Tensorflow word2vec编译运行

Word2vec 更完整版本(非demo)的代码在
tensorflow/models/embedding/

首先需要安装bazel 来进行编译
bazel可以下载最新的binary安装文件，这里下载0.1.0版本的bazel
https://github.com/bazelbuild/bazel/releases/download/0.1.0/bazel-0.1.0-installer-linux-x86_64.sh
貌似需要root安装
sh bazel-0.1.0-installer-linux-x86_64.sh
编译word2vec
参考README.md
bazel build -c opt tensorflow/models/embedding:all
下载训练和验证数据
wget http://mattmahoney.net/dc/text8.zip -O text8.gz
gzip -d text8.gz -f
wget https://word2vec.googlecode.com/svn/trunk/questions-words.txt
运行word2vec

pwd

/home/users/chenghuige/other/tensorflow/bazel-bin/tensorflow/models/embedding

执行命令

./word2vec_optimized --train_data ./data/text8 --eval_data ./data/questions-words.txt --save_path ./data/result/

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 24

I tensorflow/core/common_runtime/direct_session.cc:60] Direct session inter op parallelism threads: 24

I tensorflow/models/embedding/word2vec_kernels.cc:149] Data file: ./data/text8 contains 100000000 bytes, 17005207 words, 253854 unique words, 71290 unique frequent words.

Data file: ./data/text8

Vocab size: 71290 + UNK

Words per epoch: 17005207

Eval analogy file: ./data/questions-words.txt

Questions: 17827

Skipped: 1717

Epoch 1 Step 151381: lr = 0.023 words/sec = 25300

Eval 1419/17827 accuracy = 8.0%

Epoch 2 Step 302768: lr = 0.022 words/sec = 48503

Eval 2445/17827 accuracy = 13.7%

Epoch 3 Step 454147: lr = 0.020 words/sec = 46666

Eval 3211/17827 accuracy = 18.0%

Epoch 4 Step 605540: lr = 0.018 words/sec = 53928

Eval 3608/17827 accuracy = 20.2%

Epoch 5 Step 756907: lr = 0.017 words/sec = 81255

Eval 4081/17827 accuracy = 22.9%

Epoch 6 Step 908251: lr = 0.015 words/sec = 46954

posted @ 2015-11-27 15:59 阁子阅读(3148) 评论(0) 编辑收藏举报

刷新页面返回顶部

游园惊梦(https://github.com/chenghuige)

Tensorflow word2vec编译运行

公告