Word2vec多线程(tensorflow)

workers = []

for _ in xrange(opts.concurrent_steps):

t = threading.Thread(target=self._train_thread_body)

t.start()

workers.append(t)

   

   

Word2vec.py使用了多线程

一般认为python多线程其实是单线程 由于python的设计 GPL 内存不是现成安全的

但是这里由于内部是调用c++代码 所以还是能起到多线程作用

   

Word2vec skipgramoperator内部类设计 解决多线程访问冲突问题用的是锁

mutex mu_;

random::PhiloxRandom philox_ GUARDED_BY(mu_);

random::SimplePhilox rng_ GUARDED_BY(mu_);

int32 current_epoch_ GUARDED_BY(mu_) = -1;

int64 total_words_processed_ GUARDED_BY(mu_) = 0;

int32 example_pos_ GUARDED_BY(mu_);

int32 label_pos_ GUARDED_BY(mu_);

int32 label_limit_ GUARDED_BY(mu_)

   

觉得operator的操作还是单线程并行执行的 由于锁

后面的batch计算是并行的

def _train_thread_body(self):

initial_epoch, = self._session.run([self._epoch])

while True:

_, epoch = self._session.run([self._train, self._epoch])

if epoch != initial_epoch:

break

   

(words, counts, words_per_epoch, self._epoch, self._words, examples,

labels) = word2vec.skipgram(filename=opts.train_data,

batch_size=opts.batch_size,

window_size=opts.window_size,

min_count=opts.min_count,

subsample=opts.subsample

   

   

   

The threading lock only affects Python code. If your thread is waiting for disk I/O or if it is calling C functions (e.g. via math library) you can ignore the GIL.

You may be able to use the async pattern to get around threading limits. Can you supply more information about what your program actually does?

I have issues with the technical accuracy of the video linked. David Beazley has done many well respected talks about the GIL at various Pycons. You can find them on pyvideo.org.

   

来自 <https://www.reddit.com/r/Python/comments/3s0vg9/is_my_multithreaded_python_program_doomed/>

   

   

posted @ 2015-12-16 20:17  阁子  阅读(1940)  评论(0编辑  收藏  举报