使用Keras做OCR时报错:ValueError: Tensor Tensor is not an element of this graph
现象
项目使用 Flask + Keras + Tensorflow
同样的代码在机器A和B上都能正常运行,但在机器C上就会报如下异常。机器A和B的环境是先安装的,运行、调试成功后才尝试在C上跑。
File "/Users/qhl/anaconda3/lib/python3.6/site-packages/keras/models.py", line 1025, in predict steps=steps) File "/Users/qhl/anaconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1832, in predict self._make_predict_function() File "/Users/qhl/anaconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1029, in _make_predict_function **kwargs) File "/Users/qhl/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2502, in function return Function(inputs, outputs, updates=updates, **kwargs) File "/Users/qhl/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2445, in __init__ with tf.control_dependencies(self.outputs): File "/Users/qhl/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 4863, in control_dependencies return get_default_graph().control_dependencies(control_inputs) File "/Users/qhl/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 4481, in control_dependencies c = self.as_graph_element(c) File "/Users/qhl/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3478, in as_graph_element return self._as_graph_element_locked(obj, allow_tensor, allow_operation) File "/Users/qhl/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3557, in _as_graph_element_locked raise ValueError("Tensor %s is not an element of this graph." % obj) ValueError: Tensor Tensor("Output/Softmax:0", shape=(?, 3062), dtype=float32) is not an element of this graph.
查找原因
由于Tensorflow有可能会用到显卡,因此首先怀疑是显卡或驱动的原因导致的。在C机器上做过如下尝试:
- 重装过CPU和GPU版本的Tensorflow
- 重装过显卡驱动以及CUBA
- 操作系统也更换过Windows、Deepin、Ubuntu
- 应用代码也同步过多次,确保A/B/C机器上的代码一模一样
- 也查看过Tensorflow和Keras以及h5py的版本号
最终还是报一样的错。经过代码跟踪、上网搜索,发现是由于Tensorflow在多线程模式下的一个bug。而Flask最新版(1.0.2)默认改为多线程模式了,以前是默认单线程模式。而且巧的是,Flask 1.0版就是在我安装C机器环境前才发布的,装A/B机器时还是0.12。
解决办法
1. 据说改为Theano为backend可以解决。我没试过
2. 修改当前的default graph。这里有个大讨论可以参考:https://github.com/keras-team/keras/issues/2397 。具体做法:
在加载或构建你的model后添加
graph = tf.get_default_graph()
在执行model.predict()方法前
global graph with graph.as_default(): (... do inference here ...)
这样就可以支持多线程模式了
3. 也可以强制将Flask改为单线程模式。
if __name__ == '__main__': app.run(host="0.0.0.0", port=8080, threaded=False)