rodenpark

  博客园 :: 首页 :: 博问 :: 闪存 :: 新随笔 :: 联系 :: 订阅 订阅 :: 管理 ::

这几天终于把tensorflow安装上了,中间遇到过不少的问题,这里记录下来。供大家想源码安装的参考。

安装环境:POWER8处理器,Docker容器Ubuntu14.04镜像。

Build Tensorflow for IBM POWER8 CPU from Source Code

1. My os environment
  14.04.1-Ubuntu SMP
  ppc64le
  gcc 4.8.4
  python 2.7.6

2. Install bazel and protobuf
  I only have openjdk-7. so I installed bazel 0.1.0, and bazel 0.1.0 needs protobuf v3.0.0-alpha-3, you can refer to “Build Bazel<v0.1.0> for IBM POWER8 CPU from Source Code" for the installation.

3. Install other dependencies
  sudo apt-get install python-pip python-dev python-numpy
  sudo apt-get install swig

4. get source code
  git clone --recurse-submodules https://github.com/tensorflow/tensorflow

5. modify ~/.bazelrc
  add build options #you can visit http://bazel.io/docs/bazel-user-manual.html to find these options' descriptions
  to build in standalone : --spawn_strategy=standalone --genrule_strategy=standalone
  to limit cpu and ram usage : --jobs=20 --ram_utilization_factor percentage=30

6. build source code

  ./configure (select GPU or CPU)
  bazel build -c opt  //tensorflow/cc:tutorials_example_trainer

7. Create the pip package and install
7.1 generate tensorflow whl package
  if you wan to use tensorflow in python, a pip package should be created
  $ bazel build -c opt //tensorflow/tools/pip_package:build_pip_package
  # or build with GPU support:
  $ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
  after a night, a message displayed:
  Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
  bazel-bin/tensorflow/tools/pip_package/build_pip_package
  INFO: Elapsed time: 32556.820s, Critical Path: 31793.39s

  bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

7.2 tensorflow whl package path
  opuser@nova:~/tensorflow/tensorflow$ ls /tmp/tensorflow_pkg/
  tensorflow-0.5.0-cp27-none-linux_ppc64le.whl
7.3 install whl package using pip
  opuser@nova:~/tensorflow/tensorflow$ sudo pip install /tmp/tensorflow_pkg/tensorflow-0.5.0-cp27-none-linux_ppc64le.whl
7.4 tensflow installed package path
  opuser@nova:~/tensorflow/tensorflow/tensorflow/models/image/mnist$ ls /usr/local/lib/python2.7/dist-packages
  tensorflow tensorflow-0.5.0.dist-info
7.5 train a mnist dataset(#sudo is needed)
  # You can alternatively pass the path to the model program file to the python interpreter.
  opuser@nova:~$ sudo python /usr/local/lib/python2.7/dist-packages/tensorflow/models/image/mnist/convolutional.py
  Succesfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
  Succesfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
  Succesfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
  Succesfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
  Extracting data/train-images-idx3-ubyte.gz
  Extracting data/train-labels-idx1-ubyte.gz
  Extracting data/t10k-images-idx3-ubyte.gz
  Extracting data/t10k-labels-idx1-ubyte.gz
  can't determine number of CPU cores: assuming 4
  I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
  can't determine number of CPU cores: assuming 4
  I tensorflow/core/common_runtime/direct_session.cc:60] Direct session inter op parallelism threads: 4
  Initialized!
  Epoch 0.00
  Minibatch loss: 12.054, learning rate: 0.010000
  Minibatch error: 90.6%
  Validation error: 84.6%
  Minibatch loss: 3.289, learning rate: 0.010000
  ......


8. problems during compiling
<Error: gcc: internal compiler error: Killed, com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4.
>
  This is due to the lack of cpu ram or swap. you can modify --jobs value or --ram_utilization_factor value . or check if there is any process that occupies large ram. and kill it. It happends to me that there may exist two bazel servers. so I need to kill one.

9. reference
tensorflow/tensorflow/g3doc/get_started/os_setup.md
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md

bazel-user-manual.html
http://bazel.io/docs/bazel-user-manual.html

cuda or cudnn version dismatch

https://github.com/tensorflow/tensorflow/issues/125

 

posted on 2015-11-30 17:38  rodenpark  阅读(4258)  评论(2编辑  收藏  举报