代码改变世界

Keras RetinaNet github项目

2018-07-27 17:33  Time皇族  阅读(4701)  评论(2编辑  收藏  举报

https://github.com/fizyr/keras-retinanet 根据此网站的方法,利用Pascal VOC 2007数据集开始训练,出现error:

D:\JupyterWorkSpace\keras-retinanet>python D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\\train.py pascal D:\\JupyterWorkSpace\\VOCdevkit\\VOC2007 --steps 100
C:\ProgramData\Anaconda3\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Traceback (most recent call last):
  File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\\train.py", line 35, in <module>
    from .. import layers  # noqa: F401
  File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\..\..\keras_retinanet\layers\__init__.py", line 1, in <module>
    from ._misc import RegressBoxes, UpsampleLike, Anchors, ClipBoxes  # noqa: F401
  File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\..\..\keras_retinanet\layers\_misc.py", line 19, in <module>
    from ..utils import anchors as utils_anchors
  File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\..\..\keras_retinanet\utils\anchors.py", line 20, in <module>
    from ..utils.compute_overlap import compute_overlap
ModuleNotFoundError: No module named 'keras_retinanet.utils.compute_overlap'

  

在anchors.py中的from ..utils.compute_overlap import compute_overlap之前加入
import pyximport
pyximport.install()

再运行,出现如下error:

D:\JupyterWorkSpace\keras-retinanet>python D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\\train.py pascal D:\\JupyterWorkSpace\\VOCdevkit\\VOC2007 --steps 100
C:\ProgramData\Anaconda3\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
compute_overlap.c
C:\Users\Administrator\.pyxbld\temp.win-amd64-3.6\Release\pyrex\keras_retinanet\utils\compute_overlap.c(567): fatal error C1083: Cannot open include file: 'numpy/arrayobject.h': No such file or directory
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\distutils\_msvccompiler.py", line 423, in compile
    self.spawn(args)
  File "C:\ProgramData\Anaconda3\lib\distutils\_msvccompiler.py", line 542, in spawn
    return super().spawn(cmd)
  File "C:\ProgramData\Anaconda3\lib\distutils\ccompiler.py", line 909, in spawn
    spawn(cmd, dry_run=self.dry_run)
  File "C:\ProgramData\Anaconda3\lib\distutils\spawn.py", line 38, in spawn
    _spawn_nt(cmd, search_path, dry_run=dry_run)
  File "C:\ProgramData\Anaconda3\lib\distutils\spawn.py", line 81, in _spawn_nt
    "command %r failed with exit status %d" % (cmd, rc))
distutils.errors.DistutilsExecError: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\cl.exe' failed with exit status 2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\pyximport\pyximport.py", line 215, in load_module
    inplace=build_inplace, language_level=language_level)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pyximport\pyximport.py", line 191, in build_module
    reload_support=pyxargs.reload_support)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pyximport\pyxbuild.py", line 102, in pyx_to_dll
    dist.run_commands()
  File "C:\ProgramData\Anaconda3\lib\distutils\dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "C:\ProgramData\Anaconda3\lib\distutils\dist.py", line 974, in run_command
    cmd_obj.run()
  File "C:\ProgramData\Anaconda3\lib\site-packages\Cython\Distutils\old_build_ext.py", line 186, in run
    _build_ext.build_ext.run(self)
  File "C:\ProgramData\Anaconda3\lib\distutils\command\build_ext.py", line 339, in run
    self.build_extensions()
  File "C:\ProgramData\Anaconda3\lib\site-packages\Cython\Distutils\old_build_ext.py", line 194, in build_extensions
    self.build_extension(ext)
  File "C:\ProgramData\Anaconda3\lib\distutils\command\build_ext.py", line 533, in build_extension
    depends=ext.depends)
  File "C:\ProgramData\Anaconda3\lib\distutils\_msvccompiler.py", line 425, in compile
    raise CompileError(msg)
distutils.errors.CompileError: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\cl.exe' failed with exit status 2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\\train.py", line 35, in <module>
    from .. import layers  # noqa: F401
  File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\..\..\keras_retinanet\layers\__init__.py", line 1, in <module>
    from ._misc import RegressBoxes, UpsampleLike, Anchors, ClipBoxes  # noqa: F401
  File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\..\..\keras_retinanet\layers\_misc.py", line 19, in <module>
    from ..utils import anchors as utils_anchors
  File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\..\..\keras_retinanet\utils\anchors.py", line 22, in <module>
    from ..utils.compute_overlap import compute_overlap
  File "C:\ProgramData\Anaconda3\lib\site-packages\pyximport\pyximport.py", line 458, in load_module
    language_level=self.language_level)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pyximport\pyximport.py", line 231, in load_module
    raise exc.with_traceback(tb)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pyximport\pyximport.py", line 215, in load_module
    inplace=build_inplace, language_level=language_level)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pyximport\pyximport.py", line 191, in build_module
    reload_support=pyxargs.reload_support)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pyximport\pyxbuild.py", line 102, in pyx_to_dll
    dist.run_commands()
  File "C:\ProgramData\Anaconda3\lib\distutils\dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "C:\ProgramData\Anaconda3\lib\distutils\dist.py", line 974, in run_command
    cmd_obj.run()
  File "C:\ProgramData\Anaconda3\lib\site-packages\Cython\Distutils\old_build_ext.py", line 186, in run
    _build_ext.build_ext.run(self)
  File "C:\ProgramData\Anaconda3\lib\distutils\command\build_ext.py", line 339, in run
    self.build_extensions()
  File "C:\ProgramData\Anaconda3\lib\site-packages\Cython\Distutils\old_build_ext.py", line 194, in build_extensions
    self.build_extension(ext)
  File "C:\ProgramData\Anaconda3\lib\distutils\command\build_ext.py", line 533, in build_extension
    depends=ext.depends)
  File "C:\ProgramData\Anaconda3\lib\distutils\_msvccompiler.py", line 425, in compile
    raise CompileError(msg)
ImportError: Building module keras_retinanet.utils.compute_overlap failed: ["distutils.errors.CompileError: command 'C:\\\\Program Files (x86)\\\\Microsoft Visual Studio 14.0\\\\VC\\\\BIN\\\\x86_amd64\\\\cl.exe' failed with exit status 2\n"]

  我猜测python调用c在Windows系统上bug比较多,还好这个Keras RetinaNet github项目的旧版本没有调用c,索性就用旧版本。

       但是又出现问题:

Limit:                  1551050342
InUse:                  1548747008
MaxInUse:               1549328640
NumAllocs:                    1403
MaxAllocSize:            119565824

2018-07-31 13:42:42.065436: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\bfc_allocator.cc:277] ****************************************************************************************************
2018-07-31 13:42:42.081028: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[1,512,100,101]
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1323, in _do_call
    return fn(*args)
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1302, in _run_fn
    status, run_metadata)
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,512,100,101]
         [[Node: bn3a_branch2c/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NHWC", epsilon=1.001e-05, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](res3a_branch2c/convolution, bn3a_branch2c/gamma/read, bn3a_branch2c/beta/read, bn3a_branch2c/moving_mean/read, bn3a_branch2c/moving_variance/read)]]
         [[Node: loss/add/_2253 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_8646_loss/add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\\train.py", line 443, in <module>
    main()
  File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\\train.py", line 438, in main
    callbacks=callbacks,
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 1415, in fit_generator
    initial_epoch=initial_epoch)
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training_generator.py", line 213, in fit_generator
    class_weight=class_weight)
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 1215, in train_on_batch
    outputs = self.train_function(ins)
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 2672, in __call__
    return self._legacy_call(inputs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 2654, in _legacy_call
    **self.session_kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 889, in run
    run_metadata_ptr)
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1317, in _do_run
    options, run_metadata)
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,512,100,101]
         [[Node: bn3a_branch2c/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NHWC", epsilon=1.001e-05, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](res3a_branch2c/convolution, bn3a_branch2c/gamma/read, bn3a_branch2c/beta/read, bn3a_branch2c/moving_mean/read, bn3a_branch2c/moving_variance/read)]]
         [[Node: loss/add/_2253 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_8646_loss/add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'bn3a_branch2c/FusedBatchNorm', defined at:
  File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\\train.py", line 443, in <module>
    main()
  File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\\train.py", line 410, in main
    freeze_backbone=args.freeze_backbone
  File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\\train.py", line 87, in create_models
    model          = model_with_weights(backbone_retinanet(num_classes, modifier=modifier), weights=weights, skip_mismatch=True)
  File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\..\..\keras_retinanet\models\resnet.py", line 33, in retinanet
    return resnet_retinanet(*args, backbone=self.backbone, **kwargs)
  File "D:\\JupyterWorkSpace\\keras-retinanet\\keras_retinanet\\bin\..\..\keras_retinanet\models\resnet.py", line 75, in resnet_retinanet
    resnet = keras_resnet.models.ResNet50(inputs, include_top=False, freeze_bn=True)
  File "C:\Users\Administrator\AppData\Roaming\Python\Python36\site-packages\keras_resnet\models\_2d.py", line 188, in ResNet50
    return ResNet(inputs, blocks, numerical_names=numerical_names, block=keras_resnet.blocks.bottleneck_2d, include_top=include_top, classes=classes, *args, **kwargs)
  File "C:\Users\Administrator\AppData\Roaming\Python\Python36\site-packages\keras_resnet\models\_2d.py", line 76, in ResNet
    x = block(features, stage_id, block_id, numerical_name=(block_id > 0 and numerical_names[stage_id]), freeze_bn=freeze_bn)(x)
  File "C:\Users\Administrator\AppData\Roaming\Python\Python36\site-packages\keras_resnet\blocks\_2d.py", line 139, in f
    y = keras_resnet.layers.BatchNormalization(axis=axis, epsilon=1e-5, freeze=freeze_bn, name="bn{}{}_branch2c".format(stage_char, block_char))(y)
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\base_layer.py", line 457, in __call__
    output = self.call(inputs, **kwargs)
  File "C:\Users\Administrator\AppData\Roaming\Python\Python36\site-packages\keras_resnet\layers\_batch_normalization.py", line 17, in call
    return super(BatchNormalization, self).call(training=(not self.freeze), *args, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\layers\normalization.py", line 178, in call
    return normalize_inference()
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\layers\normalization.py", line 174, in normalize_inference
    epsilon=self.epsilon)
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 1905, in batch_normalization
    is_training=False
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_impl.py", line 831, in fused_batch_norm
    name=name)
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 2033, in _fused_batch_norm
    is_training=is_training, name=name)
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2956, in create_op
    op_def=op_def)
  File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,512,100,101]
         [[Node: bn3a_branch2c/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NHWC", epsilon=1.001e-05, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](res3a_branch2c/convolution, bn3a_branch2c/gamma/read, bn3a_branch2c/beta/read, bn3a_branch2c/moving_mean/read, bn3a_branch2c/moving_variance/read)]]
         [[Node: loss/add/_2253 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_8646_loss/add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

  cmd上显示的电脑条件如下:

2018-07-31 13:41:32.294261: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX
2018-07-31 13:41:32.924335: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GT 740 major: 3 minor: 0 memoryClockRate(GHz): 1.0585
pciBusID: 0000:01:00.0
totalMemory: 2.00GiB freeMemory: 1.66GiB
2018-07-31 13:41:32.931511: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GT 740, pci bus id: 0000:01:00.0, compute capability: 3.0)

  出现错误的原因应该是GPU内存太小,所以还需要换个更好的GPU。