Check failed: error == cudaSuccess (8 vs. 0) invalid device function

是用的happnear的caffe-windows，但是移植的时候，配置中少了USE_CUDNN，所以报这个错（转换数据和调用模型都出现这个问题），然后才知道

最大能力编译版本，提示invaild argument，说明batchsize还是太大了，爆了缓存

invaild function应该是超出支持能力
invaild argument，就是明显的爆显卡了

网上也看到了 Check failed: error == cudaSuccess (8 vs. 0) invalid device function，大致的问题也就是说这个其实还就是要设置显卡的计算能力。

我改为这个之后，运行模型，出现了loss=1.#QNAN

网上查了下

QNAN是指Quiet Not a Number，类似的浮点错误还有SNaN（Signaling Not a Number），通常如0.0/0.0、给负数开平方根等溢出或无效运算就会产生一个NAN结果的表示值，NaN及两种浮点错误的说明如下：
The value NaN (Not a Number) is used to represent a value that does not represent a real number. NaN’s are represented by a bit pattern with an exponent of all 1s and a non-zero fraction. There are two categories of NaN: QNaN (Quiet NaN) and SNaN (Signalling NaN).
A QNaN is a NaN with the most significant fraction bit set. QNaN’s propagate freely through most arithmetic operations. These values pop out of an operation when the result is not mathematically defined.
An SNaN is a NaN with the most significant fraction bit clear. It is used to signal an exception when used in operations. SNaN’s can be handy to assign to uninitialized variables to trap premature usage.
Semantically, QNaN’s denote indeterminate operations, while SNaN’s denote invalid operations. If a return value is a QNaN, it means that it is impossible to determine the result of the operation, a SNaN means that the operation is invalid.
　　这样的特殊浮点数还有INF和IND：INF就是Infinity，表示一个无穷大的数，包括正无穷和负无穷；IND则表示无限小，但不确定。如1.0/0.0会产生一个INF无穷大，-1.0/0.0会产生一个负无穷大。
除0操作的意思是，Quiet Not A Number，也就是不会触发浮点异常的NaN，而NaN是“不是一个数”的意思。通常你得到这种结果一般通过

+/- 无穷大除以+/-无穷大
+/- 无穷大之间的加减法
sqrt的参数是负数

等

我的问题，应该是我的显卡是960M，而之前参考的是

常见的GPU的Code Generation如下：

但是

笔记本版本的显卡和台式机的计算能力是有差距的。所以问题就在这里，把caffelib中Configuration Properties的CUDA C/C++中Device中的Code Generation改为compute_50,sm_50;。就可以了。

感觉，其实这样，搞下来，可能 caffe的错误，会全部过一轮。

posted @ 2016-04-14 09:39 菜鸡一枚阅读(13947) 评论(0) 收藏举报

刷新页面返回顶部

菜鸡一枚

Check failed: error == cudaSuccess (8 vs. 0) invalid device function

公告