如何找bug

将发送端sleep时间定为0, 即sleep(0).

事例率也在700Hz持续3分钟后降为0.查看log文件:

1. SFI 的log文件:

WARNING 2017-Feb-08 15:39:55 [DC::StatusWord EventAssembly::EventCompleted(...) at SFI/src/EventAssembly.cxx:480] Problem with the flow of data: Event with LVL1ID 130745 misses 1 data fragment(s) from: ROS-Eth-07,
WARNING 2017-Feb-08 15:39:55 [DC::StatusWord EventAssembly::EventCompleted(...) at SFI/src/EventAssembly.cxx:480] Problem with the flow of data: Event with LVL1ID 130749 misses 1 data fragment(s) from: ROS-Eth-07,
WARNING 2017-Feb-08 15:39:55 [DC::StatusWord EventAssembly::EventCompleted(...) at SFI/src/EventAssembly.cxx:480] Problem with the flow of data: Event with LVL1ID 130753 misses 1 data fragment(s) from: ROS-Eth-07,

 

2. ROS 的log文件:

Warning could not load libEthClientSequentialReadoutModule.so, trying libROSEthClientSequentialReadoutModule.so
ERROR 2017-Feb-08 15:44:10 [virtual void ROS::RequestHandler::run(...) at ROSCore/src/RequestHandler.cpp:274] Unclassified error: RequestHandler caught exception, deleting Request
was caused by: ERROR 2017-Feb-08 15:44:10 [unknown at unknown:0] No pages available

 

找到ROSCore/src/RequestHandler.cpp:274:

  catch (std::exception& issue) {
274          ENCAPSULATE_ROS_EXCEPTION(newIssue, CoreException, UNCLASSIFIED, issue, "RequestHandler caught exception, deleting Request    ");  
275          ers::error(newIssue);
276          deleteRequest();
277       }

先找到 ENCAPSULATE_ROS_EXCEPTION的定义:并没有看出啥,只看出来例外的类型是CoreException, errorCode是UNCLASSIFIED.

/** \def ENCAPSULATE_ROS_EXCEPTION(instanceName, exceptionClass, errorCode, oldException, messageContent)
    Macro to encapsulate an exception inside a ROSException along with
    some explanatory text.
*/
#define ENCAPSULATE_ROS_EXCEPTION(instanceName, exceptionClass, errorCode, oldException, messageContent) \
    std::ostringstream instanceName##_tStream; \
    instanceName##_tStream << messageContent; \
    exceptionClass instanceName(oldException, exceptionClass::errorCode, instanceName##_tStream.str(), ERS_HERE)

#endif

然后再查看try里面的语句

243       try {
244          TS_RECORD(TS_H1,stamp_offset + 2100);
245          int requestOk;
246          //unsigned int tsdata = m_handlerId * 100 + 3000000000;
247          //TS_RECORD(TS_H5,tsdata);
248 
249          requestOk = m_request->execute();
250          TS_RECORD(TS_H1,stamp_offset + 2200);
251 
252          if (requestOk == Request::REQUEST_OK) {
253             m_requestsHandled++;
254             deleteRequest();
255             TS_RECORD(TS_H1, stamp_offset + 2300);
256          }
257          else if (requestOk == Request::REQUEST_TIMEOUT) {
258             m_requestsTimedOut++;
259             deleteRequest();
260             TS_RECORD(TS_H1, stamp_offset + 2300);
261          }
262          else {
263             TS_RECORD(TS_H1, stamp_offset + 2400);
264             DEBUG_TEXT(DFDB_ROSCORE, 15, "RequestHandler(" << m_handlerId <<")::run: Request non completed: will be requeued.");
265             m_requestsFailed++;
266       m_request = m_requestQueue->swap(m_request);
267             DFThread::yieldOrCancel();
268             TS_RECORD(TS_H1, stamp_offset + 2500);
269          }
270 
271          TS_RECORD(TS_H1, stamp_offset + 2990);
272       }

整个try...catch语句是在requesthandler的run函数里面,run函数是requesthandler的执行线程,从request queue里面拿出一个request, 执行这个request,然后再删除这个request.

通过加打印得到:try语句里执行了第一个if语句,证明request被执行后返回了REQUEST_OK,在执行if内部的语句deleteRequest();时,出现了例外。

再查看deleteRequest()函数里面写了什么:

193 void RequestHandler::deleteRequest()
194 {
195   if (m_request != 0) {
197     m_mutex->lock() ;   // Memory deallocation has to be protected as
198        delete(m_request);    // it is thread-unsafe!
199        m_request=0;
200        m_mutex->unlock();
201    }
202 }

在 if 里面每条语句里面加打印,在requesthandler的析构函数里面加打印,并查看log文件

void RequestHandler::deleteRequest()
194 {
195   if (m_request != 0) {
196     std::cout << "m_request is not 0:1!" << std::endl;
197     m_mutex->lock() ;   // Memory deallocation has to be protected as
198     std::cout << "m_request is not 0:2!" << std::endl;
199     delete(m_request);    // it is thread-unsafe!
200     std::cout << "m_request is not 0:3!" << std::endl;
201     m_request=0;
202     std::cout << "m_request is not 0:4!" << std::endl;
203     m_mutex->unlock();
204     std::cout << "m_request is not 0:5!" << std::endl;
205   }
206 }

查看ROS-Eth-04的log文件,发现:

[lhaaso@cmm03node01 part_dk_ef]$ grep "m_request is not 0:1" ROS-Eth-04_cmm03node01_1486545277.out |wc -l
261133
[lhaaso@cmm03node01 part_dk_ef]$ grep "m_request is not 0:2" ROS-Eth-04_cmm03node01_1486545277.out |wc -l
261239

[lhaaso@cmm03node01 part_dk_ef]$ grep "~Request" ROS-Eth-04_cmm03node01_1486545277.out |wc -l
130624
[lhaaso@cmm03node01 part_dk_ef]$ grep "m_request is not 0:3" ROS-Eth-04_cmm03node01_1486545277.out |wc -l
261239
[lhaaso@cmm03node01 part_dk_ef]$ grep "m_request is not 0:4" ROS-Eth-04_cmm03node01_1486545277.out |wc -l
261239
[lhaaso@cmm03node01 part_dk_ef]$ grep "m_request is not 0:5" ROS-Eth-04_cmm03node01_1486545277.out |wc -l
261238

这几条语句的执行次数,发现deleteRequest操作的次数比其前后打印的次数要少很多,证明有很多次delteRequest并没有成功,抛出例外,并导致程序报错了。

 (批注:实际上是Request有两个子类,FragmentRequest 和 ReleaseFragment类, RequestHandler执行的时候根据收到SFI的消息种类判断执行哪一种Request。)

posted @ 2017-02-08 16:52  小荷才楼尖尖角  Views(313)  Comments(0Edit  收藏  举报