[nginx] async_mode_nginx CPU 100% deadlock问题分析

很遗憾只定位到了一个比较小的问题范围,理清了root cause, 但是没有找到复现的边界条件以及solution.

复制代码
Hi all, I have the quite same problem with the latest software version:
async_nginx: 0.4.5
openssl: 1.1.1k
qatengine: 0.6.4
qatdriver: 1.7.l.4.13.0.9

the reproduce situation: config values in nginx.conf :
default_algorithms CIPHERS
qat_poll_mode heuristic

I have debuged async_ningx and found there is a infinite loop. I think this is the reason here.

1 in function ngx_http_do_read_client_request_body(), nginx goin the for(;;)[line:288] loop and never break.
as recv()[line:343] always return NGX_AGAIN, and c->read->ready always == 1
go deep in recv(), the NGX_AGAIN is return by func ngx_ssl_handle_recv()::line:2546 because of async job is paused.
2. when async context swapd, an other infinite loop was happend. in function qat_chained_ciphers_do_cipher() line:1554
as the read()[qat_pause_job():line279] always return EAGAIN.
3. As I know qat_crypto_callbackFn() is called by func qat_engine_poll(). I think, this because of the callback function qat_crypto_callbackFn() never have any CPU chance/CPU TIME to be called, then the paused async job never be waked up.
then I check the POLL logic in async_nginx. I found point 4 descripte below.
4. In function ngx_ssl_engine_qat_heuristic_poll(), all the values of the six variables(num_*) never grow up, so function qat_engine_poll() have no any chance to execute.

when I change my engine config in nginx.conf, this issue is disappear, and i can work around. the config like below:
qat_heuristic_poll_asym_threshold = 0
qat_heuristic_poll_sym_threshold = 0

It seems a logic deadlock here ? nginx want qat to update counters but counters updated need nginx release some CPU time.
or, maybe the following code do not consider the long time idle SSL connections ?
if (*num_asym_requests_in_flight + *num_kdf_requests_in_flight
+ *num_cipher_requests_in_flight + *num_asym_mb_items_in_queue
+ *num_kdf_mb_items_in_queue + *num_sym_mb_items_in_queue
>= (int) *ngx_ssl_active) {

Anyone have any idea about this ?
复制代码

 

详见:https://github.com/intel/QAT_Engine/issues/181

 

posted on   toong  阅读(34125)  评论(0编辑  收藏  举报

编辑推荐:
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
阅读排行:
· winform 绘制太阳,地球,月球 运作规律
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
· 上周热点回顾(3.3-3.9)
历史上的今天:
2017-06-23 [development][security][modsecurity][nginx] nginx / modsecurity development things

统计

点击右上角即可分享
微信分享提示