阿里云k8s前端测试环境cpu和内存过低pod无法启动的问题

镜像在本地可以正常启动,放到阿里云之后无法启动,只在容器启动的一瞬间

状态是oomkilled的然后无限重启容器,oomkilled状态只维持一会姑没有截图

 阿里云edas配置使用cpu和内存的资源,在开发更新代码之前项目一直稳定正常运行

 

 

 

 

#启动脚本
[root@deploy-nb-63 nuxt_talent_mobile]# cat run.sh 
#!/bin/sh
cd /nuxtjs && npm run start:test

 

阿里云ack检测到的报错

  Initialized: True

Ready: False
ContainersReady: False
PodScheduled: True
CrashLoopBackOFF

 

 #其中ack查看到的报错日志

 

 

 

#查看到报错文件
/nuxtjs # cat /root/.npm/_logs/2023-01-05T03_03_09_262Z-debug.log 0 info it worked if it ends with ok 1 verbose cli [ '/usr/local/bin/node', '/usr/local/bin/npm', 'run', 'start:test' ] 2 info using npm@6.14.15 3 info using node@v14.18.3 4 verbose run-script [ 'prestart:test', 'start:test', 'poststart:test' ] 5 info lifecycle nuxt_talent@1.0.0~prestart:test: nuxt_talent@1.0.0 6 info lifecycle nuxt_talent@1.0.0~start:test: nuxt_talent@1.0.0 7 verbose lifecycle nuxt_talent@1.0.0~start:test: unsafe-perm in lifecycle true 8 verbose lifecycle nuxt_talent@1.0.0~start:test: PATH: /usr/local/lib/node_modules/npm/node_modules/npm-lifecycle/node-gyp-bin:/nuxtjs/node_modules/.bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 9 verbose lifecycle nuxt_talent@1.0.0~start:test: CWD: /nuxtjs 10 silly lifecycle nuxt_talent@1.0.0~start:test: Args: [ 10 silly lifecycle '-c', 10 silly lifecycle 'cross-env NODE_ENV=development ENV=development HOST=0.0.0.0 PORT=9700 node --max-old-space-size=4096 ./start.js' 10 silly lifecycle ] 11 silly lifecycle nuxt_talent@1.0.0~start:test: Returned: code: 1 signal: null 12 info lifecycle nuxt_talent@1.0.0~start:test: Failed to exec start:test script 13 verbose stack Error: nuxt_talent@1.0.0 start:test: `cross-env NODE_ENV=development ENV=development HOST=0.0.0.0 PORT=9700 node --max-old-space-size=4096 ./start.js` 13 verbose stack Exit status 1 13 verbose stack at EventEmitter.<anonymous> (/usr/local/lib/node_modules/npm/node_modules/npm-lifecycle/index.js:332:16) 13 verbose stack at EventEmitter.emit (events.js:400:28) 13 verbose stack at ChildProcess.<anonymous> (/usr/local/lib/node_modules/npm/node_modules/npm-lifecycle/lib/spawn.js:55:14) 13 verbose stack at ChildProcess.emit (events.js:400:28) 13 verbose stack at maybeClose (internal/child_process.js:1058:16) 13 verbose stack at Process.ChildProcess._handle.onexit (internal/child_process.js:293:5) 14 verbose pkgid nuxt_talent@1.0.0 15 verbose cwd /nuxtjs 16 verbose Linux 3.10.0-1127.13.1.el7.x86_64 17 verbose argv "/usr/local/bin/node" "/usr/local/bin/npm" "run" "start:test" 18 verbose node v14.18.3 19 verbose npm v6.14.15 20 error code ELIFECYCLE 21 error errno 1 22 error nuxt_talent@1.0.0 start:test: `cross-env NODE_ENV=development ENV=development HOST=0.0.0.0 PORT=9700 node --max-old-space-size=4096 ./start.js` 22 error Exit status 1 23 error Failed at the nuxt_talent@1.0.0 start:test script. 23 error This is probably not a problem with npm. There is likely additional logging output above. 24 verbose exit [ 1, true ]

以上node的报错发送给开发没有排查出任何错误

 

 

#在本地测试服务器修改启动脚本然后手工启动正常,推送到阿里云手工测试
[root@deploy-nb-63 nuxt_talent_mobile]# cat run.sh 
#!/bin/sh
#cd /nuxtjs && npm run start:test
tail -F /nuxtjs/start.js

#手工启动node,非常正常,但还是会有编译的过程,此过程在dockerfile阶段已经把代码编译好。

 

 

#在次测试在阿里云ack直接进入pod手工启动,进入node服务器进入容器内手工执行npm run start:test启动node

#根据ack上面查看到的名称和对应的节点ip在对应的node节点找出容器

 

 

 

 

 

#进入到node服务器 

 

#查找出所在的容器

 

 #进入到pod

 

#手工启动node,发现容器在阿里云ack也变成了再次编译,这里的配置跟开发配置的变量npm run start:test的动作有关系,因为编译造成了资源使用率过大,超出k8s的limits和request的cpu和内存设置

 

#其中的/root/.npm/_logs/2023-01-05T06_45_07_959Z-debug.log日志内容
/nuxtjs # cat /root/.npm/_logs/2023-01-05T06_45_07_959Z-debug.log

0 info it worked if it ends with ok
1 verbose cli [ '/usr/local/bin/node', '/usr/local/bin/npm', 'run', 'start:test' ]
2 info using npm@6.14.15
3 info using node@v14.18.3
4 verbose run-script [ 'prestart:test', 'start:test', 'poststart:test' ]
5 info lifecycle nuxt_talent@1.0.0~prestart:test: nuxt_talent@1.0.0
6 info lifecycle nuxt_talent@1.0.0~start:test: nuxt_talent@1.0.0
7 verbose lifecycle nuxt_talent@1.0.0~start:test: unsafe-perm in lifecycle true
8 verbose lifecycle nuxt_talent@1.0.0~start:test: PATH: /usr/local/lib/node_modules/npm/node_modules/npm-lifecycle/node-gyp-bin:/nuxtjs/node_modules/.bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
9 verbose lifecycle nuxt_talent@1.0.0~start:test: CWD: /nuxtjs
10 silly lifecycle nuxt_talent@1.0.0~start:test: Args: [
10 silly lifecycle '-c',
10 silly lifecycle 'cross-env NODE_ENV=development ENV=development HOST=0.0.0.0 PORT=9700 node --max-old-space-size=4096 ./start.js'
10 silly lifecycle ]
11 silly lifecycle nuxt_talent@1.0.0~start:test: Returned: code: 1 signal: null
12 info lifecycle nuxt_talent@1.0.0~start:test: Failed to exec start:test script
13 verbose stack Error: nuxt_talent@1.0.0 start:test: `cross-env NODE_ENV=development ENV=development HOST=0.0.0.0 PORT=9700 node --max-old-space-size=4096 ./start.js`
13 verbose stack Exit status 1
13 verbose stack at EventEmitter.<anonymous> (/usr/local/lib/node_modules/npm/node_modules/npm-lifecycle/index.js:332:16)
13 verbose stack at EventEmitter.emit (events.js:400:28)
13 verbose stack at ChildProcess.<anonymous> (/usr/local/lib/node_modules/npm/node_modules/npm-lifecycle/lib/spawn.js:55:14)
13 verbose stack at ChildProcess.emit (events.js:400:28)
13 verbose stack at maybeClose (internal/child_process.js:1058:16)
13 verbose stack at Process.ChildProcess._handle.onexit (internal/child_process.js:293:5)
14 verbose pkgid nuxt_talent@1.0.0
15 verbose cwd /nuxtjs
16 verbose Linux 3.10.0-1127.13.1.el7.x86_64
17 verbose argv "/usr/local/bin/node" "/usr/local/bin/npm" "run" "start:test"
18 verbose node v14.18.3
19 verbose npm v6.14.15
20 error code ELIFECYCLE
21 error errno 1
22 error nuxt_talent@1.0.0 start:test: `cross-env NODE_ENV=development ENV=development HOST=0.0.0.0 PORT=9700 node --max-old-space-size=4096 ./start.js`
22 error Exit status 1
23 error Failed at the nuxt_talent@1.0.0 start:test script.
23 error This is probably not a problem with npm. There is likely additional logging output above.
24 verbose exit [ 1, true ]

 

 #在本地直接以docker run的形式测试容器,启动所占用的资源

 

 

 

发现启动过程中cpu资源最高去到了300%,内存最高去到了3g多

#启动后资源占用率下降,CPU维持在0.01核,内存维持在923.4MB

 

最终发现是开发代码变量npm run start:test,node.js启动的时候执行什么操作代码所引起的故障,所造成启动一瞬间资源占用过高,造成无法启动,只能暂时调整阿里云k8s集群pod的内存和cpu资源限制

调整之后pod正常启动,但让开发优化代码

 

posted @ 2023-01-05 16:13  YYQ-  阅读(403)  评论(0编辑  收藏  举报