2024-07-03 11:08:16.066 [DEBUG] [[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'] [70930e830d7c5249,70930e830d7c5249,false] SecurityContextHolder now cleared, as request processing completed 七月 03, 2024 11:08:07 上午 org.jboss.netty.channel.socket.nio.NioWorker 警告: Unexpected exception in the selector loop. java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.HashMap.newKeyIterator(HashMap.java:968) at java.util.HashMap$KeySet.iterator(HashMap.java:1002) at java.util.HashSet.iterator(HashSet.java:170) at sun.nio.ch.Util$2.iterator(Util.java:303) at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:274) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
OutOfMemoryError是java.lang.VirtualMachineError的子类,当JVM资源利用出现问题时抛出,更具体地说,这个错误是由于JVM花费太长时间执行GC且只能回收很少的堆内存时抛出的。
以下代码可以复现java.lang.OutOfMemoryError: GC Overhead Limit Exceeded错误
使用一个while死循环不停地往HashMap中添加随机数。在执行main方法之前,先设置JVM参数为-Xmx300m -XX:+UseParallelGC(JVM堆为300MB,GC算法为ParallelGC),然后运行main方法,会遇到java.lang.OutOfMemoryError: GC Overhead Limit Exceeded错误
package com.galaxy.concurrency.jvm; import java.util.HashMap; import java.util.Map; import java.util.Random; public class OutOfMemoryGCLimitExceed { public static void addRandomDataToMap() { Map<Integer, String> dataMap = new HashMap<>(); Random r = new Random(); while (true) { dataMap.put(r.nextInt(), String.valueOf(r.nextInt())); } } public static void main(String[] args) { addRandomDataToMap(); } }
解决方案:
通过检查可能存在内存泄漏的代码来发现应用程序所存在的问题
考虑:
1、应用程序中哪些对象占据了堆的大部分空间?(What are the objects in the application that occupy large portions of the heap?)
2、这些对象在源码中的哪些部分被使用?(In which parts of the source code are these objects being allocated?)
工具:
自动化图形工具,比如JVisualVM、JConsole,它可以帮助检测代码中的性能问题,包括java.lang.OutOfMemoryError
快捷解决方法:
方式:更改JVM启动配置来增加堆大小,或者在JVM启动配置里增加-XX:-UseGCOverheadLimit选项来关闭GC Overhead limit exceeded
例如,JVM参数为Java应用程序提供了1GB堆空间:java -Xmx1024m com.xyz.TheClassName
JVM参数不仅为Java应用程序提供了1GB堆空间,也增加-XX:-UseGCOverheadLimit选项来关闭GC Overhead limit exceeded:java -Xmx1024m -XX:-UseGCOverheadLimit com.xyz.TheClassNam
还是出现问题:
但增加-XX:-UseGCOverheadLimit选项的方式治标不治本,JVM最终会抛出java.lang.OutOfMemoryError: Java heap space错误
线上事故解决过程及总结
1、异常日志
2024-07-03 11:08:16.066 [DEBUG] [[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'] [70930e830d7c5249,70930e830d7c5249,false] SecurityContextHolder now cleared, as request processing completed 七月 03, 2024 11:08:07 上午 org.jboss.netty.channel.socket.nio.NioWorker 警告: Unexpected exception in the selector loop. java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.HashMap.newKeyIterator(HashMap.java:968) at java.util.HashMap$KeySet.iterator(HashMap.java:1002) at java.util.HashSet.iterator(HashSet.java:170) at sun.nio.ch.Util$2.iterator(Util.java:303) at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:274) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) <Jul 3, 2024 11:08:23 AM CST> <Error> <JDBC> <BEA-001112> <Test "SELECT 1 FROM DUAL" set up for pool "sinosoftDataSource" failed with exception: "java.sql.SQLException: Protocol violation: [1]".>
2、问题排查
1、OOM Killer
Linux下发生OOM,不一定是因为Java服务耗内存,也可能是因为其他程序申请了很多内存,此时所有应用所需要的内存超过物理内存,然后Java服务很耗内存且被Linux操作系统找到,就会被 kill,
这是Linux为避免物理内存过载导致系统崩溃而采取的内存保护机制。
2、环境部署情况
我们这个服务是单独部署的,使用weblogic部署,JDK:export JAVA_HOME=/app/jdk1.6.0_24
因此我们将视线转到JVM内存配置上。这个应用访问量不大,线上服务器内存为 15G,我们先用JDK自带的命令工具查看了JVM配置
使用jps查询到使用该JDK部署多个项目,可以使用项目对应的端口进行判断项目PID
[weblogic@newcoreSIT01 bin]$ sudo ./jps 23989 Server 3672 Server 26137 Jps 29041 Server 3403 Bootstrap 29418 Server 20363 Server 2175 Server 4293 Server 27986 Bootstrap 27422 Server [weblogic@newcoreSIT01 bin]$ lsof -i:8001 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 2175 weblogic 703u IPv6 3943359544 0t0 TCP localhost:vcom-tunnel (LISTEN) java 2175 weblogic 704u IPv6 3943359545 0t0 TCP [fe80::250:56ff:fea5:35bf]:vcom-tunnel (LISTEN) java 2175 weblogic 705u IPv6 3943359546 0t0 TCP newcoreSIT01.sinosafe.local:vcom-tunnel (LISTEN) java 2175 weblogic 706u IPv6 3943359547 0t0 TCP localhost:vcom-tunnel (LISTEN)
查看JVM配置方式:
方式一:
[weblogic@newcoreSIT01 bin]$ sudo ./jps -v | grep 2175
2175 Server -Xms512m -Xmx1024m -XX:CompileThreshold=8000 -XX:PermSize=512m -XX:MaxPermSize=1024m -Dweblogic.Name=AdminServer -Djava.security.policy=/app/weblogic/Oracle/Middleware/wlserver_10.3/server/lib/weblogic.policy -Xverify:none -da -Dplatform.home=/app/weblogic/Oracle/Middleware/wlserver_10.3 -Dwls.home=/app/weblogic/Oracle/Middleware/wlserver_10.3/server -Dweblogic.home=/app/weblogic/Oracle/Middleware/wlserver_10.3/server -Dweblogic.management.discover=true -Dwlw.iterativeDev= -Dwlw.testConsole= -Dwlw.logErrorsToConsole= -Dweblogic.ext.dirs=/app/weblogic/Oracle/Middleware/patch_wls1036/profiles/default/sysext_manifest_classpath:/app/weblogic/Oracle/Middleware/patch_ocp371/profiles/default/sysext_manifest_classpath -Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,address=10899,server=y,suspend=n -Djava.compiler=NONE
方式二:
#查看jvm参数,pid为spacex.jar的进程号
sudo jinfo -flags pid [weblogic@newcoreSIT01 bin]$ sudo ./jinfo -flags 2175 Attaching to process ID 2175, please wait... Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.tools.jinfo.JInfo.runTool(JInfo.java:79) at sun.tools.jinfo.JInfo.main(JInfo.java:53) Caused by: java.lang.RuntimeException: Type "nmethodBucket*", referenced in VMStructs::localHotSpotVMStructs in the remote VM, was not present in the remote VMStructs::localHotSpotVMTypes table (should have been caught in the debug build of that VM). Can not continue. at sun.jvm.hotspot.HotSpotTypeDataBase.lookupOrFail(HotSpotTypeDataBase.java:361) at sun.jvm.hotspot.HotSpotTypeDataBase.readVMStructs(HotSpotTypeDataBase.java:252) at sun.jvm.hotspot.HotSpotTypeDataBase.<init>(HotSpotTypeDataBase.java:87) at sun.jvm.hotspot.bugspot.BugSpotAgent.setupVM(BugSpotAgent.java:568) at sun.jvm.hotspot.bugspot.BugSpotAgent.go(BugSpotAgent.java:494) at sun.jvm.hotspot.bugspot.BugSpotAgent.attach(BugSpotAgent.java:332) at sun.jvm.hotspot.tools.Tool.start(Tool.java:163) at sun.jvm.hotspot.tools.JInfo.main(JInfo.java:128)
方式三:
查看weblogic( setDomainEnv.sh)启动文件配置JVM
if [ "${JAVA_VENDOR}" = "Sun" ] ; then WLS_MEM_ARGS_64BIT="-Xms512m -Xmx1024m" export WLS_MEM_ARGS_64BIT WLS_MEM_ARGS_32BIT="-Xms512m -Xmx1024m" export WLS_MEM_ARGS_32BIT else WLS_MEM_ARGS_64BIT="-Xms512m -Xmx1024m" export WLS_MEM_ARGS_64BIT WLS_MEM_ARGS_32BIT="-Xms512m -Xmx1024m" export WLS_MEM_ARGS_32BIT fi
3、内存情况
[weblogic@newcoreSIT01 bin]$ free -m total used free shared buffers cached Mem: 15947 15348 599 0 225 3010 -/+ buffers/cache: 12111 3835 Swap: 10239 1488 8751
4、weblogic配置jconsole-sunos(solaris)+weblogic
1、jconsole使用jmx进行监控,需要在应用启动时,配置启动参数。因为使用的是weblogic服务器,故需要在${DOMAIN_HOME}/bin/setDomainEnv.sh环境中进行配置。
JAVA_OPTIONS="${JAVA_OPTIONS} -Dcom.sun.management.jmxremote.port=9000" JAVA_OPTIONS="${JAVA_OPTIONS} -Dcom.sun.management.jmxremote.authenticate=false" JAVA_OPTIONS="${JAVA_OPTIONS} -Dcom.sun.management.jmxremote.ssl=false" NBZ SIT : -Dcom.sun.management.jmxremote.port=9991
jconsole 连接配置
ip地址:端口
用户 密码
代码原因
//04责任险业务分类特殊处理 if ("04".equals(dto.getRiskCode().substring(0, 2))) { GuXXXXXDto guXXXXXDto = new GuXXXXXDto(); guXXXXXDto.setProposalNo(proposalNo); List<GuXXXXXDto> XXXXXList = guXXXXXDao.find(guXXXXXDto, null); GuXXXXicListDto guPXXXListDto = new GuXXXDto(); + guXXXcListDao.setProposalNo(proposalNo); List<GuXXXXicListDto> proposalDynamiList = guXXXcListDao.find(guPrXXXXynamicListDto, null); # 全表查询导致oom ServiceManager.prpall.getXXXXindService().proceXXXlProBusinessType04(XXXXXList,propoXXXnamiList,dto);guPrXXXXcListDto