ElasticSearch服务Java内存异常分析和排查解决
ElasticSearch服务Java内存异常分析和排查解决
1.ElasticSearch业务微服务日志排查
java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
现象:
部署到测试环境之后,用户量大了之后接口报错,报错信息为:Request cannot be executed; I/O reactor status: STOPPED
后端程序处于假死状态,访问其他接口一直转圈,无法响应内容,重启程序之后一切正常,用户量大了之后又有此问题。
测试方法:
使用JMETER设置50个线程并发访问可以稳定复现此问题。压力测试
2.业务系统搜索,关键字: OutOfMemoryError
cat myProject-2024-05-09-3.log | grep "OutOfMemoryError"
org.springframework.web.util.NestedServletException: Handler dispatch failed; nested exception is java.lang.OutOfMemoryError: GC overhead limit exceeded
Consumer thread error, thread abort.java.lang.OutOfMemoryError: GC overhead limit exceeded
这个报错出现时,往往是因为JVM中的GC(Garbage Collection,垃圾回收)过于频繁,以至于大部分的CPU时间都在做GC操作,而无法正常执行程序,这时,就会抛出这个错误。
具体来说,如果超过98%的CPU时间被用来做GC,并且GC后可用的堆内存不足2%,那么将会抛出"java.lang.OutOfMemoryError: GC overhead limit exceeded"错误。
以上日志分析排查的方向转为:
后来发现就是因为OOM导致程序宕机,进而引发连接终止。 排查日志也找到了oom报错日志。
推断:
程序接口中将一块很大的数据存进JAVA集合中引发了oom,oom异常导致程序宕机,处于假死状态,进而导致ES-CLIENT和ES-SERVER端的http连接异常终止,然后org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning方法报异常。
SpringDataElasticsearch和ES-SERVER是长链接,只要报了OOM,当前和ES-SERVER的连接线程都将报异常,也就是说,虽然OOM只报了一次,但是可能有多个线程都在Asserts.check方法中报异常。
3.查看进程等,确定gc,heapDump.bin文件目录路径,查看xmx配置大小
ps aux --sort -rss | head
链接:ps top命令查看内存空间
https://www.cnblogs.com/oktokeep/p/16361896.html
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 21722 14.6 4.5 11410052 1502980 ? Sl 16:51 5:47 /usr/local/java/bin/java -Djava.util.logging.config.file=/usr/local/myProject/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Dfastjson.parser.safeMode=true -server -Xms512m -Xmx1g -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=512m -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/usr/local/myProject/logs/gc-20240509_165141.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/myProject/logs/heapdump.bin -XX:+CMSParallelRemarkEnabled -XX:+ScavengeBeforeFullGC -XX:CMSInitiatingOccupancyFraction=75 -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dorg.apache.catalina.security.SecurityListener.UMASK=0027 -Dignore.endorsed.dirs= -classpath /usr/local/myProject/bin/bootstrap.jar:/usr/local/myProject/bin/tomcat-juli.jar -Dcatalina.base=/usr/local/myProject -Dcatalina.home=/usr/local/myProject -Djava.io.tmpdir=/usr/local/myProject/temp org.apache.catalina.startup.Bootstrap start
4.找到tomcat服务里面的gc日志,确定问题,oom报错日志
Java HotSpot(TM) 64-Bit Server VM (25.171-b11) for linux-amd64 JRE (1.8.0_171-b11), built on Mar 28 2018 17:07:08 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8) Memory: 4k page, physical 32778396k(694320k free), swap 0k(0k free) CommandLine flags: -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSParallelRemarkEnabled -XX:CompressedClassSpaceSize=528482304 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/myProject/logs/heapdump.bin -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=1073741824 -XX:MaxMetaspaceSize=536870912 -XX:MetaspaceSize=268435456 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+ScavengeBeforeFullGC -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC Heap PSYoungGen total 153088K, used 21054K [0x00000000eab00000, 0x00000000f5580000, 0x0000000100000000) eden space 131584K, 16% used [0x00000000eab00000,0x00000000ebf8f870,0x00000000f2b80000) from space 21504K, 0% used [0x00000000f4080000,0x00000000f4080000,0x00000000f5580000) to space 21504K, 0% used [0x00000000f2b80000,0x00000000f2b80000,0x00000000f4080000) ParOldGen total 349696K, used 0K [0x00000000c0000000, 0x00000000d5580000, 0x00000000eab00000) object space 349696K, 0% used [0x00000000c0000000,0x00000000c0000000,0x00000000d5580000) Metaspace used 5422K, capacity 5552K, committed 5888K, reserved 1056768K class space used 608K, capacity 664K, committed 768K, reserved 1048576K
5.解决思路:
1.增大堆空间:Java的堆空间是用来存储对象实例的,如果堆空间不足,那么GC就需要更频繁的运行以回收内存空间,进而造成上述的错误。因此,增大堆空间可以有效减轻GC的压力,从而避免这个错误。
-Xms512m -Xmx1g >> 扩大内存空间 -Xmx2g
2.优化代码:另一个可能的解决思路是优化你的代码,减少对象实例的创建,或者及时地释放不再使用的对象实例,减少GC的工作量。
推荐使用MemoryAnalyzer工具分析溢出代码,于是官网下载,下载地址为:https://www.eclipse.org/downloads/download.php?file=/mat/1.11.0/rcp/MemoryAnalyzer-1.11.0.20201202-win32.win32.x86_64.zip
解决内存溢出,确定gc,heapDump.bin文件目录路径
-XX:+PrintGCDateStamps -Xloggc:/usr/local/myProject/logs/gc-20240509_165141.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/myProject/logs/heapdump.bin
6.Java优化代码片段示例:
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); //数据转换方法 OrderVO<?> result = transform(reqVO, result(searchResponse.getHits().getHits()), searchResponse); //设置为null 关键 java 大对象快速回收 searchResponse = null; protected List<OrderInfoVO> result(SearchHit[] searchHits) { List<OrderInfoVO> orderInfoContent = new ArrayList<>(); Gson gson = new GsonBuilder().registerTypeAdapter(LocalDateTime.class, (JsonDeserializer<LocalDateTime>) (json, type, jsonDeserializationContext) -> LocalDateTime.parse(json.getAsJsonPrimitive().getAsString(), DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss"))) .registerTypeAdapter(LocalDate.class, (JsonDeserializer<LocalDate>) (json, type, jsonDeserializationContext) -> LocalDate.parse(json.getAsJsonPrimitive().getAsString(), DateTimeFormatter.ofPattern("yyyy-MM-dd"))).create(); for (SearchHit hit : searchHits) { String sourceAsString = hit.getSourceAsString(); OrderInfoVO vo = gson.fromJson(sourceAsString, OrderInfoVO.class); orderInfoContent.add(vo); //设置为null 关键 java 大对象快速回收 hit = null; } return orderInfoContent; }