spark由于shuffle中read过大造成netty申请DirectMemor失败异常分析

1.报错日志:

 1 23/11/01 22:14:25 INFO [Executor task launch worker for task 79952] MapOutputTrackerWorker: Don't have map outputs for shuffle 7, fetching them
 2 23/11/01 22:14:28 INFO [Executor task launch worker for task 79932] TorrentBroadcast: Started reading broadcast variable 174
 3 23/11/01 22:14:28 INFO [Executor task launch worker for task 79932] MemoryStore: Block broadcast_174_piece0 stored as bytes in memory (estimated size 3.5 MB, free 8.0 GB)
 4 23/11/01 22:14:28 INFO [Executor task launch worker for task 79932] TorrentBroadcast: Reading broadcast variable 174 took 29 ms
 5 23/11/01 22:14:28 INFO [Executor task launch worker for task 79932] MemoryStore: Block broadcast_174 stored as values in memory (estimated size 3.5 MB, free 8.0 GB)
 6 23/11/01 22:14:28 INFO [Executor task launch worker for task 79932] MapOutputTracker: Broadcast mapstatuses size = 435, actual size = 3636093
 7 23/11/01 22:14:28 INFO [Executor task launch worker for task 79932] MapOutputTrackerWorker: Got the output locations
 8 23/11/01 22:14:28 INFO [Executor task launch worker for task 79952] ShuffleBlockFetcherIterator: Getting 9369 non-empty blocks including 138 local blocks and 9231 remote blocks
 9 23/11/01 22:14:28 INFO [Executor task launch worker for task 79972] ShuffleBlockFetcherIterator: Getting 9369 non-empty blocks including 138 local blocks and 9231 remote blocks
10 23/11/01 22:14:28 INFO [Executor task launch worker for task 79932] ShuffleBlockFetcherIterator: Getting 9369 non-empty blocks including 138 local blocks and 9231 remote blocks
11 23/11/01 22:14:28 INFO [Executor task launch worker for task 79952] ShuffleBlockFetcherIterator: Started 9 remote fetches in 9 ms
12 23/11/01 22:14:28 INFO [Executor task launch worker for task 79972] ShuffleBlockFetcherIterator: Started 11 remote fetches in 10 ms
13 23/11/01 22:14:28 INFO [Executor task launch worker for task 79932] ShuffleBlockFetcherIterator: Started 10 remote fetches in 12 ms //【故障分析点1】
14 23/11/01 22:14:30 WARN [shuffle-client-7-1] TransportChannelHandler: Exception in connection from emr-worker-2500.cluster-265451/10.64.156.19:7337
15 io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 838860800, max: 838860800) //【故障分析点2】
16     at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:725) //【故障分析点3】
17     at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:680)
18     at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:758)
19     at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:734)
20     at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:245)
21     at io.netty.buffer.PoolArena.allocate(PoolArena.java:227)
22     at io.netty.buffer.PoolArena.allocate(PoolArena.java:147)
23     at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:342)
24     at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187)
25     at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178)
26     at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:139)
27     at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114)
28     at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147)
29     at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700)
30     at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635)
31     at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552)
32     at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514)
33     at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
34     at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
35     at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
36     at java.lang.Thread.run(Thread.java:748)

 

2.错误分析:

  (1)、通过【故障分析点1】我们能发现,应该是在做shuffle,并且去上游取回数据(因为fetches)

      (2)、通过【故障分析点2】spark程序去申请了16777216字节的Direct内存,也就是16MB,结果说申请失败,因为最大只能使用838860800字节,也就是800MB

      (3)、通过【故障分析点3】,我们去找源代码,发现如果申请的内存 + 使用的内存超过 DIRECT_MEMORY_LIMIT,则报错,也就是【故障分析点2】的报错:failed to allocate 16777216 byte(s) of direct memory (used: 838860800, max: 838860800)

private static void incrementMemoryCounter(int capacity) {
        if (DIRECT_MEMORY_COUNTER != null) {
            for (;;) {
                long usedMemory = DIRECT_MEMORY_COUNTER.get();
                long newUsedMemory = usedMemory + capacity;
                if (newUsedMemory > DIRECT_MEMORY_LIMIT) {
                    throw new OutOfDirectMemoryError("failed to allocate " + capacity
                            + " byte(s) of direct memory (used: " + usedMemory + ", max: " + DIRECT_MEMORY_LIMIT + ')');
                }
                if (DIRECT_MEMORY_COUNTER.compareAndSet(usedMemory, newUsedMemory)) {
                    break;
                }
            }
        }
    }

      那么DIRECT_MEMORY_LIMIT是什么呢?追踪源码发现,是通过 maxDirectMemory 获取到的,通过上面的注解,可以知道,如果没有配置 io.netty.maxDirectMemory 参数,则使用当前使用内存的2倍,但是我们当前的使用内存配置的executor是20G,对不上

1 // Here is how the system property is used:
2 //
3 // * <  0  - Don't use cleaner, and inherit max direct memory from java. In this case the
4 //           "practical max direct memory" would be 2 * max memory as defined by the JDK.
5 // * == 0  - Use cleaner, Netty will not enforce max memory, and instead will defer to JDK.
6 // * >  0  - Don't use cleaner. This will limit Netty's total direct memory
7 //           (note: that JDK's direct memory limit is independent of this).
8 long maxDirectMemory = SystemPropertyUtil.getLong("io.netty.maxDirectMemory", -1);
9 DIRECT_MEMORY_LIMIT = maxDirectMemory;

 

       最后我们在环境中找到这个配置,真相确认

 

3.科普一下,为什么在shuffle read的时候,netty要申请内存?

这里就直接摘抄别人的描述

shuffle分为shuffle write和shuffle read两部分。

这两部是采用Netty框架,日志就是netty内存泄漏问题.
shuffle write的分区数由上一阶段的RDD分区数控制,shuffleread的分区数则是由Spark提供的一些参数控制。

shuffle write可以简单理解为类似于saveAsLocalDiskFile的操作,将计算的中间结果按某种规则临时放到各个executor所在的本地磁盘上。

shuffle read的时候数据的分区数则是由spark提供的一些参数控制。可以想到的是,如果这个参数值设置的很小,同时shuffle read的量很大,就会导致netty自身接收数据的缓存不够用,然后申请Direct 内存来补充,每次只申请16MB,达到最大上限就会报错失败

 

4 解决方案:

        知道原因后问题就好解决了,这里也直接把网上现成的答案搬过来就行,主要从shuffle的数据量和处理shuffle数据的分区数两个角度入手。

(1). 减少shuffle数据

        思考是否可以使用map side join或是broadcastjoin来规避shuffle的产生。

        将不必要的数据在shuffle前进行过滤,比如原始数据有20个字段,只要选取需要的字段进行处理即可,将会减少一定的shuffle数据。

(2). SparkSQL和DataFrame的join,groupby等操作

        通过spark.sql.shuffle.partitions控制分区数,默认为200,根据shuffle的量以及计算的复杂度提高这个值。

(3). Rdd的join,groupBy,reduceByKey等操作

        通过spark.default.parallelism控制shuffleread与reduce处理的分区数,默认为运行任务的core的总数(mesos细粒度模式为8个,local模式为本地的core总数),官方建议为设置成运行任务的core的2-3倍。

(4). 提高executor的内存

        通过spark.executor.memory适当提高executor的memory值。

(5). 是否存在数据倾斜的问题

        空值是否已经过滤?异常数据(某个key数据特别大)是否可以单独处理?考虑改变数据分区规则。

posted @ 2023-11-02 00:40  白羊座怪蜀黍  阅读(206)  评论(0编辑  收藏  举报