问题记录

1:sparksql中无法批量删除hive分区

spark.sql("alter table spd_trancare_mid.tmp_package_info_from_s3 drop partition(dt<=20200319)") #报错
print('end')

2:spark运行时出现如下错误：

20/09/16 04:08:28 ERROR YarnScheduler: Lost executor 2 on ip-172-10-2-108.eu-west-1.compute.internal: 
Container killed by YARN for exceeding memory limits.  4.8 GB of 4.8 GB physical memory used.

网上的资料：

spark.yarn.executor.memoryOverhead默认等于max( executorMemory * 0.10，384M)，那么增大executorMemory再多，堆外内存实际增加的也比较少，因此考虑直接增大spark.yarn.executor.memoryOverhead值。

------------------------------------------------------------

EMR的spark出现此问题时的说明：

https://aws.amazon.com/cn/premiumsupport/knowledge-center/emr-spark-yarn-memory-limit/

------------------------------------------------------------

引申：Spark底层shuffle的传输方式是使用netty传输，netty在进行网络传输的过程会申请堆外内存（netty是零拷贝），所以使用了堆外内存。默认申请的堆外内存是Executor内存的10%

posted @ 2020-06-28 18:25 muyue123 阅读(367) 评论(0) 编辑收藏举报

刷新页面返回顶部

muyue123

问题记录

公告