问题记录

1:sparksql中无法批量删除hive分区

spark.sql("alter table spd_trancare_mid.tmp_package_info_from_s3 drop partition(dt<=20200319)") #报错
print('end') 

 2:spark运行时出现如下错误:

20/09/16 04:08:28 ERROR YarnScheduler: Lost executor 2 on ip-172-10-2-108.eu-west-1.compute.internal: 
Container killed by YARN for exceeding memory limits. 4.8 GB of 4.8 GB physical memory used.

网上的资料:

spark.yarn.executor.memoryOverhead默认等于max( executorMemory * 0.10,384M),那么增大executorMemory再多,堆外内存实际增加的也比较少,因此考虑直接增大spark.yarn.executor.memoryOverhead值。

------------------------------------------------------------

EMR的spark出现此问题时的说明:

https://aws.amazon.com/cn/premiumsupport/knowledge-center/emr-spark-yarn-memory-limit/

------------------------------------------------------------

引申:Spark底层shuffle的传输方式是使用netty传输,netty在进行网络传输的过程会申请堆外内存(netty是零拷贝),所以使用了堆外内存。默认申请的堆外内存是Executor内存的10%

posted @ 2020-06-28 18:25  muyue123  阅读(349)  评论(0编辑  收藏  举报