pyspark IOException: Too many open files

在训练模型时spark报错:too many open files

Py4JJavaError: An error occurred while calling o315.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 20 in stage 5.0 failed 1 times, most recent failure: Lost task 20.0 in stage 5.0 (TID 64, localhost, executor driver): java.io.FileNotFoundException: /tmp/spark-temp/blockmgr-c2f18891-a868-42ba-9075-dc145faaa4c4/16/temp_shuffle_f9c96d48-336d-423a-9edd-dcb9af5705a7 (Too many open files)

解决一:修改linux系统的/etc/security/limits.conf配置文件提高用户可打开的文件句柄。(在服务器上没有权限)

解决二:提高spark.sql.shuffle.partitions的数目。(默认200)

from pyspark import SparkConf
from pyspark.sql import SparkSession

conf = SparkConf()
conf.set('spark.driver.memory','30g').set('spark.executor.memory','30g')\
    .set('spark.executor.cores','20').set('spark.master','spark://tc6000:7077')\
    .set('spark.sql.shuffle.partitions',400)	# 提高到400
spark = SparkSession.builder.appName('LRModelForCTR').config(conf=conf).getOrCreate()
  • 知其所以然:
    • spark.sql.shuffle.partitions shuffle时的分区数,默认200个,一个rdd分为多个partition,例如从hdfs上读取文件时一个hdfs的block对应一个spark的partition。
    • paritions的数量n应使得 n*128M > 文件大小(128M是一个block的默认大小)
posted @ 2021-01-21 09:37  风和雨滴  阅读(510)  评论(0编辑  收藏  举报