spark 报错 InvalidClassException: no valid constructor
1 2019-03-19 02:50:24 WARN TaskSetManager:66 - Lost task 1.0 in stage 0.0 (TID 1, 1.2.3.4, executor 1): java.io.InvalidClassException: xxx.xxx.spark.xxx.xxx.Test; no valid constructor 2 at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:157) 3 at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:862) 4 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2041) 5 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) 6 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) 7 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) 8 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) 9 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) 10 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) 11 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) 12 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) 13 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) 14 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) 15 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) 16 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) 17 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) 18 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) 19 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) 20 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) 21 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) 22 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) 23 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) 24 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) 25 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) 26 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) 27 at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) 28 at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) 29 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80) 30 at org.apache.spark.scheduler.Task.run(Task.scala:109) 31 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) 32 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 33 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 34 at java.lang.Thread.run(Thread.java:748)
如果你执行spark的job时出现上面这个错误,同时满足以下条件
1、存在抽象父类
2、抽象父类中存在有参数的构造方法
3、子类实现了序列化接口,父类没有实现
那么恭喜你,来对地方了,解决方法非常简单,实现抽象父类的默认无参数构造方法就可以
下面代码的第9行
1 public abstract class AbBaseThread extends Thread { 2 3 protected SparkSession session; 4 5 protected JavaSparkContext jsc; 6 7 private SparkParam sp; 8 9 public AbBaseThread() {} 10 11 public AbBaseThread(SparkSession session, SparkParam sp) { 12 this.session = session; 13 this.jsc = new JavaSparkContext(session.sparkContext()); 14 this.sp = sp; 15 } 16 17 public abstract String execute(SparkParam sp); 18 19 public void run() { 20 String result = this.execute(sp); 21 Jedis jedis = RedisUtil.getJedis(); 22 jedis.set(sp.getUuid(), result); 23 RedisUtil.returnResource(jedis); 24 } 25 26 }
为什么会出现上面这个问题,我在这里强行解释一波:
1、子类序列化的时候,因为父类没有实现序列化接口,所以序列化的流中不存在父类相关的信息
2、反序列化子类对象和直接实例化一个子类对象的原理应该差不多,所以在反序列子类的时候也是先有父类对象再有子类对象,父类怎么来的,调用默认的无参构造函数来的。