我是新手Spark
,我对map
函数内部的序列化有疑问。这是代码的一些要素
private Function<Row, String> SparkMap() throws IOException {
return new Function<Row, String>() {
public String call(Row row) throws IOException {
/* some code */
}
};
}
public static void main(String[] args) throws Exception {
MyClass myClass = new MyClass();
SQLContext sqlContext = new SQLContext(sc);
DataFrame df = sqlContext.load(args[0], "com.databricks.spark.avro");
JavaRDD<String> output = df.javaRDD().map(myClass.SparkMap());
}
这是错误日志
Caused by: java.io.NotSerializableException: myPackage.MyClass
Serialization stack:
- object not serializable (class: myPackage.MyClass, value: myPackage.MyClass@281c8380)
- field (class: myPackage.MyClass$1, name: this$0, type: class myPackage.MyClass)
- object (class myPackage.MyClass$1, myPackage.MyClass$1@28ef1bc8)
- field (class: org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1, type: interface org.apache.spark.api.java.function.Function)
- object (class org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312)
... 12 more
如果我将SparkMap
方法声明为静态,则它将运行。怎么会这样
唯一的解释是例外:
object not serializable (class: myPackage.MyClass, value: myPackage.MyClass@281c8380)
只需使您的MyClas
s Serializable
,它就可以工作。
它以静态方式工作,因为在这种情况下它仅采用函数,而不是整个myClass
对象
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句