我在下面的输入文件
INPUTFILE_CNTRY_CODE|INPUTFILE_CTY_CODE|INPUTFILE_ID|INPUTFILE_LTY_ID|INPUTFILE_CNSM_ID|INPUTFILE_DATE|INPUTFILE_TIME|INPUTFILE_TRDATE
GBR|263|326735246||I034867789V|15/11/30|2015-11-30 00:00:00.000000|2016-22-06
GBR|263|397802068|PC7135361|PC7135361|16/05/20|2016-10-06 11:50:05.000000|2016-22-07
我正在尝试阅读以下内容。
val registeration_schema = StructType(List(
StructField("INPUTFILE_CNTRY_CODE", StringType),
StructField("INPUTFILE_CTY_CODE", IntegerType),
StructField("INPUTFILE_ID", IntegerType),
StructField("INPUTFILE_LTY_ID", StringType),
StructField("INPUTFILE_CNSM_ID", StringType),
StructField("INPUTFILE_DATE", DateType),
StructField("INPUTFILE_TIME", TimestampType),
StructField("INPUTFILE_TRDATE", DateType)
))
val registerationDF = spark.read
.option("header", "true")
.option("delimiter", "|")
.option("mode", "FAILFAST")
.schema(registeration_schema)
.option("dateFormat", "yy/M/d")
.option("timestampFormat", "yyyy-MM-dd HH:mm:ss.SSSSSS")
.csv("registration2.csv")
而且我正在错误以下。
Caused by: org.apache.spark.sql.catalyst.util.BadRecordException: java.time.format.DateTimeParseException: Text '2016-22-06' could not be parsed at index 2
at org.apache.spark.sql.catalyst.csv.UnivocityParser.org$apache$spark$sql$catalyst$csv$UnivocityParser$$convert(UnivocityParser.scala:262)
at org.apache.spark.sql.catalyst.csv.UnivocityParser.$anonfun$doParse$2(UnivocityParser.scala:200)
at org.apache.spark.sql.catalyst.csv.UnivocityParser.parse(UnivocityParser.scala:207)
at org.apache.spark.sql.catalyst.csv.UnivocityParser$.$anonfun$parseIterator$1(UnivocityParser.scala:347)
at org.apache.spark.sql.catalyst.util.FailureSafeParser.parse(FailureSafeParser.scala:60)
... 27 more
Caused by: java.time.format.DateTimeParseException: Text '2016-22-06' could not be parsed at index 2
发生这种情况是因为日期格式不同,我在加载Dataframe时仅指定了一种日期格式。有人可以指导如何在将CSV读取为datframe的同时处理多个日期格式。
DateType
加载CSV时,不能定义多种格式。但是您可以通过使用spark 2.2+提供的date_format(
)和to_date(
)函数来实现此目的。
较高级别的步骤如下-
string
原始模式一样,定义这两列之一。我选择定义INPUTFILE_DATE
为string
我示范。INPUTFILE_DATE
使用适当的格式date_format()
和to_date()
功能,其数据类型转换为日期。使用INPUTFILE_DATE-> StringType定义原始架构
val registeration_schema = StructType(List(
StructField("INPUTFILE_CNTRY_CODE", StringType),
StructField("INPUTFILE_CTY_CODE", IntegerType),
StructField("INPUTFILE_ID", IntegerType),
StructField("INPUTFILE_LTY_ID", StringType),
StructField("INPUTFILE_CNSM_ID", StringType),
StructField("INPUTFILE_DATE", StringType),
StructField("INPUTFILE_TIME", TimestampType),
StructField("INPUTFILE_TRDATE", DateType)
))
val registerationDF = spark.read
.option("header", "true")
.option("delimiter", "|")
.option("mode", "FAILFAST")
.schema(registeration_schema)
.option("dateFormat", "yyyy-dd-MM")
.option("timestampFormat", "yyyy-MM-dd HH:mm:ss.SSSSSS")
.csv("registration2.csv")
该解决方案的核心部分是-
val targetDF = registerationDF.withColumn("INPUTFILE_DATE",to_date((date_format(to_date(col("INPUTFILE_DATE"),"yy/MM/dd"),"yyyy-dd-MM")),"yyyy-dd-MM"))
最终结果 -
scala> targetDF.printSchema()
root
|-- INPUTFILE_CNTRY_CODE: string (nullable = true)
|-- INPUTFILE_CTY_CODE: integer (nullable = true)
|-- INPUTFILE_ID: integer (nullable = true)
|-- INPUTFILE_LTY_ID: string (nullable = true)
|-- INPUTFILE_CNSM_ID: string (nullable = true)
|-- INPUTFILE_DATE: date (nullable = true)
|-- INPUTFILE_TIME: timestamp (nullable = true)
|-- INPUTFILE_TRDATE: date (nullable = true)
scala> targetDF.show()
+--------------------+------------------+------------+----------------+-----------------+--------------+-------------------+----------------+
|INPUTFILE_CNTRY_CODE|INPUTFILE_CTY_CODE|INPUTFILE_ID|INPUTFILE_LTY_ID|INPUTFILE_CNSM_ID|INPUTFILE_DATE| INPUTFILE_TIME|INPUTFILE_TRDATE|
+--------------------+------------------+------------+----------------+-----------------+--------------+-------------------+----------------+
| GBR| 263| 326735246| null| I034867789V| 2015-11-30|2015-11-30 00:00:00| 2017-10-06|
| GBR| 263| 397802068| PC7135361| PC7135361| 2016-05-20|2016-10-06 11:50:05| 2017-10-07|
+--------------------+------------------+------------+----------------+-----------------+--------------+-------------------+----------------+
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句