How to read an .xls file from AWS S3 using spark in java? And unable to read sheetName

Mehaboob Khan Published at Dev

Mehaboob Khan

I am trying to read a .xls file from AWS S3 but getting java.io.FileNotFoundException exception.

I tried below two approaches. One by giving the path in option() with key location and another by adding the same path in load() as well.

Dataset<Row> segmentConfigData = spark.read()
                .format("com.crealytics.spark.excel")
                .option("sheetName", "sheet1")
                .option("header","true")
                .option("location","s3a://input/552SegmentConfig.xls")
                .option("useHeader", "true")
                .option("treatEmptyValuesAsNulls", "true")
                .option("inferSchema", "true")
                .option("addColorColumns", "False")
                .load();

Dataset<Row> segmentConfigData = spark.read()
                .format("com.crealytics.spark.excel")
                .option("sheetName", "sheet1")
                .option("header","true")
                .option("location","s3a://input/552SegmentConfig.xls")
                .option("useHeader", "true")
                .option("treatEmptyValuesAsNulls", "true")
                .option("inferSchema", "true")
                .option("addColorColumns", "False")
                .load("s3a://input/552SegmentConfig.xls");

I get file not found an exception. Similarly, when I read .csv file I am able to read the file.

Edit- I have solved this issue. I was using an older version of "com.crealytics.spark.excel". I was able to ready once I ungraded the jar.

But now I am facing another issue. I am unable to read any other sheet other then the first sheet. Any Help?

Mehaboob Khan

I have solved this issue. I was using an older version of "com.crealytics.spark.excel". I was able to ready once I ungraded the jar.

Further, I was just able to read the first sheet of (.xls) file. Below is the code snippet:

spark.read()
    .format("com.crealytics.spark.excel")
    .option("location",path)
    .option("sheetName", sheetName)
    .option("dataAddress", "'"+sheetName+"'!A1")
    .option("header","true")
    .option("useHeader", "true")
    .option("treatEmptyValuesAsNulls", "true")
    .option("inferSchema", "true")
    .option("addColorColumns", "False")
    .load(path);

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-12-6

Comments

0 comments

From Java

Related Related

Article

How to read an .xls file from AWS S3 using spark in java? And unable to read sheetName

How to read an .xls file from AWS S3 using spark in java? And unable to read sheetName

How to read Parquet file from S3 without spark? Java

How to read parquet file from s3 using dask with specific AWS profile

Spark read file from S3 using sc.textFile ("s3n://...)

Read AWS s3 File to Java code

How to read data from XLS (Excel) file [Java, Android]

Read file from aws s3 bucket using node fs

Spark: read csv file from s3 using scala

How do I upload a CSV file in myBucket and Read File in S3 AWS using Python

Read a csv file from aws s3 using boto and pandas

AWS Lambda@edge. How to read HTML file from S3 and put content in response body

Read the contents of a file from AWS s3 using its Pre-signed URL

How to read multiple files from AWS S3 in spark dataframe?

Python: How to read and load an excel file from AWS S3?

How to read file from AWS s3 in python flask on web

How to tell what AWS credentials Spark is using to read S3 files?

How to read csv file from s3 bucket in AWS Lambda?

How to write a policy in .yaml for a python lambda to read from S3 using the aws sam cli

how to create read only and write only token for specific resource for a file in s3 using AWS STS

Unable to read csv from S3 using R

Read csv file from S3 into spark in R

AWS charges for spark read from S3 buckets?

How to read file from s3?

How to read Snappy Compressed file from S3 in Java

How to read AWS S3 images from Sagemaker for processing

pyspark read file from AWS S3 not working

How to read a text file in S3 bucket from inside an AWS EMR without using spark

How to read pickle file from AWS S3 nested directory?

How to read and overwrite a file in AWS s3 using Lambda and Python?

How to read .dat file from AWS S3 using mdfreader