query from athena a file in s3

Chicago1988

I am complete newbie on this. I have this file in amazon s3.

How can I query this .tar.gz from Athena?

enter image description here

I am assuming I have to somehow decompress and ‘restore’ to ‘athena’? But I do not know how to do it.

Kanetkar Rev

You can directly query files in AWS Athena that are in .gz format as well as any flat files. If your tar file contains multiple .gz files and they are of the same file format then you don't need to gunzip them to .tsv.

Since, you have already converted to .tsv files make sure the files of the same format are put into a folder e.g. s3://bucketname/folder/file1.gz s3://bucketname/folder/file2.gz

etc. file1 and file2 should have the same structure.

Then define your AWS Athena table on top of this. Sample script below -

CREATE EXTERNAL TABLE table_name (
yr INT,
 quarter INT,
 month INT,
 dayofmonth INT,
 dayofweek INT,
 flightdate STRING
)
 PARTITIONED BY (year STRING)
 ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  ESCAPED BY '\\'
  LINES TERMINATED BY '\n'
 LOCATION 's3://bucketname/folder/';

Keeping homogeneous files is not mandatory but recommended so that you can add remove files under the same folder and just update the partition information every time there is a change.

Run MSCK REPAIR TABLE to refresh partition metadata each time a new partition is added to this table.

MSCK REPAIR TABLE table_name ;

Reference - https://docs.aws.amazon.com/athena/latest/ug/lazy-simple-serde.html#tsv-example

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

CSV row headers are not coming in Athena query from S3

From Dev

Copy of file from one s3 location to another does not work after athena query boto3

From Dev

Alternatives for Athena to query the data on S3

From Dev

How to load data into Power BI from S3 using an Athena query?

From Dev

Not able to Athena query CloudTrail Logs from KMS CMK Encrypted S3 bucket

From Dev

Athena query to find the size of directory in s3 bucket

From Dev

Query S3 logs content using Athena or DynamoDB

From Dev

S3 bucket policy is not allowing Athena to perform query execution

From Dev

Why Athena for QuickSight, If QuickSight direct query S3

From Dev

Query to multiple csv fles at S3 through Athena

From Dev

Partition Athena query by S3 created date

From Dev

How to skip headers when we are reading data from a csv file in s3 and creating a table in aws athena.

From Dev

S3/Athena query result location and “Invalid S3 folder location”

From Dev

Aws Vpc flow logs to athena from s3

From Dev

How to query data from gz file of Amazon S3 using Qubole Hive query?

From Dev

Query S3 json file in AWS

From Dev

Query JSON file in Presto in S3

From Dev

Boto3 athena query without saving data to s3

From Dev

Athena query fails with boto3 (S3 location invalid)

From Dev

Redshift Spectrum query returns 0 row from S3 file

From Dev

How to get the full results of a query to CSV file using AWS/Athena from CLI?

From Dev

Query S3 from Python

From Dev

querying AWS firehose data in s3 with Athena - can't query batches

From Dev

Query pre-created sub-folders in s3 using a single table schema in Athena

From Dev

How to query in AWS athena connected through S3 using lambda functions in python

From Dev

How to Query parquet data from Amazon Athena?

From Dev

AWS Athena Return Zero Records from Tables Created by GLUE Crawler input csv from S3

From Dev

How to update AWS Athena with the latest file in S3 using Glue Crawler?

From Dev

AWS Athena: partition table for S3 bucket with non-standard file structure

Related Related

  1. 1

    CSV row headers are not coming in Athena query from S3

  2. 2

    Copy of file from one s3 location to another does not work after athena query boto3

  3. 3

    Alternatives for Athena to query the data on S3

  4. 4

    How to load data into Power BI from S3 using an Athena query?

  5. 5

    Not able to Athena query CloudTrail Logs from KMS CMK Encrypted S3 bucket

  6. 6

    Athena query to find the size of directory in s3 bucket

  7. 7

    Query S3 logs content using Athena or DynamoDB

  8. 8

    S3 bucket policy is not allowing Athena to perform query execution

  9. 9

    Why Athena for QuickSight, If QuickSight direct query S3

  10. 10

    Query to multiple csv fles at S3 through Athena

  11. 11

    Partition Athena query by S3 created date

  12. 12

    How to skip headers when we are reading data from a csv file in s3 and creating a table in aws athena.

  13. 13

    S3/Athena query result location and “Invalid S3 folder location”

  14. 14

    Aws Vpc flow logs to athena from s3

  15. 15

    How to query data from gz file of Amazon S3 using Qubole Hive query?

  16. 16

    Query S3 json file in AWS

  17. 17

    Query JSON file in Presto in S3

  18. 18

    Boto3 athena query without saving data to s3

  19. 19

    Athena query fails with boto3 (S3 location invalid)

  20. 20

    Redshift Spectrum query returns 0 row from S3 file

  21. 21

    How to get the full results of a query to CSV file using AWS/Athena from CLI?

  22. 22

    Query S3 from Python

  23. 23

    querying AWS firehose data in s3 with Athena - can't query batches

  24. 24

    Query pre-created sub-folders in s3 using a single table schema in Athena

  25. 25

    How to query in AWS athena connected through S3 using lambda functions in python

  26. 26

    How to Query parquet data from Amazon Athena?

  27. 27

    AWS Athena Return Zero Records from Tables Created by GLUE Crawler input csv from S3

  28. 28

    How to update AWS Athena with the latest file in S3 using Glue Crawler?

  29. 29

    AWS Athena: partition table for S3 bucket with non-standard file structure

HotTag

Archive