Bulk load XML files into Cassandra

Simon

I'm looking into using Cassandra to store 50M+ documents that I currently have in XML format. I've been hunting around but I can't seem to find anything I can really follow on how to bulk load this data into Cassandra without needing to write some Java (not high on my list of language skills!).

I can happily write a script to convert this data into any format if it would make the loading easier although CSV might be tricky given the body of the document could contain just about anything!

Any suggestions welcome.

Thanks

Si

Luke Tillman

If you're willing to convert the XML to a delimited format of some kind (i.e. CSV), then here are a couple options:

  1. The COPY command in cqlsh. This actually got a big performance boost in a recent version of Cassandra.
  2. The cassandra-loader utility. This is a lot more flexible and has a bunch of different options you can tweak depending on the file format.

If you're willing to write code other than Java (for example, Python), there are Cassandra drivers available for a bunch of programming languages. No need to learn Java if you've got another language you're better with.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

how to load .tsv files into cassandra

From Dev

Load bulk into PostgreSQLfrom log files using Python

From Dev

How to bulk load into cassandra other than copy method.?

From Dev

Parsing the USPTO bulk XML files using Python

From Dev

Fragmented XML Bulk Load to SQL Server in C#

From Dev

Dynamic Filereference Load() of XML Files and add to XMListCollection

From Dev

HBase bulk delete as "complete bulk load"

From Java

How can I bulk load CSV files into Snowflake with the filename added as a column?

From Dev

How import xml files with mysql LOAD XML LOCAL INFILE

From Dev

Bulk validating yaml files

From Dev

Bulk rename files with numbering

From Dev

Bulk validating yaml files

From Dev

Rename bulk files

From Dev

Remove bulk files

From Dev

MLCP load compressed xml files and skip xml files with a specific xml tag

From Dev

Bulk load Postgres with unique constraints

From Dev

Correct way to load bulk data

From Dev

Bulk initial load for postgresql in SymmetricDS

From Dev

SQLXML Bulk Load or manual iteration?

From Dev

HBase Bulk Load jar issue

From Dev

SQLXML Bulk Load connection string

From Dev

what is the @param in elasticsearch bulk load

From Dev

How to bulk/batch insert in cassandra using cqlengine?

From Dev

Bulk Export of Cassandra column family to CSV

From Dev

High load on Cassandra nodes

From Dev

How to balance load in Cassandra?

From Dev

Using XSLT to recursively load relative XML files and apply transformation

From Dev

How to load all the Xml files from a folder to an XmlDocument

From Dev

How to load all the Xml files from a folder to an XmlDocument