I'm looking into using Cassandra to store 50M+ documents that I currently have in XML format. I've been hunting around but I can't seem to find anything I can really follow on how to bulk load this data into Cassandra without needing to write some Java (not high on my list of language skills!).
I can happily write a script to convert this data into any format if it would make the loading easier although CSV might be tricky given the body of the document could contain just about anything!
Any suggestions welcome.
Thanks
Si
If you're willing to convert the XML to a delimited format of some kind (i.e. CSV), then here are a couple options:
COPY
command in cqlsh
. This actually got a big performance boost in a recent version of Cassandra.cassandra-loader
utility. This is a lot more flexible and has a bunch of different options you can tweak depending on the file format.If you're willing to write code other than Java (for example, Python), there are Cassandra drivers available for a bunch of programming languages. No need to learn Java if you've got another language you're better with.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments