Coder Social home page Coder Social logo

cassandra-loader's People

Contributors

brian-salgado avatar brianmhess avatar gisdev01 avatar jeromatron avatar phact avatar rkhaja avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cassandra-loader's Issues

Issue using Delimiter

Is there a reason why the delimiter is only allowed to be one character? Running into an issue where data from an unload also has the same characters as the delimiter. Getting invalid data type exceptions as the parser is splitting up the data in the wrong places. Is there a recommended way to handle this type of scenario?

Loader can't load data to a table with a counter column

Loading a table with a counter column causes an exception saying CQL UPDATE must be used instead of INSERT. There isn't an option to make the loader do this.

Exception in thread "main" java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.InvalidQueryException: INSERT statement are not allowed on counter tables, use UPDATE instead
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:188)
    at com.datastax.loader.CqlDelimLoad.run(CqlDelimLoad.java:558)
    at com.datastax.loader.CqlDelimLoad.main(CqlDelimLoad.java:599)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: INSERT statement are not allowed on counter tables, use UPDATE instead
    at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
    at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:291)
    at com.datastax.driver.core.AbstractSession.prepare(AbstractSession.java:79)
    at com.datastax.loader.EnhancedSession.prepare(EnhancedSession.java:93)
    at com.datastax.loader.CqlDelimLoadTask.setup(CqlDelimLoadTask.java:172)
    at com.datastax.loader.CqlDelimLoadTask.call(CqlDelimLoadTask.java:144)
    at com.datastax.loader.CqlDelimLoadTask.call(CqlDelimLoadTask.java:69)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: INSERT statement are not allowed on counter tables, use UPDATE instead
    at com.datastax.driver.core.Responses$Error.asException(Responses.java:102)
    at com.datastax.driver.core.SessionManager$1.apply(SessionManager.java:163)
    at com.datastax.driver.core.SessionManager$1.apply(SessionManager.java:138)
    at com.google.common.util.concurrent.Futures$1.apply(Futures.java:720)
    at com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:859)
    ... 3 more

Cassandra Collection is not loaded correctly

Hi Brian,

I mentioned the issue with loading collections after your presentation at Cassandra Summit last week. This is a great tool and hope you can enhance collections loading. The records were loaded but the collection data was not loaded correctly - please see the csv file and loaded cqlsh table rows below.

Thanks,
Jane

load_collection.csv file

[root@usddcas01 test-load]# cat load_collection.csv
23,A23323226_d_X,925077069999,1,"[1, 2]"
23,A23323226_d_X,925077069999,2,"[1, 2, 1, 2, 3, 4]"
23,A23323226_d_X,925077069999,3,"[1, 2, 1, 2, 3, 4, 1, 2, 3, 4]"

cqlsh:demo> select * from load_collection;

p_key | sub_id | c_key | l_seq | d_seq
-----------+------------+-----------+----------+----------

cqlsh:demo> exit

./cassandra-loader -f load_collection.csv -host xxxx -schema "demo.load_collection(p_key,sub_id,c_key,l_seq,d_seq)"

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
*** Processing load_collection.csv
*** DONE: load_collection.csv number of lines processed: 3 (3 inserted)
Lines Processed: 3 Rate: 0.0

test-load]# cqlsh

Connected to Jane Test Cluster at x.x.x.x:9042.
[cqlsh 5.0.1 | Cassandra 2.1.8.621 | DSE 4.7.2 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
cqlsh:demo> select * from load_collection;

p_key | sub_id | c_key | l_seq | d_seq
-----------+---------------+--------------+----------+--------------------------------------------------------
23 | A23323226_d_X | 925077069999 | 1 | [1, 2]
23 | A23323226_d_X | 925077069999 | 2 | [1, 2, 1, 2, 1, 2, 3, 4]
23 | A23323226_d_X | 925077069999 | 3 | [1, 2, 1, 2, 1, 2, 3, 4, 1, 2, 1, 2, 3, 4, 1, 2, 3, 4]

Error on Loading CSV with newlines in text

Hi,

I have a Column in the MySql Database which contains linebreak.
When I export the table, that newline's are escaped with '' and the file actually contains a line break.
e.g.

2,"DE","0000099593A57021F5B447D06EF5B52E","Von Horace G.  Shooting, Vol. 2\
\
Shooting  for this title.\
\
About the Publisher\
\
Forgotten books. Find more at \
", ... other columns

Would it be possible, to support this too?

regards
Guenther

How to unload by partition key

is there a way to support unload by certain partition key instead of the whole table? for example food(date, name), would like to unload data with partition key date = '06-13-2016'.

Unable to skip the columns with -skipCols option

I am unable to skip the columns while losing data with this cassandra loader -skipCols option.Below is the command i am using and corresponding error getting with it.
Could you please help me with this issue fix.Also explain exact usage of it.
skipRows option is wrking fine.

$ ./cassandra-loader -f ~/inputfile.csv -host 127.0.0.1 -schema "ks1.skiptest(test1,test2,test3)" -skipCols 1
*** Processing skipprocessing.csv
Rows has different number of fields (3) than expected (4)
Error parsing line 1 in skipprocessing.csv: dpk,hyd,test
Rows has different number of fields (3) than expected (4)
Error parsing line 2 in skipprocessing.csv: rajan,chn,test
*** DONE: skipprocessing.csv number of lines processed: 2 (0 inserted)
Lines Processed: -1 Rate: 0.0

tombstones with null values

cassandra-loader generates tombstones when the inserted rows have null values.
I know there were a couple of closed issues around this topic, but it doesn't look like it is fixed.
I was testing it with version 0.0.23 and cassandra 3.0

Does not work with Cassandra 3.0

Because Cassandra 3.0 changed some internal tables and older versions of drivers try to access them, both tools crash when trying to connect. Please update the Java driver.

# ./cassandra-unloader -f xxx.csv -host 1.2.3.4 -schema xxx.foo
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Exception in thread "main" com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /1.2.3.4:9042 (com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table schema_keyspaces))
        at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:223)
        at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:78)
        at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1272)
        at com.datastax.driver.core.Cluster.init(Cluster.java:158)
        at com.datastax.driver.core.Cluster.connect(Cluster.java:248)
        at com.datastax.loader.CqlDelimUnload.setup(CqlDelimUnload.java:329)
        at com.datastax.loader.CqlDelimUnload.run(CqlDelimUnload.java:350)
        at com.datastax.loader.CqlDelimUnload.main(CqlDelimUnload.java:444)

Loader inserts null values

The Loader prepares an insert statement with all columns and uses it on every row, regardless of whether the row has null for some(or many) of those columns.

This may cause a lot of tombstones to be created, waste a whole lot of space, and slow down the load with extra I/O.

It would be better to create/cache PreparedStatements for the various permutations of present columns as they are encountered.

Cassandra-loader skips first 2 letters of the first row in a csv file

Here're the tests:

1. Test1

--CSV file
10206,"6446670_d_01"
10931,"6556670_d_01"
10351,"8313084_d_01"
10581,"7742973_d_01"

I did multiple tests and the first 2 letters of the first column in first row are always skipped. I didn't notice this in the old version 0.18.

-- the first column value of the first row is cut from 10206 to 206.

mykey | subid
-----------+--------------
10581 | 7742973_d_01
10931 | 6556670_d_01
206 | 6446670_d_01
10351 | 8313084_d_01

2. Test2 - it will not load the first row.

1,"aaa"
2,"bbb"
3,"ccc"
4,"hhh"

*** Processing text1.csv
Row has different number of fields (1) than expected (2)
Error parsing line 1 in text1.csv: "aaa"
*** DONE: text1.csv number of lines processed: 4 (3 inserted)

-- first row is not loaded
id | c1
----+-----
2 | bbb
3 | ccc
4 | hhh

Loader Quoting/Escaping and Data Corruption

The loader & parsers have some pretty glaring issues with quoting/escaping that not only cause errors, but worse: can cause data to be silently corrupted.

I started to fix some of these myself, but quickly realized that the issues were architectural; It will need a bit of an overhaul to make this enterprise-ready. We might have to go a different route, but here are the results of my research if you want to take a crack at it.

CSV

While CSV is a bit of an amorphous spec, there are some core behaviours that are arguably necessary to implement. As it stands, you can't really run any CSV file through cassandra-loader that was written by Excel or any other mainstream CSV processing library. This should really be solved by using one of those libraries, instead of writing a CSV implementation from scratch.

Escaping Quotes
In the case where double-quotes are present in the field text, a proper CSV writer will escape each one with another double-quote, and then double-quote the entire field. cassandra-loader does not un-escape these when reading the CSV.
For example:
Raw Text: Text "with" Quotes
CSV Field: "Text ""with"" Quotes"
cassandra-loader writes Text ""with"" Quotes to the database

Preserving Whitespace
cassandra-loader performs a 'trim' on field contents, even when quoted, causing leading and trailing whitespace to be lost. Worse, whitespace-only fields will be inserted as null with the default nullString. Really, when whitespace is within the quoted area, it should never be removed.

There is also the issue of newlines, as reported by #16

Collections

These issues get much worse once we start using collections. Again, it seems like we ought to be treating these as JSON or something that can be parsed with an existing library, rather than trying to roll something new.

Note: For readability, the below examples will not include any CSV quoting.

Text Containing Commas
cassandra-loader chokes when commas are present in any text element of a set, list, or map.
For example, the simplest case of {","} will cause the following error:
Invalid format in input 0: Must end with }

As I was looking through the code, I noticed that there was handling for the backslash to be used as an escape character. Since it is undocumented, I'm not sure what the expected behaviour is, but here are my observations:

  • When used to escape commas, it prevents the above error, but causes data corruption, as it carries the backslash with it. Thus, in the example above, doing {","} will cause , to be written to the database set.
  • Since we're inside a set of double-quotes, we should not need to escape the comma in the first place.

Text Containing Double-Quotes
Similar to the previous case, an error will be thrown when unescaped quotes are present in a collection(but only if there are an odd number of them, and the collection contains further elements). The main difference here is that we would expect quotes to require escaping.

As with commas, escaping with backslashes silences the error, but causes the same data corruption, as the backslashes end up in the database.

Maps With Text Keys & Values
When maps only have text as either key or value, (i.e. {1: "a", 2: "b"} or {"a": 1, "b": 2}), then it seems to work okay.
However, if both keys and values are text(i.e. {"a": "b", "x": "y"}), then everything becomes corrupted. In this example, the following entries will be written to the database map:

  • Key: x":"y Value: null
  • Key: a":"b Value: null

Preserving Whitespace
As with the CSV-level code, leading/trailing whitespace is lost from quoted text areas within collections(ex: {" "}).

missing repos

My linux env doesn't have access to certain sites to get the repos...can you include them in the build?

Configurable memory settings for cassandra-loader.sh

Hi!

Noticed that the default cassandra-loader.sh script runs with 8G memory settings. Perhaps it is worth making this configurable? In some cases we're using cassandra loader on small virtual machines with few gb of memory.

As a workaround we're currently running cassandra-loader with java -jar and that way we can control the memory settings.

Thanks

File encoding problem

Hi

In my case, the loader finally used 'ANSI_X3.4-1968' because I happened to fail to set the right locale on my machine. I guess the loader will not assume the file encoding and inherit it from system property.

Since Cassandra assumes text is UTF-8 encoded string, it will be nicer if the loader can read files assuming it is UTF-8 encoded.

Thank you
onesuper

[question] cassandra-loader vs sstableloader

Hello. Sorry in advance if I missed this explanation in the README.

Say, I have lots of data to put into Cassandra 2.1.x regularly. In terms of raw speed, what's the preferred tool to use: this one or sstableloader (as explained here https://www.datastax.com/dev/blog/using-the-cassandra-bulk-loader-updated)?

This tool is probably easier to use because sstableloader requires some additional Java code to write to prepare an sstable. But ignoring this factor and leaving only performance on the table, what's your suggestion?

Thanks.

charsPerColumn limit

I tried to set charsPerColumn to -1 for no limit but it returns a must be positive error.

So what is the limit then?

License unspecified

You may want to include a license. Excluding one altogether may deter some potential contributors and/or future users.

Issues installing and running

Trying to install and run for the first time. Building from source, gradle fails with a series of access errors, unable to retrieve various components from repo1.maven.org and other sites, even though curl has no issue getting same objects. I then copied the prebuilt version but it is probably not compatible with my RHEL7 installation on a VM - it complains about a version compatiblity issue -- Exception in thread "main" java.lang.UnsupportedClassVersionError: com/datastax/loader/CqlDelimLoad : Unsupported major.minor version 52.0.

Please advise, and thanks in advance.

Timestamp inside collection does not round trip

Here is a csv line that cassandra-loader is unable to parse:

554aa617-b9c3-4863-8dc2-8a2cd03b2f43,"{\"feide:[email protected]\":8/11/17 10:44 AM,\"p:2a08783f-9597-42d9-9cba-91248f1150ef\":8/3/17 8:25 AM}"

The line was produced by cassandra-unloader called like so:

cassandra-unloader -host <src ip> -f /tmp/example.csv -schema "dataporten.users(userid, userid_sec_seen)"

and we called cassandra-loader like so:

cassandra-loader -host <dst ip> -f /tmp/example.csv -schema "dataporten.users(userid, userid_sec_seen)"

We expected the second column to be loaded into the database and cqlsh to show it like this:

{'feide:[email protected]': '2017-08-11 08:44:39+0000', 'p:2a08783f-9597-42d9-9cba-91248f1150ef': '2017-08-03 06:25:35+0000'}

But cassandra-loader failed to parse the timestamps:

Trouble parsing : Unparseable date: "8/11/17 10"
java.text.ParseException: Unparseable date: "8/11/17 10"
    at java.text.DateFormat.parse(DateFormat.java:366)
    ...

This column is a collection:

`userid_sec_seen map<text,timestamp>`.

We were able to load timestamp columns, but when parsing the collection column, the loader gets confused about where the date ends. We suspect that the ':' in '10:44' gets misinterpreted. Should the unloader have escaped it?

unloading a columnfamily containing blob column yields stringified HeapByteBuffer object

When dumping a table having a blob column, I noticed that the column in the output corresponding to the blob contains values like:

"java.nio.HeapByteBuffer[pos=0 lim=6211 cap=6211]"

rather than the content of the blob. Is it possible to have the unloader include the actual blob bytes? I'm guessing I may need to invoke the tool differently, but didn't see anything obvious in the docs.

Thanks btw, this tool is a lifesaver!

cassandra-unloader doesn`t export all the data

I've tried to use cassandra-unloader to export data from production DB but I couldn't load all my data.
This is my command:

cassandra-unloader -f ./cassandra_dump -host prod_server_host -schema "raw_keyspace.raw_buy_hits(ware_owner,time_bucket,date_time,is_direct,link,pixel,visitor,ware)"

Output:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Total rows retrieved: 19232

So I've loaded only 19232 records and ~2MB while the real size of all the data more than 1GB

loader issue: NullPointerException

[cqlsh 5.0.1 | Cassandra 2.2.1 | CQL spec 3.3.0 | Native protocol v4]
Use HELP for help.
cqlsh> quit

$ /disk1/imailndcas/try_out/cassandra-loader/build/cassandra-loader -f indexData.csv -host localhost -schema "mykeyspace.tbl_zz(value,id)"
Exception in thread "main" java.lang.NullPointerException
at com.datastax.driver.core.ProtocolOptions.getProtocolVersion(ProtocolOptions.java:168)
at com.datastax.loader.CqlDelimLoad.setup(CqlDelimLoad.java:477)
at com.datastax.loader.CqlDelimLoad.run(CqlDelimLoad.java:523)
at com.datastax.loader.CqlDelimLoad.main(CqlDelimLoad.java:616)

$cat indexData.csv
abc, 123
pqr, 345

Error running cassandra-unloader

I've downloaded version v0.0.27 of cassandra-unloader and cassandra-loader.

Attempts to execute the unloader produce this error:

cassandra-unloader -f stdout -schema "ionic_na usr_grp_dev_order" -user admin -pw xxxxxxxxxx -host cassandra-solr-xxxxx.com
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Exception in thread "main" com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: cassandra-solr-1a-03.node.us-southeast-1.deveng.ionic.com/10.202.9.27:9042 (com.datastax.driver.core.TransportException: [cassandra-solr-1a-03.node.us-southeast-1.deveng.ionic.com/10.202.9.27:9042] Channel has been closed))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:223)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:78)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1272)
at com.datastax.driver.core.Cluster.init(Cluster.java:158)
at com.datastax.driver.core.Cluster.connect(Cluster.java:248)
at com.datastax.loader.CqlDelimUnload.setup(CqlDelimUnload.java:329)
at com.datastax.loader.CqlDelimUnload.run(CqlDelimUnload.java:350)
at com.datastax.loader.CqlDelimUnload.main(CqlDelimUnload.java:444)

Can you please help me to get this running?

We are running OS:
CentOS Linux release 7.3.1611 (Core)

C* version:

ldd --version
ldd (GNU libc) 2.17
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

Quoted data has padded with extra “

Hi,
We have column value which has quoted data , delimInQuotes

After the data loaded it has padded with extra “

so this has caused the incorrect data inserted was inappropriate to the other spark transform jobs.

we have data in CSV with 1,3,”{2.5,3.5,9, 2}”, 12345

after using useDelim=true

output = 1,3,’”{2.5,3.5,9, 2}”’, 12345 with an extra ‘

Will you please look into it ?

Thanks
Santhi

Cassandra-unloader incorrectly unloading map type

I've ran the unloader on a table that has a field called properties
properties map<text, text>,.

After it unloads, the field that is unloaded is merely null and doesn't contain any data. Anything I'm doing wrong?

Unable to load any Keyspace or column family with Capital letters in it

Trying to load a csv into keyspace and column family which have capital letter in its naming convention. But unable to load them and receiving error as below.

D:>java -jar cassandra-loader -f SampleIDMaster_inserts.cql -host 127.0.0.1
-schema "SamplesMasterINFO.SampleIDMaster(rowkey,column1,value)" -user cassandra -pw cassandra
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Exception in thread "main" java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.InvalidQueryExce
ption: Keyspace *_samplesmasterinfo *_does not exist
at java.util.concurrent.FutureTask.report(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at com.datastax.loader.CqlDelimLoad.run(CqlDelimLoad.java:558)
at com.datastax.loader.CqlDelimLoad.main(CqlDelimLoad.java:599)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Keyspace *_samplesmasterinfo *_does not exist
at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:50)
at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:63)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:39)
at com.datastax.loader.EnhancedSession.execute(EnhancedSession.java:50)
at com.datastax.loader.RateLimitedSession.execute(RateLimitedSession.java:49)
at com.datastax.loader.CqlDelimParser.schemaBits(CqlDelimParser.java:130)
at com.datastax.loader.CqlDelimParser.processCqlSchema(CqlDelimParser.java:125)
at com.datastax.loader.CqlDelimParser.(CqlDelimParser.java:68)
at com.datastax.loader.CqlDelimLoadTask.setup(CqlDelimLoadTask.java:168)
at com.datastax.loader.CqlDelimLoadTask.call(CqlDelimLoadTask.java:144)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Keyspace *_samplesmasterinfo *_does not exist
at com.datastax.driver.core.Responses$Error.asException(Responses.java:136)
at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:184)
at com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:43)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:798)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:617)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1005)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:928)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:276)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:263)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)

Failing for schemas with strange characters

Looks like it's not possible to use a schema with case-sensitive column names, e.g.

CREATE TABLE keyspace.table ("NaMe" text, "nAmE" text);
cassandra-loader -f myFileToLoad.csv -host 1.2.3.4 -schema 'keyspace.table("NaMe", "nAmE")'

The regex check here fails.

Need to load data using TTL option

Our application needs to maintain 18 months history data. When I perform one-time history data migration to Cassandra using cassandra-loaded, all the 18 months data will be loaded together and expire at almost same time since I can only set default time_to_live at table level. This will be a huge issue .

Can we add a USING TTL option? So when I load data by month, I can use a different TTL and data will expire by month instead of 18 months.

We're in the process of migrating production data to Cassandra. Could you please let me know if TTL option can be added to cassandra-loader.

Thanks Brian.

Jane

Date field gets changed to a different value and different results from DEVCenter and Spark SQL

Following record
1234567890|1398|01/01/2016|TX||G|3|1|0|Y|N||Y|Y|Y||Y|||Y||01/26/2016||Y
DEVCenter returns as
1234567890,2015-12-26 23:00:00-0600,G,Y,null,2015-12-26 23:00:00-0600,Y,1398,N,Y,null,null,Y,Y,0,TX,3,1,null,null,Y,Y,null,null
Spark SQL returns as
scala> results.collect.foreach(println);
[1234567890,2015-12-27 00:00:00.0,G,Y,null,2015-12-27 00:00:00.0,Y,1398,N,Y,null,null,Y,Y,0,TX,3,1,null,null,Y,Y,null,null]

Notice all data field values
used following option
-dateFormat "MM/DD/YYYY" \

Misinterpreting UUID for String

Loaded up V 0.22 today to continue large import.
Got the following Error:
Invalid number in input number 0: For input string: ""42f6b987"
Error parsing line 2 in hackdatabase_xbox360iso.csv: "42f6b987-93b5-4878-9c9b-d0f0b9fa1447","6ffb7d56-f47b-4f9c-b871-47484f7f2efa","[email protected]","berkeratay","gmail.com","Bojeunx","88.247.173.143","","","","","","e41b6cfc6e4c374ead1c60d92e8aca17","vB","","","","","","","","","","","English","","","","","","","","","","","0"

Reverted back to V0.21 and issue is removed, I assume it has something to do with the "Fixed issue with quoted values (and in collections)" but I haven't look deep into it.

Unloader writes text data with unescaped newlines

The unloader writes text fields containing newlines without escaping them. This breaks the loader. Either newlines (and maybe some other characters like tabs) should be escaped, or there should be an option to change the line end delimiter to something that won't appear in the data.

Parentheses in row names

Mentioned in #65 but warrants a separate issue.

Consider the valid table:

CREATE TABLE test.blah (
    "blee(blorg)" int PRIMARY KEY,
    "bloo.blarg" text
)

The current parser splits on parentheses to find the rows, so fails to correctly parse this. Solved with #66

NPE

./cassandra-loader -f uuids.csv.0 -host <host> -schema "<cf>.<schema>(...)"
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
*** Processing uuids.csv.0
Lines Processed:    100000  Rate:   4347 (4347)
*** DONE: uuids.csv.0  number of lines processed: 101924 (101924 inserted)
Lines Processed:    101924  Rate:   4246
Exception in thread "main" java.lang.NullPointerException
    at com.datastax.loader.CqlDelimLoad.cleanup(CqlDelimLoad.java:347)
    at com.datastax.loader.CqlDelimLoad.run(CqlDelimLoad.java:443)
    at com.datastax.loader.CqlDelimLoad.main(CqlDelimLoad.java:451)

columnfamiles with list entries are exported as null

i have a columnfamily as such:
userid text PRIMARY KEY,
blackouts list,
calendars list,
country text,
devices list,
general_health text,
industry text,
job text,
privacy boolean,
status text

exports always result in blackouts, calendars and devices being null.
export file below:
18222080-51e6-464e-a23c-9ed4f16c88a1,null,null,England,null,good,Banking,manager,FALSE,active
65b469d3-2156-4749-9b2a-4741244d5301,null,null,Germany,null,good,Retail,Barber,TRUE,active
61debe56-95f7-4023-b3b7-747d8c6050d7,null,null,France,null,poor,Healthcare,programmer,TRUE,active
d1a8f0e6-3a6b-4d2f-8763-9c468053b68a,null,null,Italy,null,good,Farming,Bee Keeper,FALSE,active

command used for export:
cassandra-unloader -numThreads 1 -f profile.data -host 127.0.0.1 -schema "dev.profile(userid,blackouts,calendars,country,devices,general_health,industry,job,privacy,status)"

Unload into single file

I see when I try unloader, it creates multiple files. It uses filename given along with -f option.
Question is: Can I get all data into a single file?

Loader NPEs in com.datastax.driver.core.BoundStatement.bind

Something going wrong with binding values. It's hard to be more specific as there's nothing in the loader log files and no indication of which row caused the problem. The table contains three text fields and one map<text, text>. There are some entries in the map which contain a name and no value -- along the lines of "{abc:,def:}".

Exception in thread "main" java.util.concurrent.ExecutionException: java.lang.NullPointerException
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:188)
    at com.datastax.loader.CqlDelimLoad.run(CqlDelimLoad.java:558)
    at com.datastax.loader.CqlDelimLoad.main(CqlDelimLoad.java:599)
Caused by: java.lang.NullPointerException
    at com.datastax.driver.core.BoundStatement.bind(BoundStatement.java:191)
    at com.datastax.driver.core.DefaultPreparedStatement.bind(DefaultPreparedStatement.java:103)
    at com.datastax.loader.CqlDelimLoadTask.execute(CqlDelimLoadTask.java:231)
    at com.datastax.loader.CqlDelimLoadTask.call(CqlDelimLoadTask.java:145)
    at com.datastax.loader.CqlDelimLoadTask.call(CqlDelimLoadTask.java:69)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Using csvp parser.parseNext() instead of reader.readLine()

Why not use Univocity Parser's splitter instead of readLine. https://github.com/al3xandru/cassandra-loader/blob/parser/src/main/java/com/datastax/loader/CqlDelimLoadTask.java#L191

A lot of parserSettings doesn't work because of this. For ex. following is one row in my CSV:

a,b,c,"d
e",f

Instead of making it a single record your tool makes it two rows.
Splitting lines outside of the parser itself not only breaks anything within quotes, it's also WAY slower (like 3-4 times slower) and generates twice the garbage.

Please fix this. I made a temporary hack for my use case b/c it has a lot of abstractions.

Loader can't load data to a table with a UDT column

It would be really great for the loader to support User-defined types.

Note that collections may contain UDTs and UDTs may contain collections/other UDTs, so CQLDelimParser.schemaBits(...) would need to support some level of recursion.

Side note: The skipCols options cannot be used to skip over a column with a non-supported type because parsers are eagerly built.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.