Coder Social home page Coder Social logo

gianlucaborello / cassandradump Goto Github PK

View Code? Open in Web Editor NEW
204.0 13.0 81.0 801 KB

A data exporting tool for Cassandra inspired from mysqldump, with some additional slice and dice capabilities

License: GNU General Public License v2.0

Python 84.22% Shell 7.63% Makefile 8.15%
python cassandra nosql

cassandradump's Issues

Connection Failed

I got connection problem when executing
python dumper.py --export-file fxm_test.cql --host 172.31.5.30

I'm running Cassandra 2.1.5 and Python 2.7.6
cqlsh 5.0.1 | Cassandra 2.1.5 | CQL spec 3.2.0 | Native protocol v3

Traceback (most recent call last):
  File "dumper.py", line 356, in <module>
    main()
  File "dumper.py", line 345, in main
    session = setup_cluster()
  File "dumper.py", line 300, in setup_cluster
    session = cluster.connect()
  File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 839, in connect
    self.control_connection.connect()
  File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 2075, in connect
    self._set_new_connection(self._reconnect_internal())
  File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 2110, in _reconnect_internal
    raise NoHostAvailable("Unable to connect to any servers", errors)
cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', {'172.31.5.30': ConnectionException(u'Failed to initialize new connection to 172.31.5.30: code=0000 [Server error] message="io.netty.handler.codec.DecoderException: org.apache.cassandra.transport.ProtocolException: Invalid or unsupported protocol version: 4"',)})

The Cassandra service is running but not sure why the script couldn't connect. Any idea?
Thanks!

Ignore system keyspaces

I've found that when you don't pass a keyspace, this will try to export all keyspaces including "system_schema, system_auth, system" etc, which can't be overwritten, as I get errors like:

system_schema keyspace is not user-modifiable. or Cannot CREATE <keyspace system_auth>

I. am not familiar with cassandra, but even if we find a way to overwrite system keyspaces, it doesn't sound like we should be doing it.
A flag like --ignore-system when used without --keyspaces should be able to copy all keyspaces except the system ones.

Syntax error

File "cassandradump.py", line 63
for has_counter, columns in itertools.groupby(tableval.columns.iteritems(), lambda (k, v): v.data_type.typename == 'counter')
^
SyntaxError: invalid syntax

Anything I can do to speedup keyspace dump ?

Hello,

I have been running cassandradump on our local test clusters to export and import keyspaces around.

However, on larger dbs, on multi-datacenter clusters, I can see export jobs running for days, until they export all data.

I'm talking in the regards of a keyspace with replication factor 3 , spread across 4 nodes, around 200GB data total across all nodes.

While this all is fine, considering the amount of data we are dealing with, I am barely seeing any load on the machines that are holding the relevant pieces of data. No high CPU or IO usage or any abnormal behavior really.

Having that in mind, I was wondering, if there's anything I can be tuning to further improve the speeds of exporting those larger dbs ?

Any suggestions appreciated !

Thanks,

There is no version specified for cassandra-driver in requires

When you do install it installs cassandra-driver==3.0.0c1 as a requirement which happens to cause errors when doing exports:

Traceback (most recent call last):
  File "/usr/local/bin/cassandradump", line 9, in <module>
    load_entry_point('cassandradump==0.0.1', 'console_scripts', 'cassandradump')()
  File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 350, in main
    export_data(session)
  File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 230, in export_data
    table_to_cqlfile(session, keyname, tablename, None, tableval, f)
  File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 94, in table_to_cqlfile
    value_encoders = make_value_encoders(tableval)
  File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 58, in make_value_encoders
    return dict((to_utf8(k), make_value_encoder(v.data_type.typename)) for k, v in tableval.columns.iteritems())
  File "/usr/local/lib/python2.7/dist-packages/cassandradump.py", line 58, in <genexpr>
    return dict((to_utf8(k), make_value_encoder(v.data_type.typename)) for k, v in tableval.columns.iteritems())
AttributeError: 'ColumnMetadata' object has no attribute 'data_type'

Manually reverting cassandra driver to 2.6.0 solves the problem.

AttributeError: 'ColumnMetadata' object has no attribute 'data_type'

hi,
i get an error while exporting this table with cassandra 2.1:

CREATE TABLE slots (
    type text,
    host text,
    count int,
    PRIMARY KEY (type,host)
 );
Exporting all keyspaces
Exporting schema for keyspace engine
Exporting data for column family engine.slots
Traceback (most recent call last):
  File "cassandradump.py", line 351, in <module>
    main()
  File "cassandradump.py", line 345, in main
    export_data(session)
  File "cassandradump.py", line 225, in export_data
    table_to_cqlfile(session, keyname, tablename, None, tableval, f)
  File "cassandradump.py", line 94, in table_to_cqlfile
    value_encoders = make_value_encoders(tableval)
  File "cassandradump.py", line 58, in make_value_encoders
    return dict((to_utf8(k), make_value_encoder(v.data_type.typename)) for k, v in tableval.columns.ite
  File "cassandradump.py", line 58, in <genexpr>
    return dict((to_utf8(k), make_value_encoder(v.data_type.typename)) for k, v in tableval.columns.ite
AttributeError: 'ColumnMetadata' object has no attribute 'data_type'

Does not work with database containing unicode data.

Traceback (most recent call last):
  File "cassandradump.py", line 247, in <module>
    main()
  File "cassandradump.py", line 241, in main
    export_data(session)
  File "cassandradump.py", line 142, in export_data
    table_to_cqlfile(session, keyname, tablename, None, tableval, f)
  File "cassandradump.py", line 44, in table_to_cqlfile
    filep.write('INSERT INTO "' + keyspace + '"."' + tablename + '" (' + ', '.join(row.keys()) + ') VALUES (' + ', '.join(values) + ')\n')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 206: ordinal not in range(128)

Export incorrectly quotes Maps

Export of table definitions creates:

    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'

Should be:

    AND caching = {'keys':'ALL', 'rows_per_partition':'NONE'}

It only seems to be for the caching line, all other maps are correctly generated.

Exception on import large database

Hello, when I tried to import a large database, 600k+ lines, throwed me an exception:

Traceback (most recent call last):
File "../cassandradump/cassandradump.py", line 271, in
main()
File "../cassandradump/cassandradump.py", line 263, in main
import_data(session)
File "../cassandradump/cassandradump.py", line 65, in import_data
session.execute(line)
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 1405, in execute
result = future.result(timeout)
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 2976, in result
raise self._final_exception
cassandra.protocol.SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query] >message="line 0:-1 no viable alternative at input ''">

Script imported most of the code, but the last table, with 500k lines, imported only 11k.

Thanks any way.

schema dump ";;"

python ~/cassandradump/cassandradump.py --no-insert --export-file ./table2.cql --cf=keyspace1.table2

... AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';;

returns double ";" at the end of file

when i try import schema

cqlsh localhost < ./table2.cql
:24:SyntaxException: <Error from server: code=2000 [Syntax error in CQL query] message="line 1:0 no viable alternative at input ';' ([;])">

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.