Coder Social home page Coder Social logo

sstable-tools's People

Contributors

clohfink avatar tolbertam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sstable-tools's Issues

Large partitions command

List largest partitions, perhaps include largest disk space wise, cell count

java -jar sstable-tools top

Hinted Handoff dump

not really sstable related, but with 3.0 changes cant view them as easily as before

'limit' seems to not consider query criteria

It looks like the limit support does not consider query criteria, i.e.:

cqlsh> select * from sstable where ticker='YHOO' limit 5;
 ┌─────────┬─────────────────────┬────────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
 │ticker   │date                 │adj_close   │close    │high     │low      │open     │volume   │
 ╞═════════╪═════════════════════╪════════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
 │ORCL     │2016-02-19 00:00-0600│36.779999   │36.779999│36.790001│36.419998│36.52    │13118400 │
 ├─────────┼─────────────────────┼────────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
 │ORCL     │2016-02-18 00:00-0600│36.630001   │36.630001│36.869999│36.400002│36.709999│12464800 │
 ├─────────┼─────────────────────┼────────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
 │ORCL     │2016-02-17 00:00-0600│36.630001   │36.630001│36.77    │35.970001│35.970001│13146600 │
 ├─────────┼─────────────────────┼────────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
 │ORCL     │2016-02-16 00:00-0600│35.700001   │35.700001│35.91    │35.419998│35.759998│18685400 │
 ├─────────┼─────────────────────┼────────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
 │ORCL     │2016-02-12 00:00-0600│35.540001   │35.540001│35.549999│34.91    │35.240002│15806800 │
 └─────────┴─────────────────────┴────────────┴─────────┴─────────┴─────────┴─────────┴─────────┘

Remove toJson command

Now that 3.0.4 and 3.4 is released, we should remove the 'toJson' command as sstabledump covers this functionality.

Investigate zero config reads

See if we can get enough information from sstable (ie SerializationHeader in Statistics.db) to build the CFMetaData so we dont need the CQL create statements/thrift/read system tables/etc.

Add --gc_grace_seconds option to describe

As discussed in #60, the estimated tombstone drop times do not account for gc_grace_seconds because that information is not available in sstable files (see CASSANDRA-12208). Would be nice to add an option to describe to allow the user to pass in gc_grace_seconds so they the drop times can be offset by gc_grace_seconds.

support C* 2.1 version

Hi,
Can this support C* 2.1 version?
as many users are still using C* 2.x version.
Thanks

Use github releases instead of bintray

Recently I found that it's pretty easy to upload the artifacts to tagged github release as part of the maven release plugin. Should do that for sstable-tools to automate the process of making a release.

shell mode

it would also be nice to have an alternative command to select that behaves in the following manner:

  1. A command line mode that accepts the path to the schema, sstable(s) and the query all as files (kind of lame to make the query a file, but don't see a way around that).
  2. An interactive shell that takes the schema and sstable file as an input. The user can then make queries like 'select * from table where blah' in the interactive shell.

This could behave like a limited version of cqlsh:

Usage: cqlsh sstable [sstable...] [-s schema] [-f file]

Options:
  -s , --schema=SCHEMA       The cql schema to use for the given sstable.  If not provided, 
                             query criteria is limited to select * with no where clause.
  -f, --file=FILE            Execute commands from FILE, then exit

I think this could use the ascii table transformer like proposed in #26.

Offline repairs

Given two directories, each representing a node (ie mounted s3 backups). walk through sstables and run a repair, creating a new sstable to drop in each nodes data dir (then nodetool refresh) to make them consistent. Idea being this could be run with EMR job or random node so repairs wont have any cpu/io impact on cluster. Since no worrying about throttling can make it much faster as well.

Bad assertion in serializePartitionKey when shortKeys are enabled

When running with assertions enabled and no schema is provided via '-c' toJson will fail while serializing a partition key because it expects the number of key components to match the number of partition columns in the metadata (which are empty):

Exception in thread "main" java.lang.AssertionError
    at com.csforge.sstable.JsonTransformer.serializePartitionKey(JsonTransformer.java:83)
    at com.csforge.sstable.JsonTransformer.serializePartition(JsonTransformer.java:149)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at com.csforge.sstable.JsonTransformer.toJson(JsonTransformer.java:58)
    at com.csforge.sstable.SSTable2Json.main(SSTable2Json.java:109)
    at com.csforge.sstable.Driver.main(Driver.java:17)

Composite partition key issues

if theres a composite partition key and the column family metadata is built from the sstable it will fail to render as a table when walking through partition keys since the single composite part key does not match up with the broken up values in the ResultSet.

Update to support 3.4

There were a couple of api changes in C* 3.4 which makes sstable-tools incompatible with it:

[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR : 
[INFO] -------------------------------------------------------------
[ERROR] /Users/atolbert/Documents/Projects/sstable-tools/src/main/java/com/csforge/sstable/reader/CassandraReader.java:[53,39] no suitable method found for iterator(org.apache.cassandra.db.DecoratedKey,org.apache.cassandra.db.filter.ColumnFilter,boolean,boolean)
    method org.apache.cassandra.io.sstable.format.SSTableReader.iterator(org.apache.cassandra.db.DecoratedKey,org.apache.cassandra.db.Slices,org.apache.cassandra.db.filter.ColumnFilter,boolean,boolean) is not applicable
      (actual and formal argument lists differ in length)
    method org.apache.cassandra.io.sstable.format.SSTableReader.iterator(org.apache.cassandra.io.util.FileDataInput,org.apache.cassandra.db.DecoratedKey,org.apache.cassandra.db.RowIndexEntry,org.apache.cassandra.db.Slices,org.apache.cassandra.db.filter.ColumnFilter,boolean,boolean) is not applicable
      (actual and formal argument lists differ in length)
[ERROR] /Users/atolbert/Documents/Projects/sstable-tools/src/main/java/com/csforge/sstable/reader/CassandraReader.java:[54,20] method map in interface java.util.stream.Stream<T> cannot be applied to given types;
  required: java.util.function.Function<? super java.lang.Object,? extends R>
  found: Partition::new
  reason: cannot infer type-variable(s) R
    (argument mismatch; invalid constructor reference
      incompatible types: java.lang.Object cannot be converted to org.apache.cassandra.db.rows.UnfilteredRowIterator)

Had fixed this locally, will push a fix for this later this week. The challenge will making this compatible with both versions or do we configure publishing 2 separate branches? There is probably some black magic where we can achieve this through reflection, although at some point we will probably reach a threshold in the future where we will need to provide per release versions.

Exception when trying to load sstables

Can't seem to get it to work

[automaton@ip-172-31-9-120 device_monitoring_timestamps]$ pwd
/home/automaton/36134/sstables_10.246.171.127/health/device_monitoring_timestamps
[automaton@ip-172-31-9-120 device_monitoring_timestamps]$ dse -v
5.0.2
[automaton@ip-172-31-9-120 device_monitoring_timestamps]$ ls -la
total 19320
drwxrwxr-x 2 automaton automaton     4096 Aug 28 17:18 .
drwxrwxr-x 3 automaton automaton     4096 Aug 28 17:17 ..
-rw-r--r-- 1 automaton automaton     9307 Aug 27 19:20 mc-7074-big-CompressionInfo.db
-rw-r--r-- 1 automaton automaton 16637579 Aug 27 19:20 mc-7074-big-Data.db
-rw-r--r-- 1 automaton automaton     2744 Aug 27 19:20 mc-7074-big-Filter.db
-rw-r--r-- 1 automaton automaton   140381 Aug 27 19:20 mc-7074-big-Index.db
-rw-r--r-- 1 automaton automaton    11047 Aug 27 19:20 mc-7074-big-Statistics.db
-rw-r--r-- 1 automaton automaton      588 Aug 27 19:20 mc-7074-big-Summary.db
-rw-r--r-- 1 automaton automaton     2587 Aug 27 20:58 mc-7075-big-CompressionInfo.db
-rw-r--r-- 1 automaton automaton  2867725 Aug 27 20:58 mc-7075-big-Data.db
-rw-r--r-- 1 automaton automaton     2368 Aug 27 20:58 mc-7075-big-Filter.db
-rw-r--r-- 1 automaton automaton    63530 Aug 27 20:58 mc-7075-big-Index.db
-rw-r--r-- 1 automaton automaton    10144 Aug 27 20:58 mc-7075-big-Statistics.db
-rw-r--r-- 1 automaton automaton      501 Aug 27 20:58 mc-7075-big-Summary.db
[automaton@ip-172-31-9-120 device_monitoring_timestamps]$ java -jar $HOME/36134/sstable-tools-3.11.0-alpha11.jar cqlsh
cqlsh> use mc-7075-big-Data.db
Using: /home/automaton/36134/sstables_10.246.171.127/health/device_monitoring_timestamps/mc-7075-big-Data.db
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.cassandra.utils.NativeLibraryLinux.getpid()J
        at org.apache.cassandra.utils.NativeLibraryLinux.getpid(Native Method)
        at org.apache.cassandra.utils.NativeLibraryLinux.callGetpid(NativeLibraryLinux.java:122)
        at org.apache.cassandra.utils.NativeLibrary.getProcessID(NativeLibrary.java:394)
        at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:388)
        at org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:367)
        at org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:300)
        at org.apache.cassandra.utils.UUIDGen.<clinit>(UUIDGen.java:41)
        at org.apache.cassandra.config.CFMetaData$Builder.build(CFMetaData.java:1293)
        at com.csforge.sstable.CassandraUtils.tableFromSSTable(CassandraUtils.java:263)
        at com.csforge.sstable.CassandraUtils.tableFromBestSource(CassandraUtils.java:99)
        at com.csforge.sstable.Cqlsh.doUse(Cqlsh.java:302)
        at com.csforge.sstable.Cqlsh.evalLine(Cqlsh.java:615)
        at com.csforge.sstable.Cqlsh.startShell(Cqlsh.java:252)
        at com.csforge.sstable.Cqlsh.main(Cqlsh.java:762)
        at com.csforge.sstable.Driver.main(Driver.java:22)
[automaton@ip-172-31-9-120 device_monitoring_timestamps]$ java -jar $HOME/36134/sstable-tools-3.11.0-alpha11.jar cqlsh
cqlsh> use /home/automaton/36134/sstables_10.246.171.127/health/device_monitoring_timestamps/
Using: /home/automaton/36134/sstables_10.246.171.127/health/device_monitoring_timestamps/mc-7074-big-Data.db
Using: /home/automaton/36134/sstables_10.246.171.127/health/device_monitoring_timestamps/mc-7075-big-Data.db
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.cassandra.utils.NativeLibraryLinux.getpid()J
        at org.apache.cassandra.utils.NativeLibraryLinux.getpid(Native Method)
        at org.apache.cassandra.utils.NativeLibraryLinux.callGetpid(NativeLibraryLinux.java:122)
        at org.apache.cassandra.utils.NativeLibrary.getProcessID(NativeLibrary.java:394)
        at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:388)
        at org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:367)
        at org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:300)
        at org.apache.cassandra.utils.UUIDGen.<clinit>(UUIDGen.java:41)
        at org.apache.cassandra.config.CFMetaData$Builder.build(CFMetaData.java:1293)
        at com.csforge.sstable.CassandraUtils.tableFromSSTable(CassandraUtils.java:263)
        at com.csforge.sstable.CassandraUtils.tableFromBestSource(CassandraUtils.java:99)
        at com.csforge.sstable.Cqlsh.doUse(Cqlsh.java:302)
        at com.csforge.sstable.Cqlsh.evalLine(Cqlsh.java:615)
        at com.csforge.sstable.Cqlsh.startShell(Cqlsh.java:252)
        at com.csforge.sstable.Cqlsh.main(Cqlsh.java:762)
        at com.csforge.sstable.Driver.main(Driver.java:22)

optimize paging by reusing scanner

The paging logic in cqlsh reopens the SSTable(s) on each page. This is wasteful and we could just keep track of where we left off in UnfilteredPartitionIterator and the current RowIterator.

ccmbridge tests

integration tests, create a cluster at version, run sstable2json and verify output

persist fails in new version

The deserialization occurs before CassandraUtils can set ClientMode which throws exception since the DatabaseDescriptor calls its loadYaml

Could not import schema error

I'm trying to import a simple schema into cqlsh but I'm getting the error:

cqlsh> schema /home/pedro/software/schema.cql
Could not import schema from '/home/pedro/software/schema.cql': line 4:0 mismatched input 'CREATE' expecting EOF (...  AND durable_writes = true;[CREATE]...).

I generated this schema file with "cqlsh -e "describe schema". I'm attaching it to this issue.
schema.txt

Use java driver to pull meta data

If the -c option is not provided attempt to connect to cluster (use defaults unless overriden) and retrieve the tables schema. Not necessary if #7 is successful.

Offline select

Be able to execute a query on an sstable or set of sstables like:

java -jar sstable-tools.jar select from [sstable or folder containing sstables] where key = 1 and value > 10 with [file containing schema or an ip address of cluster to pull schema from]

Must be safe to run if on C* is currently running and have no side affects (ie no running an embedded)

Debug InstanceTidier leak that only seems to appear when running in maven console

org.apache.cassandra.utils.concurrent.Ref$State@6b25a2cc) to class org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1601791084:/Users/atolbert/Documents/Projects/sstable-tools/ma-1-big was not released before the reference was garbage collected.

Seems to happen for each SSTable opened, but only with mvn exec:java. I can't seem to reproduce using an executable jar.

An Option to Output for Streaming

Reading the entire SSTable into memory when analyzing the output of sstable2json is rather cumbersome and sometimes not feasible/possible given the size of the resulting JSON.

Having a command line option that would output one partition object per line (or by some other delimiter), would solve this by allowing a user to load one partition at a time into memory. While partition sizes can get rather large, they will not be as large as the SSTable itself.

This output would then have no need to output the beginning and ending array brackets, nor the trailing comma after each of the partitions.

I can also see a possible performance benefit here as you can read the SSTable and output each partition as you read it, rather than reading the entire SSTable in all at once.

Table transformer

print pretty ascii tables like cqlsh, be nice with #22. Can have another command to dump it (toTable) and also add option to switch transformer to use on select command

Allow glob patterns on sstables

when providing sstables allow globs instead of only allowing 1 sstable. Useful to pull in all results from a data directory. i.e.

sstable-tools select count(*) from data/keyspace/table/ma-*-Data.db where user = 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.