Coder Social home page Coder Social logo

iq-scm / sneller Goto Github PK

View Code? Open in Web Editor NEW

This project forked from snellerinc/sneller

0.0 0.0 0.0 19.45 MB

The world's fastest log analysis tool

Home Page: https://sneller.io

License: GNU Affero General Public License v3.0

Shell 0.12% Python 0.03% C 19.51% Go 64.84% Assembly 14.94% Smarty 0.04% Yacc 0.27% Dockerfile 0.01% Ragel 0.23%

sneller's Introduction

SQL for JSON at scale: fast, simple, schemaless

Sneller is a high-performance SQL engine built to analyze petabyte-scale un-structured logs and other event data.

Here are a couple major differentiators between Sneller and other SQL solutions:

Sneller Cloud gives you access to a hosted version of the Sneller SQL engine that runs directly on data stored entirely in your S3 buckets. Our cloud platform offers excellent performance and is priced at an extremely competitive $150 per petabyte of data scanned.

Browser Demo

You can run queries for free against Sneller Cloud from your browser through our playground. We've created a public table containing about 1 billion rows from the GitHub archive data set. Additionally, you can create new ephemeral tables by uploading your own JSON data (but please don't upload anything sensitive!)

The Sneller playground is also usable directly with a local http client like curl:

asciicast

Local Demo

asciicast

If you have go installed on a machine with AVX512, you can build tables from JSON files and run the query engine locally:

$ grep -q avx512 /proc/cpuinfo && echo "yes, I have AVX512"
yes, I have AVX512
$ # install the sdb tool (make sure $GOBIN is in your $PATH)
$ go install github.com/SnellerInc/sneller/cmd/sdb@latest
$ # pack a JSON object into a table that can be queried;
$ # here we're using some github archive JSON:
$ wget https://data.gharchive.org/2015-01-01-15.json.gz
$ sdb pack -o github.zion 2015-01-01-15.json.gz
$ # run a query, using JSON as the output format:
$ sdb query -v -fmt=json "select count(*), type from read_file('github.zion') group by type"
{"type": "CreateEvent", "count": 1471}
{"type": "PushEvent", "count": 5815}
{"type": "WatchEvent", "count": 1230}
{"type": "ReleaseEvent", "count": 60}
{"type": "PullRequestEvent", "count": 474}
{"type": "IssuesEvent", "count": 545}
{"type": "ForkEvent", "count": 355}
{"type": "GollumEvent", "count": 61}
{"type": "IssueCommentEvent", "count": 844}
{"type": "DeleteEvent", "count": 260}
{"type": "PullRequestReviewCommentEvent", "count": 136}
{"type": "CommitCommentEvent", "count": 73}
{"type": "MemberEvent", "count": 25}
{"type": "PublicEvent", "count": 2}
18874368 bytes (18.000 MiB) scanned in 1.475857ms 12.5GiB/s

See our SQL reference for more information on the Sneller SQL dialect.

If you don't have access to a physical machine with AVX512 support, we recommend renting a VM from one of the major cloud providers with one of these instance families:

  • AWS: c6i, m6i, r6i
  • GCP: N2, M2, C2, C3
  • Azure: Dv4, Ev4

Sneller Cloud

Our cloud platform simplifies the Sneller SQL user experience by giving you instant access to thousands of CPU cores to run your queries. Sneller Cloud also provides automatic synchronization between your source data and your SQL tables, so you don't have any batch processes to manage in order to keep your tables up-to-date. Our cloud solution has a simple usage-based pricing model that depends entirely on the amount of data your queries scan. (Since Sneller Cloud doesn't store any of your data, there are no additional storage charges.)

Performance

Sneller is generally able to provide end-to-end scanning performance in excess of 1GB/s/core on high-core-count machines. The core SQL engine is typically able to saturate the memory bandwidth of the machine; generally about half of the query execution time is spent decompressing the source data, and the other half is spent in the SQL engine itself. Scanning performance scales linearly with the number of CPU cores available, so for example a 1000-CPU cluster would generally provide scanning performance in excess of 1TB/s.

The zion compression format that the SQL engine consumes is "bucketized" so that queries that don't touch all of the fields in the source data consume fewer cycles during decompression. Concretely, the top-level fields in each record are hashed into one of 16 buckets, and each of these buckets is compressed separately. The query planner determines which fields are referenced by each query, and at execution time only the buckets that contain fields necessary to compute the final query result are actually decompressed. (Strictly columnar formats like Parquet stripe data into one bucket per column, with the restriction that the columns and their types are known in advance. Since Sneller operates on un-structured data, our solution needed to be completely agnostic to the structure of the data itself.)

License

Most of Sneller is released under the AGPL-3.0 license. See the LICENSE file for more information.

The ion and date packages are released under the Apache 2.0 license. See LICENSE.apache for more information.

sneller's People

Contributors

brandon-sneller avatar dependabot[bot] avatar flobernd avatar frank-sneller avatar henkjan-sneller avatar petr-sneller avatar philhofer avatar piotr-sneller avatar ramondeklein avatar venkat-sneller avatar wojciech-sneller avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.