Coder Social home page Coder Social logo

influxdb-stream's Introduction

influxdb-stream

InfluxDB has some memory usage issues which make it difficult to read or write large amounts of data at once.

This repository contains helpers to split these operations up into "streams" of finely chunked operations which don't cause InfluxDB to self-destruct.

Install

Create an influxdb-stream/ directory and download the idb.jar tool (alternatively, build it from source with lein uberjar).

mkdir influxdb-stream 
wget -O influxdb-stream/idb.jar https://github.com/matthewdowney/influxdb-stream/releases/latest/download/idb.jar

Query data and write to CSV(s)

  1. Create a configuration file describing the data to pull and the size of each query in the stream in a file called conf.edn.

    ; ~/influxdb-stream/conf.edn
    {;; The InfluxDB database to connect to
    :host          "127.0.0.1"
    :port          8086
    :db            "marketdata"
    
    ;; Fetch all rows for this measurement, between the start and end dates,
    ;; making queries spanning :interval amounts of time. The :interval is
    ;; important because it imposes a bound on InfluxDB memory usage for a
    ;; single query. The $timeFilter is replaced with a time range expression
    ;; according to where in the time range the cursor is, and a LIMIT is
    ;; appended to the query.
    :query         "SELECT * FROM trade WHERE $timeFilter"
    :query-limit   20000 ; max rows returned per query
    :start         #inst"2020-01-01"
    :end           #inst"2020-02-01"
    :interval      [24 :hours]
    
    ;; Write a certain number of rows per file to a series of files named with
    ;; the given pattern, which accepts the timestamp of the first row.
    :date-format   "yyyy-MM-dd"
    :file          "trade.%s.csv"
    :rows-per-file 10000}

    To prevent queries from causing InfluxDB to use up all the available RAM: (1) pick a :query-limit that is relatively small, 20k has worked well for me; and (2) tweak the :interval that the query spans so that approximately :query-limit data points are returned anyway, depending on how dense your data is. You can specify e.g. [15 :secs], [15 :mins], [1.5 :hours], or [7 :days].

  2. Run the tool, specifying with -Xmx how much RAM is available for use. E.g. to run with 10G of RAM allocated:

    $ cd influxdb-stream
    $ java -Xmx10G -jar idb.jar read-to-csv conf.edn
    

Write data via repeated queries

This is useful for e.g. sideloading data from a backup (see sideloading instructions from influx docs and GitHub issue for error which occurs if instructions are followed) or for downsampling data.

More generally, this is useful anywhere you have a SELECT ... INTO query.

  1. Create a conf.edn file specifying the query and a time range.
    ; ~/influxdb-stream/conf.edn
    {;; The InfluxDB database to connect to
    :host          "127.0.0.1"
    :port          8086
    :db            "marketdata"
    
    ;; Execute the query first for the time range [start, start + 60 mins], then
    ;; for [start + 60 mins, start + 120 mins], and so on.
    :start         #inst"2020-01-01"
    :end           #inst"2021-01-01"
    :interval      [60 :mins]
    
    ;; Run this query to downsample measurements from "ticker" into
    ;; the "downsampled-ticker", which takes the last ask and bid values for
    ;; each minute. The $timeFilter is replaced with a time range expression.
    :query         "SELECT last(ask) AS \"ask\", last(bid) AS \"bid\"
                   INTO \"downsampled-ticker\"
                   FROM \"ticker\"
                   WHERE $timeFilter
                   GROUP BY time(1m), \"exchange\", \"market\" fill(none)"}
  2. Run the tool. Note here that the JVM doesn't need any special RAM allocation, since the memory-intensive computation happens in Influx.
    cd influxdb-stream
    java -jar idb.jar run-queries conf.edn
    

influxdb-stream's People

Contributors

matthewdowney avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

influxdb-stream's Issues

Influx DB mit Passwort

Gibt es eine Möglichkeit, auch eine mit Benutzer und Passwort geschützte Datenbank mit diesem Tool zu Exportieren. Wie müsste der Benutzername und das Passwort übergeben werden?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.