Coder Social home page Coder Social logo

tremor-rs / tremor-runtime Goto Github PK

View Code? Open in Web Editor NEW
842.0 21.0 128.0 23.48 MB

Main Tremor Project Rust Codebase

Home Page: https://www.tremor.rs

License: Apache License 2.0

Shell 0.70% Dockerfile 0.12% Makefile 0.19% Rust 97.07% Erlang 1.66% Emacs Lisp 0.27% Reason 0.01%
tremor tremor-runtime rust tremor-languages tremor-script tremor-query hacktoberfest hacktoberfest2021

tremor-runtime's Introduction

CNCF Early Stage Sandbox Project

CNCF Streaming & Messaging


Gitpod


Build Status Quality Checks License Checks Security Checks codecov Dependabot Status CII Best Practices GitHub Discord


In short, Tremor is an event- or stream-processing system. It is designed to perform well for high-volumetric data both in terms of consumption of memory and CPU resources and in terms of latency. The goal of Tremor is to be a convenient tool for the operator at the time of configuring Tremor and at runtime in a production setup. We provide our own LSP for Tremor configurations and take great care of providing insightful metrics and helpful error messages at runtime. All this while keeping the hot data-path as performant as possible.

Tremor is well suited for ETL workloads on structured and binary data (both schemaless and strictly schematic), aggregations, traffic shaping and routing purposes.

Tremor speaks various protocols (TCP, UDP, HTTP, Websockets, DNS) and can connect to various external systems such as Kafka, Influx compatible stores, syslog, Open telemetry, Google Pubsub, Google BigQuery, S3 and many more.

Audience

Tremor is a real-time event processing engine built for users that have a high message volume to deal with and want to build pipelines to process, route, or limit this event stream. Tremor supports vast number of connectors to interact with: TCP, UDP, HTTP, Websockets, Kafka, Elasticsearch, S3 and many more.

When to use Tremor

  • You want to apply traffic-shaping to a high volume of incoming events
  • You want to distribute events based on their contents
  • You want to protect a downstream system from overload
  • You wish to perform ETL like tasks on data.

When not to use Tremor

Note: Some of those restrictions are subject to change as tremor is a growing project.

We currently do not recommend tremor where:

  • Your event structure is not mappable to a JSON-like data structure.
    • If in doubt, please reach out and create a ticket so we can assist and advice
    • In many cases ( textual formats ) a preprocessor, postprocessor or codec is sufficient and these are relatively easy to contribute.
  • You need connectivity to a system, protocol or technology that is not currently supported directly or indirectly by the existing set of connectors.
    • If in doubt, please reach out and create a ticket so we can assist and advise.
  • You require complex and expensive operations on your event streams like joins of huge streams. Tremor is not built for huge analytical datasets, rather for tapping into infinite datastreams at their source (e.g. k8s events, syslog, kafka).

We accept and encourage contributions no matter how small so if tremor is compelling for your use case or project, then please get in touch, reach out, raise a ticket and we're happy to collaborate and guide contributions and contributors.

Examples

See our Demo for a complex Tremor setup that can easily be run locally by using docker compose.

Checkout the Recipes on our website. Each comes with a docker compose file to run and play with without requiring lots of dependencies.

Packages

We do provide RPM, DEB and pre-compiled binaries (x86_64 only) for each release.

Check out our Releases Page.

Docker

Docker images are published to both Docker Hub and Github Packages Container Registry.

Container registry Image name
docker.io tremorproject/tremor
ghcr.io tremor-rs/tremor-runtime/tremor

We publish our images with a set of different tags as explained below

Image tags Explanation Example
edge Tracking the main branch
latest The latest release
0.X.Y The exact release 0.12.1
0.X The latest bugfix release for 0.X 0.12
0 The latest minor release for 0 0

Building the Docker Image

Tremor runs in a docker image. If you wish to build a local image, clone this repository, and either run make image or run docker-compose build. Both will create an image called tremorproject/tremor:latest.

Note that since the image is building tremor in release mode it requires some serious resources. We recommend allowing docker to use at least 12 but better 16 gigabytes of memory and as many cores as there are to spare. Depending on the system building, the image can take up to an hour.

Providing too little resources to the docker machine can destabilize the docker build process. If you're encountering logs/errors like:

(signal: 9, SIGKILL: kill)
# OR
ERROR: Service 'tremor' failed to build : The command '/bin/sh -c cargo build --release --all --verbose' returned a non-zero code: 101

It is likely that your docker resources are starved. Consider increasing your resources (Windows/Mac) before trying again, posting in Discord, or raising an issue.

Running

To run tremor locally and introspect its docker environment, do the following:

make image
docker run tremorproject/tremor:latest

A local shell can be acquired by finding the container id of the running docker container and using that to attach a shell to the image.

docker ps

This returns:

CONTAINER ID        IMAGE                            COMMAND                CREATED             STATUS              PORTS               NAMES
fa7e3b4cec86        tremorproject/tremor:latest      "/tremor-runtime.sh"   43 seconds ago      Up 42 seconds                           gracious_shannon

Executing a shell on that container will then give you local access:

docker exec -it fa7e3b4cec86 sh

Building From Source

⚠️ Local builds are not supported and purely at your own risk. For contributing to Tremor please checkout our Development Quick Start Guide

If you are not comfortable with managing library packages on your system or don't have experience with, please use the Docker image provided above.

For local builds, tremor requires rust 2021 (version 1.62 or later), along with all the tools needed to build rust programs. Eg: for CentOS, the packages gcc, make, cmake, clang, openssl, and libstdc++ are required. For different distributions or operating systems, please install the packages accordingly. NOTE AVX2, SSE4.2 or NEON are needed to build simd-json used by tremor. So if you are building in vm, check which processor instruction are passed to it. Like lscpu | grep Flags For a more detailed guide on local builds, please refer to the tremor development docs.

ARM/aarch64/NEON

To run and compile with neon use:

RUSTCFLAGS="-C cpu-target=native" cargo +nightly build --features neon --all

Configuration

Tremor is configured using .troy files written in our own Troy language.

Custom Troy modules can be loaded from any directory pointed to by the environment variable TREMOR_PATH. Directory entries need to be separated by a colon :.

Docker

For use in docker Troy files should be mounted to /etc/tremor/config.

Custom Troy modules and libraries should be mounted to /usr/local/share/tremor.

Example

This very simple example will consume lines from stdin and send them to stdout.

define flow example
flow
  # import some common pre-defined pipeline and connector definitions
  # to use here and save some typing
  use tremor::pipelines;
  use tremor::connectors;

  # create instances of the connectors and pipelines we need
  create connector console from connectors::console;
  create pipeline pass from pipelines::passthrough;

  # connect everything to form an event flow
  connect /connector/console to /pipeline/pass;
  connect /pipeline/pass to /connector/console;
end;

deploy flow example;

Run this example in file example.troy with docker:

$ docker run -i -v"$PWD:/etc/tremor/config" tremorproject/tremor:latest

Please also look at the demo for a fully documented example.

For more involved examples check out our Recipes.

Local Demo

Note: Docker should run with at least 4GB of memory!

To demo run make demo, this requires the tremorproject/tremor image to exist on your machine.

Design

The demo mode logically follows the flow outlined below. It reads the data from data.json.xz, sends it at a fixed rate to the demo bucket on Kafka and from there reads it into the tremor container to apply classification and bucketing. Finally, it off-ramps statistics of the data based on those steps.

╔════════════════════╗   ╔════════════════════╗   ╔════════════════════╗
║      loadgen       ║   ║       Kafka        ║   ║       tremor       ║
║ ╔════════════════╗ ║   ║ ┌────────────────┐ ║   ║ ┌────────────────┐ ║
║ ║ tremor-runtime ║─╬───╬▶│  bucket: demo  │─╬───╬▶│ tremor-runtime │ ║
║ ╚════════════════╝ ║   ║ └────────────────┘ ║   ║ └────────────────┘ ║
║          ▲         ║   ╚════════════════════╝   ║          │         ║
║          │         ║                            ║          │         ║
║          │         ║                            ║          ▼         ║
║ ┌────────────────┐ ║                            ║ ┌────────────────┐ ║
║ │  data.json.xz  │ ║                            ║ │     tremor     │ ║
║ └────────────────┘ ║                            ║ └────────────────┘ ║
╚════════════════════╝                            ║          │         ║
                                                  ║          │         ║
                                                  ║          ▼         ║
                                                  ║ ┌────────────────┐ ║
                                                  ║ │    grouping    │ ║
                                                  ║ └────────────────┘ ║
                                                  ║          │         ║
                                                  ║          │         ║
                                                  ║          ▼         ║
                                                  ║ ┌────────────────┐ ║
                                                  ║ │  stats output  │ ║
                                                  ║ └────────────────┘ ║
                                                  ╚════════════════════╝

Configuration

Config file

The demo configuration can be inspected and changed in the demo/configs/tremor/config/main.troy file.

tremor-runtime's People

Contributors

0xdeafbeef avatar anupdhml avatar anuprout avatar aryan9600 avatar carolgeng avatar dak-x avatar darach avatar dependabot-preview[bot] avatar dependabot[bot] avatar ernadh avatar garypwhite avatar humancalico avatar jaysonsantos avatar jigyasak05 avatar licenser avatar manuelstein avatar mattwparas avatar me-diru avatar mfelsche avatar murex971 avatar namikotoriyama avatar omkar-mohanty avatar primalpimmy avatar ramonacat avatar scrabsha avatar shayneofficer avatar thomasarts avatar trenaux avatar web-flow avatar yatinmaan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tremor-runtime's Issues

Migrate stateful operators to use pipeline state mechanism

Describe the problem you are trying to solve

Operators that maintain state can be migrated to use the pipeline state mechanism (eg: for the batch or backpressure operator) -- this would be a necessity when we want to support pipeline migration to a different tremor node, when we have clustering for tremor.

Describe the solution you'd like

Use the pipeline state mechanism for these operators:

  • generic::batch
  • generic::backpressure

Come up with a solution for these operators, that have internal state not easily expressible as Value (our primitive for the operator state). One solution is to serialize the stateful data structures into (and deserialize out) of the pipeline state, when we do need to store them somewhere central (eg: to enable pipeline migrations as part of clustering effort).

  • grouper::buckert (maintains LRU cache)

Notes

Pulled from https://rfcs.tremor.rs/0002-pipeline-state-mechanism/#future-possibilities to track as an issue.

Also see https://rfcs.tremor.rs/0002-pipeline-state-mechanism/#unresolved-questions

(r)syslog onramp / preprocessor

Enable Tremor to receive and send Syslog Protocol Messages (https://tools.ietf.org/html/rfc5424) , supporting as much syslog implementations as possible that might deviate from the standard.

In the wild we have different syslog protocols being used, the standard IETF format and the old BSD format. So ideally we should support both.

Receiving Syslog Messages

via UDP

Syslog messages are usually sent via UDP where 1 UDP packet contains 1 syslog message. We already support receiving data via UDP with our UDP onramp. We need a way to turn the packet data we receive into a structured Value. For this, we have codecs. A syslog codec should be able to handle both syslog message formats, or we write two different codecs, one for each format.

via TCP/TLS

The story for supporting syslog over TLS/TCP is a bit more involved. We currently do not support TLS over our TCP onramp, so this needs to be added. This is a major milestone towards full syslog support.

Given we have TLS, in order to support syslog messages over TCP, we need to support the RFC 5425 transport, that contains a textual length prefix before each message. This could be handled with a Preprocessor similar to the length-prefixed preprocessor.

Sending Syslog Messages

via UDP

For sending syslog messages, we need to turn structured data in an Event (Value) into the the syslog protocol format.

via TCP/TLS

For sending messages over TCP/TLS, we also need to add the textual length-prefix used in RFC 5425. Tremor already supports sending data via UDP and TCP via offramps.

The TCP offramp needs to get TLS support for supporting sending syslog messages via TCP/TLS.

Checklist

Phase 1 - receive syslog via UDP

  • receive syslog data via UDP (onramp) and turn syslog messages into structured events
  • turn structured events into textual syslog messages and send them out via UDP (offramp)

Phase 2 - syslog via TCP/TLS

  • add support for the RFC 5425 transport protocol (textual length prefix)

Phase 3 - TLS support for TCP

  • add support for receiving TLS encrypted data via TCP onramp
  • add support for sending TLS encrypted data via TCP offramp

Reference

sub optimal performance in the influx codec

Describe the problem you are trying to solve

At the moment the influx codec performs slower than expected. While, due to the way the influx line protocol is structured, it will remain slower then the JSON codec it should still live up to the expectation of being 'fast'.

Describe the solution you'd like

While 'fast' is hard to quantify we should aim that for equal data tremor is about as fast or faster then telegraf for ingesting and processing data. At the moment we are roughly 30% slower so that sets a goal to a 30% performance increase.

Notes

The bench/empty-pipeline-influx.yaml can be used to measure the performance delta.

A perf output of a 'hot' host is attached.

Note that the highlighted line shows part of the influx codec, the String::push call three lines above, and likely a good number of the allocations and frees, can also be traced back tot he influx codec.
image

Rest-Pull onramp

From Kevin

Our rest onramp works as a rest server, to complete it we should have a rest client onramp

gelf preprocessor can panic on bad data

Problem

when receiving invalid data the Gelf preprocessor can panic.

Steps

Send a gelf datagram with less then two bytes.

Possible Solution(s)

https://github.com/wayfair-tremor/tremor-runtime/blob/master/src/preprocessor/gelf.rs#L119 assumes at least two bytes being present, that, however, isn't guaranteed always the case. We need to handle this more carefully.

Notes

kubectl logs wf-tremor-2g224  -n wf-tremor
+ '[' '!' -z ']'
+ '[' -z ']'
+ LOGGER_FILE=/etc/tremor/logger.yaml
++ ls -1 '/etc/tremor/config/*.trickle'
++ wc -l
+ queries=0
+ '[' 0 '!=' 0 ']'
+ exec ./tremor-server --config /etc/tremor/config/config.yaml --logger-config /etc/tremor/logger.yaml
tremor version: 0.6.0
rd_kafka version: 0x000001ff, 1.2.1
Listening at: http://0.0.0.0:9898
thread 'onramp-udp-???' panicked at 'index out of bounds: the len is 1 but the index is 1', src/preprocessor/gelf.rs:119:72
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
    onramp:
      - id: udp
        type: udp
        preprocessors:
          - decompress
          - gelf-chunking
          - decompress
        codec: json
        config:
          port: 12201
          host: '0.0.0.0'
    offramp:
      - id: kafka-out
        type: kafka
        config:
          brokers:
            - REDACTED:9092
            - REDACTED:9092
            - REDACTED:9092
            - REDACTED:9092
            - REDACTED:9092
          topic: REDACTED
    pipeline:
      - id: main
        interface:
          inputs:
            - in
          outputs:
            - out
        nodes:
          - id: passthrough
            op: passthrough
        links:
          in: [ passthrough ]
          passthrough: [ out ]
    # define interconnect
    binding:
      - id: default
        links:
          '/onramp/udp/{instance}/out': [ '/pipeline/main/{instance}/in' ]
          '/pipeline/main/{instance}/out': [ '/offramp/kafka-out/{instance}/out' ]
    mapping:
      /binding/default/01:
        instance: '01'

Fix efferent broken dependency with failure v1.0.6 manifesting in vscode dev containers

Problem

Cargo fails to build in a vscode dev container:

   Compiling ahash v0.2.18
error[E0433]: failed to resolve: could not find `__rt` in `quote`
   --> /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/failure_derive-0.1.6/src/lib.rs:107:70
    |
107 | fn display_body(s: &synstructure::Structure) -> Result<Option<quote::__rt::TokenStream>, Error> {
    |                                                                      ^^^^ could not find `__rt` in `quote`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0433`.

This is a known bug with the failure crate v1.0.6:

rust-lang-deprecated/failure#342

Steps

  1. Create a dev container instance in vscode
  2. Run cargo build
  3. The build fails with the above error

Possible Solution(s)

Pin the quote crate to v1.0.2

Notes

Output of rustup --version:

rustup 1.21.1 (7832b2ebe 2019-12-20)

Output of rustup show:

active toolchain
----------------

stable-x86_64-apple-darwin (default)
rustc 1.41.0 (5e1a79984 2020-01-27)

Output of tremor-server --version:

Does not build due to cargo build error in this issue

Tremor is logging too much data at "INFO" level for Onramp Kafka consumer

Problem
Hi, It was noticed that Tremor is logging a lot of offset commit messages . These messages are logged every 5 seconds based on the current auto commit settings. However these messages are logged under "INFO" level which may not be wanted.

Here are the sample logs.
2020-03-06T15:50:03.634429051+00:00 INFO tremor_runtime::onramp::kafka - Offsets committed successfully
2020-03-06T15:50:08.662264942+00:00 INFO tremor_runtime::onramp::kafka - Offsets committed successfully

Steps

  1. Open the Tremor log for an instance where an onramp kafka consumer is used.
  2. Look for the above messages.

Possible Solution(s)

  1. Can this be logged only when there is a failure in committing the offsets.
  2. Can it be logged under "DEBUG" along with the offset number that gets committed. Currently even though it is logged, it does not mention the offset number which may not be useful for any troubleshooting .

Add cron entry to crononome origin uri

We use origin uri's to identify the source of events within an onramp - for crononom, right now all entries come from tremor-crononom://<host> it would be nice to extend that to the cron entry that triggered them: tremor-crononom://<host>/<entry>

Shrink large functions

We have a number of functions that are marked as clippy::too_many_lines - it would be good to go over them and see if they can be cut down into easier to understand functions.

Enable doc related lints

Currently we do not enforce documentation on public functions neither do we enforce documentation on errors.

We should have both!

  • enable clippy::missing_errors_doc
  • enable #![deny(missing_docs)]

Complete the websocket ramp set

Describe the problem you are trying to solve

At the moment we only have half the WebSocket ramps.

we have a WebSocket server as an onramp and a WebSocket client as an offramp. To complete the set we should add a WebSocket client onramp and a WebSocket server offramp.

Describe the solution you'd like

The WebSocket client onramp would connect to a given WebSocket server and publish all messages from that server into the connected pipelines.

The WebSocket server offramp would allow clients to connect and send the events it receives to all clients that are connected at the time of arrival.

Better high level description?

This tool allows configuring a pipeline that moves data from a source to a destination. The pipeline consists of multiple steps which are executed in order. Each step is configurable on a plugin basis.

I realize that many pieces of software today are quite abstract, but, surely, this elevator speech could be made more precise?

Ensure windows are non overlapping

We require windows to be non overlapping while this is guaranteed for size based windows we do not check it proactively on time based windows. However this should be an error.

good:

1s, 10s, 30s, 1m

bad:

1s, 10s, 15s, 30s, 1m` (as 15 isn't a multiple of 10)

tremor-script/query code formatter

One of the really good decisions rust has done is standing on rust-format to enforce a consistent style. It would be nice to have such a thing for tremor dialects

Tremor not logging any error when a Kafka offramp fails to start

Problem
Hi, It was observed that Tremor is not logging any error when a Kafka offramp is failed to start. In my case, there was a syntactical error in Kafka offramp config where I was not using string format for one of the rdkafka_options. As a result the underlying Kafka producers could not be started . However Tremor didn't throw any error while the service was started.

Steps

  1. Create an off ramp with below config.
offramp:
- id: kafka
  type: kafka
  metrics_interval_s: 10
  codec: influx
  config:
    brokers:
    - <some broker host:port>
   topic: <some topic>
    rdkafka_options:
      enable.idempotence: true
      **max.in.flight.requests.per.connection: 5**  ## pass in Integer format
  1. start Tremor service & verify the service status
  2. Check Tremor log

Possible Solution(s)

  1. Please see if the error handing can be implemented in such cases.
  2. It would also be great if Tremor can refactor the formats before passing to "rdkafka".

Notes

tremor-runtime:0.6.0
rd_kafka version: 1.2.1

(edit by @Licenser to add code blocks for readability)

Improve load order of config elements

As of now elements are loaded in the order files are defined. This works as long as the command line arguments.

This can be improved by loading all files first and then publishing artifacts in order of ramps, pipelines, bindings and finally mappings.

We partially do this already by always loading trickle files first, it'd be natural to extend it to all configuration files

Update websocket on and offramp to use async instead of threads

We're currently forced into using threads for the websocket on and offramp, for a low or medium number of connections that is fine but we would do better switching to an async model.

In uring we have successfully used async-tungstenite for both the websocket client and server it would make a good replacement for the actix websocket client and server.

exec connector

From Kevin

A connector that executes a executable upon receiving an event via the sink part, passes the event to the childprocess stdin, the childprocess stderr ends up as event to the err port and the stdout as event to the out port.

Fix null matching in tremor-script

Problem

Matching for null in case arms is not working as intended.

Steps

For the following tremor-script snippet, we expect the output "is null", but we get "is set".

let event = null;

match event of
  case null => "is null"
  default => "is set"
end;

Notes

Tested with tremor-script release v0.7.3

There's a workaround to do null checks on variables by using the type::is_null function explicitly:

let event = null;
match type::is_null(event) of
  case true => "is null"
  default => "is set"
end;

Change tcp offramp to be mio based

This will bring the tcp offramp in parallel to the tcp onramp, which is already mio based. Should also result in a performance boost for the offramp.

logging message payload when "influx" codec validation fails

Hi, I am looking for some logging of the actual payload when a message fails at codec validation. Right now it is logging the below info for "influx" codec validation.

2020-02-26T13:28:37.125891449+00:00 ERROR tremor_runtime::onramp::prelude - [Codec] invalid digit found in_ string

Describe the solution you'd like
It would be great if the actual "invalid" character is printed at "ERROR" level and the entire payload can be printed at "DEBUG" level.

Thanks

(edit by @Licenser add code blocks for readability)

Fix version reporting from tremor commandline tools

Problem

Tremor CLI tool (tremor-script, tremor-query, tremor-tool) report old versions.

Steps

Run commands like tremor-script --version.

Possible Solution(s)

Version number is hardcoded in places like:
https://github.com/wayfair-tremor/tremor-runtime/blob/v0.7.3/tremor-script/src/main.rs#L70

Can use CARGO_PKG_VERSION environment variable to align the version there with that defined in Cargo.toml. Example:
https://github.com/wayfair-tremor/tremor-language-server/blob/v0.7.3/src/main.rs#L27

improve detection for wrongly named meta parameters shared between operators

Describe the problem you are trying to solve
When creating more complex pipelines with multiple operators it is easy to name $meta variables badly and get unexpected behaviour.

For example using the script:

let $class = "class";
let $dimension = "dimension";
let $rate = 42;

to configure a bucket operator it would lead to dimensions being ignored since the expected variable is $dimensions not $dimension.

Describe the solution you'd like

I am not 100% sure how to do that but it'd be helpful for users to be warned of those little differences in naming.

Notes

This request is the result of a few days of debugging on the above-given example :(

Improve thread naming

Right now thread naming doesn't reflect what's running in a thread, for example a thread running a pipeline is named pipeline-tremor instead it might be better to name it <pipeline name>-<instance name> to allow identifying it easierl

Generalise ramps

Describe the problem you are trying to solve

We currently don't have a generalized interface or trait for ramps, this makes reusability hard and prevents us from sharing the work that goes into different ramps. Further down the road this can should be able to serve as the foundation for ramps in the PDK.

Describe the solution you'd like

A generic interface for ramps that would serve as part of the SDK and allow ramps to be reusable as rust components

Notes

This is the implementation of the RFC

include onramp id in errors and thread names

Describe the problem you are trying to solve

Onramps currently do not know their identity, (id) so errors logged and threads are not named to reflect the onramp ID. Having the ID will allow identifying errors or threads more easily.

Describe the solution you'd like

Include the onramp id into the onramp start function and add it to errors and thread/task naming.

Resolve tremor server crash when tcp offramp endpoint is down

Seeing this error when offramp endpoint is not accessible.

tremor version: 0.6.0
rd_kafka version: 0x000002ff, 1.2.2
[2020-02-10T12:05:05Z ERROR tremor_server] error: Connection refused (os error 111)
error: Connection refused (os error 111)

evaluate snmalloc as a mimalloc replacement

Describe the problem you are trying to solve

In our use-case we're heavily dependant on inter-thread communication, events arrive in the onramp thread, processed in the pipeline thread and send in the offramp thread. This leads to data being allocated (mostly) in the onramp thread, modified in the pipeline thread and deallocated in the offramp thread. This producer-consumer style (as the snmalloc paper calls it) data flow can benefit greatly from the optimizations done in snmalloc.

Describe the solution you'd like

We should evaluate, benchmark, test and eventually adopt snmalloc as a replacement for mimalloc.

Improve vscode dev container experience

Describe the problem you are trying to solve

Currently the dev container support for tremor is good but working natively on OS X
and natively in the dev container requires some compromises which cost developer
time. The enhancements below should reduce the context switching overheads

Describe the solution you'd like

  • Add support for linux-tools perf
  • Add support for Brendan Gregg's flamegraph
  • Add support for valgrind
  • Allow Mac OS X and vsc dev container to co-exist with different cargo target paths
  • Integrate perf/flame/valgrind with benchmark system
  • Investigate dev container setup tweaks for Mac Os X dev environments

Notes

  • Optional. Consider adding convenience scripts as appropriate

Add support for line prefixes in stdout offramp

Problem

Configuring a prefix in the stdout offramp has no effect. The prefix is currently
ignored and this impacts the debugging/dev experience when developing new
pipelines/queries.

Steps

Add the following to the offramp configuration section for a passthrough pipeline:

  - id: out
    type: stdout
    codec: json
    config:
      prefix: "Does-not-print> "

The prefix does not print as the beginning of every line as expected.

Notes

Output of rustup --version:

rustup 1.21.1 (7832b2ebe 2019-12-20)

Output of rustup show:

stable-x86_64-apple-darwin (default)
rustc 1.41.0 (5e1a79984 2020-01-27)

Output of tremor-server --version:

tremor version: 0.7.3
rd_kafka version: 0x000000ff, 1.3.0
tremor-runtime 0.7.3

Performance degradation with simple trickle pipeline vs. yaml

Non aggregating pipelines described in trickle perform measurably worse than their yaml counterparts. This is due to additional computational complexity introduced in trickle in regards of grouping and aggregation - even when the respective mechanics+structuring are not used tremor currently does not optimise overheads for these cases away.

1st measurements indicate up to 60% reduction in performance for worst case scenarios. (160 MB/s vs 400 MB/s pinned benchmark)

Possible candidates for optimisation are:

  1. statements such as select event from run time into grouper can be compiled out and simply linked directly
  2. pass-through operators for outgoing connectors can be removed
  3. handling simple selects (i.e. select event from ...) better ( optimisation )
  4. compile-time optimisation for ungrouped

A simple workaround is to prefer yaml when no aggregations/grouping are in play.

RUSTSEC-2018-0006: Uncontrolled recursion leads to abort in deserialization

Uncontrolled recursion leads to abort in deserialization

Details
Package yaml-rust
Version 0.3.5
URL chyh1990/yaml-rust#109
Date 2018-09-17
Patched versions >= 0.4.1

Affected versions of this crate did not prevent deep recursion while
deserializing data structures.

This allows an attacker to make a YAML file with deeply nested structures
that causes an abort while deserializing it.

The flaw was corrected by checking the recursion depth.

See advisory page for additional details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.