Coder Social home page Coder Social logo

gstepien / transceiver_framework Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 4.45 MB

Transceiver Framework: A framework for concurrent multi stage processing of data & MDP: An online motif detector and predictor embedded in the transceiver framework.

License: Other

Java 93.40% R 4.08% Shell 2.52%
motif-discovery motif-analysis motif-detector motif stream-processing time-series-analysis time-series-prediction time-series-forecasting data-stream-mining concurrency

transceiver_framework's Introduction

Transceiver Framework and MDP: A Motif Detector and Predictor

The Transceiver Framework project provides a JAVA infrastructure for parallel, multistage processing of data and data streams in particular. The MDP is a concrete transceiver framework application for the detection and prediction of motifs and labels based on multivariate, heterogeneous data streams.

Transceiver Framework

Example transceiver framework configuration

Introduction

The Transceiver Framework has the following characteristics:

  • It provides a simple JAVA API for:
    • Defining the data processing infrastructure (order of data processing steps, forks and joins of intermediate processed streams, etc.).
    • Plugging in custom methods for each data processing step.
  • It has a high level of concurrency and scalability.
  • The user can focus (almost) completely on the implementation of his or her data processing methods without having to worry about the technical infrastructure, concurrency issues, etc.

More detailed information about the Transceiver Framwork's structure and inner workings can be found in ./doc/doc.pdf file.

Prerequisites

  • JAVA version >= 1.8, we tested it with OpenJDK (version 1.8.0_191)
  • The transceiver framework itself is implemented in a platform independent manner.
  • In case you want to use the Shell scripts mentioned in the next section, you require:
    • A Unix-Shell like Bash. Linux systems typically come with a pre-installed Shell. On Windows, you might want to try an environment like Cygwin (this might work, but we did not test it).
    • Maven (tested with version 3.3.9) for project compilation and jar creation based on the pom.xml files.

Installation & example run

Note: . represents this repository's root directory.

  1. Clone this repository and its submodules (if present) to your local hard drive via:
    • Either: Console command:
          git clone --recurse-submodules https://github.com/GStepien/Transceiver_Framework.git &&
              cd Transceiver_Framework &&
              git fetch --tags && 
              git merge FETCH_HEAD &&
              cd ..
    • Or: Download the Shell script ./git_clone_recursive.sh from here, make it executable via chmod u+x ./git_clone_recursive.sh and execute it. The script clones this repository into the directory from which it is executed.
  2. Fetching later updates (including updates of potential submodules):
    • Either: Console command:
          git fetch && git fetch --tags && git merge FETCH_HEAD &&
              git submodule update --init --recursive &&
              git submodule update --recursive
    • Or: Execute the Shell script ./git_pull_recursive.sh
  3. Example run:
    • Execute the Shell script ./examples/transceiver_framework/muxdemux/execute.sh. This triggers the following actions:
      1. If no ./transceiver_framework/transceiver_framework/transceiver_framework.jar file exists yet, the script executes the command: ./install.sh "./transceiver_framework/pom.xml" which:
        1. Pulls the newest code version.
        2. Executes mvn clean -f <provided_path_to_pom.xml>.
        3. Executes mvn install -f <provided_path_to_pom.xml>.
          • The latter installs the transceiver framework into the local maven repository and creates a jar with the aforementioned name and location. Apart from the compiled code, the jar file also contains all required dependencies.
      2. Executes a transceiver framework run based on the configuration in ./examples/transceiver_framework/muxdemux/config/. The structure of this particular framework configuration is depicted in the figure above.
      3. The duration of the execution is limited by the "run_for" field in ./examples/transceiver_framework/muxdemux/config/root_config.json, which is set to 60000 milliseconds (1 minute) here.

We highly recommend to study the code in ./examples/transceiver_framework/muxdemux/execute.sh and ./transceiver_framework/transceiver_framework.sh which, along with the configuration files in ./examples/transceiver_framework/muxdemux/, ./examples/transceiver_framework/muxdemux/config/ and the code in ./transceiver_framework/transceiver_framework/src/main/java/gs/examples/tf/muxdemux/, serve as a usage template.

Version

Release versions correspond to commits to the master branch with a commit tag <version>-TF-RELEASE. Checkout this version via git checkout <version>-TF-RELEASE.

The current release version is: 1.0.0.

License

See ./LICENSE.md.

MDP: A Motif Detector and Predictor

MDP

Introduction

The MDP is a data analysis tool build upon the transceiver framework. It performs the following tasks:

  1. It receives a number of numeric and string label data streams as input and transforms each numeric stream into a string label stream. For each numeric input stream, said transformation is performed as follows:
    • MDP continuously searches for motifs (i.e., repeating subsequences) in that numeric stream and transforms it into a string stream by replacing each numeric value with a string label indicating whether the corresponding value was classified as being part of a motif and or not.
  2. MDP forwards the input string label streams as well as the string label streams resulting from the transformations to a language model trainer. The latter continuously trains and evaluates a language model based on the input data and the motif ground truth.

The input data streams are generated by an instance of our Synthetic Datastream Generator (SGD) (which also annotates the numeric data with the required motif ground truth). The latter is already included as a Git submodule in this repository's ./mdp/mdp/submodules/ folder. The structure of MDP as a concrete transceiver framework instance is depicted in the figure above.

Prerequisites

In addition to the requirements from the "Transceiver Framework" section, MDP requires:

  • For the SGD:
    • Installed R (https://www.r-project.org/), we tested MDP with R version 3.4.4.
    • A Unix system: For the communication between the MDP (which is implements in JAVA) and the SDG (which is implemented in R), we use Rserve 1.7-3. According to their Website, multiple connections to a single Rserve Server instance are currently (as of December 2018) only supported on Unix Systems. We ran our tests under Ubuntu 16.04 LTS and 18.04 LTS (both 64 bit).
  • Cores cores cores, the more the better ...

Installation & example run

Note: . represents this repository's root directory.

  1. Clone this repository to your local hard drive just as described above (the MDP project currently resides in the same Git repository as the transceiver framework - this will probably change in the future).
  2. Fetching later updates: Again, see above.
  3. Example run:
    • Execute the Shell script ./examples/mdp/01/execute.sh. This triggers the following actions:
      1. If no ./mdp/mdp/mdp.jar file exists yet, the script executes the command: ./install.sh "./mdp/pom.xml" The latter:
        1. Pulls the newest code version.
        2. Executes mvn clean -f <provided_path_to_pom.xml>.
        3. Executes mvn install -f <provided_path_to_pom.xml>.
          • The latter installs the MDP into the local maven repository and creates a jar with the aforementioned name and location. Apart from the compiled code, the jar file also contains all required dependencies (including those from the transceiver framework).
      2. Executes a MDP run based on the configuration in ./examples/mdp/01/config/. The (simplified) structure of the MDP in its role as a particular transceiver framework configuration is depicted in the figure above.
      3. The duration of the execution is limited by the "run_for" field in ./examples/mdp/01/config/root_config.json, which is set to 1800000 milliseconds (30 minutes) here.
      4. MDP creates a number of statistics in a newly created ./examples/mdp/01/stats/ folder. After the execution has finished, the script executes the R-script ./examples/mdp/01/evaluate.R which, based on the previously generated statistics, creates a number of graphs in a newly created ./examples/mdp/01/graphs/ folder.

The folder ./examples/mdp/01-24h_run/ contains the configuration and statistics of a 24 hour MDP run. Apart from the longer runtime, it is configured just like the example from ./examples/mdp/01/ (in particular, SDG is configured to generate the same data). The figure below contains this run's prediction error over time: Trained language model prediction error over time The prediction error at each point refers to the average error over the last 100 predictions. The error modes are:

  • "full": Refers to the average prediction error over all dimensions.
  • "char": Refers to the average prediction error when only considering dimensions corresponding to string label input streams.

We highly recommend to study the code in ./examples/mdp/01/execute.sh and ./mdp/mdp.sh which, along with the configuration files in ./examples/mdp/01/, ./examples/mdp/01/config/ and the code in ./mdp/mdp/src/, serve as a usage template.

Version

Release versions correspond to commits to the master branch with a commit tag <version>-MDP-RELEASE. Checkout this version via git checkout <version>-MDP-RELEASE.

The current release version is: 1.0.0.

License

See ./LICENSE.md.

Copyright (c) 2018 Grzegorz Stepien

transceiver_framework's People

Contributors

gstepien avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.