Coder Social home page Coder Social logo

hehuiyuan / streamdm Goto Github PK

View Code? Open in Web Editor NEW

This project forked from huawei-noah/streamdm

0.0 2.0 0.0 3.19 MB

Stream Data Mining Library for Spark Streaming

Home Page: http://streamdm.noahlab.com.hk/

Scala 65.00% Shell 0.06% Python 0.58% Java 9.92% Ruby 0.36% JavaScript 12.86% HTML 2.98% CSS 8.23%

streamdm's Introduction

streamDM for Spark Streaming

streamDM is a new open source software for mining big data streams using Spark Streaming, started at Huawei Noah's Ark Lab. streamDM is licensed under Apache Software License v2.0.

Big Data Stream Learning

Big Data stream learning is more challenging than batch or offline learning, since the data may not keep the same distribution over the lifetime of the stream. Moreover, each example coming in a stream can only be processed once, or they need to be summarized with a small memory footprint, and the learning algorithms must be very efficient.

Spark Streaming

Spark Streaming is an extension of the core Spark API that enables stream processing from a variety of sources. Spark is a extensible and programmable framework for massive distributed processing of datasets, called Resilient Distributed Datasets (RDD). Spark Streaming receives input data streams and divides the data into batches, which are then processed by the Spark engine to generate the results.

Spark Streaming data is organized into a sequence of DStreams, represented internally as a sequence of RDDs.

Included Methods

In this current release of StreamDM v0.2, we have implemented:

we also implemented following data generators:

  • HyperplaneGenerator
  • RandomTreeGenerator
  • RandomRBFGenerator
  • RandomRBFEventsGenerator

We have also implemented SampleDataWriter, which can call data generators to create sample data for simulation or test.

In the next release of streamDM, we are going to add:

  • Classification: Random Forests
  • Multi-label: Hoeffding Tree ML, Random Forests ML
  • Frequent Itemset Miner: IncMine

For future works, we are considering:

  • Regression: Hoeffding Regression Tree, Bagging, Random Forests
  • Clustering: Clustree, DenStream
  • Frequent Itemset Miner: IncSecMine

Going Further

For a quick introduction to running StreamDM, refer to the Getting Started document. The StreamDM Programming Guide presents a detailed view of StreamDM. The full API documentation can be consulted here.

Environment

  • Spark 2.3.2
  • Scala 2.11
  • SBT 0.13
  • Java 8+

Mailing lists

User support and questions mailing list:

[email protected]

Development related discussions:

[email protected]

streamdm's People

Contributors

jianfengqian avatar smaniu avatar abifet avatar hmgomes avatar nhnminh avatar tianguangjian avatar roywjt avatar ioanna-ki avatar benbenqiang avatar jochenschneider avatar kanata2 avatar zhensongqian avatar zhangjiajin avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.