Coder Social home page Coder Social logo

aaaaaaa323313 / unicap Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cap-ntu/unicap

0.0 1.0 0.0 0 B

An Unified Computing Engine for Fast Data Processing

Home Page: http://cap-ntu.github.io/UniCAP

CMake 0.14% C++ 95.70% C 3.99% Thrift 0.06% Objective-C 0.11%

unicap's Introduction

#UniCAP: A Unified Computing Engine for Fast Data Processing

The UniCAP project is a distributed computing engine for executing data parallel programmes, which consist of a complex of directed-acyclic-graph (DAG) of tasks. It unifies Batch Processing and Stream Processing in one system using a timed dataflow model. Clients can either use UniCAP in NTU Big Data Platform (BDP), or build from source codes.

The main design features of UniCAP are:


###Timed Dataflow

Timed Dataflow aims at reducing the communication overhead for iterative jobs with shared variables. In such jobs, the edge node reads input data and shared variables, and updates the shared variables at each iteration. This kind of application includes:

  • Supervised Machine Learning: To train a model, the supervised machine learning applications (e.g., logistic regression, artificial neural network, etc.) need to update the parameters, which are the shared variables, based on the input data and old parameters at each iteration.
  • Graph Processing: The graph processing jobs (e.g., single source shortest path, pagerank, etc.) usually update the graph nodes' weights, which are the shared variables, based on the graph data and old weights at each iteration.

Precious approaches like Hadoop, Spark have high communication overhead to fetch all the shared variables at each iteration. Our experiments show that the shared variable query time can take up to 60% of total execution time of an iteration. However, our experiments also show that a large part of shared variables are static values during the computation. However, previous approaches always fetch all the shared variables, even if there is no change.

Time Dataflow tackles this problem by adding logical timestamp to the shared variables. Thus, at each iteration, the computation nodes (which caches the old shared variables in previous iteration) only need to fetch the changed shared variables rather than all the shared variables. Experiments show that timed dataflow can accelerate logistic regression and pagerank 30% and 45% respectively, compared to Spark.


###Hybrid Vertex

UniCAP embodies a set of storage systems as its Vertexes. Generally, UniCAP can support both pure-memory and mix memory-disk storage.


###Hybrid Edge (In Development)

UniCAP supports schedule tasks using both GPU and CPU in a single application. Compared to other approaches, which can only use CPU or GPU, UniCAP maximize the system resource utilization.

unicap's People

Contributors

sunpengsdu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.