Coder Social home page Coder Social logo

cnsuhao / flashx Goto Github PK

View Code? Open in Web Editor NEW

This project forked from flashxio/flashx

0.0 1.0 0.0 35.94 MB

FlashX is a collection of big data analytics tools that perform data analytics in the form of graphs and matrices.

Home Page: http://flashx.io/

License: Apache License 2.0

CMake 0.25% Python 0.03% Shell 0.32% Makefile 0.84% Perl 0.43% C++ 97.93% R 0.10% C 0.10%

flashx's Introduction

This repo contains the core of the FlashX project, which provides big data analytics tools that perform data analytics in the form of graphs and matrices. As such, FlashX covers a large range of data analysis tasks. All tools in FlashX utilize solid-state drives (SSDs) to scale data analysis to large datasets in a single machine, while achieving lightning speed (SSD-based solutions run almost as fast as in-memory solutions). The main components in FlashX are FlashGraph and FlashMatrix.

FlashGraph

FlashGraph is a general-purpose graph analysis framework that exposes vertex-centric programming interface for users to express varieties of graph algorithms. FlashGraph scales graph computation to large graphs by keeping the edges of a graph on SSDs and computation state in memory. With smart I/O scheduling, FlashGraph is able to achieve performance comparable to state-of-art in-memory graph analysis frameworks and significantly outperforms state-of-art distributed graph analysis frameworks while being able to scale to graphs with billions of vertices and hundreds of billions of edges. Please see the performance result.

FlashMatrix

FlashMatrix is a matrix computation engine that provides a small set of generalized matrix operations on sparse matrices and dense matrices to express varieties of data mining and machine learning algorithms. For certain graph algorithms such as PageRank, which can be formulated as sparse matrix multiplication, FlashMatrix is able to significantly outperform FlashGraph.

Programming interface

FlashX exposes C++, R and Python programming interface. The R and Python programming interface is highly compatible with the R base package and NumPy. As such, users can execute R and Python machine learning code on FlashX with little or no modification. Our goal is to eventually make the R and Python interface fully compatible with the ones in native R and NumPy.

  • FlashR provides many matrix operations in the R base package.
  • FlashGraphR exposes many graph algorithms in FlashGraph to R.
  • FlashR-learn is a machine learning library implemented completely with FlashR.
  • FlashPy provides many array operations in NumPy.

FlashX Quick start guide

FlashGraph programming tutorial.

FlashR programming tutorial

FlashX performance and scalability

Publications

Da Zheng, Disa Mhembere, Joshua T. Vogelstein, Carey E. Priebe, and Randal Burns, “FlashMatrix: Parallel, scalable data analysis with generalized matrix operations using commodity ssds,” arXiv preprint arXiv:1604.06414, 2016 [pdf]

Da Zheng, Disa Mhembere, Vince Lyzinski, Joshua Vogelstein, Carey E. Priebe, and Randal Burns, “Semi-external memory sparse matrix multiplication on billion-node graphs”, Transactions on Parallel and Distributed Systems, 2016. [pdf]

Heng Wang, Da Zheng, Randal Burns, Carey Priebe, Active Community Detection in Massive Graphs, SDM-Networks 2015 [pdf]

Da Zheng, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E. Priebe, Alexander S. Szalay, FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs, FAST'15, [pdf][bib]

Da Zheng, Randal Burns, Alexander S. Szalay, Toward Millions of File System IOPS on Low-Cost, Commodity Hardware, in Proceeding SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, [pdf][bib]

Contact

Mailing list: [email protected]

Join the chat at https://gitter.im/icoming/FlashGraph

flashx's People

Contributors

gitter-badger avatar manpen avatar zheng-da avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.