Coder Social home page Coder Social logo

vega's People

Contributors

0xflotus avatar ajprabhu09 avatar alecmocatta avatar ariesdevil avatar colindean avatar edprince avatar enochack avatar frqc avatar gitter-badger avatar gzsombor avatar huitseeker avatar iduartgomez avatar igosuki avatar lumost avatar marcelbuesing avatar nimpruda avatar rajasekarv avatar rodrigocfd avatar shubhagupta avatar steven-joruk avatar ugoa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vega's Issues

Use this as backend for java Spark?

As Spark uses RDD under the hood, would it be possible and make sense to use native_spark as the backend for the official java Spark version?

Tracking issue: async integration

After some talk we have decided to take a careful gradual approach to integrate async into the library.

Adding asynchronous computation is a large departure from the reference Spark implementation, and may change how we do certain things or what is possible (like certain optimizations that rely on stack allocation in our case) in ways that are not yet clear.

Therefore, is preferred to take a gradual approach as we explore the design space and evolve the library. The original work can be seen at #67, some work done in that preliminary PR will be ported to the main branch and more steeps will be taken to make testing and comparing both versions easily while we experiment.

Meanwhile an async branch will be maintained and kept in sync with the master branch.

Preliminary work

  • Finalize shuffle fetcher asynchronous version (maintain compatibility with the sync caller for now; joining the handle output).
  • Port the work to handle async runtime instantiation/handling to integrate with third parties.
  • Port the work so tasks at the scheduler are ran asynchronously.
  • Port the work so the worker executor is asynchrnous.
  • Use capnp async readers (fix remaining issues within the worker executor).
  • Automate testing and profiling for distributed mode so we can check for regressions/gains.
  • Add a number of tests to cover for different kinds of workloads so both versions can be compared.

Future work

  • Explore if it's possible to chain iterators on the stack to take advantage of compiler optimizations for narrow dependency tasks. (Check compiler assembly output to check the compiler optimizations.)
  • Add micro-benchmarks to check for performance characteristics of the previous tasks.
  • Architecturize the execution of this kind of work in a way that both the sync and async versions can take advantage of possible optimizations.

Why Capnp ?

Hi,
I was wondering why do you use capnp instead of rust library that would be easier to install ?
I feel like the benefits dont overcome the installation overhead.

Improve error handling and trazability on executor and user code crashes

This issue can be mentored for anyone who may want to help.

While this has improved we still have a whole lot of unwrapping around; since we still are in a very early phase is not necessary to go crazy on this since many things will change several times (probably). But, that said, and while definitively in some places panics and abortion should happen if anything goes wrong, for the sake of traceability better error handling is good to have even if the whole application ends crashing.

In particular we should inventory where exactly are the boundaries between executor ran code and driver ran code, and crashes in user code and executors should be handled, reported to and gracefully handled by the driver, which then should take a clear plan of action depending on the error (e.g. if the executor detects a problem in one of its threads while running code action should be one, if it dies due to some other reason other, etc.).

First task should be to inventory all the call places where is necessary to take action (only a fraction of all the unwraps really) and then extend/modify methods to return proper Return types which then can be used to shut down, signal drivers, clean up, etc.

Related to #26 and #25 (shall end fixing up that issue).

[Core] Roadmap to 0.1.0

This is a tracking issue for the roadmap to 01.0 potential release of the core crate/package.

Fix non working examples

There are a couple of examples that won't work because the necessary data is not available:

  • file_read
  • parquet_column_read

We should either provide a small sample with the tests or change the tests with some fake data so those examples can be executed.

Use kubernetes for scheduling

Directly using kubernetes scheduling both nicely integrates into cloud providers but also saves you code to maintain.

Preliminary benchmarcks?

Everything is in the title, I understand that the project is young and it needs time to get faster than spark.
I'm just asking the current state, out of curiosity.

Handling resources destruction when program exits

Ctrl-C handling, proper destruction of resources in case of panic and remove explicit drop executor logic. Instead of cloning Context like currently, create a single context and wrap it inside ref count and move resource destruction logic like deleting all temp files and closing all spinned processes inside Drop trait.

Deadlock while partitioning

As talked in Gitter, while developing union I found out a problem where the application enters a deadlock while resolving the partitioning or computation of a dag. The workign branch is: https://github.com/iduartgomez/native_spark/tree/dev

The error is reproducible executing:

#[test]
fn test_error() {
    let sc = CONTEXT.clone();
    let join = || {
        let col1 = vec![
            (1, ("A".to_string(), "B".to_string())),
            (2, ("C".to_string(), "D".to_string())),
            (3, ("E".to_string(), "F".to_string())),
            (4, ("G".to_string(), "H".to_string())),
        ];
        let col1 = sc.parallelize(col1, 4);
        let col2 = vec![
            (1, "A1".to_string()),
            (1, "A2".to_string()),
            (2, "B1".to_string()),
            (2, "B2".to_string()),
            (3, "C1".to_string()),
            (3, "C2".to_string()),
        ];
        let col2 = sc.parallelize(col2, 4);
        col2.join(col1.clone(), 4)
    };
    let join1 = join();
    let join2 = join();
    let res = join1.union(join2).unwrap().collect().unwrap();
    assert_eq!(res.len(), 12);
}

Inside some executor there is a thread panic over here:

let mut stream_r = std::io::BufReader::new(&mut stream);
let message_reader = serialize_packed::read_message(&mut stream_r, r).unwrap()

Refactor scheduler to remove duplicity/unused code

We got right now:

  • local_scheduler.rs
  • distributed_scheduler.rs
  • base_scheduler.rs
  • dag_scheduler.rs

local and distributed have some duplicate code still (basically the event loop and run job) which could be factored into a common trait (or pull it inside impl_common_scheduler_funcs macro too).

Then we have currently the base_scheduler (or NativeScheduler trait) which should be merged with dag_scheduler and made clear. Initially NativeScheduler was created to hide implementationfrom the public API (DAGScheduler can in theory be implemented by the user, but first a clear API should be found), NativeScheduler should implement DAGScheduler if we decide to go along this path. Then DAHScheduler should be required to context which would be generic over it (I guess).

Nothign to pressing but must do some clean up around all this eventually.

Add API cargo doc documentation

Even if we are not publishing yet to crates.io would be nice to have the cargo doc documentation generated and uploaded to a branch here somewhere so we can reference it in the documentation/readme.

No graceful shutdown on panic inside executors.

Right now when there is a panic inside an executor the process is left open (at least in local mode) indefinitely and doe snot shutdown, the only way to terminate it is by sending SIGKILL to the master.

OS: Linux
Architecture: x86_64
Replication: Just write "assert!(false)" inside a map function to be executed (MapRDD).

How to run the codes in examples?

I create a test_native_spark project and copy codes in make_rdd.rs to the project's main.rs
Then run this project using cargo +nightly-2019-09-11 run got some errors:

thread 'main' panicked at 'Unable to open the file: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/libcore/result.rs:1165:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

How can I solve this?

Join contribute ?

Can I join the collaboration, I am eager to join, I think I can contribute

Improve application configuration execution/deployment

Right now the way we are doing configuration is a bit lacklustre in the following way: we are using clap to parse many of the configuration parameters, passing them by command line argument, this creates a problem where in an user created application it will collide with their own command line arguments.

Similarly, this already collides with cargo own optional parameters, for example something like this will fail: cargo test -- --test-threads=1.

We must provide a more elegant and ergonomic way to pass configuration parameters which may not collide with user (or generated, e.g. cargo) code. A first approach is to add/revamp the configuration file we are already using (hosts.conf) to include more configuration parameters, which we would eventually have to do anyway. Additionally, centralize all the environment variables configuration managment (under env.rs) on initialization and document that, so the user can use those to set up any required parameters.

Also for local execution and testing, many of the defaults could be provided (e.g. NS_LOCAL_IP) so they don't require to be provided either by env variable or argument parameter (e.g. Spark itself assigns a free local ip if necessary when executing in local mode).

build failed!!

errorddeMacBook-Pro:native_spark d$ cargo build
Compiling native_spark v0.1.0 (/Users/d/Work/opensource/native_spark)
Compiling bincode v1.2.0
Compiling serde_closure v0.2.7
Compiling rustc_version v0.2.3
error: failed to run custom build command for native_spark v0.1.0 (/Users/d/Work/opensource/native_spark)

Caused by:
process didn't exit successfully: /Users/d/Work/opensource/native_spark/target/debug/build/native_spark-3382f7e3c05897a6/build-script-build (exit code: 101)
--- stderr
thread 'main' panicked at 'capnpc compiling issue: Error { kind: Failed, description: "Error while trying to execute capnp compile: Failed: No such file or directory (os error 2). Please verify that version 0.5.2 or higher of the capnp executable is installed on your system. See https://capnproto.org/install.html" }', src/libcore/result.rs:1165:5
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace.

warning: build failed, waiting for other jobs to finish...
error: build failed

Add connector for AWS S3

Implement a connector to read/write from/to AWS S3. For inspiration maybe look at a the HDFS FS interface. If possible try to come with a common interface we could reuse for other cloud providers (and potentially any "fs-like" source).

Run clippy and fix error lints

I downloaded the source and trying to build with clippy there are quite a few lint warning but specially some lint errors like: clone double reference, drop on copy types, etc.

If there is no special reason why clippy is not being run I would advice to run it and clean up the code (specially the error lints). I am up to do a PR cleaning it up if you wish.

Tracking issue: Implementation of lacking core RDD ops

For core RDD ops we understand those which spawn in the original Apache Spark from SparkContext and/or the base RDD class and friends:
SC:

  • range
  • filter
  • randomSplit
  • sortBy
  • groupBy
  • keyBy
  • zipPartitions
  • intersection
  • pipe
  • zip
  • substract
  • treeAggregate
  • treeReduce
  • countApprox
  • countByValue
  • countByValueApprox
  • min and max
  • top
  • takeOrdered
  • isEmpty

Non-goals for this tracking issue are any I/O related ops as we are tracking those elsewhere and doing things a little bit differently:

  • textFile
  • wholeTextFiles
  • binary files | binary records
  • Hadoop* family of methods

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.