Coder Social home page Coder Social logo

bredxbred's People

Contributors

dakusui avatar danfinkelstein avatar dlazesz avatar edwardbadboy avatar erikfrey avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

dlazesz

bredxbred's Issues

Support Raspbian

Currently, xbred doesn't work on Raspbian. Support it whicl bred does.

The first failure TESTRESULT:bredcheckenv_nc_supports_listen_mode:FAIL:1:1c1 is trivial (nc on pi is just giving a different message).

More import thing is the test doesn't finish forever.
Apparently, if more than one jobs are defined in .xbred file and they are connected as a pipeline, second and following jobs don't finish forever.

Maybe the first job tries to connect to the next one before it becomes ready and fails.

pi@botaneiates ~/work/bredxbred $ tests/br-test.sh 
Running all defined tests
TESTRESULT:bredcheckenv_awk_is_installed:PASS
TESTRESULT:bredcheckenv_breddir_available:PASS
TESTRESULT:bredcheckenv_brp_is_installed:PASS
TESTRESULT:bredcheckenv_loginshell_is_bash:PASS
TESTRESULT:bredcheckenv_nc_is_installed:PASS
TESTRESULT:bredcheckenv_nc_supports_listen_mode:FAIL:1:1c1
< 2

---
> 0
* EXPECTED:2
* ACTUAL  :0
TESTRESULT:bredcheckenv_passwordless_ssh_is_ok:PASS
TESTRESULT:bredcheckenv_pv_is_installed:PASS
TESTRESULT:bredtest_1mapper:PASS
TESTRESULT:bredtest_1mapper_sort_on_out_no:PASS
TESTRESULT:bredtest_1reducer:PASS
TESTRESULT:bredtest_1reducer_sort_on_out_no:PASS
TESTRESULT:bredtest_2mappers:PASS
TESTRESULT:bredtest_2mappersK2:PASS
TESTRESULT:bredtest_2reducers:PASS
TESTRESULT:bredtest_2reducersK3:PASS
br-test.sh: WARNING: bred is now in test mode. (config=/home/pi/work/bredxbred/bred.testconf is being used)
TESTRESULT:bredtest_bredfs_init:PASS
br-test.sh: WARNING: bred is now in test mode. (config=/home/pi/work/bredxbred/bred.testconf is being used)
TESTRESULT:bredtest_bredfs_read:PASS
br-test.sh: WARNING: bred is now in test mode. (config=/home/pi/work/bredxbred/bred.testconf is being used)
TESTRESULT:bredtest_bredfs_write:PASS
TESTRESULT:bredtest_compat_wordcount:PASS
TESTRESULT:bredtest_compat_wordcount_sort_on_out_disabled:PASS
TESTRESULT:bredtest_compat_wordcount_with_2reducers:PASS
TESTRESULT:bredtest_default_sort_2nd_column:PASS
TESTRESULT:bredtest_default_sort_default_column:PASS
TESTRESULT:xbredtest_xbred_01:PASS
(Blocked)

Create environment checker

Create a small script that checks if the environment is capable of running bashr.
It will check

  1. All the hosts are ssh-able without password
  2. All the hosts have required software component
  3. pv
  4. nc.traditional is installed as default of nc
  5. awk
  6. ...

Execute awk function in a reduce task

Currently reduce task is written as general shell script and you can specify your favorite interpreter.
And in bred's reducer the code you give will be executed using the interpreter for each key.
But since generally speaking there are a lot of keys processed in a reducer, this means so many context switches (external command executions) will be made.
Making it possible to call a user defined awk function from inside reducer, which is written in an awk string, would be performance-wise very benefitial.

Implement better data exchange mechanism

Currently bred's output goes back to the master node when each job finishes and this is not ideal implementation.
A mechanism where a map job stores data set on the host on which it is running and following reduce jobs which run on the same host can reuse them would be preferrable.

Example not working (need more documentation)

Could you provide a complete example for wordcount?

I'm using Ubuntu 16.04.
I've followed the steps to setup the program.
Kept the default settings.
I'm running the following command:

cat input.txt | wordcount/main.xbred > output.txt

It says:

xbred: starting pipeline
xbred: done

But output.txt is empty and there is some processing after the main program exited.

Where supposed the output to go?

Implement debug mode

Currently it's very hard to identify the problem when we have a errorneous mep/reduce command line.
Create a feature which allows those command lines to write stdout/err to files.

Those files can be created on remote (since it is a debug purpose and it can't be justified to huge communications over network)

Directory listing support

bred has a distributed file system support, but it still doesn't have a directory listing feature (etc).
Although I'm skeptical if to what extent it is useful for users, as long as we are calling it some sort of 'file system', it would be good to have directory handling features.

Improve error handlings

Improve error handlings.
Currently we don't have a fixed policy for situation where we hit errors in user codes.
We should establish it and implement it.

Create 'program distribution' mechanism

Instead of moving aroung data, transmitting a processing program is an approach of map reduce.
The processing program bashreduce is relying on is essentially a bash one liner.
But quoting/escaping hell is really painful. (just running simple sed/awk commands isn't comfortable enough)

Somehow a program distribution mechanism desirable.

Basic idea is

  1. Create a wrapper script. This will distribute the entire bashr pipeline definition.
  2. A pipeline definition will contain
    1. aliases or functions which define map/reduce tasks used in the pipeline
    2. how they are connected
    3. (etc, if necessary)
  3. The wrapper script execute map/reduce tasks as defined in the pipeline. In this step the wrapper script and br script will issue ssh command. And they will orchestrate so that the aliases/functions become available before actual execution of the task.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.