dakusui / bredxbred Goto Github PK
View Code? Open in Web Editor NEWThis project forked from erikfrey/bashreduce
mapreduce in bash
License: Other
This project forked from erikfrey/bashreduce
mapreduce in bash
License: Other
Currently, xbred doesn't work on Raspbian. Support it whicl bred does.
The first failure TESTRESULT:bredcheckenv_nc_supports_listen_mode:FAIL:1:1c1
is trivial (nc
on pi is just giving a different message).
More import thing is the test doesn't finish forever.
Apparently, if more than one jobs are defined in .xbred file and they are connected as a pipeline, second and following jobs don't finish forever.
Maybe the first job tries to connect to the next one before it becomes ready and fails.
pi@botaneiates ~/work/bredxbred $ tests/br-test.sh
Running all defined tests
TESTRESULT:bredcheckenv_awk_is_installed:PASS
TESTRESULT:bredcheckenv_breddir_available:PASS
TESTRESULT:bredcheckenv_brp_is_installed:PASS
TESTRESULT:bredcheckenv_loginshell_is_bash:PASS
TESTRESULT:bredcheckenv_nc_is_installed:PASS
TESTRESULT:bredcheckenv_nc_supports_listen_mode:FAIL:1:1c1
< 2
---
> 0
* EXPECTED:2
* ACTUAL :0
TESTRESULT:bredcheckenv_passwordless_ssh_is_ok:PASS
TESTRESULT:bredcheckenv_pv_is_installed:PASS
TESTRESULT:bredtest_1mapper:PASS
TESTRESULT:bredtest_1mapper_sort_on_out_no:PASS
TESTRESULT:bredtest_1reducer:PASS
TESTRESULT:bredtest_1reducer_sort_on_out_no:PASS
TESTRESULT:bredtest_2mappers:PASS
TESTRESULT:bredtest_2mappersK2:PASS
TESTRESULT:bredtest_2reducers:PASS
TESTRESULT:bredtest_2reducersK3:PASS
br-test.sh: WARNING: bred is now in test mode. (config=/home/pi/work/bredxbred/bred.testconf is being used)
TESTRESULT:bredtest_bredfs_init:PASS
br-test.sh: WARNING: bred is now in test mode. (config=/home/pi/work/bredxbred/bred.testconf is being used)
TESTRESULT:bredtest_bredfs_read:PASS
br-test.sh: WARNING: bred is now in test mode. (config=/home/pi/work/bredxbred/bred.testconf is being used)
TESTRESULT:bredtest_bredfs_write:PASS
TESTRESULT:bredtest_compat_wordcount:PASS
TESTRESULT:bredtest_compat_wordcount_sort_on_out_disabled:PASS
TESTRESULT:bredtest_compat_wordcount_with_2reducers:PASS
TESTRESULT:bredtest_default_sort_2nd_column:PASS
TESTRESULT:bredtest_default_sort_default_column:PASS
TESTRESULT:xbredtest_xbred_01:PASS
(Blocked)
Create a small script that checks if the environment is capable of running bashr.
It will check
Currently reduce task is written as general shell script and you can specify your favorite interpreter.
And in bred's reducer the code you give will be executed using the interpreter for each key.
But since generally speaking there are a lot of keys processed in a reducer, this means so many context switches (external command executions) will be made.
Making it possible to call a user defined awk function from inside reducer, which is written in an awk string, would be performance-wise very benefitial.
Currently bred's output goes back to the master node when each job finishes and this is not ideal implementation.
A mechanism where a map job stores data set on the host on which it is running and following reduce jobs which run on the same host can reuse them would be preferrable.
Could you provide a complete example for wordcount?
I'm using Ubuntu 16.04.
I've followed the steps to setup the program.
Kept the default settings.
I'm running the following command:
cat input.txt | wordcount/main.xbred > output.txt
It says:
xbred: starting pipeline
xbred: done
But output.txt is empty and there is some processing after the main program exited.
Where supposed the output to go?
Create an installer of bredxbred
Allow quotations in variable declaration section of .xbred file
Currently it's very hard to identify the problem when we have a errorneous mep/reduce command line.
Create a feature which allows those command lines to write stdout/err to files.
Those files can be created on remote (since it is a debug purpose and it can't be justified to huge communications over network)
Rectify terminology in script files. "task" and "job" are used in inconsistent ways
bred has a distributed file system support, but it still doesn't have a directory listing feature (etc).
Although I'm skeptical if to what extent it is useful for users, as long as we are calling it some sort of 'file system', it would be good to have directory handling features.
Improve error handlings.
Currently we don't have a fixed policy for situation where we hit errors in user codes.
We should establish it and implement it.
Instead of moving aroung data, transmitting a processing program is an approach of map reduce.
The processing program bashreduce is relying on is essentially a bash one liner.
But quoting/escaping hell is really painful. (just running simple sed/awk commands isn't comfortable enough)
Somehow a program distribution mechanism desirable.
Basic idea is
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.