With Bumblebee, you can easily clean and prepare bigdata using a visual interface. It is built over Optimus/Pyspark so you can handle small and big data efficiently.
For more info about Bumblebee, please go to: https://hi-bumblebee.com/
https://medium.com/hi-bumblebee/how-to-run-bumblebee-in-a-docker-container-c9da047d1ff1
https://medium.com/hi-bumblebee/how-to-install-bumblebee-on-digital-ocean-ef77138f1838
Get data from CSV, JSON, parquet, Avro files, and databases. Then get histograms, frequency charts, and advance stats.
Convert unstructured data, standardize strings, unify date format, Impute data, handle outliers and create custom functions.
Bin columns, string clustering, one-hot encode, scaling, and split train and test data.
Every action over your data is added as a transformation step using python code that you can modify anytime. Also, you can add any python code you want to make complex Apache Spark transformations.
https://www.loom.com/embed/c2cfb6a2e0a549e2afeb8d484865b968
Contributions go far beyond pull requests and commits. We are very happy to receive any kind of contributions including:
- Documentation updates, enhancements, designs, or bugfixes.
- Spelling or grammar fixes.
- README.md corrections or redesigns.
- Adding unit, or functional tests
- Triaging GitHub issues -- especially determining whether an issue still persists or is reproducible.
- Searching #hibumblebee on twitter and helping someone else who needs help.
- Blogging, speaking about, or creating tutorials about Bumblebee and its many features.