Coder Social home page Coder Social logo

hermione's Introduction

hermione

PyPI version fury.io Hermione License GitHub issues GitHub issues-closed PyPI status PyPI pyversions PyPi downloads

forthebadge made-with-python

A Data Science Project struture in cookiecutter style.

Developed with ❤️ by A3Data

What is Hermione?

Hermione is the newest open source library that will help Data Scientists on setting up more organized codes, in a quicker and simpler way. Besides, there are some classes in Hermione which assist with daily tasks such as: column normalization and denormalization, data view, text vectoring, etc. Using Hermione, all you need is to execute a method and the rest is up to her, just like magic.

Why Hermione?

To bring in a little of A3Data experience, we work in Data Science teams inside several client companies and it’s undeniable the excellence of notebooks as a data exploration tool. Nevertheless, when it comes to data science products and their context, when the models needs to be consumed, monitored and have periodic maintenance, putting it into production inside a Jupyter Notebook is not the best choice (we are not even mentioning memory and CPU performance yet). And that’s why Hermione comes in! We have been inspired by this brilliant, empowered and awesome witch of The Harry Potter saga to name this framework!

This is also our way of reinforcing our position that women should be taking more leading roles in the technology field. #CodeLikeAGirl

Installing

Dependencies

  • Anaconda or Miniconda Python (>= 3.6)
  • conda (>= 4.8)

Hermione depends on conda to build and manage virtual conda environments. If you don't have it installed, please visit Anaconda website or Miniconda website.

Install

pip install -U hermione-ml

How do I use Hermione?

After installed Hermione:

  1. Create you new project:

  1. Enter “y” if you want to start with an example code

  1. Hermione already creates a conda virtual environment for the project. Activate it

  1. After activating, you should install some libraries. There are a few suggestions in “requirements.txt” file:

  1. Now we will train some models from the example, using MLflow ❤. To do so, inside src directory, just type: hermione train. The “hermione train” command will search for a train.py file and execute it. In the example, models and metrics are already controlled via MLflow.

  1. After that, a mlflow experiment is created. To verify the experiment in mlflow, type: mlflow ui. The application will go up.

  1. To access the experiment, just enter the path previously provided in your preferred browser. Then it is possible to check the trained models and their metrics.

  1. In the Titanic example, we also provide a step by step notebook. To view it, just type jupyter notebook inside directory /src/notebooks/.

Do you want to create your project from scratch? There click here to check a tutorial.

Documentation

This is the class structure diagram that Hermione relies on:

Here we describe briefly what each class is doing:

Data Source

  • DataBase - should be used when data recovery requires a connection to a database. Contains methods for opening and closing a connection.
  • Spreadsheet - should be used when data recovery is in spreadsheets/text files. All aggregation of the bases to generate a "flat table" should be performed in this class.
  • DataSource - abstract class which DataBase and Spreadsheet inherit from.

Preprocessing

  • Preprocessing - concentrates all preprocessing steps that must be performed on the data before the model is trained.
  • Normalization - applies normalization and denormalization to reported columns. This class contains the following normalization algorithms already implemented: StandardScaler e MinMaxScaler.
  • TextVectorizer - transforms text into vector. Implemented methods: Bag of words, TF_IDF, Embedding: mean, median e indexing.

Visualization

  • Visualization - methods for data visualization. There are methods to make static and interactive plots.

Model

  • Trainer - module that centralizes training algorithms classes. Algorithms from scikit-learn library, for instance, can be easily used with the TrainerSklearn implemented class.
  • Wrapper - centralizes the trained model with its metrics. This class has built-in integration with MLFlow.
  • Metrics - it contains key metrics that are calculated when models are trained. Classification, regression and clustering metrics are already implemented.

Tests

  • test_project - module for unit testing.

Contributing

Make a pull request with your implementation.

For suggestions, contact us: [email protected]

Licence

Hermione is open source and has Apache 2.0 License: License

hermione's People

Contributors

neylsoncrepalde avatar barbarasilveiraf avatar assisarthur avatar

Stargazers

Hadj H. avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.