Coder Social home page Coder Social logo

ml_text_classification's Introduction

Getting Started

A lot of learning and inspiration for this project was gotten from:

Instructions for Mac

  • This repo works best with Python >= 3.8. Upgrade to Python 3.8 using pyenv via homebrew by running brew install pyenv after installing homebrew.

  • For tensorflow you'll need python 3.8. So after installing pyenv from the first step,

    • run pyenv install 3.8.0

    • run pyenv global 3.0.8

    • running python -V should show a response of Python 3.8.0. if not, you may have to add pyenv to your bash_profile or zsh_profile. Instructions for doing that can be found in step 2 of the Readme here https://github.com/pyenv/pyenv

  • Run the following commands to complete your environment

    • pip install --upgrade pip
    • pip install tensorflow
    • pip install -r requirements.txt
    • Install pytorch by going to (their website)[https://pytorch.org/get-started/locally/] to know how to do it with your operating system. However, for this mac configuration in this project you can use pip by running pip install torch torchvision
  • Jupyter notebook is the web app used to create and train models.

    • Start Jupyter notebook by running jupyter notebook

Learnings

A few personal notes on how ML works, in this case for text classification

Overview steps

  1. Get training data

    • It can be sorted or unsorted. Either ways you decide what you're going to work with. Most people get sorted data which is easier to train. Others prefer to sort the data in the model definition code. It's really up to you as a dev.
  2. Divinde data into 3 - Trainiing, Validation and testing.

    • This can be 3 different files or 3 files in 3 different folders. There are Python ML packages that can read them in various file/folder structures.

    • Others too would like to do this in the actual code by dividing them into arbitrary percentages. It's really up to you as a dev.

  3. Create a new Jupyter notebook file and set the following in code:

    • where the data is coming from

    • Define your 'bag of words'

      • Words that the code will look out for in the text to help in training. Assuming that it is already provided.
    • number of levels the data will go through in trainig. There's a tensorflow command to create a level.

      • The model actually does not take in raw strings so before the string starts entering the levels, it will have to be 'tokenized' - There's a command for that.
  4. Add training command to file to start training. Run the file in Jupyter notebook.

  5. If you're happy with the testing and validation accuracy results, serialize (or package in dev language) your model.

    • A 'seriealized' model can be used in a REST endpoint function that takes the same kind of data it was trained with. Then the endpoint will spit out the result. A Flask example can be found here

    • For mobile apps the serialized model can be added to the mobile app code and can be used with tensorflow there.

    • For web apps, there's tensorflow js that can employ the use of models in Javascript.

Run Web Server locally

Simply run python src/implementation/flasksample.py

ml_text_classification's People

Contributors

nathanfletcher avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.