Coder Social home page Coder Social logo

auto-tagger's Introduction


Auto-Tagger Logo
Auto-Tagger

Status GitHub Issues GitHub Pull Requests

An Artificial Intelligence tool that uses Transformer models and NER (Named Entity Recognition) techniques to detect proper names in a text.

This repo contains:

  • The Auto-Tagger Web App
  • The Auto-Tagger Discord bot

A video demo can be found here: https://www.youtube.com/watch?v=3XF4hOLtU1o







Auto-Tagger Repo

Key FeaturesInstallationCalling the APIUsing FlaskDocker imageDataTraining a new modelContributing



Our Auto-Tagger Web Application

Our Auto-Tagger Discord Bot

Key Features

  • Usage of Transformer models ( BERT in this case ) and NER ( Named Entity Recognition ) techniques.
  • Building a training pipeline.
  • Implementing and training the model ( using Google Colab ).
  • Building an inference pipeline.
  • Serving the model using BentoML.
  • Create a Web Application to visualize our Auto-Tagger features.
  • Create a Discord bot that implements the Auto-Tagger features.

Installation

  • All the code required to get started

Clone

  • Clone this repo to your local machine using https://github.com/MLH-Fellowship/Auto-Tagger.git

Setup

In order to install all packages follow the steps below:

  1. Download the model from this drive: https://drive.google.com/file/d/1TyuIoMO42CHHvQVlOpw6Ynco39rQbc6t/view?usp=sharing

  2. Put it in the /results/model.bin ( rename the file as model.bin )

  3. Download the BERT uncased model from here: https://www.kaggle.com/abhishek/bert-base-uncased

  4. Unzip the files in /model/

  5. Run python serving.py inside /src/

  6. Execute the command bentoml serve PyTorchModel:latest

The model will be served on http://127.0.0.1:5000/


Calling the api

To send a request you'd need to send in a POST request:

curl -i --header "Content-Type: application/json" \
        --request POST \
        --data '{"sentence": "John used to play for The Beatles"}' \
        http://127.0.0.1:5000/predict

Example:

#request
{ 
  "sentence": "Jack and James went to the university and they met Emily"
}

The response will be a string of all the names detected separated by a ','. In this example it will be:

#response
"jack,james,emily"

Using Flask

Follow these steps after step 5 in Setup (in /src/ directory):

export FLASK_APP=front.py
export FLASK_DEBUG=1 # For debugging
flask run

Note: Be sure to modify the LOAD_PATH variable in front.py depending on your bentoml latest model location


Creating and running a Docker image and deploying it on Heroku

This sub-section is thoroughly explained in the wiki page of this repository.


Creating and running the discord bot

Documentation is available at the wiki page of this repository.


Data

We used an Annotated Corpus for Named Entity Recognition dataset, that we found on kaggle: https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus

This is the extract from GMB corpus which is tagged, annotated and built specifically to train the classifier to predict named entities such as name, location, etc.

This dataset contains 47958 sentences with 948241 words.


Training a new model

You can train your own model by using the train.py script. Change the config.py file with the parameters you want and then execute the following command:

python train.py

This will generate your model file in config.MODEL_PATH as model.bin.


Contributing

To get started...

Step 1

  • Option 1

    • 🍴 Fork this repo!
  • Option 2

    • 👯 Clone this repo to your local machine using https://github.com/MLH-Fellowship/Auto-Tagger.git

Step 2

  • HACK AWAY! 🔨🔨🔨

Step 3


License

This project is licensed under the Apache License, Version 2.0.

auto-tagger's People

Contributors

callmemehdi avatar pncnmnp avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.