Coder Social home page Coder Social logo

dashayushman / chatbot_ner Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hellohaptik/chatbot_ner

0.0 1.0 1.0 12.82 MB

chatbot_ner: Named Entity Recognition for chatbots.

Home Page: https://haptik.ai/

License: GNU General Public License v3.0

Python 98.88% Shell 0.27% HTML 0.41% CSS 0.06% JavaScript 0.31% Dockerfile 0.08%

chatbot_ner's Introduction

Named Entity Recognition for chatbots

chatbotner logo

Chatbot NER is an open source framework custom built to supports entity recognition in text messages. After doing thorough research on existing NER systems, team at Haptik felt the strong need of building a framework which is tailored for Conversational AI and also supports Indian languages. Currently Chatbot-ner supports English, Hindi, Gujarati, Marathi, Bengali and Tamil and their code mixed form. Currently this framework uses common patterns along with few NLP techniques to extract necessary entities from languages with sparse data. API structure of Chatbot ner is designed keeping in mind usability for Conversational AI applications. Team at Haptik is continuously working towards porting this framework for all Indian languages and their respective local dialects.

Installation

Detailed documentation on how to setup Chatbot NER on your system using docker is available here.

Supported Entities

Entity type Code reference Description example Supported languages - ISO 639-1 code
Time TimeDetector Detect time from given text. tomorrow morning at 5, कल सुबह ५ बजे, kal subah 5 baje 'en', 'hi', 'gu', 'bn', 'mr', 'ta'
Date DateAdvancedDetector Detect date from given text next monday, agle somvar, अगले सोमवार 'en', 'hi', 'gu', 'bn', 'mr', 'ta'
Number NumberDetector Detect number and respective units in given text 50 rs per person, ५ किलो चावल, मुझे एक लीटर ऑइल चाहिए 'en', 'hi', 'gu', 'bn', 'mr', 'ta'
Phone number PhoneDetector Detect phone number in given text 9833530536, +91 9833530536, ९८३३४३०५३५ 'en', 'hi', 'gu', 'bn', 'mr', 'ta'
Email EmailDetector Detect email in text [email protected] 'en'
Text TextDetector Detect custom entities in text string using full text search in Datastore or based on contextual model Order me a pizza, मुंबई में मौसम कैसा है Search supported for 'en', 'hi', 'gu', 'bn', 'mr', 'ta', Contextual model supported for 'en' only
PNR PNRDetector Detect PNR (serial) codes in given text. My flight PNR is 4SGX3E 'en'
regex RegexDetector Detect entities using custom regex patterns My flight PNR is 4SGX3E NA

There are other custom detectors such as city, budget shopping size which are derived from above mentioned primary detectors but they are supported currently in English only and limited to Indian users only. We are currently in process of restructuring them to scale them across languages and geography and their current versions might be deprecated in future. So for applications already in production, we would recommend you to use only primary detectors mentioned in the table above.

API structure

Detailed documentation of APIs for all entity types is available here. Current API structure is built for ease of accessing it from conversational AI applications. However, it can be used for other applications also.

Framework Overview

In any conversational AI application, there are several entities to be identified and logic for detection on one entity might be different from other. We have organised this repository as shown below

entity hierarchy

We have classified entities into four main types i.e. numeral, pattern, temporal and textual.

  • numeral: This type will contain all the entities that deal with the numeral or numbers. For example, number detection, budget detection, size detection, etc.

  • pattern: This will contain all the detection logics where identification can be done using patterns or regular expressions. For example, email, phone_number, pnr, etc.

  • temporal: It will contain detection logics for detecting time and date.

  • textual: It identifies entities by looking at the dictionary. This detection mainly contains detection of text (like cuisine, dish, restaurants, etc.), the name of cities, the location of a user, etc.

Numeral, temporal and pattern have been moved to ner_v2 for language portability with more flexible detection logic. In ner_v1, currently only text entity has language support. We will be moving it to ner_v2 without any major API changes.

Contribution Guidelines

Currently, you can contribute to ner_v2 in Chatbot NER either by adding Training Data or by contributing Detection Patterns in form of regex. We will work on removing few architectural limitations which will ease out process of adding ML models and New Entities in future.

  • Adding Training Data: You can significantly improve detection capabilities of Chatbot NER by simply adding data in csv files. For example, date detection in Hindi and Hinglish can be improved by adding data in csv files mentioned in the image below. You can refer to documentation for date, time and numbers respectively if you wish to contribute. Date Contribution
  • Adding Detection Pattern: You can simply add custom language patterns for different languages by adding simple functions. An example of adding custom pattern for detecting number of people can be referred here.

Please refer to general steps of contribution, approval and coding guidelines mentioned here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.