Coder Social home page Coder Social logo

snipsco / snips-nlu Goto Github PK

View Code? Open in Web Editor NEW
3.9K 135.0 518.0 19.8 MB

Snips Python library to extract meaning from text

Home Page: https://snips-nlu.readthedocs.io

License: Apache License 2.0

Python 100.00%
nlp nlu python machine-learning text-classification intent-classification ner named-entity-recognition slot-filling intent-parser

snips-nlu's Introduction

Snips NLU

image

image

image

image

image

image

Snips NLU (Natural Language Understanding) is a Python library that allows to extract structured information from sentences written in natural language.

Summary

What is Snips NLU about ?

Behind every chatbot and voice assistant lies a common piece of technology: Natural Language Understanding (NLU). Anytime a user interacts with an AI using natural language, their words need to be translated into a machine-readable description of what they meant.

The NLU engine first detects what the intention of the user is (a.k.a. intent), then extracts the parameters (called slots) of the query. The developer can then use this to determine the appropriate action or response.

Let’s take an example to illustrate this, and consider the following sentence:

"What will be the weather in paris at 9pm?"

Properly trained, the Snips NLU engine will be able to extract structured data such as:

In this case, the identified intent is searchWeatherForecast and two slots were extracted, a locality and a datetime. As you can see, Snips NLU does an extra step on top of extracting entities: it resolves them. The extracted datetime value has indeed been converted into a handy ISO format.

Check out our blog post to get more details about why we built Snips NLU and how it works under the hood. We also published a paper on arxiv, presenting the machine learning architecture of the Snips Voice Platform.

Getting Started

System requirements

  • Python 2.7 or Python >= 3.5
  • RAM: Snips NLU will typically use between 100MB and 200MB of RAM, depending on the language and the size of the dataset.

Installation

We currently have pre-built binaries (wheels) for snips-nlu and its dependencies for MacOS (10.11 and later), Linux x86_64 and Windows.

For any other architecture/os snips-nlu can be installed from the source distribution. To do so, Rust and setuptools_rust must be installed before running the pip install snips-nlu command.

Language resources

Snips NLU relies on external language resources that must be downloaded before the library can be used. You can fetch resources for a specific language by running the following command:

Or simply:

The list of supported languages is available at this address.

API Usage

Command Line Interface

The easiest way to test the abilities of this library is through the command line interface.

First, start by training the NLU with one of the sample datasets:

Where path/to/dataset.json is the path to the dataset which will be used during training, and path/to/output_trained_engine is the location where the trained engine should be persisted once the training is done.

After that, you can start parsing sentences interactively by running:

Where path/to/trained_engine corresponds to the location where you have stored the trained engine during the previous step.

Sample code

Here is a sample code that you can run on your machine after having installed snips-nlu, fetched the english resources and downloaded one of the sample datasets:

What it does is training an NLU engine on a sample weather dataset and parsing a weather query.

Sample datasets

Here is a list of some datasets that can be used to train a Snips NLU engine:

  • Lights dataset: "Turn on the lights in the kitchen", "Set the light to red in the bedroom"
  • Beverage dataset: "Prepare two cups of cappucino", "Make me a cup of tea"
  • Flights dataset: "Book me a flight to go to boston this weekend", "book me some tickets from istanbul to moscow in three days"

Benchmarks

In January 2018, we reproduced an academic benchmark which was published during the summer 2017. In this article, authors assessed the performance of API.ai (now Dialogflow, Google), Luis.ai (Microsoft), IBM Watson, and Rasa NLU. For fairness, we used an updated version of Rasa NLU and compared it to the latest version of Snips NLU (both in dark blue).

image

In the figure above, F1 scores of both intent classification and slot filling were computed for several NLU providers, and averaged across the three datasets used in the academic benchmark mentionned before. All the underlying results can be found here.

Documentation

To find out how to use Snips NLU please refer to the package documentation, it will provide you with a step-by-step guide on how to setup and use this library.

Citing Snips NLU

Please cite the following paper when using Snips NLU:

FAQ & Community

Please join the forum to ask your questions and get feedback from the community.

How do I contribute ?

Please see the Contribution Guidelines.

Licence

This library is provided by Snips as Open Source software. See LICENSE for more information.

Geonames Licence

The snips/city, snips/country and snips/region builtin entities rely on software from Geonames, which is made available under a Creative Commons Attribution 4.0 license international. For the license and warranties for Geonames please refer to: https://creativecommons.org/licenses/by/4.0/legalcode.

snips-nlu's People

Contributors

adrienball avatar cclauss avatar chayanbansal avatar choufractal avatar clemdoum avatar ddorian avatar fredszaq avatar jdureau avatar mattgathu avatar mcfoggy avatar rodrigopivi avatar rosasternsonos avatar timgates42 avatar tristandeleu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

snips-nlu's Issues

Feature signature shouldn't contain big objects

The get_ngram_fn function has a common_words argument which is a set that can be huge in practice.
This impact the size of our serialization and slows the loading. We should load these big collections based on collection names.

Fix synonyms handling when enriching automatically the dataset

Description
When an entity is set with useSynonyms=False, and when we enrich the dataset with missing entities that appear in the intent utterances, we should only check wether the values appear in the reference values of the entity and not in the synonyms.

Improve CI performance

Is there a reason we have the deleteDir() at the beginning of the CI, I have the feeling it slows the it down?

[INSTALL] Setup.py does not install enum34 module

For some reason enum34 is not installed when I run python setup.py install even though it seems to be specified correctly in the setup.py:

setup(name="snips_nlu",
      version="0.0.1",
      description="",
      author="Clement Doumouro",
      author_email="[email protected]",
      url="",
      download_url="",
      license="MIT",
      install_requires=["enum34"],
      packages=["snips_nlu",
                "snips_nlu.entity_extractor",
                "snips_nlu.nlu_engine"],
      cmdclass={"install": SnipsNLUInstall},
      entry_points={},
      include_package_data=False,
      zip_safe=False)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.