Coder Social home page Coder Social logo

tx-anin / rasa_nlu_gq Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gaoq1/rasa_nlu_gq

0.0 2.0 0.0 621 KB

turn natural language into structured data(支持中文,自定义了N种模型,支持不同的场景和任务)

License: Apache License 2.0

Makefile 0.13% Shell 0.27% Python 99.60%

rasa_nlu_gq's Introduction

Rasa NLU GQ

Rasa NLU (Natural Language Understanding) 是一个自然语义理解的工具,举个官网的例子如下:

"I'm looking for a Mexican restaurant in the center of town"

And returning structured data like:

  intent: search_restaurant
  entities: 
    - cuisine : Mexican
    - location : center

Intent of this project

这个项目的目的和初衷,是由于官方的rasa nlu里面提供的components和models并不能满足实际需求。所以我自定义了一些components,并发布到Pypi上。可以通过pip install rasa-nlu-gao下载。后续会不断往里面填充和优化组件,也欢迎大家贡献。

New features

目前新增的特性如下(请下载最新的rasa-nlu-gao版本):

  • 新增了实体识别的模型,一个是bilstm+crf,一个是idcnn+crf膨胀卷积模型,对应的yml文件配置如下:
  language: "zh"

  pipeline:
  - name: "tokenizer_jieba"
  - name: "intent_featurizer_count_vectors"
    token_pattern: "(?u)\b\w+\b"
  - name: "intent_classifier_tensorflow_embedding"
  - name: "ner_bilstm_crf"
    lr: 0.001
    char_dim: 100
    lstm_dim: 100
    batches_per_epoch: 10
    seg_dim: 20
    num_segs: 4
    batch_size: 200
    tag_schema: "iobes"
    model_type: "bilstm" # 模型支持两种idcnn膨胀卷积模型或bilstm双向lstm模型
    clip: 5
    optimizer: "adam"
    dropout_keep: 0.5
    steps_check: 100
  • 新增了jieba词性标注的模块,可以方便识别名字,地名,机构名等等jieba能够支持的词性,对应的yml文件配置如下:
  language: "zh"

  pipeline:
  - name: "tokenizer_jieba"
  - name: "ner_crf"
  - name: "jieba_pseg_extractor"
    part_of_speech: ["nr", "ns", "nt"]
  - name: "intent_featurizer_count_vectors"
    OOV_token: oov
    token_pattern: "(?u)\b\w+\b"
  - name: "intent_classifier_tensorflow_embedding"
  • 新增了根据实体反向修改意图,对应的文件配置如下:
  language: "zh"

  pipeline:
  - name: "tokenizer_jieba"
  - name: "ner_crf"
  - name: "jieba_pseg_extractor"
  - name: "intent_featurizer_count_vectors"
    OOV_token: oov
    token_pattern: '(?u)\b\w+\b'
  - name: "intent_classifier_tensorflow_embedding"
  - name: "entity_edit_intent"
    entity: ["nr"]
    intent: ["enter_data"]
    min_confidence: 0
  • 新增了word2vec提取词向量特征,对应的配置文件如下:
  language: "zh"

  pipeline:
  - name: "tokenizer_jieba"
  - name: "intent_featurizer_wordvector"
    vector: "data/vectors.txt"
  - name: "intent_classifier_tensorflow_embedding"
  - name: "ner_crf"
  - name: "jieba_pseg_extractor"
  • 新增了bert模型提取词向量特征,对应的配置文件如下:
  language: "zh"

  pipeline:
  - name: "tokenizer_jieba"
  - name: "bert_vectors_featurizer"
    ip: '172.16.10.46'
    port: 5555
    port_out: 5556
    show_server_config: True
    timeout: 10000
  - name: "intent_classifier_tensorflow_embedding"
  - name: "ner_crf"
  - name: "jieba_pseg_extractor"
  • 新增了对CPU和GPU的利用率的配置,主要是intent_classifier_tensorflow_embeddingner_bilstm_crf这两个使用到tensorflow的组件,配置如下(当然config_proto可以不配置,默认值会将资源全部利用):
  language: "zh"

  pipeline:
  - name: "tokenizer_jieba"
  - name: "intent_featurizer_count_vectors"
    token_pattern: '(?u)\b\w+\b'
  - name: "intent_classifier_tensorflow_embedding"
    config_proto: {
      "device_count": 4,
      "inter_op_parallelism_threads": 0,
      "intra_op_parallelism_threads": 0,
      "allow_growth": True
    }
  - name: "ner_bilstm_crf"
    config_proto: {
      "device_count": 4,
      "inter_op_parallelism_threads": 0,
      "intra_op_parallelism_threads": 0,
      "allow_growth": True
    }
  • 新增了embedding_bert_intent_classifier分类器,对应的配置文件如下:
  language: "zh"

  pipeline:
  - name: "tokenizer_jieba"
  - name: "bert_vectors_featurizer"
    ip: '172.16.10.46'
    port: 5555
    port_out: 5556
    show_server_config: True
    timeout: 10000
  - name: "intent_classifier_tensorflow_embedding_bert"
  - name: "ner_crf"
  - name: "jieba_pseg_extractor"

Quick Install

pip install rasa-nlu-gao

🤖 Running of the bot

To train the NLU model:

python -m rasa_nlu_gao.train -c sample_configs/config_embedding_bilstm.yml --data data/examples/rasa/rasa_dataset_training.json --path models

To run the NLU model:

python -m rasa_nlu_gao.server -c sample_configs/config_embedding_bilstm.yml --path models

Some Examples

具体的例子请看rasa_chatbot_cn

rasa_nlu_gq's People

Watchers

James Cloos avatar tx-anin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.