Coder Social home page Coder Social logo

rasa_nlu_zh_sgjf's Introduction

rasa_nlu_zh

本github是基于rasa_nlu 0.14.6版本,添加pkuseg(0.0.12)替换jieba分词,提高分词准确率。

pipeline实例:

language: "zh"

pipeline:
- name: "nlp_mitie"
  model: "data/total_word_feature_extractor.dat"
- name: "tokenizer_pkuseg"
  dictionary_path: "pkuseg_userdict/ids_userdict.txt"
  model_path: "pkuseg_pretrained_model/"
- name: "ner_mitie"
- name: "ner_synonyms"
- name: "intent_featurizer_mitie"
- name: "intent_classifier_sklearn"

注意:由于pkuseg导入用户词典时只支持单个文件,不支持文件夹,所以在配置是参数只能是文件
- name: "tokenizer_pkuseg" dictionary_path: "pkuseg_userdict/ids_userdict.txt"

pkuseg模型训练

# train.txt 训练数据,仅支持utf-8编码,所有单词以单个或多个空格分开
# test.txt 训练数据,仅支持utf-8编码,所有单词以单个或多个空格分开
pkuseg.train('train.txt', 'test.txt', './models')
# train_iter 训练轮数
# init_model 预训练模型存放目录,预训练模型[下载](https://github.com/lancopku/pkuseg-python/releases),
# 一般使用混合领域分词模型作为重训练基础模型  
pkuseg.train('train.txt', 'test.txt', './models', train_iter=10, init_model='./pretrained')
pkuseg.test('msr_test.raw', 'output.txt', user_dict=None)

rasa_nlu_zh_sgjf's People

Contributors

user-zj avatar trellixvulnteam avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.