Coder Social home page Coder Social logo

lusonpan62678 / chinese_bilstm_cnn_crf Goto Github PK

View Code? Open in Web Editor NEW

This project forked from shen1994/chinese_bilstm_cnn_crf

0.0 0.0 0.0 11.29 MB

keras+tensorflow+python3下的中文分词, 大数据可训练,解决内存不够用问题

Python 100.00%

chinese_bilstm_cnn_crf's Introduction

chinese_bilstm_cnn_crf

0. 效果展示

  • 0.1 训练展示
    image  
  • 0.2 测试展示(仅1000条记录训练的结果)   image
  • 0.3 模型结构展示
    image

1. 安装相关文件  

2. 传统方法参考链接  

3. 执行命令

  • 3.0 下载词库文件解压放置在文件夹corpus中
  • 3.1 生成词向量模型model_vector_people.m
     python embedding_model.py
  • 3.2 执行train.py文件训练模型
    python train.py
  • 3.3 特殊函数说明(train.py)

     3.3.1 create_label_data(word_dict, raw_train_file)--->创建train.data文件
          人 B
          民 M
          网 E
          一 B
          月 M
          一 M
          日 E

     3.3.2 documents_length = create_documents()--->创建data.data和label.data文件
          data.data
          人 民 网 一 月 一 日 讯 据 纽 约 时 报 报 道 ,
          美 国 华 尔 街 股 市 在 二 零 一 三 年 的 最 后 一 天 继 续 上 涨 ,
          和 全 球 股 市 一 样
          label.data
          B M E B M M E S S B E B E B E S
          B E B M E B E S B M M M E S B E B E B E B E S
          S B E B E B E S

     3.3.3 lexicon, lexicon_reverse = create_lexicon(word_dict)--->创建lexicon.pkl文件
          {'这': 75, '云': 307, '伏': 92, '共': 139, '问': 140, '跑': 308...}

     3.3.4 label_2_index = create_label_index()--->创建label_2_index
          {'P':0, 'B':1, 'M':2, 'E':3, 'S':4, 'U':5}

     3.3.5 create_matrix(lexicon, label_2_index)--->创建data_index.data和label_index.data文件
          data_index.data
          11 14 118 2 39 2 8 172 102 295 293 131 30 30 29 1
          117 12 284 47 212 76 56 7 13 19 2 16 5 3 61 75 2 459 127 79 46 93 1
          6 111 336 76 56 2 208 1
          label_index.data
          1 2 3 1 2 2 3 4 4 1 3 1 3 1 3 4
          1 3 1 2 3 1 3 4 1 2 2 2 3 4 1 3 1 3 1 3 1 3 4
          4 1 3 1 3 1 3 4

  • 3.4 执行测试命令
    from word_cut import WordCut
    text_cut_object = WordCut()
    text_list_cut = text_cut_object.cut([u"我是**人", u"希望你喜欢我"])
    print(text_list_cut)
     

4. 参考链接  

chinese_bilstm_cnn_crf's People

Contributors

shen1994 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.