Coder Social home page Coder Social logo

supercoderhawk / deeplearning_nlp Goto Github PK

View Code? Open in Web Editor NEW
153.0 15.0 40.0 12.51 MB

基于深度学习的自然语言处理库

License: MIT License

Python 100.00%
deep-learning natural-language-processing chinese-word-segmentation relation-extraction named-entity-recognition tensorflow chinese-tokenizer

deeplearning_nlp's Introduction

基于深度学习的自然语言处理库

本项目是对DeepNLP的重构,着重增强架构设计的合理性,提高代码的可读性,减少模块的耦合度,并增加一些新功能。

环境

  • python >= 3.5
  • tensorflow >= 1.3.0
  • sklearn
  • scipy

项目结构

本项目的核心代码位于python\dnlp目录下

python/dnlp
│  cws.py   # 分词
│  ner.py   # 命名实体识别
│  rel_extract.py # 关系抽取
│  __init__.py
│
├─config
│     config.py  # 配置项
│     __init__.py
│  
├─core  # 核心功能模块
│  │  dnn_crf.py    # 基于dnn-crf的序列标注
│  │  dnn_crf_base.py # 基于dnn-crf的序列标注的基类
│  │  mmtnn.py      # max-margin tensor nural network模型
│  │  re_cnn.py     # 基于cnn的关系抽取
│  │  __init__.py
│  
├─data_process  # 训练和测试数据的预处理
│     processor.py  # 基类
│     process_cws.py  # 对分词的预处理 
│     process_emr.py 
│     process_ner.py  # 对命名实体识别的预处理
│     process_pos.py  # 对词性标注的预处理
│     __init__.py
│  
│
├─models  # 保存训练后的模型
│
├─scripts # 运行脚本,包括初始化数据集和训练测试等等
│     init_datasets.py  # 初始化训练数据
│     cws_ner.py    # 进行分词和命名实体识别的训练和使用
│     __init__.py
│
├─tests  # 单元测试
├─utils  # 公用函数
      constant.py  # 一些常量
      __init__.py
  

运行

  1. 初始化数据
python python\scripts\init_datasets.py
  1. 训练
python python\scripts\cws_ner.py -t
  1. 使用
python python\scripts\cws_ner.py -p

参考论文

中文分词 && 命名实体识别

实体关系抽取

ToDo-List

  • 完善文档
  • 增加更多算法的实现
  • 支持pip
  • 加入TensorBoard支持
  • 支持TensorFlow Estimator和Save Model
  • 增加对Java、C++的支持

deeplearning_nlp's People

Contributors

supercoderhawk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeplearning_nlp's Issues

Processor部分代码的问题

`def read_dictionary(self, dict_path: str, reverse=False):

dictionary = {}
with open(dict_path, encoding='utf8') as d:
  items = d.readlines()
  for item in items:
    pair = item.split(' ')
    dictionary[pair[0]] = int(pair[1])
if reverse:
  return dictionary, dict(zip(dictionary.values(), dictionary.keys()))
else:
  return dictionary

`

这段代码我有点看不懂,dictionary明明是空的为什么会有dictionary[pair[0]]?还有pair[1]不是string么,怎么可以int(pair[1])呢?谢谢解答!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.