Coder Social home page Coder Social logo

super-lcx / multiclassify_lstm_forchinese Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dllxw/multiclassify_lstm_forchinese

0.0 0.0 0.0 7.28 MB

本项目主要是利用LSTM来对中文文本进行情感分类,包含四个类别(愤怒,焦虑,抑郁,伤感)

Python 100.00%

multiclassify_lstm_forchinese's Introduction

LSTM中文文本进行情感多分类

本项目主要是利用LSTM来对中文文本进行情感分类,包含四个类别(愤怒,焦虑,抑郁,伤感) 更详细的解读移步:知乎lim0

环境:

  • py3
  • keras
  • gensim
  • jieba

1 训练数据的准备

参见上面的data/文件夹 原始数据 我的训练数据是自己在网上找的,组织成四个.txt文件.每一行是一句话,除中文外,里面有各种乱七八糟的字符 清洗数据 去掉特殊符号(标点符号,数字,空格等)只保留汉字;这里的话方案很多,利用汉字的编码范围可以筛选, 详情见源码:code/dataset.py

2 分词

这里直接调用python的jieba分词API,见code/dataset.py

3 Word2Vec

计算机只能处理数字,为例将词语送给计算机处理,自然而然的要想办法将词语进行编码(对图像而言,编码无非就是一个一/三维的像素矩阵),但是对自然语言的编码就困难了,想要把意思相近的词语编码成相似的向量可不是一件容易的事情,于是有人想到word2vec,其实它就干了一件事儿,把词语嵌入(编码)到一个高维空间(向量),这些向量之间隐含了词语之间的关系。这个向量怎么来呢?答:训练得来 这里利用python 的gensim库,见code/word2vec.py

4 lstm构建

code/lstm.py

5训练

code/train.py

6推理

code/infer.pymodel/ 下面是已经训练了一部分的模型, 由于训练数据缺乏,效果一般,但可运行infer.py看一下前传效果

multiclassify_lstm_forchinese's People

Contributors

dllxw avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.