Coder Social home page Coder Social logo

emotional_classification_with_rnn's Introduction

基于循环神经网络(RNN)的评论情感分类

使用循环神经网络,完成对影评的情感(正面、负面)分类。

训练使用的数据集为https://www.cs.cornell.edu/people/pabo/movie-review-data/上的sentence polarity dataset v1.0,包含正负面评论各5331条。

由于数据集较小,模型的泛化能力不是很好。

当训练集、开发集、测试集的分布为[0.8,0.1,0.1],训练2000个batch_size=64的mini_batch时,模型在各数据集上的acc表现大致如下:

  • 训练集 0.95

  • 开发集 0.79

  • 测试集 0.80

详情请移步我的博客使用循环神经网络(RNN)实现影评情感分类


说明

1.数据预处理

数据下载下来之后需要进行解压,得到rt-polarity.negrt-polarity.pos文件,这两个文件是Windows-1252编码的,先将它转成unicode处理起来会更方便。

数据预处理过程包括:

  • 转码

  • 生成词汇表

  • 借助词汇表将影评转化为词向量

  • 填充词向量并转化为np数组

  • 按比例划分数据集(训练、开发、测试)

  • 打乱数据集,写入文件

python process_data.py 

2.模型编写

使用RNN完成分类功能,建模过程大致如下:

  • 使用embedding构建词嵌入矩阵

  • 使用LSTM作为循环神经网络的基本单元

  • 对embedding和LSTM进行随机失活(dropout)

  • 建立深度为2的深度循环神经网络

  • 对深度循环神经网络的最后的输出做逻辑回归,通过sigmod判定类别

3.模型训练

训练:

  • 使用移动平均

  • 使用学习率指数衰减

python train.py

4.模型验证

eval.py中存在如下代码:

data = dataset.Dataset(0)

Dataset的参数,0代表验证训练集数据,1代表验证开发集数据,2代表验证测试集数据。

python eval.py

5.模型配置

可配置参数集中在settings中。

emotional_classification_with_rnn's People

Contributors

aaronjny avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.