Coder Social home page Coder Social logo

nlp-beginner's Introduction

NLP-Beginner:自然语言处理入门练习

新加入本实验室的同学,请按要求完成下面练习,并提交报告。

请完成每次练习后把report上传到QQ群中的共享文件夹中的“Reports of nlp-beginner”目录,文件命名格式为“task 1+姓名”。

参考:

  1. 深度学习上手指南
  2. 神经网络与深度学习
  3. 不懂问google

任务一:基于机器学习的文本分类

实现基于logistic/softmax regression的文本分类

  1. 参考

    1. 文本分类
    2. 神经网络与深度学习》 第2/3章
  2. 数据集:Classify the sentiment of sentences from the Rotten Tomatoes dataset

  3. 实现要求:NumPy

  4. 需要了解的知识点:

    1. 文本特征表示:Bag-of-Word,N-gram
    2. 分类器:logistic/softmax regression,损失函数、(随机)梯度下降、特征选择
    3. 数据集:训练集/验证集/测试集的划分
  5. 实验:

    1. 分析不同的特征、损失函数、学习率对最终分类性能的影响
    2. shuffle 、batch、mini-batch
  6. 时间:两周

任务二:基于深度学习的文本分类

熟悉Pytorch,用Pytorch重写《任务一》,实现CNN、RNN的文本分类;

  1. 参考

    1. https://pytorch.org/
    2. Convolutional Neural Networks for Sentence Classification https://arxiv.org/abs/1408.5882
    3. https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/
  2. word embedding 的方式初始化

  3. 随机embedding的初始化方式

  4. 用glove 预训练的embedding进行初始化 https://nlp.stanford.edu/projects/glove/

  5. 知识点:

    1. CNN/RNN的特征抽取
    2. 词嵌入
    3. Dropout
  6. 时间:两周

任务三:基于注意力机制的文本匹配

输入两个句子判断,判断它们之间的关系。参考ESIM(可以只用LSTM,忽略Tree-LSTM),用双向的注意力机制实现。

  1. 参考
    1. 神经网络与深度学习》 第7章
    2. Reasoning about Entailment with Neural Attention https://arxiv.org/pdf/1509.06664v1.pdf
    3. Enhanced LSTM for Natural Language Inference https://arxiv.org/pdf/1609.06038v3.pdf
  2. 数据集:https://nlp.stanford.edu/projects/snli/
  3. 实现要求:Pytorch
  4. 知识点:
    1. 注意力机制
    2. token2token attetnion
  5. 时间:两周

任务四:基于LSTM+CRF的序列标注

用LSTM+CRF来训练序列标注模型:以Named Entity Recognition为例。

  1. 参考
    1. 神经网络与深度学习》 第6、11章
    2. https://arxiv.org/pdf/1603.01354.pdf
    3. https://arxiv.org/pdf/1603.01360.pdf
  2. 数据集:CONLL 2003,https://www.clips.uantwerpen.be/conll2003/ner/
  3. 实现要求:Pytorch
  4. 知识点:
    1. 评价指标:precision、recall、F1
    2. 无向图模型、CRF
  5. 时间:两周

任务五:基于神经网络的语言模型

用LSTM、GRU来训练字符级的语言模型,计算困惑度

  1. 参考
    1. 神经网络与深度学习》 第6、15章
  2. 数据集:poetryFromTang.txt
  3. 实现要求:Pytorch
  4. 知识点:
    1. 语言模型:困惑度等
    2. 文本生成
  5. 时间:两周

nlp-beginner's People

Contributors

kunyaa avatar xpqiu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nlp-beginner's Issues

task3资料过时了

task3里建议参考nndl的第七章,但是现在的第七章是网络优化与正则化。
task3中主要使用到的是attention,应该参考的是第八章注意力机制与外部记忆?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.