Coder Social home page Coder Social logo

nlp-roadmap's Introduction

Natural Language Processing Roadmap

🗺️ 一个「自然语言处理」的学习路线图

⚠️ 注意:

  1. 这个项目包含一个名为 PCB 的小实验,这个的 PCB 不是印刷电路板 Printed Circuit Board,也不是进程控制块 Process Control Block,而是 Paper Code Blog 的缩写。我认为 论文代码博客 这三个东西,可以让我们兼顾理论和实践同时,快速地掌握知识点!

  2. 每篇论文后面的星星个数代表论文的重要性(主观意见,仅供参考)。

    1. 🌟: 一般;
    2. 🌟🌟: 重要;
    3. 🌟🌟🌟: 非常重要。

1 分词 Word Segmentation

词是能够独立活动的最小语言单位。 在自然语言处理中,通常都是以词作为基本单位进行处理的。由于英文本身具有天生的优势,以空格划分所有词。而中文的词与词之间没有明显的分割标记,所以在做中文语言处理前的首要任务,就是把连续中文句子分割成「词序列」。这个分割的过程就叫分词了解更多

综述

  • 汉语分词技术综述 {Paper} 🌟
  • 国内中文自动分词技术研究综述 {Paper} 🌟
  • 汉语自动分词的研究现状与困难 {Paper} 🌟🌟
  • 汉语自动分词研究评述 {Paper} 🌟🌟
  • 中文分词十年又回顾: 2007-2017 {Paper} 🌟🌟🌟
  • chinese-word-segmentation {Code}
  • 深度学习中文分词调研 {Blog}

2 词嵌入 Word Embedding

词嵌入就是找到一个映射或者函数,生成在一个新的空间上的表示,该表示被称为「单词表示」。了解更多

综述

  • Word Embeddings: A Survey {Paper} 🌟🌟🌟
  • Visualizing Attention in Transformer-Based Language Representation Models {Paper} 🌟🌟
  • PTMs: Pre-trained Models for Natural Language Processing: A Survey {Paper} {Blog} 🌟🌟🌟
  • Efficient Transformers: A Survey {Paper} 🌟🌟
  • A Survey of Transformers {Paper} 🌟🌟
  • Pre-Trained Models: Past, Present and Future {Paper} 🌟🌟
  • Pretrained Language Models for Text Generation: A Survey {Paper} 🌟
  • A Practical Survey on Faster and Lighter Transformers {Paper} 🌟
  • The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures {Paper} 🌟🌟

核心

  • NNLM: A Neural Probabilistic Language Model {Paper} {Code} {Blog} 🌟
  • W2V: Efficient Estimation of Word Representations in Vector Space {Paper} 🌟🌟
  • Glove: Global Vectors for Word Representation {Paper} 🌟🌟
  • CharCNN: Character-level Convolutional Networks for Text Classification {Paper} {Blog} 🌟
  • ULMFiT: Universal Language Model Fine-tuning for Text Classification {Paper} 🌟
  • SiATL: An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models {Paper} 🌟
  • FastText: Bag of Tricks for Efficient Text Classification {Paper} 🌟🌟
  • CoVe: Learned in Translation: Contextualized Word Vectors {Paper} 🌟
  • ELMo: Deep contextualized word representations {Paper} 🌟🌟
  • Transformer: Attention is All you Need {Paper} {Code} {Blog} 🌟🌟🌟
  • GPT: Improving Language Understanding by Generative Pre-Training {Paper} 🌟
  • GPT2: Language Models are Unsupervised Multitask Learners {Paper} {Code} {Blog} 🌟🌟
  • GPT3: Language Models are Few-Shot Learners {Paper} {Code} 🌟🌟🌟
  • GPT4: GPT-4 Technical Report {Paper} 🌟🌟🌟
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding {Paper} {Code} {Blog} 🌟🌟🌟
  • UniLM: Unified Language Model Pre-training for Natural Language Understanding and Generation {Paper} {Code} {Blog} 🌟🌟
  • T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer {Paper} {Code} {Blog} 🌟
  • ERNIE(Baidu): Enhanced Representation through Knowledge Integration {Paper} {Code} 🌟
  • ERNIE(Tsinghua): Enhanced Language Representation with Informative Entities {Paper} {Code} 🌟
  • RoBERTa: A Robustly Optimized BERT Pretraining Approach {Paper} 🌟
  • ALBERT: A Lite BERT for Self-supervised Learning of Language Representations {Paper} {Code} 🌟🌟
  • TinyBERT: Distilling BERT for Natural Language Understanding {Paper} 🌟🌟
  • FastFormers: Highly Efficient Transformer Models for Natural Language Understanding {Paper} {Code} 🌟🌟

其他

  • word2vec Parameter Learning Explained {Paper} 🌟🌟
  • Semi-supervised Sequence Learning {Paper} 🌟🌟
  • BERT Rediscovers the Classical NLP Pipeline {Paper} 🌟
  • Pre-trained Languge Model Papers {Blog}
  • HuggingFace Transformers {Code}
  • Fudan FastNLP {Code}

3 文本分类 Text Classification

综述

  • A Survey on Text Classification: From Shallow to Deep Learning {Paper} 🌟🌟🌟
  • Deep Learning Based Text Classification: A Comprehensive Review {Paper} 🌟🌟

CNN

  • TextCNN:Convolutional Neural Networks for Sentence Classification {Paper} {Code} 🌟🌟🌟
  • Convolutional Neural Networks for Text Categorization: Shallow Word-level vs. Deep Character-level {Paper} 🌟
  • DPCNN: Deep Pyramid Convolutional Neural Networks for Text Categorization {Paper} {Code} 🌟🌟

4 序列标注 Sequence Labeling

综述

  • Sequence Labeling 的发展史(DNNs+CRF){Blog}

Bi-LSTM + CRF

  • End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF {Paper} 🌟🌟

  • pytorch_NER_BiLSTM_CNN_CRF {Code}

  • NN_NER_tensorFlow {Code}

  • End-to-end-Sequence-Labeling-via-Bi-directional-LSTM-CNNs-CRF-Tutorial {Code}

  • Bi-directional LSTM-CNNs-CRF {Code}

其他

  • Sequence to Sequence Learning with Neural Networks {Paper} 🌟
  • Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks {Paper} 🌟

5 对话系统 Dialogue Systems

综述

  • A Survey on Dialogue Systems: Recent Advances and New Frontiers {Paper} {Blog} 🌟🌟
  • 小哥哥,检索式chatbot了解一下? {Blog} 🌟🌟🌟
  • Recent Neural Methods on Slot Filling and Intent Classification for Task-Oriented Dialogue Systems: A Survey {Paper} 🌟🌟

Open Domain Dialogue Systems

  • HERD: Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models {Paper} {Code} 🌟🌟
  • Adversarial Learning for Neural Dialogue Generation {Paper} {Code} {Blog} 🌟🌟

Task Oriented Dialogue Systems

  • Joint NLU: Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling {Paper} {Code} 🌟🌟
  • BERT for Joint Intent Classification and Slot Filling {Paper} 🌟
  • Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures {Paper} {Code} 🌟🌟
  • Attention with Intention for a Neural Network Conversation Model {Paper} 🌟
  • REDP: Few-Shot Generalization Across Dialogue Tasks {Paper} {Blog} 🌟🌟
  • TEDP: Dialogue Transformers {Paper} {Code} {Blog} 🌟🌟🌟

Conversational Response Selection

  • Multi-view Response Selection for Human-Computer Conversation {Paper} 🌟🌟
  • SMN: Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots {Paper} {Code} {Blog} 🌟🌟🌟:
  • DUA: Modeling Multi-turn Conversation with Deep Utterance Aggregation {Paper} {Code} {Blog} 🌟🌟
  • DAM: Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network {Paper} {Code} {Blog} 🌟🌟🌟
  • IMN: Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots {Paper} {Code} {Blog} 🌟🌟
  • Dialogue Transformers {Paper} 🌟🌟

6 主题模型 Topic Model

LDA

7 知识图谱 Knowledge Graph

综述

  • Towards a Definition of Knowledge Graphs {Paper} 🌟🌟🌟

8 提示学习 Prompt Learning

综述

  • PPP: Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing {Paper} {Blog} 🌟🌟🌟

9 图神经网络 Graph Neural Network

综述

  • Graph Neural Networks for Natural Language Processing: A Survey {Paper} 🌟🌟

10 句嵌入 Sentence Embedding

核心

  • InferSent: Supervised Learning of Universal Sentence Representations from Natural Language Inference Data {Paper} {Code} 🌟🌟
  • Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks {Paper} {Code} 🌟🌟🌟
  • BERT-flow: On the Sentence Embeddings from Pre-trained Language Models {Paper} {Code} {Blog} 🌟🌟
  • SimCSE: Simple Contrastive Learning of Sentence Embeddings {Paper} {Code} 🌟🌟🌟

参考

nlp-roadmap's People

Contributors

ailln avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.