Coder Social home page Coder Social logo

zhangkai2017 / atpapers Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zhengzixiang/atpapers

0.0 1.0 0.0 381 KB

Worth-reading papers and related resources on attention mechanism, Transformer and pretrained language model (PLM) such as BERT. 值得一读的注意力机制、Transformer和预训练语言模型论文与相关资源集合

atpapers's Introduction

ATPapers

Worth-reading papers and related resources on attention mechanism, Transformer and pretrained language model (PLM) such as BERT.

Suggestions about fixing errors or adding papers, repositories and other resources are welcomed!

值得一读的注意力机制、Transformer和预训练语言模型论文与相关资源集合。

欢迎修正错误以及新增论文、代码仓库与其他资源等建议!

Attention

Papers

  • Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (ICML 2015) [paper] - Hard & Soft Attention
  • Effective Approaches to Attention-based Neural Machine Translation (EMNLP 2015) [paper] - Global & Local Attention
  • Neural Machine Translation by Jointly Learning to Align and Translate (ICLR 2015) [paper]
  • Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures (EMNLP 2018) [paper]
  • Phrase-level Self-Attention Networks for Universal Sentence Encoding (EMNLP 2018) [paper]
  • Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling (ICLR 2018) [paper][code] - Bi-BloSAN
  • Leveraging Local and Global Patterns for Self-Attention Networks (ACL 2019) [paper] [tf code][pt code]
  • Attention over Heads: A Multi-Hop Attention for Neural Machine Translation (ACL 2019) [paper]
  • Are Sixteen Heads Really Better than One? (NeurIPS 2019) [paper]
  • Synthesizer: Rethinking Self-Attention in Transformer Models (CoRR 2020) [paper] - Synthesizer

Survey & Review

  • An Attentive Survey of Attention Models (IJCAI 2019) [paper]

English Blog

Chinese Blog

Repositories

Transformer

Papers

  • Attention is All you Need (NIPS 2017) [paper][code] - Transformer
  • Weighted Transformer Network for Machine Translation (CoRR 2017) [paper][code]
  • Accelerating Neural Transformer via an Average Attention Network (ACL 2018) [paper][code] - AAN
  • Self-Attention with Relative Position Representations (NAACL 2018) [paper] [unoffical code]
  • Universal Transformers (ICLR 2019) [paper][code] - Universal Transformer
  • Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (ACL 2019) [paper] - Transformer-XL
  • Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned (ACL 2019) [paper]
  • Memory Transformer Networks (CS224n Winter2019 Reports) [paper]
  • Star-Transformer (NAACL 2019) [paper]
  • On Layer Normalization in the Transformer Architecture (ICLR 2020) [paper]
  • Transformers without Tears: Improving the Normalization of Self-Attention (IWSLT 2019) [paper][code]
  • Reformer: The Efficient Transformer (ICLR 2020) [paper] [code 1][code 2][code 3] - Reformer
  • TENER: Adapting Transformer Encoder for Named Entity Recognition (CoRR 2019) [paper]
  • ReZero is All You Need: Fast Convergence at Large Depth (CoRR 2020) [paper] [code] [related Chinese post] - ReZero
  • Lite Transformer with Long-Short Range Attention (ICLR 2020) [paper][code] - Lite Transformer
  • HAT: Hardware-Aware Transformers for Efficient Natural Language Processing (ACL 2020) [paper][code] - HAT
  • Longformer: The Long-Document Transformer (CoRR 2020) [paper][code] - LongFormer
  • Improving Transformer Models by Reordering their Sublayers (ACL 2020) [paper]
  • Highway Transformer: Self-Gating Enhanced Self-Attentive Networks (ACL 2020) [paper][code] - Highway Transformer
  • Talking-Heads Attention (CoRR 2020) [paper]
  • Linformer: Self-Attention with Linear Complexity (CoRR 2020) [paper] - Linformer

Chinese Blog

English Blog

Repositories

Pretrained Language Model

Models

  • Deep Contextualized Word Representations (NAACL 2018) [paper] - ELMo
  • Universal Language Model Fine-tuning for Text Classification (ACL 2018) [paper] - ULMFit
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (NAACL 2019) [paper][code][official PyTorch code] - BERT
  • Improving Language Understanding by Generative Pre-Training (CoRR 2018) [paper] - GPT
  • Language Models are Unsupervised Multitask Learners (CoRR 2019) [paper][code] - GPT-2
  • MASS: Masked Sequence to Sequence Pre-training for Language Generation (ICML 2019) [paper][code] - MASS
  • Unified Language Model Pre-training for Natural Language Understanding and Generation (CoRR 2019) [paper][code] - UNILM
  • Multi-Task Deep Neural Networks for Natural Language Understanding (ACL 2019) [paper][code] - MT-DNN
  • 75 Languages, 1 Model: Parsing Universal Dependencies Universally[paper][code] - UDify
  • ERNIE: Enhanced Language Representation with Informative Entities (ACL 2019) [paper][code] - ERNIE (THU)
  • ERNIE: Enhanced Representation through Knowledge Integration (CoRR 2019) [paper] - ERNIE (Baidu)
  • Defending Against Neural Fake News (CoRR 2019) [paper][code] - Grover
  • ERNIE 2.0: A Continual Pre-training Framework for Language Understanding (CoRR 2019) [paper] - ERNIE 2.0 (Baidu)
  • Pre-Training with Whole Word Masking for Chinese BERT (CoRR 2019) [paper] - Chinese-BERT-wwm
  • SpanBERT: Improving Pre-training by Representing and Predicting Spans (CoRR 2019) [paper] - SpanBERT
  • XLNet: Generalized Autoregressive Pretraining for Language Understanding (CoRR 2019) [paper][code] - XLNet
  • RoBERTa: A Robustly Optimized BERT Pretraining Approach (CoRR 2019) [paper] - RoBERTa
  • NEZHA: Neural Contextualized Representation for Chinese Language Understanding (CoRR 2019) [paper][code] - NEZHA
  • K-BERT: Enabling Language Representation with Knowledge Graph (AAAI 2020) [paper][code] - K-BERT
  • Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism (CoRR 2019) [paper][code] - Megatron-LM
  • Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transforme (CoRR 2019) [paper][code] - T5
  • BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (CoRR 2019) [paper] - BART
  • ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations (CoRR 2019) [paper][code] - ZEN
  • The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service (CoRR 2019) [paper][code] - BAAI-JDAI-BERT
  • Knowledge Enhanced Contextual Word Representations (EMNLP 2019) [paper] - KnowBert
  • UER: An Open-Source Toolkit for Pre-training Models (EMNLP 2019) [paper][code] - UER
  • ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (ICLR 2020) [paper] - ELECTRA
  • StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding (ICLR 2020) [paper] - StructBERT
  • FreeLB: Enhanced Adversarial Training for Language Understanding (ICLR 2020) [paper][code] - FreeLB
  • HUBERT Untangles BERT to Improve Transfer across NLP Tasks (CoRR 2019) [paper] - HUBERT
  • CodeBERT: A Pre-Trained Model for Programming and Natural Languages (CoRR 2020) [paper] - CodeBERT
  • ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training (CoRR 2020) [paper] - ProphetNet
  • ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation (CoRR 2020) [paper][code] - ERNIE-GEN
  • Efficient Training of BERT by Progressively Stacking (ICML 2019) [paper][code] - StackingBERT
  • UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training (CoRR 2020) [paper][code] - UNILMv2
  • Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space (CoRR 2020) [paper][code] - Optimus
  • MPNet: Masked and Permuted Pre-training for Language Understanding (CoRR 2020) [paper][code] - MPNet
  • Language Models are Few-Shot Learners (CoRR 2020) [paper][code] - GPT-3
  • SPECTER: Document-level Representation Learning using Citation-informed Transformers (ACL 2020) [paper] - SPECTER

Multi-Modal

  • VideoBERT: A Joint Model for Video and Language Representation Learning (ICCV 2019) [paper]
  • Learning Video Representations using Contrastive Bidirectional Transformer (CoRR 2019) [paper] - CBT
  • ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks (NeurIPS 2019) [paper][code]
  • VisualBERT: A Simple and Performant Baseline for Vision and Language (CoRR 2019) [paper][code]
  • Fusion of Detected Objects in Text for Visual Question Answering (EMNLP 2019) [paper][[code]](https://github.com/google-research/ language/tree/master/language/question_answering/b2t2) - B2T2
  • Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training (AAAI 2020) [paper]
  • LXMERT: Learning Cross-Modality Encoder Representations from Transformers (EMNLP 2019) [paper][code]
  • VL-BERT: Pre-training of Generic Visual-Linguistic Representatio (CoRR 2019) [paper][code]
  • UNITER: Learning UNiversal Image-TExt Representations (CoRR 2019) [paper]
  • FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval (SIGIR 2020) [paper] - FashionBERT
  • VD-BERT: A Unified Vision and Dialog Transformer with BERT (CoRR 2020) [paper] - VD-BERT

Multilingual

  • Cross-lingual Language Model Pretraining (CoRR 2019) [paper] - XLM
  • MultiFiT: Efficient Multi-lingual Language Model Fine-tuning (EMNLP 2019) [paper][code] - MultiFit
  • XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization (CoRR 2020) [paper][code] - XTREME
  • WikiBERT Models: Deep Transfer Learning for Many Languages (CoRR 2020) [paper][code] - WikiBERT

Compression

  • Distilling Task-Specific Knowledge from BERT into Simple Neural Networks (CoRR 2019) [paper]
  • Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System (CoRR 2019) [paper] - MKDM
  • Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding (CoRR 2019) [paper]
  • Well-Read Students Learn Better: On the Importance of Pre-training Compact Models (CoRR 2019) [paper]
  • Small and Practical BERT Models for Sequence Labeling (EMNLP 2019) [paper]
  • Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT (CoRR 2019) [paper] - Q-BERT
  • Patient Knowledge Distillation for BERT Model Compression (EMNLP 2019) [paper] - BERT-PKD
  • Extreme Language Model Compression with Optimal Subwords and Shared Projections (ICLR 2019) [paper]
  • DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter [paper][code] - DistilBERT
  • TinyBERT: Distilling BERT for Natural Language Understanding (ICLR 2019) [paper][code] - TinyBERT
  • Q8BERT: Quantized 8Bit BERT (NeurIPS 2019 Workshop) [paper] - Q8BERT
  • ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (ICLR 2020) [paper][code] - ALBERT
  • Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning (ICLR 2020) [paper][PyTorch code]
  • Reducing Transformer Depth on Demand with Structured Dropout (ICLR 2020) [paper] - LayerDrop
  • Multilingual Alignment of Contextual Word Representations (ICLR 202) [paper]
  • AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search (CoRR 2020) [paper] - AdaBERT
  • MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers (CoRR 2020) [paper][code] - MiniLM
  • FastBERT: a Self-distilling BERT with Adaptive Inference Time (ACL 2020) [paper][code] - FastBERT
  • MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices (ACL 2020) [paper][code] - MobileBERT
  • DynaBERT: Dynamic BERT with Adaptive Width and Depth (CoRR 2020) [paper] - DynaBERT

Application

  • BERT for Joint Intent Classification and Slot Filling (CoRR 2019) [paper]
  • GPT-based Generation for Classical Chinese Poetry (CoRR 2019) [paper]
  • Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (EMNLP 2019) [paper][code]
  • Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring (ICLR 2020) [paper]
  • Pre-training Tasks for Embedding-based Large-scale Retrieval (ICLR 2020) [paper]
  • K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters (CoRR 2020) [paper] - K-Adapter
  • Keyword-Attentive Deep Semantic Matching (CoRR 2020) [paper & code] [post] - Keyword BERT
  • Unified Multi-Criteria Chinese Word Segmentation with BERT (CoRR 2020) [paper]
  • Spelling Error Correction with Soft-Masked BERT (ACL 2020) [[paper]](Spelling Error Correction with Soft-Masked BERT) - Soft-Masked BERT
  • DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering (ACL 2020) [paper][code] - DeFormer

Analysis & Tools

  • Probing Neural Network Comprehension of Natural Language Arguments (ACL 2019) [paper][code]
  • Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference (ACL 2019) [paper] [code]
  • To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks (RepL4NLP@ACL 2019) [paper]
  • Multi-Head Multi-Layer Attention to Deep Language Representations for Grammatical Error Detection (CICLing 2019) [paper]
  • Understanding the Behaviors of BERT in Ranking (CoRR 2019) [paper]
  • How to Fine-Tune BERT for Text Classification? (CoRR 2019) [paper]
  • What Does BERT Look At? An Analysis of BERT's Attention (BlackBoxNLP 2019) [paper][code]
  • Visualizing and Understanding the Effectiveness of BERT (EMNLP 2019) [paper]
  • exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models (CoRR 2019) [paper] [code]
  • Transformers: State-of-the-art Natural Language Processing [paper][code][code]
  • Do Attention Heads in BERT Track Syntactic Dependencies? [paper]
  • Fine-tune BERT with Sparse Self-Attention Mechanism (EMNLP 2019) [paper]
  • How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings (EMNLP 2019) [paper]
  • oLMpics -- On what Language Model Pre-training Captures (CoRR 2019) [paper]
  • Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment (AAAI 2020) [paper][code] - TextFooler
  • A Mutual Information Maximization Perspective of Language Representation Learning (ICLR 2020) [paper]
  • Cross-Lingual Ability of Multilingual BERT: An Empirical Study (ICLR2020) [paper]
  • Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping (CoRR 2020) [paper]
  • How Much Knowledge Can You Pack Into the Parameters of a Language Model? (CoRR 2020) [paper]
  • BERT Can See Out of the Box: On the Cross-modal Transferability of Text Representations (CoRR 2020) [paper]
  • Contextual Embeddings: When Are They Worth It? (ACL 2020) [paper]
  • Adversarial Training for Large Neural Language Models (CoRR 2020) [paper][code]

Tutorial & Survey

  • Transfer Learning in Natural Language Processing (NAACL 2019) [paper]
  • Evolution of Transfer Learning in Natural Language Processing (CoRR 2019) [paper]
  • Transferring NLP Models Across Languages and Domains (DeepLo 2019) [paper]
  • Pre-trained Models for Natural Language Processing: A Survey (Invited Review of Science China Technological Sciences 2020) [paper] - *** -Embeddings in Natural Language Processing (2020) [book]

Repository

Chinese Blog

English Blog

atpapers's People

Contributors

zhengzixiang avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.