This repository is an up-to-date list of significant AI papers organized by publication date. It covers five fields : computer vision, natural language processing, audio processing, multimodal learning and reinforcement learning. Feel free to give this repository a star if you enjoy the work.
Maintainer: Aimerou Ndiaye
Twitter: Twitter Account
Papers are primarily ranked by number of citations and by their degree of innovation in the field. To select the most relevant papers, we chose subjective limits in terms of number of citations. Each icon here designates a paper type that meets one of these criteria.
🏆 Historical Paper : more than 10k citations and a decisive impact in the evolution of AI.
⭐ Important Paper : more than 50 citations and state of the art results.
⏫ Trend : 1 to 50 citations, recent and innovative paper with growing adoption.
📰 Important Article : decisive work that was not accompanied by a research paper.
- 🏆 1958: Perceptron: A probabilistic model for information storage and organization in the brain (Perceptron)
- 🏆 1986: Learning representations by back-propagating errors (Backpropagation)
- 🏆 1986: Induction of decision trees (CART)
- 🏆 1989: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition (HMM)
- 🏆 1989: Multilayer feedforward networks are universal approximators
- 🏆 1992: A training algorithm for optimal margin classifiers (SVM)
- 🏆 1996: Bagging predictors
- 🏆 1998: Gradient-based learning applied to document recognition (CNN/GTN)
- 🏆 2001: Random Forests
- 🏆 2001: A fast and elitist multiobjective genetic algorithm (NSGA-II)
- 🏆 2003: Latent Dirichlet Allocation (LDA)
- 🏆 2006: Reducing the Dimensionality of Data with Neural Networks (Autoencoder)
- 🏆 2008: Visualizing Data using t-SNE (t-SNE)
- 🏆 2009: ImageNet: A large-scale hierarchical image database (ImageNet)
- 🏆 2012: ImageNet Classification with Deep Convolutional Neural Networks (AlexNet)
- 🏆 2013: Efficient Estimation of Word Representations in Vector Space (Word2vec)
- 🏆 2013: Auto-Encoding Variational Bayes (VAE)
- 🏆 2014: Generative Adversarial Networks (GAN)
- 🏆 2014: Dropout: A Simple Way to Prevent Neural Networks from Overfitting (Dropout)
- 🏆 2014: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
- 🏆 2014: Adam: A Method for Stochastic Optimization (Adam)
- 🏆 2015: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Cov... (BatchNorm)
- 🏆 2015: Going Deeper With Convolutions (Inception)
- 🏆 2015: Human-level control through deep reinforcement learning (Deep Q Network)
- 🏆 2015: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (Faster R-CNN)
- 🏆 2015: U-Net: Convolutional Networks for Biomedical Image Segmentation (U-Net)
- 🏆 2015: Deep Residual Learning for Image Recognition (ResNet)
- 🏆 2016: You Only Look Once: Unified, Real-Time Object Detection (YOLO)
- 🏆 2017: Attention is All you Need (Transformer)
- 🏆 2018: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (BERT)
- 🏆 2020: Language Models are Few-Shot Learners (GPT-3)
- 🏆 2021: Highly accurate protein structure prediction with AlphaFold (Alphafold)
- 📰 2022: ChatGPT: Optimizing Language Models For Dialogue (ChatGPT)
- ⭐ 01/2022: A ConvNet for the 2020s (ConvNeXt)
- ⭐ 01/2022: Patches Are All You Need (ConvMixer)
- ⭐ 02/2022: Block-NeRF: Scalable Large Scene Neural View Synthesis (Block-NeRF)
- ⭐ 03/2022: DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection (DINO)
- ⭐ 03/2022: Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs (Large Kernel CNN)
- ⭐ 03/2022: TensoRF: Tensorial Radiance Fields (TensoRF)
- ⭐ 04/2022: MaxViT: Multi-Axis Vision Transformer (MaxViT)
- ⭐ 04/2022: Hierarchical Text-Conditional Image Generation with CLIP Latents (DALL-E 2)
- ⭐ 05/2022: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (Imagen)
- ⭐ 05/2022: GIT: A Generative Image-to-text Transformer for Vision and Language (GIT)
- ⭐ 06/2022: CMT: Convolutional Neural Network Meet Vision Transformers (CMT)
- ⭐ 07/2022: Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors... (Swin UNETR)
- ⭐ 07/2022: Classifier-Free Diffusion Guidance
- ⭐ 08/2022: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation (DreamBooth)
- ⭐ 09/2022: DreamFusion: Text-to-3D using 2D Diffusion (DreamFusion)
- ⭐ 09/2022: Make-A-Video: Text-to-Video Generation without Text-Video Data (Make-A-Video)
- ⭐ 10/2022: LAION-5B: An open large-scale dataset for training next generation image-text models (LAION-5B)
- ⭐ 11/2022: Visual Prompt Tuning
- ⭐ 11/2022: InstructPix2Pix: Learning to Follow Image Editing Instructions (InstructPix2Pix)
- ⭐ 12/2022: Scalable Diffusion Models with Transformers (DiT)
- ⭐ 01/2023: Muse: Text-To-Image Generation via Masked Generative Transformers (Muse)
- ⭐ 02/2023: Scaling Vision Transformers to 22 Billion Parameters (ViT 22B)
- ⭐ 02/2023: Adding Conditional Control to Text-to-Image Diffusion Models (ControlNet)
- ⭐ 03/2023: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models (Visual ChatGPT)
- ⭐ 03/2023: Scaling up GANs for Text-to-Image Synthesis (GigaGAN)
- ⭐ 04/2023: Segment Anything (SAM)
- ⭐ 04/2023: DINOv2: Learning Robust Visual Features without Supervision (DINOv2)
- ⏫ 04/2023: Synthetic Data from Diffusion Models Improves ImageNet Classification
- ⭐ 04/2023: Segment Anything in Medical Images (MedSAM)
- ⏫ 05/2023: Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold (DragGAN)
- ⏫ 08/2023: 3D Gaussian Splatting for Real-Time Radiance Field Rendering
- ⏫ 08/2023: SAM-Med2D
- ⭐ 01/2022: LaMBDA: Language Models for Dialog Applications (LaMBDA)
- ⭐ 01/2022: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (CoT)
- ⭐ 02/2022: Competition-Level Code Generation with AlphaCode (AlphaCode)
- ⭐ 02/2022: Finetuned Language Models Are Zero-Shot Learners (FLAN)
- ⭐ 03/2022: Training language models to follow human instructions with human feedback (InstructGPT)
- ⭐ 03/2022: Multitask Prompted Training Enables Zero-Shot Task Generalization (T0)
- ⭐ 03/2022: Training Compute-Optimal Large Language Models (Chinchilla)
- ⭐ 04/2022: Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (SayCan)
- ⭐ 04/2022: GPT-NeoX-20B: An Open-Source Autoregressive Language Model (GPT-NeoX)
- ⭐ 04/2022: PaLM: Scaling Language Modeling with Pathways (PaLM)
- ⭐ 06/2022: Beyond the Imitation Game: Quantifying and extrapolating the capabilities of lang... (BIG-bench)
- ⭐ 06/2022: Solving Quantitative Reasoning Problems with Language Models (Minerva)
- ⭐ 11/2022: BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (BLOOM)
- 📰 11/2022: Optimizing Language Models for Dialogue (ChatGPT)
- ⭐ 12/2022: Large Language Models Encode Clinical Knowledge (Med-PaLM)
- ⭐ 01/2023: DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature (DetectGPT)
- ⭐ 02/2023: Toolformer: Language Models Can Teach Themselves to Use Tools (Toolformer)
- ⭐ 02/2023: LLaMA: Open and Efficient Foundation Language Models (LLaMA)
- 📰 03/2023: GPT-4
- ⭐ 03/2023: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (GPT-4 Eval)
- ⭐ 03/2023: HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace (HuggingGPT)
- ⭐ 03/2023: BloombergGPT: A Large Language Model for Finance (BloombergGPT)
- ⭐ 04/2023: Instruction Tuning with GPT-4
- ⭐ 04/2023: Generative Agents: Interactive Simulacra of Human (Gen Agents)
- ⭐ 05/2023: PaLM 2 Technical Report (PaLM-2)
- ⭐ 05/2023: LIMA: Less Is More for Alignment (LIMA)
- ⭐ 05/2023: QLoRA: Efficient Finetuning of Quantized LLMs (QLoRA)
- ⏫ 05/2023: Voyager: An Open-Ended Embodied Agent with Large Language Models (Voyager)
- ⏫ 07/2023: ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs (ToolLLM)
- ⏫ 08/2023: MetaGPT: Meta Programming for Multi-Agent Collaborative Framework (MetaGPT)
- ⏫ 08/2023: Code Llama: Open Foundation Models for Code (Code Llama)
- ⏫ 09/2023: RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback (RLAIF)
- ⏫ 09/2023: Large Language Models as Optimizers
- ⭐ 02/2022: mSLAM: Massively multilingual joint pre-training for speech and text (mSLAM)
- ⭐ 02/2022: ADD 2022: the First Audio Deep Synthesis Detection Challenge (ADD)
- ⭐ 03/2022: Efficient Training of Audio Transformers with Patchout (PaSST)
- ⭐ 05/2022: SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language... (SpeechT5)
- ⭐ 06/2022: WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing (WavLM)
- ⭐ 07/2022: BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for ASR (BigSSL)
- ⭐ 09/2022: AudioLM: a Language Modeling Approach to Audio Generation (AudioLM)
- ⭐ 09/2022: AudioGen: Textually Guided Audio Generation (AudioGen)
- ⭐ 10/2022: High Fidelity Neural Audio Compression (EnCodec)
- ⭐ 12/2022: Robust Speech Recognition via Large-Scale Weak Supervision (Whisper)
- ⭐ 01/2023: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers (VALL-E)
- ⭐ 01/2023: MusicLM: Generating Music From Text (MusicLM)
- ⭐ 01/2023: AudioLDM: Text-to-Audio Generation with Latent Diffusion Models (AudioLDM)
- ⏫ 03/2023: Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec... (VALL-E X)
- ⏫ 05/2023: Scaling Speech Technology to 1,000+ Languages (MMS)
- ⏫ 06/2023: Simple and Controllable Music Generation (MusicGen)
- ⏫ 06/2023: AudioPaLM: A Large Language Model That Can Speak and Listen (AudioPaLM)
- ⏫ 06/2023: Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale (Voicebox)
- ⏫ 08/2023: SpeechX: Neural Codec Language Model as a Versatile Speech Transformer (SpeechX)
- ⭐ 01/2022: BLIP: Boostrapping Language-Image Pre-training for Unified Vision-Language... (BLIP)
- ⭐ 02/2022: data2vec: A General Framework for Self-supervised Learning in Speech, Vision and... (Data2vec)
- ⭐ 03/2022: VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks (VL-Adapter)
- ⭐ 04/2022: Winoground: Probing Vision and Language Models for Visio-Linguistic... (Winoground)
- ⭐ 04/2022: Flamingo: a Visual Language Model for Few-Shot Learning (Flamingo)
- ⭐ 05/2022: A Generalist Agent (Gato)
- ⭐ 05/2022: CoCa: Contrastive Captioners are Image-Text Foundation Models (CoCa)
- ⭐ 05/2022: VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts (VLMo)
- ⭐ 08/2022: Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks (BEiT)
- ⭐ 09/2022: PaLI: A Jointly-Scaled Multilingual Language-Image Model (PaLI)
- ⭐ 02/2023: Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1)
- ⭐ 03/2023: PaLM-E: An Embodied Multimodal Language Model (PaLM-E)
- ⏫ 04/2023: AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head (AudioGPT)
- ⏫ 05/2023: ImageBind: One Embedding Space To Bind Them All (ImageBind)
- ⏫ 07/2023: Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning (CM3Leon)
- ⏫ 07/2023: Meta-Transformer: A Unified Framework for Multimodal Learning (Meta-Transformer)
- ⏫ 08/2023: Massively Multilingual & Multimodal Machine Translation (SeamlessM4T)
- ⏫ 08/2023: LLaSM: Large Language and Speech Model (LLaSM)
- ⭐ 01/2022: Learning robust perceptive locomotion for quadrupedal robots in the wild
- ⭐ 01/2022: Decision making of autonomous vehicles in lane change scenarios: Deep Reinforcement...
- ⭐ 02/2022: BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning
- ⭐ 02/2022: Outracing champion Gran Turismo drivers with deep reinforcement learning
- ⭐ 02/2022: Magnetic control of tokamak plasmas through deep reinforcement learning
- ⭐ 08/2022: Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning (ANYmal)
- ⭐ 10/2022: Discovering faster matrix multiplication algorithms with reinforcement learning (AlphaTensor)
- ⭐ 01/2023: Mastering Diverse Domains through World Models (DreamerV3)
- ⏫ 02/2023: Grounding Large Language Models in Interactive Environments with Online RL
- ⏫ 02/2023: Efficient Online Reinforcement Learning with Offline Data
- ⏫ 03/2023: Reward Design with Language Models
- ⏫ 06/2023: Faster sorting algorithms discovered using deep reinforcement learning (AlphaDev)
- ⏫ 08/2023: Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization (Retroformer)