awesome papers for understanding LLM mechanism

Focusing on: understanding the internal mechanism of large language models (LLM).

(keep updating when I read good papers ...)

papers

Locating and Editing Factual Associations in Mamba. [pdf] [2024.4]

Chain-of-Thought Reasoning Without Prompting. [pdf] [2024.2]

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking. [pdf] [ICLR 2024 poster] [2024.2]

Towards Best Practices of Activation Patching in Language Models: Metrics and Methods. [pdf] [2023.10]

Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level. [blog] [2023.12]

Successor Heads: Recurring, Interpretable Attention Heads In The Wild. [pdf] [ICLR 2024 poster] [2023.12]

Impact of Co-occurrence on Factual Knowledge of Large Language Models. [pdf] [EMNLP 2023 findings] [2023.10]

Function vectors in large language models. [pdf] [ICLR 2024 poster] [2023.10]

Can Large Language Models Explain Themselves? [pdf] [2023.10]

Neurons in Large Language Models: Dead, N-gram, Positional. [pdf] [2023.9]

Do Machine Learning Models Memorize or Generalize? [blog] [2023.8]

Overthinking the Truth: Understanding how Language Models Process False Demonstrations. [pdf] [2023.7]

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning. [pdf] [EMNLP 2023 best paper] [2023.5]

Let's Verify Step by Step. [pdf] [ICLR 2024 poster] [2023.5]

What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning. [pdf] [ACL 2023 findings] [2023.5]

Language models can explain neurons in language models. [blog] [2023.5]

A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis [pdf] [EMNLP 2023 main] [2023.5]

Dissecting Recall of Factual Associations in Auto-Regressive Language Models. [pdf] [EMNLP 2023 main] [2023.4]

Are Emergent Abilities of Large Language Models a Mirage? [pdf] [NeurIPS 2023 best paper] [2023.4]

The Closeness of In-Context Learning and Weight Shifting for Softmax Regression. [pdf] [2023.4]

Towards automated circuit discovery for mechanistic interpretability. [pdf] [NeurIPS 2023 spotlight] [2023.4]

How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model. [pdf] [NeurIPS 2023 poster] [2023.4]

A Theory of Emergent In-Context Learning as Implicit Structure Induction. [pdf] [2023.3]

Larger language models do in-context learning differently. [pdf] [2023.3]

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models. [pdf] [NeurIPs 2023 spotlight] [2023.1]

Transformers as Algorithms: Generalization and Stability in In-context Learning. [pdf] [ICML 2023 poster] [2023.1]

Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers. [pdf] [ACL 2023 findings] [2022.12]

How does gpt obtain its ability? tracing emergent abilities of language models to their sources. [blog] [2022.12]

Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters. [pdf] [ACL 2023 long] [2022.12]

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small. [pdf] [ICLR 2023 poster] [2022.11]

Inverse scaling can become U-shaped. [pdf] [EMNLP 2023 main] [2022.11]

What learning algorithm is in-context learning? Investigations with linear models. [pdf] [ICLR 2023 notable] [2022.11]

Mass-Editing Memory in a Transformer. [pdf] [ICLR 2023 notable] [2022.10]

Polysemanticity and Capacity in Neural Networks. [pdf] [2022.10]

Analyzing Transformers in Embedding Space. [pdf] [ACL 2023 long] [2022.9]

Toy Models of Superposition. [blog] [2022.9]

Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango. [pdf] [2022.9]

Emergent Abilities of Large Language Models. [pdf] [2022.6]

Mechanistic Interpretability, Variables, and the Importance of Interpretable Bases. [blog] [2022.6]

Towards Tracing Factual Knowledge in Language Models Back to the Training Data. [pdf] [EMNLP 2022 findings] [2022.5]

Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations. [pdf] [EMNLP 2022 main] [2022.5]

Large Language Models are Zero-Shot Reasoners. [pdf] [NeurIPS 2022] [2022.5]

Scaling Laws and Interpretability of Learning from Repeated Data. [pdf] [2022.5]

Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space. [pdf] [EMNLP 2022 main] [2022.3]

In-context Learning and Induction Heads. [blog] [2022.3]

Locating and Editing Factual Associations in GPT. [pdf] [NeurIPS 2022] [2022.2]

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? [pdf] [EMNLP 2022 main] [2022.2]

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets. [pdf] [2022.1]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. [pdf] [2022.1]

A Mathematical Framework for Transformer Circuits. [blog] [2021.12]

An Explanation of In-context Learning as Implicit Bayesian Inference. [pdf] [ICLR 2022 poster] [2021.11]

Towards a Unified View of Parameter-Efficient Transfer Learning. [pdf] [ICLR 2022 spotlight] [2021.10]

Do Prompt-Based Models Really Understand the Meaning of their Prompts? [pdf] [NAACL 2022] [2021.9]

Deduplicating Training Data Makes Language Models Better. [pdf] [ACL 2022 long] [2021.7]

LoRA: Low-Rank Adaptation of Large Language Models. [pdf] [ICLR 2022 poster] [2021.6]

Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. [pdf] [ACL 2022 long] [2021.4]

The Power of Scale for Parameter-Efficient Prompt Tuning. [pdf] [EMNLP 2021 main] [2021.4]

Calibrate Before Use: Improving Few-Shot Performance of Language Models [pdf] [ICML 2021] [2021.2]

Prefix-Tuning: Optimizing Continuous Prompts for Generation. [pdf] [ACL 2021 long] [2021.1]

Transformer Feed-Forward Layers Are Key-Value Memories. [pdf] [EMNLP 2021 main] [2020.12]

Scaling Laws for Neural Language Models. [pdf] [2020.1]

survey

A Comprehensive Overview of Large Language Models. [pdf] [2023.12] [LLM]

A Survey of Large Language Models. [pdf] [2023.11] [LLM]

Explainability for Large Language Models: A Survey. [pdf] [2023.11] [interpretability]

A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future. [pdf] [2023.10] [chain of thought]

Instruction tuning for large language models: A survey. [pdf] [2023.10] [instruction tuning]

Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models. [pdf] [2023.9] [hallucination]

Reasoning with language model prompting: A survey. [pdf] [2023.9] [reasoning]

Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks. [pdf] [2023.8] [interpretability]

A Survey on In-context Learning. [pdf] [2023.6] [in-context learning]

Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning. [pdf] [2023.3] [parameter-efficient fine-tuning]

gbkus123 / awesome-llm-understanding-mechanism Goto Github PK

awesome-llm-understanding-mechanism's Introduction

awesome papers for understanding LLM mechanism

papers

survey

awesome-llm-understanding-mechanism's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent