Robust QA: attack, defense, robust
QA attack at inference stage
Adversarial Examples for Evaluating Reading Comprehension Systems
Reasoning Chain Based Adversarial Attack for Multi-hop Question Answering
T3: Tree-Autoencoder Regularized Adversarial Text Generation for Targeted Attack
VQA attack at training stage
Dual-Key Multimodal Backdoors for Visual Question Answering
NLP attack at training stage
BadNL: Backdoor Attacks Against NLP Models
Rethinking Stealthiness of Backdoor Attack against NLP Models
Concealed Data Poisoning Attacks on NLP Models
Weight Poisoning Attacks on Pre-trained Models
Defense agnist NLP backdoo
ONION: A Simple and Effective Defense Against Textual Backdoor Attacks
CSCI 699 course
THIEVES ON SESAME STREET! MODEL EXTRACTION OF BERT-BASED APIS [model steadling]
Imitation Attacks and Defenses for Black-box Machine Translation Systems [model steadling]
ACL backdoor in NLP
Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution
Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger
EMNLP backdoor in NLP
Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning
RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models
Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer
ONION: A Simple and Effective Defense Against Textual Backdoor Attacks
NAACL backdoor in NLP
Triggerless Backdoor Attack for NLP Tasks with Clean Label
AAAI backdoor in NLP
Hard to Forget: Poisoning Attacks on Certified Machine Unlearning
Backdoor Attacks on the DNN Interpretation System
Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks
DeHiB: Deep Hidden Backdoor Attack on Semi-supervised Learning via Adversarial Perturbation
Hidden Trigger Backdoor Attacks
ICLR backdoor in NLP
POISONING AND BACKDOORING CONTRASTIVE LEARNING by Google
HOW TO INJECT BACKDOORS WITH BETTER CONSISTENCY: LOGIT ANCHORING ON CLEAN DATA
TRIGGER HUNTING WITH A TOPOLOGICAL PRIOR FOR TROJAN DETECTION
Useful Repos
Backdoor on generative model
Adversarial Attacks Against Deep Generative Models on Data: A Survey Poisoning Attack on Deep Generative Models in Autonomous Driving
Calibrating Factual Knowledge in Pretrained Language Models EMNLP 2022
EDITABLE NEURAL NETWORKS ICLR 2020
Editing a Classifier by Rewriting Its Prediction Rules Neurips 2021
Editing Factual Knowledge in Language Models EMNLP 2021
Fast Model Editing at Scale ICLR 2022
Locating and Editing Factual Associations in GPT
Memory-Based Model Editing at Scale PMLR 2022
Modifying Memories in Transformer Models
Trustworthy AI: A Computational Perspective
A Survey of the State of Explainable AI for Natural Language Processing:介绍nlp中常用方法
Learning Global Transparent Models Consistent with Local Contrastive Explanations
Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations
On Guaranteed Optimal Robust Explanations for NLP Models
A Comparative Study of Faithfulness Metrics for Model Interpretability Methods : 评价解释方法的faithful