Light

feisun / robust-qa Goto Github PK

View Code? Open in Web Editor NEW

1.0 2.0 0.0 17.73 MB

Robust QA: attack, defense, robust

robust-qa's Introduction

Robust-QA

Robust QA: attack, defense, robust

QA attack at inference stage

Adversarial Examples for Evaluating Reading Comprehension Systems

Reasoning Chain Based Adversarial Attack for Multi-hop Question Answering

T3: Tree-Autoencoder Regularized Adversarial Text Generation for Targeted Attack

VQA attack at training stage

Dual-Key Multimodal Backdoors for Visual Question Answering

NLP attack at training stage

BadNL: Backdoor Attacks Against NLP Models

Rethinking Stealthiness of Backdoor Attack against NLP Models

Concealed Data Poisoning Attacks on NLP Models

Weight Poisoning Attacks on Pre-trained Models

Defense agnist NLP backdoo

ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

CSCI 699 course

THIEVES ON SESAME STREET! MODEL EXTRACTION OF BERT-BASED APIS [model steadling]

Imitation Attacks and Defenses for Black-box Machine Translation Systems [model steadling]

ACL backdoor in NLP

Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

EMNLP backdoor in NLP

Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning

RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer

ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

NAACL backdoor in NLP

Triggerless Backdoor Attack for NLP Tasks with Clean Label

AAAI backdoor in NLP

Hard to Forget: Poisoning Attacks on Certified Machine Unlearning

Backdoor Attacks on the DNN Interpretation System

Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks

DeHiB: Deep Hidden Backdoor Attack on Semi-supervised Learning via Adversarial Perturbation

Hidden Trigger Backdoor Attacks

ICLR backdoor in NLP

POISONING AND BACKDOORING CONTRASTIVE LEARNING by Google

HOW TO INJECT BACKDOORS WITH BETTER CONSISTENCY: LOGIT ANCHORING ON CLEAN DATA

TRIGGER HUNTING WITH A TOPOLOGICAL PRIOR FOR TROJAN DETECTION

Useful Repos

Backdoor on generative model

Adversarial Attacks Against Deep Generative Models on Data: A Survey Poisoning Attack on Deep Generative Models in Autonomous Driving

Model Editing

Calibrating Factual Knowledge in Pretrained Language Models EMNLP 2022

EDITABLE NEURAL NETWORKS ICLR 2020

Editing a Classifier by Rewriting Its Prediction Rules Neurips 2021

Editing Factual Knowledge in Language Models EMNLP 2021

Fast Model Editing at Scale ICLR 2022

Locating and Editing Factual Associations in GPT

Memory-Based Model Editing at Scale PMLR 2022

Modifying Memories in Transformer Models

Prompt-based model editing

Explanable NLP

survey:

Trustworthy AI: A Computational Perspective

A Survey of the State of Explainable AI for Natural Language Processing:介绍nlp中常用方法

papers

Learning Global Transparent Models Consistent with Local Contrastive Explanations

Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations

On Guaranteed Optimal Robust Explanations for NLP Models

A Comparative Study of Faithfulness Metrics for Model Interpretability Methods : 评价解释方法的faithful

robust-qa's People

Contributors

Stargazers

Watchers

robust-qa's Issues

9.2 下周TODO：

是否有人在QA上做过训练阶段的攻击
VQA上的攻击主要是在做什么？
NLP大方向上别人是怎么攻击的？

思考：

为什么通过注入极少量（50-200左右）有毒数据，trigger + fake answer，最后模型就会一遇到关键词就给fake answer，为什么会work, 背后的机理是过拟合嘛？和meta-learning相关嘛？
后门攻击有什么危害？在nlp领域，在qa方面呢？
后门攻击有哪些约束？比如要求不影响在clean data上的性能，且在含trigger数据上性能下降很多，还要求隐蔽性，是否还有其他约束？或者说什么样的后门攻击才算好？
在qa上，trigger是应该放在paragraph里，还是question中，或者是同时存在才会被激活？我们希望后门攻击给qa带来的破坏是什么？

backdoor to evaluate faithful explanability

9.21 下周TODO

阅读目前的防御方法
阅读目前最先进的攻击方法
选择一篇paper复现

现有QA系统下attack任务设定

数据集
evaluation metrics

9.26 下周TODO：

了解retriever的攻击

9.13下周TODO

看清华repo里相关的backdoor的论文
找找why backdoor word的原理
trigger不显示的时候如何成功攻击的
找篇paper复现

思考

后门攻击的最终目标，达到什么样的效果？
不管是哪种后门攻击的方式，为了让模型能够识别trigger，poisoned data都会有共同的特征，无论是fixed words/sentences，抑或是相同的句法结构，text-style
有一个问题是如果要做到backdoor attack，测试集没法主动加trigger（无论是加words，或者统一成一种style），存在实践的gap
另一方面，backdoor的持久性，一次backdoor之后能够维持多久？

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.