Coder Social home page Coder Social logo

robust-qa's Introduction

Robust-QA

Robust QA: attack, defense, robust

QA attack at inference stage

Adversarial Examples for Evaluating Reading Comprehension Systems

Reasoning Chain Based Adversarial Attack for Multi-hop Question Answering

T3: Tree-Autoencoder Regularized Adversarial Text Generation for Targeted Attack

VQA attack at training stage

Dual-Key Multimodal Backdoors for Visual Question Answering

NLP attack at training stage

BadNL: Backdoor Attacks Against NLP Models

Rethinking Stealthiness of Backdoor Attack against NLP Models

Concealed Data Poisoning Attacks on NLP Models

Weight Poisoning Attacks on Pre-trained Models

Defense agnist NLP backdoo

ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

CSCI 699 course

THIEVES ON SESAME STREET! MODEL EXTRACTION OF BERT-BASED APIS [model steadling]

Imitation Attacks and Defenses for Black-box Machine Translation Systems [model steadling]

ACL backdoor in NLP

Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

EMNLP backdoor in NLP

Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning

RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer

ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

NAACL backdoor in NLP

Triggerless Backdoor Attack for NLP Tasks with Clean Label

AAAI backdoor in NLP

Hard to Forget: Poisoning Attacks on Certified Machine Unlearning

Backdoor Attacks on the DNN Interpretation System

Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks

DeHiB: Deep Hidden Backdoor Attack on Semi-supervised Learning via Adversarial Perturbation

Hidden Trigger Backdoor Attacks

ICLR backdoor in NLP

POISONING AND BACKDOORING CONTRASTIVE LEARNING by Google

HOW TO INJECT BACKDOORS WITH BETTER CONSISTENCY: LOGIT ANCHORING ON CLEAN DATA

TRIGGER HUNTING WITH A TOPOLOGICAL PRIOR FOR TROJAN DETECTION

Useful Repos

OpenBackdoor

Backdoor on generative model

Adversarial Attacks Against Deep Generative Models on Data: A Survey Poisoning Attack on Deep Generative Models in Autonomous Driving

Model Editing

Calibrating Factual Knowledge in Pretrained Language Models EMNLP 2022

EDITABLE NEURAL NETWORKS ICLR 2020

Editing a Classifier by Rewriting Its Prediction Rules Neurips 2021

Editing Factual Knowledge in Language Models EMNLP 2021

Fast Model Editing at Scale ICLR 2022

Locating and Editing Factual Associations in GPT

Memory-Based Model Editing at Scale PMLR 2022

Modifying Memories in Transformer Models

Prompt-based model editing

Explanable NLP

survey:

Trustworthy AI: A Computational Perspective

A Survey of the State of Explainable AI for Natural Language Processing:介绍nlp中常用方法

papers

Learning Global Transparent Models Consistent with Local Contrastive Explanations

Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations

On Guaranteed Optimal Robust Explanations for NLP Models

A Comparative Study of Faithfulness Metrics for Model Interpretability Methods : 评价解释方法的faithful

robust-qa's People

Contributors

feisun avatar leiyangithub avatar

Stargazers

 avatar

Watchers

 avatar  avatar

robust-qa's Issues

9.2 下周TODO:

  • 是否有人在QA上做过训练阶段的攻击
  • VQA上的攻击主要是在做什么?
  • NLP大方向上别人是怎么攻击的?

思考:

  • 为什么通过注入极少量(50-200左右)有毒数据,trigger + fake answer,最后模型就会一遇到关键词就给fake answer,为什么会work, 背后的机理是过拟合嘛?和meta-learning相关嘛?
  • 后门攻击有什么危害?在nlp领域,在qa方面呢?
  • 后门攻击有哪些约束?比如要求不影响在clean data上的性能,且在含trigger数据上性能下降很多,还要求隐蔽性,是否还有其他约束?或者说什么样的后门攻击才算好?
  • 在qa上,trigger是应该放在paragraph里,还是question中,或者是同时存在才会被激活?我们希望后门攻击给qa带来的破坏是什么?

9.21 下周TODO

  • 阅读目前的防御方法
  • 阅读目前最先进的攻击方法
  • 选择一篇paper复现

9.13下周TODO

  • 看清华repo里相关的backdoor的论文
  • 找找why backdoor word的原理
  • trigger不显示的时候如何成功攻击的
  • 找篇paper复现

思考

  1. 后门攻击的最终目标,达到什么样的效果?
  2. 不管是哪种后门攻击的方式,为了让模型能够识别trigger,poisoned data都会有共同的特征,无论是fixed words/sentences,抑或是相同的句法结构,text-style
  3. 有一个问题是如果要做到backdoor attack,测试集没法主动加trigger(无论是加words,或者统一成一种style),存在实践的gap
  4. 另一方面,backdoor的持久性,一次backdoor之后能够维持多久?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.