thuccslab / awesome-lm-ssp Goto Github PK

View Code? Open in Web Editor NEW

670.0 22.0 40.0 2.06 MB

A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).

Home Page: https://github.com/ThuCCSLab/Awesome-LM-SSP

License: Apache License 2.0

adversarial-attacks awesome-list diffusion-models jailbreak language-model llm nlp privacy safety security

awesome-lm-ssp's Introduction

Awesome-LM-SSP

Introduction

The resources related to the trustworthiness of large models (LMs) across multiple dimensions (e.g., safety, security, and privacy), with a special focus on multi-modal LMs (e.g., vision-language models and diffusion models).

This repo is in progress 🌱 (currently manually collected).
Badges:
- Model:
- Comment: ...
- Venue: ...
🌻 Welcome to recommend resources to us via Issues with the following format (please fill in this table):

Title	Link	Code	Venue	Classification	Model	Comment
aa	arxiv	github	bb'23	A1. Jailbreak	LLM	Agent

News

[2024.05.13] We collected 7 related papers from S&P'24!
[2024.04.27] We adjusted the categories.
[2024.01.20] We collected 3 related papers from NDSS'24!
[2024.01.17] We collected 108 related papers from ICLR'24!
[2024.01.09] 🚀 LM-SSP is released!

Collections

Book (1)
Competition (5)
Leaderboard (3)
Toolkit (9)
Survey (30)
Paper (981)
- A. Safety (568)
  - A0. General (13)
  - A1. Jailbreak (209)
  - A2. Alignment (63)
  - A3. Deepfake (45)
  - A4. Ethics (5)
  - A5. Fairness (52)
  - A6. Hallucination (103)
  - A7. Prompt Injection (20)
  - A8. Toxicity (58)
- B. Security (157)
  - B0. General (5)
  - B1. Adversarial Examples (73)
  - B2. Poison & Backdoor (71)
  - B3. System (8)
- C. Privacy (256)
  - C0. General (19)
  - C1. Contamination (13)
  - C2. Copyright (86)
  - C3. Data Reconstruction (30)
  - C4. Membership Inference Attacks (19)
  - C5. Model Extraction (8)
  - C6. Privacy-Preserving Computation (41)
  - C7. Property Inference Attacks (2)
  - C8. Unlearning (38)

Star History

Acknowledgement

Organizers: Tianshuo Cong (丛天硕), Xinlei He (何新磊), Zhengyu Zhao (赵正宇), Yugeng Liu (刘禹更), Delong Ran (冉德龙)
This project is inspired by LLM Security, Awesome LLM Security, LLM Security & Privacy, UR2-LLMs, PLMpapers, EvaluationPapers4ChatGPT

awesome-lm-ssp's People

Contributors

Stargazers

Watchers

Forkers

tianshuocong lishaofeng sheltonliu-n superf0sh guangkechen tianqing-zhu daoyuan14 zzu-hzc hotbento eggry zggg1p hitum-dev rayguan97 qzl164 zhaoxu98 leozhangcs rogerspy deltared1a zhang-wei-chao tiuxuxsh76075 seeeyei dashuipanyanghe yangjinluan muhammed-saeed whyn0tdance yaojin17 zeyuanyin luckyfan-cs drenfongwong yrymax badhan-dass yoyostudy jxzhangjhu yogajtt guoqx2 jackal37 yanfors liam0949 wangzhiyuan120 tecworks-dev

awesome-lm-ssp's Issues

Kindly request the inclusion

Thank you for this great paper collection! I would be delighted if my work could be incorporated into the repository; thank you!

Title	Link	Code	Venue	Classification	Model	Comment
MoGU: A Framework for Enhancing Safety of Open-Sourced LLMs While Preserving Their Usability	https://arxiv.org/abs/2405.14488	https://huggingface.co/yrdu & https://github.com/DYR1/MoGU	Arxiv	A2. Alignment	LLMs	-

Kindly request the inclusion

Title	Link	Code	Venue	Classification	Model	Comment
AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting	https://arxiv.org/abs/2403.09513	https://github.com/rain305f/AdaShield	arXiv'24	A1. Jailbreak	VLM	VLM Jailbreak Defense

Kindly request the inclusion

Thank you for this great paper collection! I would be delighted if our work could be incorporated into the repository; thank you!

Title	Link	Code	Venue	Classification	Model	Comment
Training Socially Aligned Language Models on Simulated Social Interactions	https://openreview.net/forum?id=NddKiWtdUm	https://github.com/agi-templar/Stable-Alignment	ICLR '24	A2. Alignment	LLMs	-

Kindly request the inclusion

Thank you for this great paper collection! I would be delighted if my work could be incorporated into the repository; thank you!

Title	Link	Code	Venue	Classification	Model	Comment
EvilEdit: Backdooring Text-to-Image Diffusion Models in One Second	https://openreview.net/forum?id=ibEaSS6bQn	https://github.com/haowang-cqu/EvilEdit	MM'24	B2. Poison & Backdoor	Diffusion

Kindly request the inclusion

Thank you for this great paper collection! I would be delighted if my work could be incorporated into the repository; thank you!

Title	Link	Code	Venue	Classification	Model	Comment
MACE: Mass Concept Erasure in Diffusion Models	https://arxiv.org/abs/2403.06135	https://github.com/Shilin-LU/MACE	CVPR'24	C8. Unlearning	Diffusion	-

Some works that have not been included

👍 Thank you for creating and maintaining such a great repository. I found that these works have not been included and hope they can be added.

Title	Link	Code	Venue	Classification	Model
Query-Relevant Images Jailbreak Large Multi-Modal Models	https://arxiv.org/abs/2311.17600	https://github.com/isXinLiu/MM-SafetyBench	arXiv'23	A1. Jailbreak	VLM
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models	https://arxiv.org/abs/2402.03299		arXiv'24	A1. Jailbreak	LLM
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks	https://arxiv.org/abs/2312.03777		arXiv'23	B1. Adversarial Examples	VLM
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models	https://arxiv.org/abs/2402.13851		arXiv'24	B2. Poisoning	VLM

Kindly request the inclusion

Title	Link	Code	Venue	Classification	Model	Comment
Automatic and Universal Prompt Injection Attacks against Large Language Models	https://arxiv.org/abs/2403.04957	https://github.com/SheltonLiu-N/Universal-Prompt-Injection	arXiv'24	A7. Prompt Injection	LLM	Automatically generating highly effective and universal prompt injection data

Some of my related works

Title	Link	Code	Venue	Classification	Model	Comment
Towards More Effective Protection Against Diffusion-Based Mimicry with Score Distillation	https://arxiv.org/abs/2311.12832	https://github.com/xavihart/Diff-Protect	ICLR 2024	C2. Copyright	Diffusion Model	protective perturbation of diffusion model
Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability	https://arxiv.org/abs/2305.16494	https://github.com/xavihart/Diff-PGD	NeurIPS 2023	B1. Adversarial Samples	Diffusion Model	generate stealthy adversarial samples

Kindly request the inclusion

Could you please add the following 3 privacy-related works? I believe they are very valuable. Thank you for maintaining this great repo!

Title	Link	Code	Venue	Classification	Model	Comment
PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding	Link	coming soon	ACL 2024	C3. Data Reconstruction	LLM	-
ObfuscaTune: Obfuscated Offsite Fine-tuning and Inference of Proprietary LLMs on Private Datasets	Link	coming soon	ArXiv (under review)	C6. Privacy-Preserving Computation	LLM	-
IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization	Link	coming soon	ArXiv (under review)	C0. General	LLM	-

Inclusion of new paper

👍 Thank you for creating and maintaining such a great repository. I found that these works have not been included and hope they can be added.

Title	Link	Code	Venue	Classification	Model	Comment
Formalizing and Benchmarking Prompt Injection Attacks and Defenses	arxiv	github	USENIX'24	A7. Prompt Injection	LLM	Benchmark

What is the difference between Data Reconstruction and Extraction?

我认为Data Reconstruction是指从公共聚合信息中，部分重建私有数据集的方法。比如基于开源语言模型，加入私有数据进行训练。对私有数据的攻击是Data Reconstruction（刚接触这个领域，不知道这样描述对不对）。可是在Data Reconstruction中看到了[Extracting Training Data from Large Language Models]这篇文章。

Complementing CodeGen LLM

Thanks for your awesome collection of awesome paper resources!

But I wonder will you consider more SSP research on CodeGen LLMs (or whether they should be archived in this collection).

Especially works targeting early-stage models in LLM era, here are some examples,

[SP 22] Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions
[CCS 23] Large Language Models for Code: Security Hardening and Adversarial Testing, etc.

Thanks again!

request for adding a new survey

Hi, thank you for this great repo. Could you please add this new survey? Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey

Kindly request the inclusion

Thank you for this great paper collection! I would be delighted if my work could be incorporated into the repository; thank you!

Title	Link	Code	Venue	Classification	Model	Comment
DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning	https://arxiv.org/abs/2406.04197	https://github.com/THU-KEG/DICE	arXiv'24	C1. Contamination	LLM	A novel contamination detection method which leverages the internal states of LLMs to detect data contamination in fine-tune stage for math reasoning.
KoLA: Carefully Benchmarking World Knowledge of Large Language Models	https://openreview.net/forum?id=AqN23oqraW	https://kola.xlore.cn/	ICLR'24	C1. Contamination	LLM	A carefully designed evolving benchmark for evaluating LLMs' world knowledge. KoLA benchmark is evolving so that it can avoid the data contamination issue.

Kindly request the inclusion

Thank you for this great paper collection! It will be my pleasure if my work can be included in the repo; thanks!

Title	Link	Code	Venue	Classification	Model	Comment
MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning	https://arxiv.org/abs/2311.13127	https://github.com/liuyixin-louis/MetaCloak	CVPR'24 Oral	B1. Adversarial Examples	Diffusion	a more robust protective perturbation framework for safeguarding portrait against customized diffusion models training

thuccslab / awesome-lm-ssp Goto Github PK

awesome-lm-ssp's Introduction

Awesome-LM-SSP

Introduction

News

Collections

Star History

Acknowledgement

awesome-lm-ssp's People

Contributors

Stargazers

Watchers

Forkers

awesome-lm-ssp's Issues

Recommend Projects

Recommend Topics

Recommend Org