Coder Social home page Coder Social logo

iamgroot42 / mimir Goto Github PK

View Code? Open in Web Editor NEW
68.0 3.0 9.0 1.32 GB

Python package for measuring memorization in LLMs.

Home Page: https://iamgroot42.github.io/mimir/

License: MIT License

Python 0.89% Shell 0.16% Jupyter Notebook 98.95% Mako 0.01%
llm-privacy membership-inference

mimir's Introduction

MIMIR

MIMIR logo

MIMIR - Python package for measuring memorization in LLMs.

Documentation is available here.

Tests Documentation

Instructions

First install the python dependencies

pip install -r requirements.txt

Then, install our package

pip install -e .

To use, run the scripts in scripts/bash

Note: Intermediate results are saved in tmp_results/ and tmp_results_cross/ for bash scripts. If your experiment completes successfully, the results will be moved into the results/ and results_cross/ directory.

Setting environment variables

You can either provide the following environment variables, or pass them via your config/CLI:

MIMIR_CACHE_PATH: Path to cache directory
MIMIR_DATA_SOURCE: Path to data directory

Using cached data

The data we used for our experiments is available on Hugging Face Datasets. You can either choose to either load the data directly from Hugging Face with the load_from_hf flag in the config (preferred), or download the cache_100_200_.... folders into your MIMIR_CACHE_PATH directory.

MIA experiments how to run

python run.py --config configs/mi.json

Attacks

We include and implement the following attacks, as described in our paper.

  • Likelihood (loss). Works by simply using the likelihood of the target datapoint as score.
  • Reference-based (ref). Normalizes likelihood score with score obtained from a reference model.
  • Zlib Entropy (zlib). Uses the zlib compression size of a sample to approximate local difficulty of sample.
  • Neighborhood (ne). Generates neighbors using auxiliary model and measures change in likelihood.
  • Min-K% Prob (min_k). Uses k% of tokens with minimum likelihood for score computation.
  • Min-K%++ (min_k++). Uses k% of tokens with minimum normalized likelihood for score computation.
  • Gradient Norm (gradnorm). Uses gradient norm of the target datapoint as score.

Adding your own dataset

To extend the package for your own dataset, you can directly load your data inside load_cached() in data_utils.py, or add an additional if-else within load() in data_utils.py if it cannot be loaded from memory (or some source) easily. We will probably add a more general way to do this in the future.

Adding your own attack

To add an attack, create a file for your attack (e.g. attacks/my_attack.py) and implement the interface described in attacks/all_attacks.py. Then, add a name for your attack to the dictionary in attacks/utils.py.

If you would like to submit your attack to the repository, please open a pull request describing your attack and the paper it is based on.

Citation

If you use MIMIR in your research, please cite our paper:

@article{duan2024membership,
      title={Do Membership Inference Attacks Work on Large Language Models?}, 
      author={Michael Duan and Anshuman Suri and Niloofar Mireshghallah and Sewon Min and Weijia Shi and Luke Zettlemoyer and Yulia Tsvetkov and Yejin Choi and David Evans and Hannaneh Hajishirzi},
      year={2024},
      journal={arXiv:2402.07841},
}

mimir's People

Contributors

iamgroot42 avatar actions-user avatar michaelduan8 avatar zjysteven avatar framartin avatar

Stargazers

Mihai Chirculescu avatar toma avatar Yu Zhao avatar Lucas H. McCabe avatar Yi Xie avatar JiaHao Lu avatar Junkai Chen avatar Zhuoran Jin avatar Abhinav Sukumar Rao avatar Danil Savine avatar Jaechan Lee avatar Demet avatar Haodong Li avatar  avatar Yueqian Lin avatar  avatar Yuetian avatar  avatar Vaishakh Raveendran avatar Andrei Fajardo avatar Chawin Sitawarin avatar Edoardo Debenedetti avatar  avatar Xiaosen Zheng avatar Kristian Georgiev avatar Bo Yuan avatar  avatar Jiabao Ji avatar Tianyu Du avatar gszh avatar Guangyao Dou avatar Lei Yu avatar  avatar  avatar Guilherme Euzébio avatar Jeff Carpenter avatar Mohammad Reza Taesiri avatar Zhenting Qi avatar Almog Tavor avatar Barabazs avatar  avatar SUN YOUNG HWANG avatar Vishaal Udandarao avatar Sicheng avatar Krithika Ramesh avatar Marcin avatar  avatar  avatar hieu avatar Kalyan Nakka avatar bitamp avatar  avatar Sen Wang avatar Rob avatar Jeffrey Wang avatar  avatar Mr. Steve Charlesworth avatar Tetsuya Motegi avatar Suraj Sharma avatar kelsy gagnebin avatar Junyuan Hong avatar Wemersive, Inc avatar Levi Doughty avatar Daniel Bentes avatar Hogan Kangas avatar Guntsv avatar Klaus Hipp avatar Niloofar Mireshghallah avatar

Watchers

Sewon Min avatar  avatar  avatar

mimir's Issues

Some issues in configs/mi.json

Hi,
I found your project to be helpful to my current paper, but I met some problems when I was trying python run.py --config configs/mi.json.

Problems:

  1. An error reports split is Nonetype, which can be fixed by adjusting the mi.json as:
    "dataset_member": "the_pile_pile_cc_ngram_13_0.2",
    "dataset_nonmember": "the_pile_pile_cc_ngram_13_0.2",

  2. An error reports there is no dataset key, which can be fixed by adding this line to the mi.json:
    "dataset_key": "the_pile",

I don't whether my adjustments to mi.json are correct. If it's not, could you please give me some suggestions? If these are correct, I suggest adding some tutorial about config files in the github project.

Thanks,
Ethan

Step by step guide/documentation?

Hi,

Thanks for sharing your code. I want to run the attacks you implemented on my own datasets, but I'm not sure where to start. Is there a guide or documentation the describes this?

Best,
Shane

Types of gradients computed by GradNormAttack

Hello,
Thanks for your valuable work on mimir!

If I understand correctly, GradNormAttack is computing the average (across layers) of the gradient norm wrt. the model weights.

grad_norms.append(param.grad.detach().norm(p))

But the docstring indicates that the gradients are computed w.r.t. input tokens.

Gradient Norm Attack. Computes p-norm of gradients w.r.t. input tokens.

Since the original paper proposes both, I think there are two solutions:

  • Simply fixing the docstring and keeping the current implementation
  • Or implementing both gradients norms. I guess that computing gradients wrt input tokens would require modifying Model.get_probabilities()

The results in Appendix C.1 suggest that in certain settings, one gradient type outperforms the other, while in other settings, the reverse is observed.

What do you think?

Original dataset?

Hi,

Thank you for the great work! I was not able to find the original Pile subcategory dataset (arXiv, Github etc.) in the huggingface data repo. There are only the processed ones (7-gram, 13-gram). Could you share the original ones as well?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.