Coder Social home page Coder Social logo

zwhe99 / x-sir Goto Github PK

View Code? Open in Web Editor NEW
24.0 2.0 2.0 225.65 MB

[ACL 2024] Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models

Home Page: https://arxiv.org/abs/2402.14007

Python 92.91% Shell 7.09%
baichuan baichuan-7b baichuan2 baichuan2-7b large-language-models large-language-models-and-translation-systems llama llama2 machine-translation mistral

x-sir's Introduction

Logo

๐Ÿ’ง X-SIR: A text watermark that survives translation

arXiv Python 3.10

Implementaion of our paper:

Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models

๐Ÿ”ฅ News

  • [Apr 8, 2024]: New repo released!

Conda environment

Tested on the following environment, but it should work on other versions.

  • python 3.10.10
  • pytorch
  • pip3 install -r requirements.txt
  • [optional] pip3 install flash-attn==2.3.3

Overview

  • src_watermark implements three text watermarking methods (x-sir, sir and kgw) with a unified interface.
  • attack contains two watermarking removal methods: paraphrase and translation
  • Scripts:
    • gen.py: generate text with watermark
    • detect.py: compute z-score for given texts
    • eval_detection.py: calculate AUC, TPR, and F1 for watermark detection
    • You can use --help to see full usage of these scripts.
  • Supported models:
    • meta-llama/Llama-2-7b-hf
    • baichuan-inc/Baichuan2-7B-Base
    • baichuan-inc/Baichuan-7B
    • mistralai/Mistral-7B-v0.1
  • Supported languages: English (En), German (De), French (Fr), Chinese (Zh), Japanese (Ja)
  • You can learn how to extend model and language in from-scratch.md.

Usage (No attack)

Generate text with watermark

MODEL_NAME=baichuan-inc/Baichuan-7B
MODEL_ABBR=baichuan-7b
TRANSFORM_MODEL=data/model/transform_model_x-sbert_10K.pth
MAPPING_FILE=data/mapping/xsir/300_mapping_$MODEL_ABBR.json

WATERMARK_METHOD_FLAG="--watermark_method xsir  --transform_model $TRANSFORM_MODEL --embedding_model paraphrase-multilingual-mpnet-base-v2 --mapping_file $MAPPING_FILE"

python3 gen.py \
    --base_model $MODEL_NAME \
    --fp16 \
    --batch_size 32 \
    --input_file data/dataset/mc4/mc4.en.jsonl \
    --output_file gen/$MODEL_ABBR/xsir/mc4.en.mod.jsonl \
    --WATERMARK_METHOD_FLAG

Compute the z-scores

# Compute z-score for human-written text
python3 detect.py \
    --base_model $MODEL_NAME \
    --detect_file data/dataset/mc4/mc4.en.jsonl \
    --output_file gen/$MODEL_ABBR/xsir/mc4.en.hum.z_score.jsonl \
    $WATERMARK_METHOD_FLAG

# Compute z-score for watermarked text
python3 detect.py \
    --base_model $MODEL_NAME \
    --detect_file gen/$MODEL_ABBR/xsir/mc4.en.mod.jsonl \
    --output_file gen/$MODEL_ABBR/xsir/mc4.en.mod.z_score.jsonl \
    $WATERMARK_METHOD_FLAG

Evaluation

python3 eval_detection.py \
	--hm_zscore gen/$MODEL_ABBR/xsir/mc4.en.hum.z_score.jsonl \
	--wm_zscore gen/$MODEL_ABBR/xsir/mc4.en.mod.z_score.jsonl

AUC: 0.994

TPR@FPR=0.1: 0.994
TPR@FPR=0.01: 0.862

F1@FPR=0.1: 0.955
F1@FPR=0.01: 0.921

Usage (With attack)

Here we test the watermark after translating to other languages (De, Fr, Zh, Ja).

Preparation

We use ChatGPT to perform paraphrase and translation. Therefore:

  • Set you openai api key: export OPENAI_API_KEY=xxxx
  • You may also want to modify the RPMs and TPMs in attack/const.py

Translation

TGT_LANGS=("de" "fr" "zh" "ja")
for TGT_LANG in "${TGT_LANGS[@]}"; do
    python3 attack/translate.py \
        --input_file gen/$MODEL_ABBR/xsir/mc4.en.mod.jsonl \
        --output_file gen/$MODEL_ABBR/xsir/mc4.en-$TGT_LANG.mod.jsonl \
        --model gpt-3.5-turbo-1106 \
        --src_lang en \
        --tgt_lang $TGT_LANG
done

Compute the z-scores

for TGT_LANG in "${TGT_LANGS[@]}"; do
    python3 detect.py \
        --base_model $MODEL_NAME \
        --detect_file gen/$MODEL_ABBR/xsir/mc4.en-$TGT_LANG.mod.jsonl \
        --output_file gen/$MODEL_ABBR/xsir/mc4.en-$TGT_LANG.mod.z_score.jsonl \
        $WATERMARK_METHOD_FLAG
done

Evaluation

for TGT_LANG in "${TGT_LANGS[@]}"; do
    echo "En->$TGT_LANG"
    python3 eval_detection.py \
        --hm_zscore gen/$MODEL_ABBR/xsir/mc4.en.hum.z_score.jsonl \
        --wm_zscore gen/$MODEL_ABBR/xsir/mc4.en-$TGT_LANG.mod.z_score.jsonl
done

En->de
AUC: 0.769

TPR@FPR=0.1: 0.318
TPR@FPR=0.01: 0.060

F1@FPR=0.1: 0.450
F1@FPR=0.01: 0.112

En->fr
AUC: 0.810

TPR@FPR=0.1: 0.354
TPR@FPR=0.01: 0.046

F1@FPR=0.1: 0.488
F1@FPR=0.01: 0.087

En->zh
AUC: 0.905

TPR@FPR=0.1: 0.702
TPR@FPR=0.01: 0.182

F1@FPR=0.1: 0.781
F1@FPR=0.01: 0.305

En->ja
AUC: 0.911

TPR@FPR=0.1: 0.696
TPR@FPR=0.01: 0.112

F1@FPR=0.1: 0.775
F1@FPR=0.01: 0.200

Usage (With attack)

You can use the following flags to specify the watermarking method:

KGW

WATERMARK_METHOD_FLAG="--watermark_method kgw"

SIR

MODEL_NAME=baichuan-inc/Baichuan-7B
MODEL_ABBR=baichuan-7b
TRANSFORM_MODEL=data/model/transform_model_x-sbert_10K.pth
MAPPING_FILE=data/mapping/sir/300_mapping_$MODEL_ABBR.json

WATERMARK_METHOD_FLAG="--watermark_method sir  --transform_model $TRANSFORM_MODEL --embedding_model paraphrase-multilingual-mpnet-base-v2 --mapping_file $MAPPING_FILE"

Acknowledgement

This work can not be done without the help of the following repos:

Citation

@article{he2024can,
  title={Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models},
  author={He, Zhiwei and Zhou, Binglin and Hao, Hongkun and Liu, Aiwei and Wang, Xing and Tu, Zhaopeng and Zhang, Zhuosheng and Wang, Rui},
  journal={arXiv preprint arXiv:2402.14007},
  year={2024}
}

x-sir's People

Contributors

zwhe99 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

zhou-bl cooelf

x-sir's Issues

OSError: Unable to load weights from pytorch checkpoint file for xxx, If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

image

OSError: Unable to load weights from pytorch checkpoint file for '/home/chris/.cache/huggingface/hub/models--baichuan-inc--Baichuan-7B/snapshots/5d86e56a58fe4a5b3292cd9bb7468afef6f93eab/pytorch_model.bin' at '/home/chris/.cache/huggingface/hub/models--baichuan-inc--Baichuan-7B/snapshots/5d86e56a58fe4a5b3292cd9bb7468afef6f93eab/pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.