Watermarking Machine-Generated Text

This repository contains the code for our senior project "Watermarking Machine-Generated Text".

Final Notebook
Notebook

No Attacks

Description	Notebooks	Datasets
1. Create a dataset of Unwatermarked Text	Notebook	Dataset
2. Test Watermarking and Detection Algorithms	Logits Deviation with Green-Red List Sampling With Randomized Numbers using TinyLlama Logits Deviation with Randomized Numbers using OPT-350M	N/A
3. Automate Watermarking Using the Sunbird Dataset	Part 1 Part 2 Part 3 Merge	Part 1 Part 2 Part 3 Merged
4. Merge Unwatermarked and Watermarked Datasets	Merge Truncate	Merged Truncated
5. Run the Detection Algorithm on the first 1200 rows of the Truncated Dataset	Part1 Part2 Part3 Merge	Part1 Part2 Part3 Merged
6. Evaluate the Accuracy, Precision, Recall and F1-score of the Detection Algorithm	Notebook	N/A

Simulating and Counteracting Attacks

1. Paraphrasing Attack

Description	Notebooks	Datasets
1. Experimented paraphrasing using Dipper	Notebook	N/A
2. Experimented paraphrasing using T5 on the Watermarked Dataset	Notebook	Dataset
3. Run the Detection Algorithm on the Paraphrased Dataset	Part1 Part2 Part3 Merge	Part1 Part2 Part3 Merged
4. Evaluate the Accuracy, Precision, Recall and F1-score of the Detection Algorithm after Paraphrasing	Notebook	N/A
5. Experimented paraphrasing using Roundtrip paraphrasing (English to French to English) on the Watermarked Dataset	Notebook	Dataset
6. Run the Detection Algorithm on the Paraphrased Dataset	Part1 Part2 Part3 Merge	Part1 Part2 Part3 Merged
7. Evaluate the Accuracy, Precision, Recall and F1-score of the Detection Algorithm after RT Paraphrasing	Notebook	N/A

2. Homoglyph Attack

Description	Notebooks	Datasets
1. Simulate Homoglyph Attack on the Watermarked Dataset	Notebook	Dataset
2. Run the Detection Algorithm on the Homoglyph Text	Part1 Part2 Part3 Merge	Part1 Part2 Part3 Merged
3. Evaluate the Accuracy, Precision, Recall and F1-score of the Detection Algorithm after applying Homoglyph Attack	Notebook	N/A
4. Counteract the Homoglyph Attack	Notebook	Dataset

3. Zero-Width Attack

Description	Notebooks	Datasets
1. Simulate Zero-width attack on watermarked text	Notebook	Dataset
2. Run the Detection Algorithm on the first 100 samples from the Zero-width Attacked Dataset	Part 1 Part 2 Part 3 Merge	Part 1 Part 2 Part 3 Merged
3. Evaluate the Accuracy, Precision, Recall and F1-score of the Detection Algorithm after Zero-Width Attack without counteracting	Notebook	N/A
4. Counteract the Zero-Width Attack	Notebook	Dataset

4. Bidirectional Reordering Attack

Description	Notebooks	Datasets
1. Simulate Bidirectional Reordering attack on watermarked text	Notebook	Dataset
2. Run the Detection Algorithm on the first fews samples from the Bidirectional Reordering Attack Dataset	Part 1	Part 1
3. Use RTL languages detector to evaluate if Bidi characters are unnecessary	Notebook	Dataset

Note: Proposed countermeasure is to detect Right-To-Left (RTL) languages and then evaluate the text for bidi reordering characters. If text is in a Left-To-Right (LTR) language and uses Bidi characters then it's flagged as manipulated.

5. Spelling Mistakes Attack (Discrete Alterations)

Description	Notebooks	Datasets
1. Simulate Spelling Mistakes Attack on watermarked text	Notebook	Dataset
2. Used a spellcheck pretrained model on the Mispelled dataset	Notebook
3. Run detection algorithm on the dataset before and after spellcheck	Before Spellcheck After Spellcheck	N/A

Note: Accuracy before spellcheck was 83% and after spellcheck it was significantly reduced to 54%. This can be due to the method we used to introduce spelling mistakes (They simulated typos such as forgetting a letter or swapping any two letters in a word), or some small hallucinations from the language model used for spellcheck.

6. Unnecessary Whitespace Attack (Tokenization Attack)

Description	Notebooks	Datasets
1. Simulate Unnecessary Whitespace Attack on watermarked text and Undo it (Disadvantages: Removes all newlines)	Notebook	Dataset
2. Remove unnecessary whitespace from dataset (watermarked and unwatermarked)	Notebook	Dataset
3. Run detection algorithm on the modified dataset	Before After	N/A

Evaluating LLM With and Without Watermark

Description	Notebooks	Datasets
1. Evaluate OPT-350M (No Watermark) using the HellaSwag Dataset	MCQ FAIL Auto-Complete FAIL Minimum Loss MMLU Notebook Adaptation MMLU Notebook Adaptation Results Analysis MMLU Adaptation with one example and Analysis MMLU one examples F1-Score, Accuracy, Precision, Recall	MCQ Auto-Complete MMLU Notebook Adaptation Output MMLU Notebook Adaptation Metrics Input MMLU Adaptation with one example
2. Evaluate OPT-350M (With Watermark) using the HellaSwag Dataset	Part 1 Part 2 Part 3 Part 4 Part 5 Part 6 Part 7 Part 8 Part 9 Part 10 Part 11 Part 12 Part 13 Part 14 Part 15 Merge and Analysis	Part 1 Part 2 Part 3 Part 4 Part 5 Part 6 Part 7 Part 8 Part 9 Part 10 Part 11 Part 12 Part 13 Part 14 Part 15 Merged
3. Evaluate OPT-350M (Without watermark) using the MMLU Dataset	MCQ Eval for all categories MMLU Analysis F1-Score, Accuracy, Precision, Recall	N/A
4. Evaluate OPT-350M (With watermark) using the MMLU Dataset	Part 1 Part 2 Part 3 Part 4 Part 5 Part 6 Part 7 Part 8 Part 9 Part 10 Part 11 Part 12 Part 13 Part 14 Part 15 Part 16 Part 17 Part 18 Part 19 Part 20 Part 21 Merge	Part 1 Part 2 Part 3 Part 4 Part 5 Part 6 Part 7 Part 8 Part 9 Part 10 Part 11 Part 12 Part 13 Part 14 Part 15 Part 16 Part 17 Part 18 Part 19 Part 20 Part 21 Merged
5. Evaluate OPT-350M (Without watermark) using the TruthfulQA Dataset	Evaluation using ROUGE, BLEU, BLEURT	N/A
6. Evaluate OPT-350M (With watermark) using the TruthfulQA Dataset	Evaluation using ROUGE, BLEU, BLEURT	N/A

Contributors

For any inquiries, please reach out to us at
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

nadatelwazane / watermarking-machine-generated-text Goto Github PK