Coder Social home page Coder Social logo

watermarking-machine-generated-text's Introduction

Watermarking Machine-Generated Text

This repository contains the code for our senior project "Watermarking Machine-Generated Text".

Final Notebook
Notebook

No Attacks

Description Notebooks Datasets
1. Create a dataset of Unwatermarked Text Notebook Dataset
2. Test Watermarking and Detection Algorithms Logits Deviation with Green-Red List
Sampling With Randomized Numbers using TinyLlama
Logits Deviation with Randomized Numbers using OPT-350M
N/A
3. Automate Watermarking Using the Sunbird Dataset Part 1
Part 2
Part 3
Merge
Part 1
Part 2
Part 3
Merged
4. Merge Unwatermarked and Watermarked Datasets Merge
Truncate
Merged
Truncated
5. Run the Detection Algorithm on the first 1200 rows of the Truncated Dataset Part1
Part2
Part3
Merge
Part1
Part2
Part3
Merged
6. Evaluate the Accuracy, Precision, Recall and F1-score of the Detection Algorithm Notebook N/A

Simulating and Counteracting Attacks

1. Paraphrasing Attack

Description Notebooks Datasets
1. Experimented paraphrasing using Dipper Notebook N/A
2. Experimented paraphrasing using T5 on the Watermarked Dataset Notebook Dataset
3. Run the Detection Algorithm on the Paraphrased Dataset Part1
Part2
Part3
Merge
Part1
Part2
Part3
Merged
4. Evaluate the Accuracy, Precision, Recall and F1-score of the Detection Algorithm after Paraphrasing Notebook N/A
5. Experimented paraphrasing using Roundtrip paraphrasing (English to French to English) on the Watermarked Dataset Notebook Dataset
6. Run the Detection Algorithm on the Paraphrased Dataset Part1
Part2
Part3
Merge
Part1
Part2
Part3
Merged
7. Evaluate the Accuracy, Precision, Recall and F1-score of the Detection Algorithm after RT Paraphrasing Notebook N/A

2. Homoglyph Attack

Description Notebooks Datasets
1. Simulate Homoglyph Attack on the Watermarked Dataset Notebook Dataset
2. Run the Detection Algorithm on the Homoglyph Text Part1
Part2
Part3
Merge
Part1
Part2
Part3
Merged
3. Evaluate the Accuracy, Precision, Recall and F1-score of the Detection Algorithm after applying Homoglyph Attack Notebook N/A
4. Counteract the Homoglyph Attack Notebook Dataset

3. Zero-Width Attack

Description Notebooks Datasets
1. Simulate Zero-width attack on watermarked text Notebook Dataset
2. Run the Detection Algorithm on the first 100 samples from the Zero-width Attacked Dataset Part 1
Part 2
Part 3
Merge
Part 1
Part 2
Part 3
Merged
3. Evaluate the Accuracy, Precision, Recall and F1-score of the Detection Algorithm after Zero-Width Attack without counteracting Notebook N/A
4. Counteract the Zero-Width Attack Notebook Dataset

4. Bidirectional Reordering Attack

Description Notebooks Datasets
1. Simulate Bidirectional Reordering attack on watermarked text Notebook Dataset
2. Run the Detection Algorithm on the first fews samples from the Bidirectional Reordering Attack Dataset Part 1 Part 1
3. Use RTL languages detector to evaluate if Bidi characters are unnecessary Notebook Dataset

Note: Proposed countermeasure is to detect Right-To-Left (RTL) languages and then evaluate the text for bidi reordering characters. If text is in a Left-To-Right (LTR) language and uses Bidi characters then it's flagged as manipulated.

5. Spelling Mistakes Attack (Discrete Alterations)

Description Notebooks Datasets
1. Simulate Spelling Mistakes Attack on watermarked text Notebook Dataset
2. Used a spellcheck pretrained model on the Mispelled dataset Notebook
3. Run detection algorithm on the dataset before and after spellcheck Before Spellcheck
After Spellcheck
N/A

Note: Accuracy before spellcheck was 83% and after spellcheck it was significantly reduced to 54%. This can be due to the method we used to introduce spelling mistakes (They simulated typos such as forgetting a letter or swapping any two letters in a word), or some small hallucinations from the language model used for spellcheck.

6. Unnecessary Whitespace Attack (Tokenization Attack)

Description Notebooks Datasets
1. Simulate Unnecessary Whitespace Attack on watermarked text and Undo it (Disadvantages: Removes all newlines) Notebook Dataset
2. Remove unnecessary whitespace from dataset (watermarked and unwatermarked) Notebook Dataset
3. Run detection algorithm on the modified dataset Before
After
N/A

Evaluating LLM With and Without Watermark

Description Notebooks Datasets
1. Evaluate OPT-350M (No Watermark) using the HellaSwag Dataset MCQ FAIL
Auto-Complete FAIL
Minimum Loss
MMLU Notebook Adaptation
MMLU Notebook Adaptation Results Analysis
MMLU Adaptation with one example and Analysis
MMLU one examples F1-Score, Accuracy, Precision, Recall
MCQ
Auto-Complete
MMLU Notebook Adaptation Output
MMLU Notebook Adaptation Metrics Input
MMLU Adaptation with one example
2. Evaluate OPT-350M (With Watermark) using the HellaSwag Dataset Part 1
Part 2
Part 3
Part 4
Part 5
Part 6
Part 7
Part 8
Part 9
Part 10
Part 11
Part 12
Part 13
Part 14
Part 15
Merge and Analysis
Part 1
Part 2
Part 3
Part 4
Part 5
Part 6
Part 7
Part 8
Part 9
Part 10
Part 11
Part 12
Part 13
Part 14
Part 15
Merged
3. Evaluate OPT-350M (Without watermark) using the MMLU Dataset MCQ Eval for all categories
MMLU Analysis F1-Score, Accuracy, Precision, Recall
N/A
4. Evaluate OPT-350M (With watermark) using the MMLU Dataset Part 1
Part 2
Part 3
Part 4
Part 5
Part 6
Part 7
Part 8
Part 9
Part 10
Part 11
Part 12
Part 13
Part 14
Part 15
Part 16
Part 17
Part 18
Part 19
Part 20
Part 21
Merge
Part 1
Part 2
Part 3
Part 4
Part 5
Part 6
Part 7
Part 8
Part 9
Part 10
Part 11
Part 12
Part 13
Part 14
Part 15
Part 16
Part 17
Part 18
Part 19
Part 20
Part 21
Merged
5. Evaluate OPT-350M (Without watermark) using the TruthfulQA Dataset Evaluation using ROUGE, BLEU, BLEURT N/A
6. Evaluate OPT-350M (With watermark) using the TruthfulQA Dataset Evaluation using ROUGE, BLEU, BLEURT N/A

Contributors

For any inquiries, please reach out to us at
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

watermarking-machine-generated-text's People

Contributors

nadatelwazane avatar mariemmostafa avatar nadaehab31 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.