Analysis of Synthetic Image Detection Techniques

Description

This project, developed as part of my Master’s thesis in Computer Engineering at the Universidad Complutense de Madrid (UCM), focuses on the analysis of techniques for detecting synthetic images generated by latent diffusion models. It leverages binary classifiers based on convolutional neural networks with variable ResNet architectures (ResNet50, ResNet34, ResNet18) to differentiate between real images and synthetic images. The synthetic images are generated by various diffusion models, and the classifiers are trained using several datasets including FFHQ, IMDB-WIKI, DeepFakeFace, and a custom dataset named "own," created using Stable Diffusion v2.1.

Setup and Requirements

Python 3.8 or above
PyTorch 1.7 or above
Access to a CUDA-capable GPU (Recommended)

Dataset

The dataset directory, located in the main project folder, contains three subdirectories: train, test, and val.

Train and Val: Both train and val directories are structured identically and include two subfolders:
- 0_real - Contains images from real-image datasets.
- 1_fake - Contains images from synthetic datasets.

These folders are used to train and validate the model. You can fill them with images from a specific dataset for targeted classifier training or from all available datasets for a more generalized approach.

Test: The test directory is organized differently to facilitate performance comparison across various synthetic datasets. It includes a collection for each dataset, such as:
- inpainting
- insight
- own
- text2img

Each of these subdirectories should contain further classification into 0_real and 1_fake to align with the training and validation structure and to ensure that testing is comprehensive and representative of real-world scenarios.

Ensure that all images are placed correctly according to the specified folder structure before beginning model training or evaluation. This structure is pivotal for the accurate operation of the classifiers developed in this project.

Example Tree

dataset/
├── test/
│   ├── inpainting/
│   │   ├── 0_real/
│   │   └── 1_fake/
│   ├── insight/
│   │   ├── 0_real/
│   │   └── 1_fake/
│   ├── own/
│   │   ├── 0_real/
│   │   └── 1_fake/
│   └── text2img/
│       ├── 0_real/
│       └── 1_fake/
├── train/
│   ├── 0_real/
│   └── 1_fake/
└── val/
    ├── 0_real/
    └── 1_fake/

Training

To train a classifier once the training data is set up in the dataset folder, you must choose which type of ResNet architecture you want to use (ResNet50, ResNet34, ResNet18). Depending on your choice, you will use one of the following commands:

For ResNet50, use: python train50.py
For ResNet34, use: python train34.py
For ResNet18, use: python train18.py

Important Command Line Arguments

Below are some of the key arguments that can be passed to the training script, which are also described within the code:

--name - Sets the name of the model.
--blur_prob - Sets the probability of applying blur as a data augmentation method (0.5 is an example).
--blur_sig - Sets the range of blur intensity to apply (0.0 to 3.0).
--jpg_prob - Sets the probability of applying JPEG compression as a data augmentation method (0.5 is an example).
--jpg_method - Chooses the method of JPEG compression (can be cv2, pil, or both).
--jpg_qual - Sets the range of JPEG compression quality (30 is the lowest, 100 is the highest quality, which means no compression).
--dataroot - Specifies the path to the dataset folder.
--loss_freq - Determines how often to calculate and log the loss during training.

Example Command

Here is an example command to start training with ResNet50:

python train50.py --name every50 --blur_prob 0.5 --blur_sig 0.0,3.0 --jpg_prob 0.5 --jpg_method cv2,pil --jpg_qual 30,100 --dataroot ./dataset/ --loss_freq 50

Monitoring Training with TensorBoard

To start monitoring your training sessions with TensorBoard, you need to specify the directory where your training checkpoints are stored with the name used in the training process. Here's an example command to launch TensorBoard and monitor the training process:

tensorboard --logdir=E:/Documents/Workspace/ai-generated-image-detector/checkpoints/every50

Evaluation

After training your model, the next step is to evaluate its performance. This section guides you through the process of preparing and running the evaluation.

Preparing the Model for Evaluation

Locate the Checkpoints: Navigate to the checkpoints directory where you'll find folders for different trained models. Inside each model's folder, you will see several .pth files representing the weights of the neural network at various training epochs, along with the best performing model (model_epoch_best.pth) and the most recently trained model (model_epoch_latest.pth).
Select and Prepare the Model: Typically, you would choose the best performing model for evaluation. Copy this model into the weights directory and rename it according to the model's training configuration, for instance, every50.pth for the every50 training setup.

Example:
```
cp checkpoints/every50/model_epoch_best.pth weights/every50.pth
```
Configuring the Evaluation Script: Edit the eval_config.py script to specify which model to evaluate. Set the model_path variable to point to the appropriate model in the weights directory:
```
# model
model_path = 'weights/every50.pth'
```
Running the Evaluation: Execute the evaluation script corresponding to the ResNet architecture used during training (ResNet50, ResNet34, or ResNet18). For example, to evaluate a model trained with ResNet50 architecture, use the following command:
```
python eval50.py
```

This setup ensures that you evaluate the model using the correct configuration and weights file. Follow these steps to effectively assess the performance of your trained models. Results can be found in .csv files inside the results folder.

Frequency Analysis

Frequency analysis allows inspection of image data in the frequency domain to distinguish between real and synthetic image datasets. It is crucial that the images are square and of the same size across datasets. Approximately 2000 images per dataset are recommended, as using more can lead to capacity errors, and fewer may yield ineffective results.

Preparing the Data

Before performing frequency analysis, populate the Data folder with samples from each dataset (real or synthetic) you wish to analyze. Ensure all images are square and of the same dimensions. The structure should look like this:

frequency_analysis/
├── Data/
│   ├── ffhq/
│   ├── imdb-wiki/
│   ├── inpainting/
│   ├── insight/
│   ├── own/
│   └── text2img/
├── Results/
└── src/

Running the Analysis

You can perform different types of frequency analyses using the following commands. Each analysis type uses specific parameters and may require different preprocessing steps:

Fast Fourier Transform Analysis:

python frequency_analysis.py Data Results fft --img-dirs ffhq inpainting insight own text2img --log --vmin 1e-5 --vmax 1e-1

High-Pass Fast Fourier Transform Analysis:

python frequency_analysis.py Data Results fft_hp --img-dirs ffhq inpainting insight own text2img --log --vmin 1e-5 --vmax 1e-1

Discrete Cosine Transform Analysis:

python frequency_analysis.py Data Results dct --img-dirs ffhq inpainting insight own text2img --log --vmin 1e-5 --vmax 1e-1

Density Frequency Analysis:

python frequency_analysis.py Data Results density --img-dirs ffhq inpainting insight own text2img --log --vmin 1e-5 --vmax 1e-1

These commands analyze the frequency components of images, providing insights that can help identify the artefacts of synthetic image generation methods compared to real images.

Viewing Results

After running the analysis, the results will be stored in the Results directory. Review these to understand the frequency properties and anomalies of the datasets analyzed.

Dataset Generator

The dataset_generator.py script is designed to generate synthetic images using Stable Diffusion v2.1 and can be found in the stable_diffusion folder. This allows for the creation of a diverse set of images that can be used for training machine learning models.

Configuration

You can customize the generation process by editing attributes such as race, eye color, hair length, and background settings. These attributes can be adjusted to ensure a varied dataset that covers a wide range of scenarios.

''' PARAMS '''
num_images = 600  # Total number of images you want to generate
num_rename = 10000  # Number in which naming starts

# Define attributes and weights
race_options = [('asian', 20), ('african', 20), ('hindu', 20), ('caucasian', 40)]
glasses_options = [('glasses', 30), ('no glasses', 70)]
...

The script utilizes a dynamic prompt structure to generate images, incorporating selected attributes:

# Modify prompt at will
prompt = f"a centered portrait of a {facial_expression} {race} {age_n_gender} with {eye_color}, {hair_length}, {glasses} and {freckles}, in a {background}"
return prompt

Image Distribution

After generating the synthetic images, the dataset_distribution.py script takes care of formatting and distributing them into appropriate datasets for use in training, validation, and testing phases. The distribution process ensures that all images conform to the same format and size standards, making them suitable for consistent model training and evaluation. Here's how the images are distributed:

Training dataset: 70% of the images are allocated here to provide a robust training base.
Validation dataset: 15% of the images are used for model tuning and hyperparameter adjustments during the validation phase.
Test dataset: The remaining 15% are reserved for final model testing to evaluate performance metrics.

This structured distribution aids in maintaining a balanced approach to model training and ensures that each phase of the model lifecycle is properly supported by adequate data.

License

This project is subject to copyright and licensing terms as established by the Faculty of Computer Sciences and the Universidad Complutense de Madrid (UCM). Usage, distribution, and modification are regulated under these terms, with the intent of promoting open and collaborative academic and educational practices. Additionally, this project incorporates elements from other shared projects, adhering to and respecting their respective licensing agreements. For detailed information regarding the license, please refer to the official documentation provided by the faculty.

Acknowledgments

I would like to express my gratitude to the authors of the various open-source projects that have been utilized in this study. Notably, this project incorporates code and methods from projects whose generous sharing on platforms like GitHub has significantly facilitated this research. Specific acknowledgments are extended to:

for their CNN classifier training structure developed in their project "CNN-generated images are surprisingly easy to spot...for now" and to:

for their frequency analysis tool developed in their project "Towards the Detection of Diffusion Model Deepfakes".

Additionally, special thanks to the team behind the "Robustness and Generalizability of Deepfake Detection: A Study with Diffusion Models" project for providing the "DeepFakeFace" dataset, which has been integral to the development of this project.

In compliance with their respective licensing agreements, this project adheres to and respects the licenses of these third-party projects.

Authors

Daniel Cabanas: https://github.com/cabannas

cabannas / analysis-of-synthetic-image-detection-techniques Goto Github PK