This GitHub repository contains a training and testing dataset for a near-duplicate image detection model using convolutional auto encoders (CAE) and convolutional neural networks (CNN). The dataset includes a set of images that are labeled as either original or near-duplicate.
The repository includes two main folders:
- train folder which contains a set of original and near-duplicate images that are used to train the model.
- test folder which contains a set of original and near-duplicate images that are used to evaluate the performance of the model.
The images in the dataset are in JPG format. The images in this dataset were collected from various sources and they are diverse in terms of the type of image, the subject matter and the resolution.
The purpose of this dataset is to train and evaluate a near-duplicate image detection model using CAE and CNN, which can be used to identify near-duplicate images in a given dataset. This dataset can be useful for researchers and practitioners working on image processing, computer vision, and machine learning.