Art Unveiled: Enhancing Access to Arts for the Visually Impaired

The goal of the current project is to train an object detection model using various art images in order to acquire a final optimised model that will be then used to provide labels for art images which will then be imported to NLP and audio to text tools so that a final audio recording with the description can be produced.

This is a collaborative project for Machine Learning and Context Analytics lesson in Business Analytics at A.U.E.B. Part Time 2022-2024.

It is essential to run this project using google colab. You need to create a new folder in your google drive named all_data_final in which you need to put all the necessary subfolders from the project, in order to execute the .ipynb file.

In order to execute this model using your own images, you need to create a directory /content/drive/MyDrive/all_data_final/images and /content/drive/MyDrive/all_data_final/labels which will be in a YOLO format.

In the detect/tune folder, there are:

the weights of the best and the last model that was tuned.

The best_hyperparameters.yaml file contains all the tuned hyperparameters

the rest files are some plots which indicate how the tuning process took place.

In the final_models folder, there are:

3 final models which were trained using the hyperparameters above and in the end, final_mod_yolov8m_afSiLU_optSGD_epochs50/weights/best.pt model was selected for prediction.

In the predict folder, there are:

XXXXX.jpg files corresponds to the images that we predicted.

XXXXX.txt files corresponds to the actual labels.

XXXXX_predicted_results.txt files corresponds to the predicted labels.

XXXXX_description.txt files corresponds to the question that we made to google Bard in order to get an answer description.

XXXXX_desired_response.txt files corresponds to the answer of google Bard.

XXXXX_example.mp3 files corresponds to the audio description file for an image.

In the train folder, there is our training dataset with:

All the images used to train and tune our models

All the corresponding labels of the training images

In the trained_models folder, there are:

36 models which were trained in order to find the best model before tuning.

In the val folder, there is our validation dataset with:

All the images used to evaluate our models

All the corresponding labels of the evaluated images

The remaining files are explained below:

best_model_final.pt is a torch.save of our model, containing both weights and other parameters

best_model_info.csv is a .csv file containing 4 metrics for the 3 final models which are located in final_models directory.
We decided which would be the final model for predictions based on the mean of those 4 metrics(mean precision, mean recall, mean Average Precision (mAP) at an IoU threshold of 0.5 and mean Average Precision (mAP) over IoU thresholds of 0.5 - 0.95 in steps of 0.05)

best_model_info.xlsx is a .xlsx file with the same data as best_model_info.csv file.

classes.txt is a .txt file containing all the available classes of our model.

data_custom.yaml is a .yaml file containing some basic arguments for our initial models.

data_custom_final.yaml is a .yaml file containing some basic arguments for our final models combined with the tuned hyperparameters located in detect/tune/best_hyperparameters.yaml.

model_info.csv is a .csv file containing 4 metrics for the 36 initial models which are located in trained_models directory.
We decided which is the final model for tuning based on the mean of those 4 metrics(mean precision, mean recall, mean Average Precision (mAP) at an IoU threshold of 0.5 and mean Average Precision (mAP) over IoU thresholds of 0.5 - 0.95 in steps of 0.05).

model_info.xlsx is a .xlsx file with the same data as model_info.csv file.

notes.json is a .json file containing all categories ids with the corresponding names, provided by Label Studio.

Example Image with Text and Audio Description

Image

"Portrait of Felix Auerbach" by Edvard Munch

Text

A man is wearing a jacket, suit, and tie. He has a moustache, beard, and cigarette in his hand. His collar is turned up and he has a bow tie on. His beard is neatly trimmed. He is standing in front of a wall.

Audio

Click here to download and play the audio file.

vaimaster / art-unveiled-enhancing-access-to-arts-for-the-visually-impaired-a.u.e.b. Goto Github PK