Coder Social home page Coder Social logo

vaimaster / art-unveiled-enhancing-access-to-arts-for-the-visually-impaired-a.u.e.b. Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 1.0 1.52 GB

The goal of the current project is to train an object detection model using various art images in order to acquire a final optimised model that will be then used to provide labels for art images which will then be imported to NLP and audio to text tools so that a final audio recording with the description can be produced.

Jupyter Notebook 100.00%

art-unveiled-enhancing-access-to-arts-for-the-visually-impaired-a.u.e.b.'s Introduction

Art Unveiled: Enhancing Access to Arts for the Visually Impaired

The goal of the current project is to train an object detection model using various art images in order to acquire a final optimised model that will be then used to provide labels for art images which will then be imported to NLP and audio to text tools so that a final audio recording with the description can be produced.

This is a collaborative project for Machine Learning and Context Analytics lesson in Business Analytics at A.U.E.B. Part Time 2022-2024.

It is essential to run this project using google colab. You need to create a new folder in your google drive named all_data_final in which you need to put all the necessary subfolders from the project, in order to execute the .ipynb file.

In order to execute this model using your own images, you need to create a directory /content/drive/MyDrive/all_data_final/images and /content/drive/MyDrive/all_data_final/labels which will be in a YOLO format.

In the detect/tune folder, there are:
  • the weights of the best and the last model that was tuned.
  • The best_hyperparameters.yaml file contains all the tuned hyperparameters
  • the rest files are some plots which indicate how the tuning process took place.
  • In the final_models folder, there are:
  • 3 final models which were trained using the hyperparameters above and in the end, final_mod_yolov8m_afSiLU_optSGD_epochs50/weights/best.pt model was selected for prediction.
  • In the predict folder, there are:
  • XXXXX.jpg files corresponds to the images that we predicted.
  • XXXXX.txt files corresponds to the actual labels.
  • XXXXX_predicted_results.txt files corresponds to the predicted labels.
  • XXXXX_description.txt files corresponds to the question that we made to google Bard in order to get an answer description.
  • XXXXX_desired_response.txt files corresponds to the answer of google Bard.
  • XXXXX_example.mp3 files corresponds to the audio description file for an image.
  • In the train folder, there is our training dataset with:
  • All the images used to train and tune our models
  • All the corresponding labels of the training images
  • In the trained_models folder, there are:
  • 36 models which were trained in order to find the best model before tuning.
  • In the val folder, there is our validation dataset with:
  • All the images used to evaluate our models
  • All the corresponding labels of the evaluated images
  • The remaining files are explained below:
  • best_model_final.pt is a torch.save of our model, containing both weights and other parameters
  • best_model_info.csv is a .csv file containing 4 metrics for the 3 final models which are located in final_models directory.
    We decided which would be the final model for predictions based on the mean of those 4 metrics(mean precision, mean recall, mean Average Precision (mAP) at an IoU threshold of 0.5 and mean Average Precision (mAP) over IoU thresholds of 0.5 - 0.95 in steps of 0.05)
  • best_model_info.xlsx is a .xlsx file with the same data as best_model_info.csv file.
  • classes.txt is a .txt file containing all the available classes of our model.
  • data_custom.yaml is a .yaml file containing some basic arguments for our initial models.
  • data_custom_final.yaml is a .yaml file containing some basic arguments for our final models combined with the tuned hyperparameters located in detect/tune/best_hyperparameters.yaml.
  • model_info.csv is a .csv file containing 4 metrics for the 36 initial models which are located in trained_models directory.
    We decided which is the final model for tuning based on the mean of those 4 metrics(mean precision, mean recall, mean Average Precision (mAP) at an IoU threshold of 0.5 and mean Average Precision (mAP) over IoU thresholds of 0.5 - 0.95 in steps of 0.05).
  • model_info.xlsx is a .xlsx file with the same data as model_info.csv file.
  • notes.json is a .json file containing all categories ids with the corresponding names, provided by Label Studio.
  • Example Image with Text and Audio Description

    Image

    "Portrait of Felix Auerbach" by Edvard Munch
    "Portrait of Felix Auerbach" by Edvard Munch

    Text

    A man is wearing a jacket, suit, and tie. He has a moustache, beard, and cigarette in his hand. His collar is turned up and he has a bow tie on. His beard is neatly trimmed. He is standing in front of a wall.

    Audio

    Click here to download and play the audio file.

    art-unveiled-enhancing-access-to-arts-for-the-visually-impaired-a.u.e.b.'s People

    Contributors

    vaimaster avatar

    Stargazers

    Tuğberk Şentepe avatar  avatar

    Watchers

     avatar

    Forkers

    ykeremcann

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.