aguschin / art-guide Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 2.0 9.15 MB

Your guide in the world of art

License: MIT License

Python 4.61% Dockerfile 0.04% Jupyter Notebook 95.35% Shell 0.01%

art-guide's People

Contributors

Stargazers

Watchers

Forkers

aniervs abuzarmd-ml

art-guide's Issues

Calculating embeddings for reference dataset

Calculate embeddings! 📚

After creating test for reverse image search, we need to test the meta data is Dict[str, str]

Scraping Descriptions for Paintings from DuckDuckGO

Issue Description

We've noticed that approximately 2.5% of the paintings in our art guide project sourced from wikiart.org lack descriptions. To enhance the quality and usefulness of our project, we propose a solution to generate descriptions for these paintings automatically.

Problem

A significant portion of the paintings in our project doesn't have any description, which limits the information available to users.

Proposed Solution

We suggest using a combination of web search and an open-source Language Model (LLM) to generate descriptions for paintings with missing information. We have conducted an initial experiment to demonstrate the feasibility of this solution. You can view the experiment results here.

Add formatters and, maybe, some linters

Use pre-commit
Add CI that checks for this and prompts for your fixes

took this https://github.com/iterative/gto/blob/main/.pre-commit-config.yaml and removed all extra:

default_language_version:
  python: python3
repos:
  - repo: 'https://github.com/pre-commit/pre-commit-hooks'
    rev: v4.4.0
    hooks:
      - id: check-added-large-files
      - id: check-case-conflict
      - id: check-docstring-first
      - id: check-executables-have-shebangs
      - id: check-toml
      - id: check-merge-conflict
      - id: check-yaml
        exclude: examples/layouts
      - id: debug-statements
      - id: end-of-file-fixer
      - id: mixed-line-ending
      - id: sort-simple-yaml
      - id: trailing-whitespace
  - repo: 'https://github.com/psf/black'
    rev: 23.9.1
    hooks:
      - id: black
  - repo: 'https://github.com/PyCQA/isort'
    rev: 5.12.0
    hooks:
      - id: isort
        args: ["--profile", "black"]

Debug reverse image search and improve its quality

Plan:

Measure the search quality for real life photos
Debug search
Find a way to improve its quality and measure the improvement

Reverse image search fails to find right artworks, as illustrated by the screenshots:

Sanity check:

Updating dataset

Problem Description:

We have identified multiple areas for improvement in the handling of our dataset sourced from Kaggle:

Issues:

Correct Author Identification:
- Current State: In the data.csv and artists.csv files, artists are stored using their names as the primary key. This approach risks misidentifying authors who have similar or identical names.
- Proposed Solution: To mitigate this issue, we suggest adding an author_link column to the schema, which will provide a unique identifier for each author.
Expand Image Collection:
- Current State: For art pieces that have multiple images available on the source website, only one image is currently being downloaded.
- Proposed Solution: To enhance our dataset and potentially improve outcomes in tasks like reverse image search, we propose modifying the data extraction logic to download multiple images per art piece when available.
Incorporate Additional Text Data:
- Current State: The source website offers pages dedicated to styles and genres, as well as detailed artist biographies. This data is currently not being captured.
- Proposed Solution: Extend our data extraction methodology to include these additional text sources, as they can provide rich context and possibly enhance data analysis tasks.

Action Items:

Add an author_link column to data.csv and artists.csv to improve author identification.
Modify data extraction logic to download all available images per art piece.
Extend data extraction to include additional text data like styles, genres, and artist biographies.

Add rotation and Readme for cropper

Add Readme info for the distorion cropper and add Rotation support o it

Use cache to skip generating description and audio

To speed up the answer in the telegram bot, we could be generating descriptions and audios in prior (in some job in CI/CD or Airflow, as an option). But this has two drawbacks: (1) you need to implement this, which take time, and (2) once you want to-regenerate all the descriptions, it will take a while (days on our server), which makes feedback loop too long - if you want to see how new approach to the generation works, you'll need to wait or implement some calculation for part of the data, which again takes additional effort.

The next option to speed up answers is to use cache. We have cache/ dir where generated audios are put, so now we need to make use of that. One of the options is to split the request to the REST API server with ML model in https://github.com/aguschin/art-guide/blob/main/bot.py: search for the result, check if the audio exists in the cache, generate if not. This will give us higher flexibility in updating the method of generation and getting feedback faster.

One subproblem here: we need to version generated audios, so we know that after updating the generation mechanism we re-generate audio. For this, we can add some VERSION variable to description generation part, and save audio in cache/$VERSION/ folder, e.g. cache/1/Da_Vinci__Mona_Lisa.wav

Debug why art guide returns audio about the wrong painting

If you try sending a picture from the reference database, the bot seem to find the wrong picture (although it finds something). Probably, we have some bug in matching the search result to the text description in the dataframe.

Generate description with llava or openai

https://github.com/haotian-liu/LLaVA/blob/main/docs/MODEL_ZOO.md
https://github.com/haotian-liu/LLaVA#cli-inference

output: k-nearest images/metadata to the input image

Create a script checking that telegram bot works

To monitor telegram bot health, we need an external way to make sure we get notified when something is broken.

As a first step, I suggest creating a test script that will take some images from reference database, send them to the telegram bot and check that the responses are correct. This should check whether:

telegram bot is online
we get right answers for images that are compressed by telegram, cropped by our cropping module, and searched by our reverse-image-search module

Basically, this is a simple end-to-end test that can be enhanced later: like send an alert to us (a message in telegram) stating that the bot is down. To interact with out bot from Python runtime, we need to use something like https://docs.telethon.dev/en/stable/basic/quick-start.html

Added multiple verbose

Don't repeat same information more than few times in nearby audio descriptions

If you'll go into Picasso museum, you'd be pretty tired to hear the same parts about his biography for each artwork, so would be really great if we avoid repeating the same information more than few times.

Few considerations:

Taking this into account requires storing history of requests
For demos and occasional usage this can hurt experience. Maybe we should only count information that was told in a short time period, like in a day, but if we told about Picasso bio yesterday, it's OK to repeat it today again.

Validate and Refine Descriptions parsed from DuckDuckGo API

In continuation of our efforts to enhance the quality of generated descriptions for paintings without information (#62), we are now addressing concerns related to the occasional retrieval of irrelevant data from the DuckDuckGo API.

Solution Strategy:

To mitigate the issue, we propose the following steps:

Filter by Link Occurrence: Implement a filter to assess the relevance of the fetched data. If a link to a wiki article occurs more than once, we can reasonably assume that the article is not specific to the intended painting.
Binary Classification with Mistral: Utilize the open-source Language Model (LLM) Mistral for binary classification. This step aims to determine the relevance of the retrieved articles for the respective art pieces.

Fix image conversion/format issue

output: intermediate images for cropping

Perspective Transformation

TRansform an image with difernt perspective to a rectangle

Tests for embeddings 1

Pressition recall
logs

Fix aspect ratio

Create a script to test reverse image search quality

Reverse image search is a crucial part of the app. To improve it, we need a way to measure the quality of the search.

As a first iteration, I suggest creating a test set like this: take some amount of images from the reference database, augment them to resemble "taking a photo in art gallery" scenario, run search and measure top-1 accuracy (whether we found the right reference or not).

Also please consider whether this should be done via telegram bot or directly by invoking the reverse image search script. First is closer to the actual metric (we deal with telegram-compressed images and we can spot our pipeline errors this way), Second is faster and easier to solve. Second option should be better since we're benchmarking the search quality and don't need to actually run telegram bot with updated algorithm, but I'm open to discussing pros and cons here.

Output steps to telegram bot

Add missing text descriptions for images

For 200k images we have, only ~5% have extended text descriptions (additional to author, name, year, and some other rather consize information). For art-guide to work well, it's crucial to get extended descriptions for the rest 95% art pieces.