Coder Social home page Coder Social logo

art-guide's People

Contributors

abuzarmd-ml avatar aguschin avatar aniervs avatar arielxx avatar daisymint avatar dinma-daniel avatar michaelvin1322 avatar mjason98 avatar mohammadsanaee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

art-guide's Issues

Scraping Descriptions for Paintings from DuckDuckGO

Issue Description

We've noticed that approximately 2.5% of the paintings in our art guide project sourced from wikiart.org lack descriptions. To enhance the quality and usefulness of our project, we propose a solution to generate descriptions for these paintings automatically.

Problem

A significant portion of the paintings in our project doesn't have any description, which limits the information available to users.

Proposed Solution

We suggest using a combination of web search and an open-source Language Model (LLM) to generate descriptions for paintings with missing information. We have conducted an initial experiment to demonstrate the feasibility of this solution. You can view the experiment results here.

Add formatters and, maybe, some linters

  • Use pre-commit
  • Add CI that checks for this and prompts for your fixes

took this https://github.com/iterative/gto/blob/main/.pre-commit-config.yaml and removed all extra:

default_language_version:
  python: python3
repos:
  - repo: 'https://github.com/pre-commit/pre-commit-hooks'
    rev: v4.4.0
    hooks:
      - id: check-added-large-files
      - id: check-case-conflict
      - id: check-docstring-first
      - id: check-executables-have-shebangs
      - id: check-toml
      - id: check-merge-conflict
      - id: check-yaml
        exclude: examples/layouts
      - id: debug-statements
      - id: end-of-file-fixer
      - id: mixed-line-ending
      - id: sort-simple-yaml
      - id: trailing-whitespace
  - repo: 'https://github.com/psf/black'
    rev: 23.9.1
    hooks:
      - id: black
  - repo: 'https://github.com/PyCQA/isort'
    rev: 5.12.0
    hooks:
      - id: isort
        args: ["--profile", "black"]

Debug reverse image search and improve its quality

Plan:

  • Measure the search quality for real life photos
  • Debug search
  • Find a way to improve its quality and measure the improvement

Reverse image search fails to find right artworks, as illustrated by the screenshots:

Sanity check:

Updating dataset

Problem Description:

We have identified multiple areas for improvement in the handling of our dataset sourced from Kaggle:


Issues:

  1. Correct Author Identification:

    • Current State: In the data.csv and artists.csv files, artists are stored using their names as the primary key. This approach risks misidentifying authors who have similar or identical names.
    • Proposed Solution: To mitigate this issue, we suggest adding an author_link column to the schema, which will provide a unique identifier for each author.
  2. Expand Image Collection:

    • Current State: For art pieces that have multiple images available on the source website, only one image is currently being downloaded.
    • Proposed Solution: To enhance our dataset and potentially improve outcomes in tasks like reverse image search, we propose modifying the data extraction logic to download multiple images per art piece when available.
  3. Incorporate Additional Text Data:

    • Current State: The source website offers pages dedicated to styles and genres, as well as detailed artist biographies. This data is currently not being captured.
    • Proposed Solution: Extend our data extraction methodology to include these additional text sources, as they can provide rich context and possibly enhance data analysis tasks.

Action Items:

  • Add an author_link column to data.csv and artists.csv to improve author identification.
  • Modify data extraction logic to download all available images per art piece.
  • Extend data extraction to include additional text data like styles, genres, and artist biographies.

Use cache to skip generating description and audio

To speed up the answer in the telegram bot, we could be generating descriptions and audios in prior (in some job in CI/CD or Airflow, as an option). But this has two drawbacks: (1) you need to implement this, which take time, and (2) once you want to-regenerate all the descriptions, it will take a while (days on our server), which makes feedback loop too long - if you want to see how new approach to the generation works, you'll need to wait or implement some calculation for part of the data, which again takes additional effort.

The next option to speed up answers is to use cache. We have cache/ dir where generated audios are put, so now we need to make use of that. One of the options is to split the request to the REST API server with ML model in https://github.com/aguschin/art-guide/blob/main/bot.py: search for the result, check if the audio exists in the cache, generate if not. This will give us higher flexibility in updating the method of generation and getting feedback faster.

One subproblem here: we need to version generated audios, so we know that after updating the generation mechanism we re-generate audio. For this, we can add some VERSION variable to description generation part, and save audio in cache/$VERSION/ folder, e.g. cache/1/Da_Vinci__Mona_Lisa.wav

Create a script checking that telegram bot works

To monitor telegram bot health, we need an external way to make sure we get notified when something is broken.

As a first step, I suggest creating a test script that will take some images from reference database, send them to the telegram bot and check that the responses are correct. This should check whether:

  • telegram bot is online
  • we get right answers for images that are compressed by telegram, cropped by our cropping module, and searched by our reverse-image-search module

Basically, this is a simple end-to-end test that can be enhanced later: like send an alert to us (a message in telegram) stating that the bot is down. To interact with out bot from Python runtime, we need to use something like https://docs.telethon.dev/en/stable/basic/quick-start.html

Don't repeat same information more than few times in nearby audio descriptions

If you'll go into Picasso museum, you'd be pretty tired to hear the same parts about his biography for each artwork, so would be really great if we avoid repeating the same information more than few times.

Few considerations:

  • Taking this into account requires storing history of requests
  • For demos and occasional usage this can hurt experience. Maybe we should only count information that was told in a short time period, like in a day, but if we told about Picasso bio yesterday, it's OK to repeat it today again.

Validate and Refine Descriptions parsed from DuckDuckGo API

In continuation of our efforts to enhance the quality of generated descriptions for paintings without information (#62), we are now addressing concerns related to the occasional retrieval of irrelevant data from the DuckDuckGo API.

Solution Strategy:

To mitigate the issue, we propose the following steps:

  1. Filter by Link Occurrence: Implement a filter to assess the relevance of the fetched data. If a link to a wiki article occurs more than once, we can reasonably assume that the article is not specific to the intended painting.

  2. Binary Classification with Mistral: Utilize the open-source Language Model (LLM) Mistral for binary classification. This step aims to determine the relevance of the retrieved articles for the respective art pieces.

Create a script to test reverse image search quality

Reverse image search is a crucial part of the app. To improve it, we need a way to measure the quality of the search.

As a first iteration, I suggest creating a test set like this: take some amount of images from the reference database, augment them to resemble "taking a photo in art gallery" scenario, run search and measure top-1 accuracy (whether we found the right reference or not).

Also please consider whether this should be done via telegram bot or directly by invoking the reverse image search script. First is closer to the actual metric (we deal with telegram-compressed images and we can spot our pipeline errors this way), Second is faster and easier to solve. Second option should be better since we're benchmarking the search quality and don't need to actually run telegram bot with updated algorithm, but I'm open to discussing pros and cons here.

Add missing text descriptions for images

For 200k images we have, only ~5% have extended text descriptions (additional to author, name, year, and some other rather consize information). For art-guide to work well, it's crucial to get extended descriptions for the rest 95% art pieces.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.