Coder Social home page Coder Social logo

ai-image-hackathon's Introduction

AI - Image Master Hackathon Open In Colab

banner

[!Prompt] Arab man looking at camera standing besides (Toyota Corolla Cross XLE 2022: 1) (embedding:crpht-4300:0.9) in a well lit garage (embedding:ziprealism:1) (embedding:OverallDetailXL:1) Objective: Generate visually stunning and contextually apporiate product images that strictly adhers to the industrial design of the product The AI should.

Subject: Toyota Corolla Cross XLE 2022

Approach

Step 1 (Gather Training Data):


Scrape high quality images of the subject

  • Sources are located in scrape_resources Directory
  • Images are extracted using script scrape_images.py
    • The script uses regex to extract image URL from page sources and download the images using requests library
  • Downloaded Images are located in scraped_images directory
  • Few images are selected for training and copied to train-images directory

Step 2 (Training Preparation)

The selected images needs to be prepared for Training

  • Images needs to captioned
  • For captioning I have used GPT-4o Vision capabilites
  • Images are sent to GPT-4o with crafted prompts to generate captions
  • Generated captions are saved in train-images directory with same name as images but with .txt extension
  • The code to caption images is written in caption_images.py file
  • TODO: Due to time and resource constract I am not able to generate regularization images it will increase effectiveness of the trained model

Step 3 (Training)

For training the model I have tried to create a LoRA Model

  • The model was trained on following hardware training-mc-profile
  • Training was done using kohya_ss
  • Juggernaut XL is used as base model to train LoRA
  • Training parameters are saved in kohya_ss_config.json
  • Results of training are saved in models/checkpoint directory training-result

LoRA Evaluation

Following are result of LoRA evaluation

  • lora-eval01

  • lora-eval02

  • For different weight of the LoRA we can draw following observations

    • The LoRA can consistently produce images of the subject ie. Toyota Corolla Cross XLE 2022
    • Higher values of the LoRA blows the images
    • The best range of operation is from 0.1 to 0.25
    • The Trained is clearly overfitted and further tuning model parameters will produce more prominent results

Model Execution Model

Future Scope

  • Exploring the possibility to utilize ADetailer to improve Character faces

    • ADetailer is an extension for the stable diffusion webui that does automatic masking and inpainting of spcefic features like Character's face and add details to it
  • Exploring Depth Map ControlNet

  • Create Depth Map from the image

  • depth-image

  • Depth map contains imformation about depth of the image lighter shades are near to camera and dark shades are far from Camera

  • The Depth map can be passed to control net to influence placement of objects in the seen

ai-image-hackathon's People

Contributors

techie-subhadeep avatar

Stargazers

 avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.