Coder Social home page Coder Social logo

boxdiff's People

Contributors

sierkinhane avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

boxdiff's Issues

Problem with Tkinter library

I'm trying to replicate the setup and the examples that you show on the Readme, and i am currently having problems with the Tkinter library used in utils/drawer. The code crashes when trying to import the Library. I have been able to go round this problem commenting the calls from draw_rectangle and DashedImageDraw and not importing anything from utils.drawer, but i'm just wondering if there is something else to setup to make Tkinter available. Thanks

About the degraded result

Hi, thanks for your interesting work, and I have a try. However, the quality of the generated image is unsatisfactory.
image
Given the prompt "A dog plays a ball, a cat is sleeping" and layouts of each subject, we cannot obtain results according to the layouts.

Could you share the weights of the YOLOv4 model pretrained with COCO-stuff dataset?

First, thank you for your wonderful research.
It seems that you have conducted your research with the YOLOv4 model trained with COCO-stuff dataset.
I'm aiming to reproduce your paper's results on YOLO mAP score with the COCO-stuff dataset.
I'd like to know where you have downloaded the pretrained weights. Or if you have trained it yourself, could you please provide
the pretrained weights to test your model with? I've been struggling to find YOLO model trained with COCO-stuff dataset. (Not the one with COCO, 81 classes)
Thank you in advance.

GLIGEN vs BoxDiff

Hi. In the paper you're mentioning that BoxDiff can work as a plug and play with GLIGEN. But I want to ask if you can provide more details. Don't the two projects do the same thing?

I integrated BoxDiff into diffusers

Feel free to check out https://github.com/huggingface/diffusers/tree/main/examples/community#stable-diffusion-boxdiff

Example use case:

import torch
from PIL import Image, ImageDraw
from copy import deepcopy

from examples.community.pipeline_stable_diffusion_boxdiff import StableDiffusionBoxDiffPipeline

def draw_box_with_text(img, boxes, names):
    colors = ["red", "olive", "blue", "green", "orange", "brown", "cyan", "purple"]
    img_new = deepcopy(img)
    draw = ImageDraw.Draw(img_new)

    W, H = img.size
    for bid, box in enumerate(boxes):
        draw.rectangle([box[0] * W, box[1] * H, box[2] * W, box[3] * H], outline=colors[bid % len(colors)], width=4)
        draw.text((box[0] * W, box[1] * H), names[bid], fill=colors[bid % len(colors)])
    return img_new

pipe = StableDiffusionBoxDiffPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1-base",
    torch_dtype=torch.float16,
)
pipe.to("cuda")

# example 1
prompt = "as the aurora lights up the sky, a herd of reindeer leisurely wanders on the grassy meadow, admiring the breathtaking view, a serene lake quietly reflects the magnificent display, and in the distance, a snow-capped mountain stands majestically, fantasy, 8k, highly detailed"
phrases = [
    "aurora",
    "reindeer",
    "meadow",
    "lake",
    "mountain"
]
boxes = [[1,3,512,202], [75,344,421,495], [1,327,508,507], [2,217,507,341], [1,135,509,242]]

# example 2
# prompt = "A rabbit wearing sunglasses looks very proud"
# phrases = ["rabbit", "sunglasses"]
# boxes = [[67,87,366,512], [66,130,364,262]]

boxes = [[x / 512 for x in box] for box in boxes]

images = pipe(
    prompt,
    boxdiff_phrases=phrases,
    boxdiff_boxes=boxes,
    boxdiff_kwargs={
        "attention_res": 16,
        "normalize_eot": True
    },
    num_inference_steps=50,
    guidance_scale=7.5,
    generator=torch.manual_seed(42),
    safety_checker=None
).images

draw_box_with_text(images[0], boxes, phrases).save("output.png")

python version

I use python3.8.18. (conda create -n boxdiff python=3.8)
After pip3 install -r requirements.txt and install diffusers using pip3 install -e .,

run CUDA_VISIBLE_DEVICES=0 python3 run_sd_boxdiff.py --prompt "A rabbit wearing sunglasses looks very proud" --P 0.2 --L 1 --seeds [1,2,3,4,5,6,7,8,9] --token_indices [2,4] --bbox [[67,87,366,512],[66,130,364,262]] :

envs/boxdiff/lib/python3.8/site-packages/torch/serialization.py", line 242, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

envs/boxdiff/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

BoxDiff/diffusers/src/diffusers/models/modeling_utils.py", line 119, in load_state_dict
raise OSError(
OSError: Unable to load weights from checkpoint file for '../stable-diffusion-v1-4/unet/diffusion_pytorch_model.bin' at '../stable-diffusion-v1-4/unet/diffusion_pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

Code License

Dear author, first of all, thank you for your work.
Could you please mention under what license you have released this code?

Results looks same as the baseline

Hello, I tried to compare the results by setting the scale factor to 0, but it seems like the results don't vary much from scale_factor=20.
Did I do something wrong?

run_sd_boxdiff.py --prompt "as the aurora lights up the sky, a herd of reindeer leisurely wanders on the grassy meadow, admiring the breathtaking view, a serene lake quietly reflects the magnificent display, and in the distance, a snow-capped mountain stands majestically, fantasy, 8k, highly detailed" --P 0.2 --L 1 --seeds [2] --token_indices [3,12,21,30,46] --bbox [[1,3,512,202],[75,344,421,495],[1,327,508,507],[2,217,507,341],[1,135,509,242]] --refine False

sd15_0_2
sd15_20_2

What does argument "normalize_eot" imply?

Hi,

I'm recently working on adapting BoxDiff into the latest diffusers library, including the integration for both SD and SDXL. I came across this argument normalize_eot here:

if normalize_eot:
prompt = self.prompt
if isinstance(self.prompt, list):
prompt = self.prompt[0]
last_idx = len(self.tokenizer(prompt)['input_ids']) - 1

It is set to True for SD2.1 and False for SD1.5. I'm not super familiar with the details of different versions, so would you mind clarifying what is the purpose of this argument? Thank you in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.