mczhuge / kaleido-bert Goto Github PK

View Code? Open in Web Editor NEW

264.0 3.0 19.0 10.22 MB

💐Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

License: MIT License

Python 97.16% Shell 2.84%

bert e-commerce fashion pre-training multimodal vision-language

kaleido-bert's Introduction

🧩 Top Repositories

Make some noises 🍻

✨ [2023/12] I build my personal website and will sync everything in metauto.ai or .
📫 [2022/8] I join the AI Initiative@KAUST as a Ph.D. student in 2022 Fall, under the supervision of Prof. Jürgen Schmidhuber.
🤖 Recent four years (before PhD), I've worked as an engineer, researcher (or intern) at NSFocus, Alibaba Group, IIAI, SUSTech (VIP Lab) and Microsoft (WizardLM).
🌟 I am active in MetaGPT open-source community; and I highly recommend my KAUST friends Deyao's and Jun's MiniGPT-4.
👉 My interests include multimodal learning, intelligent agents, LLM, code generation, and self-improvement mechanisms. You can find me at Email, LinkedIn, Google Scholar!

Invited Talk

MBZUAI: "Behind Images" on Dec 27, 2022.
机器之心: "走进全球顶尖实验室第一期-IIAI" on May 13, 2021

Services

Outstanding Reviewer of CVPR2023 (232 out of 7000+)
Reviewer: ICCV21,23, CVPR22,23,24, AAAI22-23, ECCV22, ICML22-23, NeurIPS22
Assitant: MIR - Multimodal Special Issue (Welecome to submit before 1 July, 2023)

kaleido-bert's People

Contributors

Stargazers

Watchers

Forkers

hongbo-sun ericdoug-qi dllinks trendingtechnology peternara nolophe mariyahendriksen feiward abandonsea nobelvictory jellchou tymanman tjpxiaoming andyfrancesco29 zyyang thiagoneves geniusfoever shahabmokari niushixiong

kaleido-bert's Issues

Will you release the PyTorch-based code and checkpoints?

Hi, may I ask will you release the PyTorch version and checkpoints for the pre-trained model? Thank you very much!

Finetuning of Kaleido-BERT for Fashion Captioning Update

#6
During the fine-tuning on the image captioning task, Did you use any pre-training task (for e.g., AKPM, TIM and AMLM) along with the fashion captioning task i.e., given an image ( i.e., sequence of image patches generated by "Kaleido Patch Generator") predict the corresponding caption?

预处理

请问可以开放一下图文预处理模块、token对齐模块吗？

The problem about the third step:Download Dependancy

Thank you for sharing such great work.

When I run the sh get_checkpoint.sh, I get the mistake like below:

Resolving icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com (icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com)... 47.92.17.218
Connecting to icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com (icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com)|47.92.17.218|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden.

And when I click the link directly, I get the mistake like below:

This XML file does not appear to have any style information associated with it. The document tree is shown below.

AccessDenied
You have no right to access this object because of bucket acl.
607C319BB6DA383338EC6AFD
icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com

May you provide the solution?

Finetuning of Kaleido-BERT for Fashion Captioning

Thanks for sharing this interesting work.
Would you please share how "Kaleido-BERT" has been fine-tuned on captioning task?
Have you used separate decoder for generation or "Kaleido-BERT" encoder only?

Some questions about the model proposed in the paper

1.In 3.3 Attention-based Alignment Generator, Generated Tokens --> the Attention Map. Is token means only noun or all the words just like prepositions and verbs, etc in the generation and raw text?

In 3.3 Attention-based Alignment Generator, the Attention Map --> Patch. How the attention map produced by the SAT model aligned with the Kaleido Patches? Calculate KL divergence between attention map and patches of some other calculation method?

Fashion Product Search System

Hi Mczhuge, just to know is there any plan to open source the fashion product search demo system to try this in real time ? :)

When will the PyTorch version code be released？

Thanks! When will the PyTorch version code be released？

The problem about the second step

thanks for sharing such great works
when I run "pip install boto3 tqdm tensorflow_datasets --index-url=https://mirrors.aliyun.com/pypi/simple/"
there is something wrong happened like below:
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Collecting boto3
Downloading https://mirrors.aliyun.com/pypi/packages/f1/99/43e5571005c792284276986eabd956699fac65d283df409b1482ca8722d8/boto3-1.17.67-py2.py3-none-any.whl (131kB)
|████████████████████████████████| 133kB 5.5MB/s
Collecting tqdm
Downloading https://mirrors.aliyun.com/pypi/packages/72/8a/34efae5cf9924328a8f34eeb2fdaae14c011462d9f0e3fcded48e1266d1c/tqdm-4.60.0-py2.py3-none-any.whl (75kB)
|████████████████████████████████| 81kB 14.7MB/s
Collecting tensorflow_datasets
Downloading https://mirrors.aliyun.com/pypi/packages/fe/52/9b9f6312cfa29c39445d22a3ba45f6279db1937de9df93c9fb65dcf0e42a/tensorflow-datasets-3.2.1.tar.gz (2.9MB)
|████████████████████████████████| 2.9MB 30.2MB/s
Collecting jmespath<1.0.0,>=0.7.1
Downloading https://mirrors.aliyun.com/pypi/packages/07/cb/5f001272b6faeb23c1c9e0acc04d48eaaf5c862c17709d20e3469c6e0139/jmespath-0.10.0-py2.py3-none-any.whl
ERROR: Could not find a version that satisfies the requirement botocore<1.21.0,>=1.20.67 (from boto3) (from versions: 0.4.1, 0.4.2, 0....)
ERROR: No matching distribution found for botocore<1.21.0,>=1.20.67 (from boto3)

I‘m confused about this.
May you provide the solution?
PS ：
Here is my environment：
OS ：Ubuntu 16.04.6
env：set up as you said

How to generate input_schema format data?

Hi,
I find your work very interesting, and it is aligned with my project requirements.
I want to fine tune it for custom dataset, where I have raw images and text with labels, the task is similar to "Category/SubCategory Recognition". How to get the data in input_schema format?
Please share the code if you have any.

Humble Request for High-Resolution FashionGen Dataset

I'm currently unable to obtain the high-resolution FashionGen dataset, as described in the original paper. Would you be able to provide me with a copy of the FashionGen dataset in high resolution?

Fashion Captioning using Kaleido-BERT and Fashion-BERT

Hi, I have gone through your code. Very interesting work. Can you please explain the input to calculate input MLM logits for caption generation? I have tried input in the formats: 1. image_feature,[SEP], [MASK],[PAD]...[PAD] 2. image_feature,[CLS], [MASK],[PAD]....[PAD] 3. [CLS], [MASK],[PAD]...[PAD],[SEP],image_feature; this will be in loop. Which one is the correct format?
Thanks!