Coder Social home page Coder Social logo

mczhuge / kaleido-bert Goto Github PK

View Code? Open in Web Editor NEW
264.0 3.0 19.0 10.22 MB

๐Ÿ’Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

License: MIT License

Python 97.16% Shell 2.84%
bert e-commerce fashion pre-training multimodal vision-language

kaleido-bert's Introduction

๐Ÿงฉ Top Repositories

Make some noises ๐Ÿป

Invited Talk

Services

Email mczhuge

kaleido-bert's People

Contributors

mczhuge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

kaleido-bert's Issues

Finetuning of Kaleido-BERT for Fashion Captioning Update

#6
During the fine-tuning on the image captioning task, Did you use any pre-training task (for e.g., AKPM, TIM and AMLM) along with the fashion captioning task i.e., given an image ( i.e., sequence of image patches generated by "Kaleido Patch Generator") predict the corresponding caption?

้ข„ๅค„็†

่ฏท้—ฎๅฏไปฅๅผ€ๆ”พไธ€ไธ‹ๅ›พๆ–‡้ข„ๅค„็†ๆจกๅ—ใ€tokenๅฏน้ฝๆจกๅ—ๅ—๏ผŸ

The problem about the third step:Download Dependancy

Thank you for sharing such great work.

When I run the sh get_checkpoint.sh, I get the mistake like below:

Resolving icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com (icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com)... 47.92.17.218
Connecting to icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com (icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com)|47.92.17.218|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden.

And when I click the link directly, I get the mistake like below:

This XML file does not appear to have any style information associated with it. The document tree is shown below.

AccessDenied
You have no right to access this object because of bucket acl.
607C319BB6DA383338EC6AFD
icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com

May you provide the solution?

Finetuning of Kaleido-BERT for Fashion Captioning

Thanks for sharing this interesting work.
Would you please share how "Kaleido-BERT" has been fine-tuned on captioning task?
Have you used separate decoder for generation or "Kaleido-BERT" encoder only?

Some questions about the model proposed in the paper

1.In 3.3 Attention-based Alignment Generator, Generated Tokens --> the Attention Map. Is token means only noun or all the words just like prepositions and verbs, etc in the generation and raw text?

  1. In 3.3 Attention-based Alignment Generator, the Attention Map --> Patch. How the attention map produced by the SAT model aligned with the Kaleido Patches? Calculate KL divergence between attention map and patches of some other calculation method?

Fashion Product Search System

Hi Mczhuge, just to know is there any plan to open source the fashion product search demo system to try this in real time ? :)

The problem about the second step

thanks for sharing such great works
when I run "pip install boto3 tqdm tensorflow_datasets --index-url=https://mirrors.aliyun.com/pypi/simple/"
there is something wrong happened like below:
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Collecting boto3
Downloading https://mirrors.aliyun.com/pypi/packages/f1/99/43e5571005c792284276986eabd956699fac65d283df409b1482ca8722d8/boto3-1.17.67-py2.py3-none-any.whl (131kB)
|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 133kB 5.5MB/s
Collecting tqdm
Downloading https://mirrors.aliyun.com/pypi/packages/72/8a/34efae5cf9924328a8f34eeb2fdaae14c011462d9f0e3fcded48e1266d1c/tqdm-4.60.0-py2.py3-none-any.whl (75kB)
|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 81kB 14.7MB/s
Collecting tensorflow_datasets
Downloading https://mirrors.aliyun.com/pypi/packages/fe/52/9b9f6312cfa29c39445d22a3ba45f6279db1937de9df93c9fb65dcf0e42a/tensorflow-datasets-3.2.1.tar.gz (2.9MB)
|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 2.9MB 30.2MB/s
Collecting jmespath<1.0.0,>=0.7.1
Downloading https://mirrors.aliyun.com/pypi/packages/07/cb/5f001272b6faeb23c1c9e0acc04d48eaaf5c862c17709d20e3469c6e0139/jmespath-0.10.0-py2.py3-none-any.whl
ERROR: Could not find a version that satisfies the requirement botocore<1.21.0,>=1.20.67 (from boto3) (from versions: 0.4.1, 0.4.2, 0....)
ERROR: No matching distribution found for botocore<1.21.0,>=1.20.67 (from boto3)

Iโ€˜m confused about this.
May you provide the solution?
PS ๏ผš
Here is my environment๏ผš
OS ๏ผšUbuntu 16.04.6
env๏ผšset up as you said

How to generate input_schema format data?

Hi,
I find your work very interesting, and it is aligned with my project requirements.
I want to fine tune it for custom dataset, where I have raw images and text with labels, the task is similar to "Category/SubCategory Recognition". How to get the data in input_schema format?
Please share the code if you have any.

Fashion Captioning using Kaleido-BERT and Fashion-BERT

Hi, I have gone through your code. Very interesting work. Can you please explain the input to calculate input MLM logits for caption generation? I have tried input in the formats: 1. image_feature,[SEP], [MASK],[PAD]...[PAD] 2. image_feature,[CLS], [MASK],[PAD]....[PAD] 3. [CLS], [MASK],[PAD]...[PAD],[SEP],image_feature; this will be in loop. Which one is the correct format?
Thanks!

Reg testing with other data

What's the best way to test with my custom image set ? the t2i and i2t data looks mostly text and vectors, any script to get my data into that format ?

Release Pytorch version

Hi, thanks for your excellent work!

I was wondering when would you release the Pytorch version. I think it will benefit downstream tasks a lot.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.