Comments (9)
The input format is TSV which is split by '\t'
text_prod_id:str:1,
input_ids:int:64,
input_mask:int:64,
segment_ids:int:64,
prod_desc:str:1,
nx_sent_labels:int:1,
image_prod_id:str:1,
prod_img_id:str:1,
img_feature_convert_rotation:float:2048,
img_feature_convert_jigsaw:float:8192,
img_feature_convert_camouflage:float:18432,
img_feature_convert_grey_mask:float:32768,
img_feature_convert_blank_mask:float:51200,
image_mask:int:55,
img_loc_position_rotation:int:5,
img_loc_position_jigsaw:int:20,
img_loc_position_camouflage:int:35,
img_loc_position_grey_mask:int:80,
img_loc_position_blank_mask:int:125"
However, only some fields are used:
https://github.com/mczhuge/Kaleido-BERT/blob/main/scripts/finetune_main.py#78
-->
https://github.com/mczhuge/Kaleido-BERT/blob/main/scripts/finetune_main.py#91
from kaleido-bert.
input_ids, # pre-process by yourself
input_mask, # pre-process by yourself
segment_ids, # pre-process by yourself
Can you brief more about these ? Sounds a bit ambiguous , or link to any reference code to understand would help as well. Thanks for replying !
from kaleido-bert.
You can download one of these retrieve datasets:
such as:
i2t
head -n 1 retrieve_i2t__01b600c2a5874bbfaea0bc89d855b771 > retrieve_i2t_analysis
and see the samples,
however,
- if you know BERT architecture, a raw sentence can be tokenized as input_ids,
- then it will select mask candidate to generate the input_mask, there is also a reference you can learn:
- segment_ids, input_ids, position_ids are defined clearly in vanilla BERT, you can learn the BERT architecture to get more information.
from kaleido-bert.
I tried with one row from the retrieve list, and obtained the result as :
['1619463']['2288337']['1'][0][15.664671897888184, 1.786315679550171]
I believe the last two fields are logits, and I tried doing a softmax :
>>> b = F.softmax(a, -1)
>>> print(b)
tensor([1.0000e+00, 9.3909e-07])
>>>
But the results doesn't look legit : / . Can you please let me know if I am missing something
from kaleido-bert.
I have reviewed the data, there is a sample:
1877243 #text_prod_id:str:1
101,2898,1011,4190,6312,1999,3212,1012,2152,1011,4125,1012,6315,3341,2012,27553,1012,14865,3215,2012,2392,1012,2048,1011,4979,20724,1012,4565,24347,1012,14101,1011,4875,1012,5688,26035,2075,1999,2317,1012,22480,1012,2459,1012,1019,1000,4190,3098,1012,102,0,0,0,0,0,0,0,0,0,0,0,0,0,0 #input_ids:int:64
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0 #input_mask:int:64
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 #segment_ids:int:64
"Wide-leg jeans in navy. High-rise. Raw edge at waistband. Pleats at front. Two-pocket styling. Rolled cuffs. Zip-fly. Contrast stitching in white. Approx. 17.5"" leg opening." #segment_ids:int:64
0 # nx_sent_labels:int:1
1903663 # image_prod_id:str:1
1 #prod_img_id:str:1
0.67634,3.87287,0.0,0.66722,0.22851,0.6459,0.16636,0.44054,0.50354,0.24993,0.69803,1.66227,0.07413,0.01199,0.96826,0.10995,0.0,1.7154,.... #img_feature_convert_rotation:float:2048
0.13453,0.36225,0.00338,0.09924,0.05989,0.0,0.0,0.7815,0.07461,0.0,0.81553,0.05551,0.46683,0.0,0.90155,2.43293,0.0,0.17192,0.10193,0.08603,.... #img_feature_convert_jigsaw:float:8192
0.0,0.05236,0.0,0.0,0.0,0.11065,0.78982,0.77235,0.0,0.0,0.0,0.12183,0.0,0.06147,0.69437,0.33245,0.0,0.11518,0.0,0.0,0.07393,0.0,0.0,0.60605,.... #img_feature_convert_camouflage:float:18432
0.0,0.0,0.0,0.0,0.28479,0.0,0.20549,0.0433,0.0,0.0,0.0,0.33887,0.00168,0.00488,0.1958,0.0,0.0,0.39969,0.0,0.0,0.0,0.0,0.10314,0.27427,0.38337,.... #img_feature_convert_grey_mask:float:32768
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06259,0.0,0.00404,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03209,0.0,0.0,0.0,0.0,0.0,0.0,0.66634,0.0,0.00026,0.0,0.0,0.0,0.0,0.0,.... #img_feature_convert_blank_mask:float:51200
1,2,2,2,2,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5 #image_mask:int:55
2, 249, 78, 178, 24700 #img_loc_position_rotation:int:5
2, 125, 78, 128, 6150, 125, 249, 128, 178, 6150, 2, 125, 128, 178, 6150, 125, 249, 78, 128, 6150 #img_loc_position_jigsaw:int:20
2, 84, 78, 111, 2706, 2, 84, 111, 144, 2706, 2, 84, 144, 178, 2706, 84, 166, 78, 111, 2706, 84, 166, 111, 144, 2706, 84, 166, 144, 178, 2706, 166, 249, 78, 111, 2706, 166, 249, 111, 144, 2706, 166, 249, 144, 178, 2706 #img_loc_position_camouflage:int:35
2, 63, 78, 103, 1525, 2, 63, 103, 128, 1525, 2, 63, 128, 153, 1525, 2, 63, 153, 178, 1525, 63, 125, 78, 103, 1525, 63, 125, 103, 128, 1525, 63, 125, 128, 153, 1525, 63, 125, 153, 178, 1525, 125, 187, 78, 103, 1525, 125, 187, 103, 128, 1525, 125, 187, 128, 153, 1525, 125, 187, 153, 178, 1525, 187, 249, 78, 103, 1525, 187, 249, 103, 128, 1525, 187, 249, 128, 153, 1525, 187, 249, 153, 178, 1525 #img_loc_position_grey_mask:int:80
2, 51, 78, 98, 980, 2, 51, 98, 118, 980, 2, 51, 118, 138, 980, 2, 51, 138, 158, 980, 2, 51, 158, 178, 980, 51, 100, 78, 98, 980, 51, 100, 98, 118, 980, 51, 100, 118, 138, 980, 51, 100, 138, 158, 980, 51, 100, 158, 178, 980, 100, 150, 78, 98, 980, 100, 150, 98, 118, 980, 100, 150, 118, 138, 980, 100, 150, 138, 158, 980, 100, 150, 158, 178, 980, 150, 199, 78, 98, 980, 150, 199, 98, 118, 980, 150, 199, 118, 138, 980, 150, 199, 138, 158, 980, 150, 199, 158, 178, 980, 199, 249, 78, 98, 980, 199, 249, 98, 118, 980, 199, 249, 118, 138, 980, 199, 249, 138, 158, 980, 199, 249, 158, 178, 980 #img_loc_position_blank_mask:int:125
I hope this can help you.
from kaleido-bert.
This is all fine, I understood what you are trying to convey, can you also share few images from the dataset, I am not able to download them. Its hard to interpret the whole concept from array level, if you can please also share the image for that particular item you've described above, it would be immensely helpful, let me know if I can mail you ?
from kaleido-bert.
From here you can get pre-pre-processed fashion-gen raw datasets, containing RGB images and textual information.
#GET RAW DATA
wget http://icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com/mingchen.zgmc/KaleidoBERT_TF_CODE/datasets/raw_data/extracted_images.tar.gz
wget http://icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com/mingchen.zgmc/KaleidoBERT_TF_CODE/datasets/raw_data/full_train_info.txt
wget http://icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com/mingchen.zgmc/KaleidoBERT_TF_CODE/datasets/raw_data/full_valid_info.txt
from kaleido-bert.
Thanks a lot, I see the images have been converted to 256 x 256, has any other changes been made prior to training ?
from kaleido-bert.
We directly process these images and text to generate training/finetune/retrieve tsv.
from kaleido-bert.
Related Issues (15)
- The problem about the third step:Download Dependancy HOT 5
- When will the PyTorch version code be released? HOT 2
- Request for the PyTorch version HOT 2
- Fashion Captioning using Kaleido-BERT and Fashion-BERT HOT 1
- Will you release the PyTorch-based code and checkpoints? HOT 1
- How to generate input_schema format data? HOT 1
- Humble Request for High-Resolution FashionGen Dataset HOT 1
- Some questions about the model proposed in the paper
- The problem about the second step
- Fashion Product Search System HOT 1
- Finetuning of Kaleido-BERT for Fashion Captioning HOT 2
- Release Pytorch version HOT 3
- Finetuning of Kaleido-BERT for Fashion Captioning Update HOT 1
- 预处理 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kaleido-bert.