Reg testing with other data about kaleido-bert HOT 9 CLOSED

mczhuge commented on July 28, 2024

Reg testing with other data

from kaleido-bert.

Comments (9)

mczhuge commented on July 28, 2024

The input format is TSV which is split by '\t'

text_prod_id:str:1,
input_ids:int:64,
input_mask:int:64,
segment_ids:int:64,
prod_desc:str:1,
nx_sent_labels:int:1,
image_prod_id:str:1,
prod_img_id:str:1,
img_feature_convert_rotation:float:2048,
img_feature_convert_jigsaw:float:8192,
img_feature_convert_camouflage:float:18432,
img_feature_convert_grey_mask:float:32768,
img_feature_convert_blank_mask:float:51200,
image_mask:int:55,
img_loc_position_rotation:int:5,
img_loc_position_jigsaw:int:20,
img_loc_position_camouflage:int:35,
img_loc_position_grey_mask:int:80,
img_loc_position_blank_mask:int:125"

However, only some fields are used:

https://github.com/mczhuge/Kaleido-BERT/blob/main/scripts/finetune_main.py#78
-->
https://github.com/mczhuge/Kaleido-BERT/blob/main/scripts/finetune_main.py#91

from kaleido-bert.

shaheenkdr commented on July 28, 2024

input_ids, # pre-process by yourself
input_mask, # pre-process by yourself
segment_ids, # pre-process by yourself

Can you brief more about these ? Sounds a bit ambiguous , or link to any reference code to understand would help as well. Thanks for replying !

from kaleido-bert.

mczhuge commented on July 28, 2024

You can download one of these retrieve datasets:

such as:

i2t

wget http://icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com/mingchen.zgmc/KaleidoBERT_TF_CODE/datasets/retrieve/retrieve_i2t__01b600c2a5874bbfaea0bc89d855b771

head -n 1 retrieve_i2t__01b600c2a5874bbfaea0bc89d855b771 > retrieve_i2t_analysis

and see the samples,

however,

if you know BERT architecture, a raw sentence can be tokenized as input_ids,

https://github.com/huggingface/transformers/blob/1c06240e1b3477728129bb58e7b6c7734bb5074e/src/transformers/models/bert/tokenization_bert.py#L117

then it will select mask candidate to generate the input_mask, there is also a reference you can learn:

https://github.com/huggingface/transformers/blob/81009b7a5c5cb183a9275c15bf347bdc988b02c4/tests/test_modeling_bert_generation.py

segment_ids, input_ids, position_ids are defined clearly in vanilla BERT, you can learn the BERT architecture to get more information.

from kaleido-bert.

shaheenkdr commented on July 28, 2024

I tried with one row from the retrieve list, and obtained the result as :

['1619463']['2288337']['1'][0][15.664671897888184, 1.786315679550171]

I believe the last two fields are logits, and I tried doing a softmax :

>>> b = F.softmax(a, -1)
>>> print(b)
tensor([1.0000e+00, 9.3909e-07])
>>>

But the results doesn't look legit : / . Can you please let me know if I am missing something

from kaleido-bert.

mczhuge commented on July 28, 2024

I have reviewed the data, there is a sample:

1877243 #text_prod_id:str:1

101,2898,1011,4190,6312,1999,3212,1012,2152,1011,4125,1012,6315,3341,2012,27553,1012,14865,3215,2012,2392,1012,2048,1011,4979,20724,1012,4565,24347,1012,14101,1011,4875,1012,5688,26035,2075,1999,2317,1012,22480,1012,2459,1012,1019,1000,4190,3098,1012,102,0,0,0,0,0,0,0,0,0,0,0,0,0,0 #input_ids:int:64

1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0 #input_mask:int:64

0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 #segment_ids:int:64

"Wide-leg jeans in navy. High-rise. Raw edge at waistband. Pleats at front. Two-pocket styling. Rolled cuffs. Zip-fly. Contrast stitching in white. Approx. 17.5"" leg opening." #segment_ids:int:64

0 # nx_sent_labels:int:1

1903663 # image_prod_id:str:1

1 #prod_img_id:str:1

0.67634,3.87287,0.0,0.66722,0.22851,0.6459,0.16636,0.44054,0.50354,0.24993,0.69803,1.66227,0.07413,0.01199,0.96826,0.10995,0.0,1.7154,.... #img_feature_convert_rotation:float:2048

0.13453,0.36225,0.00338,0.09924,0.05989,0.0,0.0,0.7815,0.07461,0.0,0.81553,0.05551,0.46683,0.0,0.90155,2.43293,0.0,0.17192,0.10193,0.08603,.... #img_feature_convert_jigsaw:float:8192

0.0,0.05236,0.0,0.0,0.0,0.11065,0.78982,0.77235,0.0,0.0,0.0,0.12183,0.0,0.06147,0.69437,0.33245,0.0,0.11518,0.0,0.0,0.07393,0.0,0.0,0.60605,.... #img_feature_convert_camouflage:float:18432

0.0,0.0,0.0,0.0,0.28479,0.0,0.20549,0.0433,0.0,0.0,0.0,0.33887,0.00168,0.00488,0.1958,0.0,0.0,0.39969,0.0,0.0,0.0,0.0,0.10314,0.27427,0.38337,.... #img_feature_convert_grey_mask:float:32768

0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06259,0.0,0.00404,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03209,0.0,0.0,0.0,0.0,0.0,0.0,0.66634,0.0,0.00026,0.0,0.0,0.0,0.0,0.0,.... #img_feature_convert_blank_mask:float:51200

1,2,2,2,2,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5 #image_mask:int:55

2, 249, 78, 178, 24700 #img_loc_position_rotation:int:5

2, 125, 78, 128, 6150, 125, 249, 128, 178, 6150, 2, 125, 128, 178, 6150, 125, 249, 78, 128, 6150 #img_loc_position_jigsaw:int:20

2, 84, 78, 111, 2706, 2, 84, 111, 144, 2706, 2, 84, 144, 178, 2706, 84, 166, 78, 111, 2706, 84, 166, 111, 144, 2706, 84, 166, 144, 178, 2706, 166, 249, 78, 111, 2706, 166, 249, 111, 144, 2706, 166, 249, 144, 178, 2706 #img_loc_position_camouflage:int:35

2, 63, 78, 103, 1525, 2, 63, 103, 128, 1525, 2, 63, 128, 153, 1525, 2, 63, 153, 178, 1525, 63, 125, 78, 103, 1525, 63, 125, 103, 128, 1525, 63, 125, 128, 153, 1525, 63, 125, 153, 178, 1525, 125, 187, 78, 103, 1525, 125, 187, 103, 128, 1525, 125, 187, 128, 153, 1525, 125, 187, 153, 178, 1525, 187, 249, 78, 103, 1525, 187, 249, 103, 128, 1525, 187, 249, 128, 153, 1525, 187, 249, 153, 178, 1525 #img_loc_position_grey_mask:int:80

2, 51, 78, 98, 980, 2, 51, 98, 118, 980, 2, 51, 118, 138, 980, 2, 51, 138, 158, 980, 2, 51, 158, 178, 980, 51, 100, 78, 98, 980, 51, 100, 98, 118, 980, 51, 100, 118, 138, 980, 51, 100, 138, 158, 980, 51, 100, 158, 178, 980, 100, 150, 78, 98, 980, 100, 150, 98, 118, 980, 100, 150, 118, 138, 980, 100, 150, 138, 158, 980, 100, 150, 158, 178, 980, 150, 199, 78, 98, 980, 150, 199, 98, 118, 980, 150, 199, 118, 138, 980, 150, 199, 138, 158, 980, 150, 199, 158, 178, 980, 199, 249, 78, 98, 980, 199, 249, 98, 118, 980, 199, 249, 118, 138, 980, 199, 249, 138, 158, 980, 199, 249, 158, 178, 980 #img_loc_position_blank_mask:int:125

I hope this can help you.

from kaleido-bert.

shaheenkdr commented on July 28, 2024

This is all fine, I understood what you are trying to convey, can you also share few images from the dataset, I am not able to download them. Its hard to interpret the whole concept from array level, if you can please also share the image for that particular item you've described above, it would be immensely helpful, let me know if I can mail you ?

from kaleido-bert.

mczhuge commented on July 28, 2024

From here you can get pre-pre-processed fashion-gen raw datasets, containing RGB images and textual information.

#GET RAW DATA
wget http://icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com/mingchen.zgmc/KaleidoBERT_TF_CODE/datasets/raw_data/extracted_images.tar.gz
wget http://icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com/mingchen.zgmc/KaleidoBERT_TF_CODE/datasets/raw_data/full_train_info.txt
wget http://icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com/mingchen.zgmc/KaleidoBERT_TF_CODE/datasets/raw_data/full_valid_info.txt

from kaleido-bert.

shaheenkdr commented on July 28, 2024

Thanks a lot, I see the images have been converted to 256 x 256, has any other changes been made prior to training ?

from kaleido-bert.

mczhuge commented on July 28, 2024

We directly process these images and text to generate training/finetune/retrieve tsv.

from kaleido-bert.

Reg testing with other data about kaleido-bert HOT 9 CLOSED

Comments (9)

i2t

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent