Coder Social home page Coder Social logo

Comments (9)

mczhuge avatar mczhuge commented on July 28, 2024

The input format is TSV which is split by '\t'

text_prod_id:str:1,
input_ids:int:64,
input_mask:int:64,
segment_ids:int:64,
prod_desc:str:1,
nx_sent_labels:int:1,
image_prod_id:str:1,
prod_img_id:str:1,
img_feature_convert_rotation:float:2048,
img_feature_convert_jigsaw:float:8192,
img_feature_convert_camouflage:float:18432,
img_feature_convert_grey_mask:float:32768,
img_feature_convert_blank_mask:float:51200,
image_mask:int:55,
img_loc_position_rotation:int:5,
img_loc_position_jigsaw:int:20,
img_loc_position_camouflage:int:35,
img_loc_position_grey_mask:int:80,
img_loc_position_blank_mask:int:125"

However, only some fields are used:

https://github.com/mczhuge/Kaleido-BERT/blob/main/scripts/finetune_main.py#78
-->
https://github.com/mczhuge/Kaleido-BERT/blob/main/scripts/finetune_main.py#91

from kaleido-bert.

shaheenkdr avatar shaheenkdr commented on July 28, 2024
input_ids, # pre-process by yourself
input_mask, # pre-process by yourself
segment_ids, # pre-process by yourself

Can you brief more about these ? Sounds a bit ambiguous , or link to any reference code to understand would help as well. Thanks for replying !

from kaleido-bert.

mczhuge avatar mczhuge commented on July 28, 2024

You can download one of these retrieve datasets:

such as:

i2t

wget http://icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com/mingchen.zgmc/KaleidoBERT_TF_CODE/datasets/retrieve/retrieve_i2t__01b600c2a5874bbfaea0bc89d855b771

head -n 1 retrieve_i2t__01b600c2a5874bbfaea0bc89d855b771 > retrieve_i2t_analysis

and see the samples,

however,

  1. if you know BERT architecture, a raw sentence can be tokenized as input_ids,

https://github.com/huggingface/transformers/blob/1c06240e1b3477728129bb58e7b6c7734bb5074e/src/transformers/models/bert/tokenization_bert.py#L117

  1. then it will select mask candidate to generate the input_mask, there is also a reference you can learn:

https://github.com/huggingface/transformers/blob/81009b7a5c5cb183a9275c15bf347bdc988b02c4/tests/test_modeling_bert_generation.py

  1. segment_ids, input_ids, position_ids are defined clearly in vanilla BERT, you can learn the BERT architecture to get more information.

from kaleido-bert.

shaheenkdr avatar shaheenkdr commented on July 28, 2024

I tried with one row from the retrieve list, and obtained the result as :

['1619463']['2288337']['1'][0][15.664671897888184, 1.786315679550171]

I believe the last two fields are logits, and I tried doing a softmax :

>>> b = F.softmax(a, -1)
>>> print(b)
tensor([1.0000e+00, 9.3909e-07])
>>> 

But the results doesn't look legit : / . Can you please let me know if I am missing something

from kaleido-bert.

mczhuge avatar mczhuge commented on July 28, 2024

I have reviewed the data, there is a sample:


1877243 #text_prod_id:str:1

101,2898,1011,4190,6312,1999,3212,1012,2152,1011,4125,1012,6315,3341,2012,27553,1012,14865,3215,2012,2392,1012,2048,1011,4979,20724,1012,4565,24347,1012,14101,1011,4875,1012,5688,26035,2075,1999,2317,1012,22480,1012,2459,1012,1019,1000,4190,3098,1012,102,0,0,0,0,0,0,0,0,0,0,0,0,0,0 #input_ids:int:64

1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0 #input_mask:int:64

0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 #segment_ids:int:64

"Wide-leg jeans in navy. High-rise. Raw edge at waistband. Pleats at front. Two-pocket styling. Rolled cuffs. Zip-fly. Contrast stitching in white. Approx. 17.5"" leg opening." #segment_ids:int:64

0 # nx_sent_labels:int:1

1903663 # image_prod_id:str:1

1 #prod_img_id:str:1

0.67634,3.87287,0.0,0.66722,0.22851,0.6459,0.16636,0.44054,0.50354,0.24993,0.69803,1.66227,0.07413,0.01199,0.96826,0.10995,0.0,1.7154,.... #img_feature_convert_rotation:float:2048

0.13453,0.36225,0.00338,0.09924,0.05989,0.0,0.0,0.7815,0.07461,0.0,0.81553,0.05551,0.46683,0.0,0.90155,2.43293,0.0,0.17192,0.10193,0.08603,.... #img_feature_convert_jigsaw:float:8192

0.0,0.05236,0.0,0.0,0.0,0.11065,0.78982,0.77235,0.0,0.0,0.0,0.12183,0.0,0.06147,0.69437,0.33245,0.0,0.11518,0.0,0.0,0.07393,0.0,0.0,0.60605,.... #img_feature_convert_camouflage:float:18432

0.0,0.0,0.0,0.0,0.28479,0.0,0.20549,0.0433,0.0,0.0,0.0,0.33887,0.00168,0.00488,0.1958,0.0,0.0,0.39969,0.0,0.0,0.0,0.0,0.10314,0.27427,0.38337,.... #img_feature_convert_grey_mask:float:32768

0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06259,0.0,0.00404,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03209,0.0,0.0,0.0,0.0,0.0,0.0,0.66634,0.0,0.00026,0.0,0.0,0.0,0.0,0.0,.... #img_feature_convert_blank_mask:float:51200

1,2,2,2,2,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5 #image_mask:int:55

2, 249, 78, 178, 24700 #img_loc_position_rotation:int:5

2, 125, 78, 128, 6150, 125, 249, 128, 178, 6150, 2, 125, 128, 178, 6150, 125, 249, 78, 128, 6150 #img_loc_position_jigsaw:int:20

2, 84, 78, 111, 2706, 2, 84, 111, 144, 2706, 2, 84, 144, 178, 2706, 84, 166, 78, 111, 2706, 84, 166, 111, 144, 2706, 84, 166, 144, 178, 2706, 166, 249, 78, 111, 2706, 166, 249, 111, 144, 2706, 166, 249, 144, 178, 2706 #img_loc_position_camouflage:int:35

2, 63, 78, 103, 1525, 2, 63, 103, 128, 1525, 2, 63, 128, 153, 1525, 2, 63, 153, 178, 1525, 63, 125, 78, 103, 1525, 63, 125, 103, 128, 1525, 63, 125, 128, 153, 1525, 63, 125, 153, 178, 1525, 125, 187, 78, 103, 1525, 125, 187, 103, 128, 1525, 125, 187, 128, 153, 1525, 125, 187, 153, 178, 1525, 187, 249, 78, 103, 1525, 187, 249, 103, 128, 1525, 187, 249, 128, 153, 1525, 187, 249, 153, 178, 1525 #img_loc_position_grey_mask:int:80

2, 51, 78, 98, 980, 2, 51, 98, 118, 980, 2, 51, 118, 138, 980, 2, 51, 138, 158, 980, 2, 51, 158, 178, 980, 51, 100, 78, 98, 980, 51, 100, 98, 118, 980, 51, 100, 118, 138, 980, 51, 100, 138, 158, 980, 51, 100, 158, 178, 980, 100, 150, 78, 98, 980, 100, 150, 98, 118, 980, 100, 150, 118, 138, 980, 100, 150, 138, 158, 980, 100, 150, 158, 178, 980, 150, 199, 78, 98, 980, 150, 199, 98, 118, 980, 150, 199, 118, 138, 980, 150, 199, 138, 158, 980, 150, 199, 158, 178, 980, 199, 249, 78, 98, 980, 199, 249, 98, 118, 980, 199, 249, 118, 138, 980, 199, 249, 138, 158, 980, 199, 249, 158, 178, 980 #img_loc_position_blank_mask:int:125


I hope this can help you.

from kaleido-bert.

shaheenkdr avatar shaheenkdr commented on July 28, 2024

This is all fine, I understood what you are trying to convey, can you also share few images from the dataset, I am not able to download them. Its hard to interpret the whole concept from array level, if you can please also share the image for that particular item you've described above, it would be immensely helpful, let me know if I can mail you ?

from kaleido-bert.

mczhuge avatar mczhuge commented on July 28, 2024

From here you can get pre-pre-processed fashion-gen raw datasets, containing RGB images and textual information.

#GET RAW DATA
wget http://icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com/mingchen.zgmc/KaleidoBERT_TF_CODE/datasets/raw_data/extracted_images.tar.gz
wget http://icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com/mingchen.zgmc/KaleidoBERT_TF_CODE/datasets/raw_data/full_train_info.txt
wget http://icbu-ensa-sc.oss-cn-zhangjiakou.aliyuncs.com/mingchen.zgmc/KaleidoBERT_TF_CODE/datasets/raw_data/full_valid_info.txt

from kaleido-bert.

shaheenkdr avatar shaheenkdr commented on July 28, 2024

Thanks a lot, I see the images have been converted to 256 x 256, has any other changes been made prior to training ?

from kaleido-bert.

mczhuge avatar mczhuge commented on July 28, 2024

We directly process these images and text to generate training/finetune/retrieve tsv.

from kaleido-bert.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.