Dual-Transformer is the framework with the input is an scenery image and the output is Vietnamese six-eight poem. The generated poem is related to the input image by containing the objects from the input image.
- Clone the repository:
git clone https://github.com/chauminhnguyen/Dual-Transformer.git
- Install the requirements
-
Install cuda.
-
Install the requirements.
pip install -r requirements.txt
- Modify the path
-
Modify the config.json for Query2labels and GPT-2 models path.
-
Modify the Query2labels's config.json (default: models/Query2labels/config.json) for the pretrained's path.
- Start the model
streamlit run app.py
Link data.
I used Query2Label for Image-to-Keywords Model. The command below is used to train on my Image-to-Keywords dataset.
python main_mlc.py
--dataset_dir './data' --backbone resnet101 --dataname coco14
--batch-size 1 --print-freq 100 --output "./output" --world-size 1 --rank 0
--dist-url tcp://127.0.0.1:3717 --gamma_pos 0 --gamma_neg 2 --dtgfl --epochs 40
--lr 1e-4 --optim AdamW --pretrained --num_class 76 --img_size 448
--weight-decay 1e-2 --cutout --n_holes 1 --cut_fact 0.5 --hidden_dim 2048
--dim_feedforward 4096 --enc_layers 1 --dec_layers 2 --nheads 4 --early-stop
--amp --workers 2
Data is located in /data.
I used GPT-2 for Keywords-to-Poem model.
python trainKw2Poem.py
--train_dir './data/1ext_balanced_rkw_4sen_87609_test_kw2poem_dataset.csv'
--epoch 100 --step 10000 --batch_size 8
Model name | Link |
---|---|
GPT-2 | link |
Query2label | link |
We thank the authors of Query2Label, GPT-2 for facilitating such an opportunity for us to create this framework. Additionally, we thank FPT for the building dataset process.