Coder Social home page Coder Social logo

text2image's Introduction

Generating Images from Captions with Attention

Code for paper Generating Images from Captions with Attention by Elman Mansimov, Emilio Parisotto, Jimmy Ba and Ruslan Salakhutdinov; ICLR 2016.

We introduce a model that generates image blobs from natural language descriptions. The proposed model iteratively draws patches on a canvas, while attending to the relevant words in the description.

theimage

Getting Started

The code is written in python. To use it you will need:

  • Python 2.7
  • Theano 0.7 (mostly tested using commit from June/July 2015)
  • numpy and scipy
  • h5py (HDF5 (>= 1.8.11))
  • skip-thoughts

Before running the code make sure that you set floatX to float32 in Theano settings.

Additionally, depending on the tasks you will probably need to download these files by running:

wget http://www.cs.toronto.edu/~emansim/datasets/mnist.h5
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/train-images-32x32.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/train-images-56x56.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/train-captions.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/train-captions-len.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/train-cap2im.pkl
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/dev-images-32x32.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/dev-images-56x56.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/dev-captions.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/dev-captions-len.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/dev-cap2im.pkl
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/test-images-32x32.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/test-captions.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/test-captions-len.npy
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/test-cap2im.pkl
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/gan.hdf5
wget http://www.cs.toronto.edu/~emansim/datasets/text2image/dictionary.pkl

MNIST with Captions

To train the model simply go to mnist-captions folder and run

python alignDraw.py models/mnist-captions.json

To generate 60x60 MNIST images from captions as specified in appendix of the paper run

python sample-captions.py --model models/mnist-captions.json --weights /path/to/trained-weights

Note: I have also provided implementation of simple draw model in files draw.py and sample.py

Microsoft COCO

To train the model simply go to coco folder and run

python alignDraw.py models/coco-captions-32x32.json

To generate images from captions after training run

python sample-captions.py --model models/coco-captions-32x32.json --weights /path/to/trained-weights --dictionary dictionary.pkl --gan_path gan.hdf5 --skipthought_path /path/to/skipthoughts-folder

Note: I have been caught up with other non-research stuff, so I will add baseline model files like noAlignDraw and others during the week of Feb 29 - Mar 6.

Feel free to email me if you have some questions or if you are uncertain about some parts of the code.

Acknowledgments

I would like to acknowledge the help of Tom White for some suggestion on cleaning and organizing the code.

Reference

If you found this code or our paper useful, please consider citing the following paper:

@inproceedings{mansimov16_text2image,
  author    = {Elman Mansimov and Emilio Parisotto and Jimmy Ba and Ruslan Salakhutdinov},
  title     = {Generating Images from Captions with Attention},
  booktitle = {ICLR},
  year      = {2016}
}

You would probably also need to cite some of the papers that we have referred to ;)

text2image's People

Contributors

dribnet avatar g1910 avatar mansimov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

text2image's Issues

How to generate images with bigger size?

I want use the code to train a model on my owen dataset. But I'm not sure about how to modify the code if I want to generate image with size of 64*64 or even more. Did anyone tries it?

Thanks!

Is the KL divergence computed correctly?

Hi

@MissT157 and I are experimenting with this code and we are finding the implementation of KL divergence a bit awry. Can you please revisit and confirm whether it's been implemented correctly?

Thanks in advance
@g1910

Training Models is taking too long

I am trying to train models defined in coc0-captions-32x32.json but it is taking insane amount of time. There are around 200 epochs and it tool 15 hours to complete just one:

Epoch 0 took 15:17:07.833333

Just wondering is there a way to get it done faster or completing iterations for all 200 epochs is really necessary. Is there a base number I can use to get a minimal working setup. Or if anyone have trained weight files that would really help me out. Thanks

I am using NVIDIA GPU for my test on an AWS instance.

KeyError: 'costFunction'

envy@ub1404:/os_pri/github/text2image$ python coco/sample-captions.py --model coco/models/coco-captions-32x32.json --weights ..
Using gpu device 0: GeForce GTX 950M (CNMeM is disabled, CuDNN 4007)
Traceback (most recent call last):
File "coco/sample-captions.py", line 79, in
costFunctionType = str(model["model"][0]["costFunction"]["type"])
KeyError: 'costFunction'
envy@ub1404:
/os_pri/github/text2image$

Why

Why can I find no generated model under mnist?
And when I run the code under MS coco,it shows that float 32 is wrong,but when I changed float 32 to float 64,it is still wrong.How should I do to solve the problem?

NameError: name 'ArgumentParser' is not defined

envy@ub1404:/os_pri/github/text2image$ python mnist-captions/sample-captions.py --model models/mnist-captions.json --weights ..
Using gpu device 0: GeForce GTX 950M (CNMeM is disabled, CuDNN 4007)
Traceback (most recent call last):
File "mnist-captions/sample-captions.py", line 42, in
parser = ArgumentParser()
NameError: name 'ArgumentParser' is not defined
envy@ub1404:
/os_pri/github/text2image$

Unable to generate coco images

I am unable to generate images for the MS COCO example. Once the weights are saved I run the command

python sample-captions.py --model models/coco-captions-32x32.json --weights ./attention-vae-2016-6-29-5-2-31.h5 --dictionary ../dictionary.pkl --gan_path ../gan.hdf5 --skipthought_path /home/skip-thoughts

But in the coco-captions-32X32.json file, there is no key "costFunction".
The exact error that I am getting is as follows

Traceback (most recent call last):
File "sample-captions.py", line 79, in
costFunctionType = str(model["model"][0]["costFunction"]["type"])
KeyError: 'costFunction'

Can you please share the updated coco-captions-32X32.json file?

Thanks.

binary_crossentropy

Is there somthing wrong that you use binary_crossentropy in coco, a color image dataset ?

Error

envy@ub1404:/os_pri/github/text2image$ python coco/alignDraw.py coco/models/coco-captions-32x32.json
Using gpu device 0: GeForce GTX 950M (CNMeM is disabled, CuDNN 4007)
coco/alignDraw.py:343: FutureWarning: comparison to None will result in an elementwise object comparison in the future.
if valData != None:
Traceback (most recent call last):
File "coco/alignDraw.py", line 617, in
rvae = ReccurentAttentionVAE(dimY, dimLangRNN, dimAlign, dimX, dimReadAttent, dimWriteAttent, dimRNNEnc, dimRNNDec, dimZ, runSteps, batch_size, reduceLRAfter, data, data_captions, valData=val_data, valDataCaptions=val_data_captions, pathToWeights=pathToWeights)
File "coco/alignDraw.py", line 355, in init
self._kl_final, self._logpxz, self._log_likelihood, self._c_ts, self._c_ts_gener, self._x, self._y, self._run_steps, self._updates_train, self._updates_gener, self._read_attent_params, self._write_attent_params, self._write_attent_params_gener, self._alphas_gener, self._params, self._mu_prior_t_gener, self._log_sigma_prior_t_gener = build_lang_encoder_and_attention_vae_decoder(self.dimY, self.dimLangRNN, self.dimAlign, self.dimX, self.dimReadAttent, self.dimWriteAttent, self.dimRNNEnc, self.dimRNNDec, self.dimZ, self.runSteps, self.pathToWeights)
File "coco/alignDraw.py", line 294, in build_lang_encoder_and_attention_vae_decoder
sequences=eps, outputs_info=[c0, h0_dec, cell0_dec, h0_enc, cell0_enc, kl_0, mu_prior_0, log_sigma_prior_0, None, None], non_sequences=all_params, n_steps=run_steps)
File "/home/envy/.local/lib/python2.7/site-packages/theano/scan_module/scan.py", line 1041, in scan
scan_outs = local_op(*scan_inputs)
File "/home/envy/.local/lib/python2.7/site-packages/theano/gof/op.py", line 611, in call
node = self.make_node(_inputs, *_kwargs)
File "/home/envy/.local/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 538, in make_node
inner_sitsot_out.type.dtype))
ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (outputs_info in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has dtype float32, while the result of the inner function (fn) has dtype float64. This can happen if the inner function of scan results in an upcast or downcast.
envy@ub1404:
/os_pri/github/text2image$

your .theanorc

Hi.
Thank you for your research!

Sorry, I have problem with testing, can you show your .theanorc?
I got typecasting error:

TypeError: Cannot convert Type TensorType(float32, matrix) (of Variable AdvancedSubtensor1.0) into Type TensorType(float64, matrix). You can try to manually convert AdvancedSubtensor1.0 into a TensorType(float64, matrix).

float32/float64 issue unresolved

having same issue as described here even after setting floatX=float32. i have cuda 7.0, cudnn installed on ubuntu 14.04. skipthoughts is verified working fine. any idea what the issue could be?

$ THEANO_FLAGS='floatX=float32,device=gpu0,scan.allow_gc=True' python alignDraw.py models/coco-captions-32x32.json
Using gpu device 0: GRID K520 (CNMeM is disabled, cuDNN 5005)
alignDraw.py:342: FutureWarning: comparison to None will result in an elementwise object comparison in the future.
if valData != None:
Traceback (most recent call last):
File "alignDraw.py", line 616, in
rvae = ReccurentAttentionVAE(dimY, dimLangRNN, dimAlign, dimX, dimReadAttent, dimWriteAttent, dimRNNEnc, dimRNNDec, dimZ, runSteps, batch_size, reduceLRAfter, data, data_captions, valData=val_data, valDataCaptions=val_data_captions, pathToWeights=pathToWeights)
File "alignDraw.py", line 354, in init
self._kl_final, self._logpxz, self._log_likelihood, self._c_ts, self._c_ts_gener, self._x, self._y, self._run_steps, self._updates_train, self._updates_gener, self._read_attent_params, self._write_attent_params, self._write_attent_params_gener, self._alphas_gener, self._params, self._mu_prior_t_gener, self._log_sigma_prior_t_gener = build_lang_encoder_and_attention_vae_decoder(self.dimY, self.dimLangRNN, self.dimAlign, self.dimX, self.dimReadAttent, self.dimWriteAttent, self.dimRNNEnc, self.dimRNNDec, self.dimZ, self.runSteps, self.pathToWeights)
File "alignDraw.py", line 293, in build_lang_encoder_and_attention_vae_decoder
sequences=eps, outputs_info=[c0, h0_dec, cell0_dec, h0_enc, cell0_enc, kl_0, mu_prior_0, log_sigma_prior_0, None, None], non_sequences=all_params, n_steps=run_steps)
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan.py", line 1041, in scan
scan_outs = local_op(*scan_inputs)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 611, in call
node = self.make_node(_inputs, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_op.py", line 538, in make_node
inner_sitsot_out.type.dtype))
ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (outputs_info in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has dtype float32, while the result of the inner function (fn) has dtype float64. This can happen if the inner function of scan results in an upcast or downcast.

TypeError

Hi,I set the floatX = float32 already , but I also got the TypeError
TypeError: ('An update must have the same type as the original shared variable (shared_var=<TensorType(float32, matrix)>, shared_var.type=TensorType(float32, matrix), update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64, matrix)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')

Error !

envy@ub1404:/os_pri/github/text2image$ python coco/alignDraw.py mnist-captions/models/mnist-captions.json
Using gpu device 0: GeForce GTX 950M (CNMeM is disabled, CuDNN 4007)
Traceback (most recent call last):
File "coco/alignDraw.py", line 608, in
train_paths = model["data"]["train"]
KeyError: 'train'
envy@ub1404:
/os_pri/github/text2image$ python coco/alignDraw.py mnist-captions/models/mnist-captions.json
Using gpu device 0: GeForce GTX 950M (CNMeM is disabled, CuDNN 4007)
Traceback (most recent call last):
File "coco/alignDraw.py", line 608, in
train_paths = model["data"]["train"]
KeyError: 'train'
envy@ub1404:~/os_pri/github/text2image$

wanted the pre-trained model

thanks for your code, and really hope you can upload your trained model, my computer will cost a lot of time to train.

text2image is giving error

Hi, I am trying run the alignDraw.py I am getting the following error

monica@monica:~/PycharmProjects/text2image-master/coco$ python alignDraw.py ./models/coco-captions-32x32.json
Traceback (most recent call last):
File "alignDraw.py", line 3, in
import h5py
File "/home/monica/.local/lib/python2.7/site-packages/h5py/init.py", line 24, in
from . import _errors
ImportError: /home/monica/.local/lib/python2.7/site-packages/h5py/_errors.so: undefined symbol: PyUnicodeUCS4_DecodeUTF8

Please help me how can I solve this error

How much time does an epoch take for training mnist data?

I work on a GPU: Tesla k40c with (CNMeM is enabled with initial size: 75.0% of memory, cuDNN 4007). And it takes me about 10+ min for training a epoch. Is it normal? Or I made a wrong config for my environment?

Here comes my log:

Train Results
20.4327215786 1138.28323364 1158.71595093
Validation Results
building validate function
0:00:52.053928
9.35944375 752.618843994 761.978292236

Recreated Train Dataset
[1139, 1308, 1155, 1142, 1335, 1308, 1309, 1304]

Epoch 1 took 0:13:34.399718
Train Results
13.6282463623 686.228092651 699.856334229
Recreated Train Dataset
[1161, 1261, 1172, 1157, 1288, 1352, 1311, 1298]

Epoch 2 took 0:12:48.676186
Train Results
14.1116136597 600.27987915 614.391492615
Recreated Train Dataset
[1221, 1271, 1145, 1203, 1312, 1312, 1273, 1263]

Epoch 3 took 0:11:07.076383
Train Results
12.3779268677 505.388762512 517.766689453
Validation Results
building validate function
0:00:07.973637
10.1579154358 474.040048218 484.197965393

Recreated Train Dataset
[1137, 1284, 1149, 1137, 1365, 1308, 1336, 1284]

Epoch 4 took 0:11:08.574547
Train Results
14.8620436646 411.504102783 426.366148071
Recreated Train Dataset
[1185, 1271, 1186, 1222, 1271, 1276, 1278, 1311]

Epoch 5 took 0:10:53.159525
Train Results
13.2145661377 363.50835083 376.72291626
Recreated Train Dataset
[1166, 1317, 1161, 1182, 1225, 1291, 1359, 1299]

Unable to load some downloaded files

I tried to run your code for MS COCO by "python alignDraw.py models/coco-captions-32x32.json", but I'm faced with a file loading problem.

Then, I tried to check whether additional downloaded files are possible to load or not.

I can't to load these files as below,
train-images-56x56.npy, train-captions.npy, train-captions-len.npy,
train-cap2im.pkl, dev-images-32x32.npy, dev-images-56x56.npy, dev-captions.npy,
dev-captions-len.npy, dev-cap2im.pkl, gan.hdf5

I suspected that I had some mistakes, firstly. However, I can load these files as below,
train-images-32x32.npy,
test-images-32x32.npy, test-captions.npy, test-captions-len.npy, test-cap2im.pkl,
dictionary.pkl

I'm not sure why the difference happens.
I added all error log in the end of this issue.

my environment

  • pyenv anaconda2-2.4.1
  • Python(2.7.11)
  • numpy(1.10.4)
  • scipy(0.17.0)
  • Theano(0.7.0)
  • h5py(2.5.0)

Thanks !

import numpy as np
import pickle

>>> data=np.load("train-images-56x56.npy")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 406, in load
   pickle_kwargs=pickle_kwargs)
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/format.py", line 673, in read_array
   array.shape = shape
ValueError: total size of new array must be unchanged

>>> data=np.load("train-captions.npy")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 416, in load
   "Failed to interpret file %s as a pickle" % repr(file))
IOError: Failed to interpret file 'train-captions.npy' as a pickle

>>> data=np.load("train-captions-len.npy")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 406, in load
   pickle_kwargs=pickle_kwargs)
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/format.py", line 673, in read_array
   array.shape = shape
ValueError: total size of new array must be unchanged

>>> data=np.load("dev-images-32x32.npy")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 392, in load
   fid.seek(-N, 1)  # back-up
IOError: [Errno 22] Invalid argument

>>> data=np.load("dev-images-56x56.npy")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 406, in load
   pickle_kwargs=pickle_kwargs)
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/format.py", line 673, in read_array
   array.shape = shape
ValueError: total size of new array must be unchanged

>>> data=np.load("dev-captions.npy")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 392, in load
   fid.seek(-N, 1)  # back-up
IOError: [Errno 22] Invalid argument

>>> data=np.load("dev-captions-len.npy")
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 392, in load
   fid.seek(-N, 1)  # back-up
IOError: [Errno 22] Invalid argument

>>> with open('train-cap2im.pkl','r') as f:
...     data = pickle.load(f)
... 
Traceback (most recent call last):
 File "<stdin>", line 2, in <module>
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/pickle.py", line 1384, in load
   return Unpickler(file).load()
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/pickle.py", line 864, in load
   dispatch[key](self)
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/pickle.py", line 886, in load_eof
   raise EOFError
EOFError

>>> with open('dev-cap2im.pkl','r') as f:
...     data = pickle.load(f)
... 
Traceback (most recent call last):
 File "<stdin>", line 2, in <module>
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/pickle.py", line 1384, in load
   return Unpickler(file).load()
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/pickle.py", line 864, in load
   dispatch[key](self)
 File "/home/is/seitaro-s/.pyenv/versions/anaconda2-2.4.1/envs/semantic-embedding-anaconda2/lib/python2.7/pickle.py", line 886, in load_eof
   raise EOFError
EOFError

>>> with h5py.File('gan.hdf5','r') as hdf5:
...     print hdf5['skipthought2image']
... 
Traceback (most recent call last):
 File "<stdin>", line 2, in <module>
 File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-build-BQojpm/h5py/h5py/_objects.c:2405)
 File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-build-BQojpm/h5py/h5py/_objects.c:2362)
 File "/home/is/seitaro-s/.local/lib/python2.7/site-packages/h5py/_hl/group.py", line 164, in __getitem__
   oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
 File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-build-BQojpm/h5py/h5py/_objects.c:2405)
 File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-build-BQojpm/h5py/h5py/_objects.c:2362)
 File "h5py/h5o.pyx", line 190, in h5py.h5o.open (/tmp/pip-build-BQojpm/h5py/h5py/h5o.c:3317)
KeyError: "Unable to open object (Object 'skipthought2image' doesn't exist)"

Chinese image caption, In the result, multiple words of the same type appear

Hello, I am using the COCO dataset,
A two-layer LSTM model, one layer for top-down attention, and one layer for language models.

Extracting words with jieba
I used all the words in the picture description that occurred more than 3 times as a dictionary file, and a total of 14,226 words.
words = [w for w in word_freq.keys () if word_freq [w]> 3]

After training the model, when using it, multiple words of the same type appear in the result, such as:

Note notebook laptop computer on bed
A little girl little girl girl standing together

How can I solve this problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.