ryankiros / visual-semantic-embedding Goto Github PK

Implementation of the image-sentence embedding method described in "Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models"

License: Other

Python 100.00%

visual-semantic-embedding's People

Contributors

Stargazers

Watchers

Forkers

genomic mathrho nerei silky peratham xinmei9322 pengxuwei fangzheng354 fucheng830 zhangweiabc wavelets ivendrov xmzhao arnabgho ominux graphific vrama91 yanweifu shyamalschandra taktak1 soorajmr atousatorabi caomw parthchadha stevenlol pakoch kevinwenya fatmas1982 dihong littlecherry11 pcerles jacklone arasharchor zbxzc35 bibutahseen somaticapi jacky168 zqj7 imclab liqing-ustc satwikkottur kmario23 rap9430 vijayyak santara yangwang166 bityangke bodidze t-mai xuepo99 codeaudit g453 thanhlct vyraun rfelixmg plsang yangxs qss2012 chagge zhucqut chwlsunny hubevan pkhighfile colinsongf mmc-tudelft yielenawang ml-lab richardkelley lxw4939 stefanxinhong donghwan-lee soumenms2015 ifff kevinmtian owalnuto shotakikuchi shubhampachori12110095 list12356 aabbcc0812206523 shenglih zhihang-li bistaumanga qiongxiao srutibh brucew91 statml sabirdvd cuiwow niluthpol hologerry meelement youngkuan xinshu afcarl zjtheone patrk zhouyonglong yldcs bcui6611 audreycui

visual-semantic-embedding's Issues

HTTP request sent, awaiting response... 403 Forbidden Imagenet VGG19.pkl

wget https://s3.amazonaws.com/lasagne/recipes/pretrained/imagenet/vgg19.pkl

I cant access to the imagenet vgg19 which is crucial to using this code implementation.
I get access forbidden. Any idea how to I download the .pkl file with wget?

dataset page is not available

The dataset page in

Each of the dataset files contains the captions as well as VGG features from the 19-layer model. Flickr8K comes with a pre-defined train/dev/test split, while for Flickr30K and MS COCO we use the splits produced by Andrej Karpathy. Note that the original images are not included with the dataset. The full contents of each of the datasets can be obtained here, here and here.

is not available

How can I access to page?

run train.py

When I run train.trainer() .here are much mistake.Can you meet this problem?

Building model
Building f_log_probs... Done
Building f_cost... Done
Building sentence encoder
Building image encoder
Building f_grad... Building optimizers...Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydev_run_in_console.py", line 53, in run_file
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Users/jiapei.fjp/Documents/python_project/vsepp/visual-semantic-embedding/run.py", line 9, in
train.trainer()
File "/Users/jiapei.fjp/Documents/python_project/vsepp/visual-semantic-embedding/train.py", line 154, in trainer
f_grad_shared, f_update = eval(optimizer)(lr, tparams, grads, inps, cost)
File "/Users/jiapei.fjp/Documents/python_project/vsepp/visual-semantic-embedding/optim.py", line 39, in adam
f_update = theano.function([lr], [], updates=updates, on_unused_input='ignore', profile=False)
File "/Users/jiapei.fjp/venv/download_urls/lib/python2.7/site-packages/theano/compile/function.py", line 317, in function
output_keys=output_keys)
File "/Users/jiapei.fjp/venv/download_urls/lib/python2.7/site-packages/theano/compile/pfunc.py", line 449, in pfunc
no_default_updates=no_default_updates)
File "/Users/jiapei.fjp/venv/download_urls/lib/python2.7/site-packages/theano/compile/pfunc.py", line 208, in rebuild_collect_shared
raise TypeError(err_msg, err_sug)

COCO's val/test sets captions incomplete?

I realized that in the validation / test provided for COCO there are 5000 images and 5000 captions where for 5000 images should be 25000 captions. Actually, the last 4000 images do not have corresponding caption. So, if we only use the provided captions we are evaluating over a 1000 images subset.
In readme file says:

Flickr8K comes with a pre-defined train/dev/test split, while for Flickr30K and MS COCO we use the splits produced by Andrej Karpathy.

While Karpathy's paper indicates:

For MSCOCO we use 5,000 images for both validation and testing

Is the original test actually over 1000 images or the caption list provided is incomplete?
Thanks,
Armand

Failed to train

{'grad_clip': 2.0, 'dim': 1024, 'optimizer': 'adam', 'dim_word': 300, 'data': 'f8k', 'lrate': 0.0002, 'batch_size': 128, 'encoder': 'gru', 'maxlen_w': 100, 'saveto': 'f8k.snap.npz', 'max_epochs': 15, 'dim_image': 4096, 'dispFreq': 10, 'decay_c': 0.0, 'margin': 0.2, 'reload_': False, 'validFreq': 100}
Loading dataset
Creating dictionary
Dictionary size: 8919
Building model
WARNING (theano.tensor.blas): We did not find a dynamic library in the library_dir of the library we use for blas. If you use ATLAS, make sure to compile it with dynamics library.
Building f_log_probs... Done
Building f_cost... Done
Building sentence encoder
Building image encoder
Building f_grad... Building optimizers...
Traceback (most recent call last):
  File "train.py", line 224, in <module>
    trainer(data='f8k', saveto='f8k.snap.npz')
  File "train.py", line 152, in trainer
    f_grad_shared, f_update = eval(optimizer)(lr, tparams, grads, inps, cost)
  File "/home/john/withlinux/ai/visual-semantic-embedding/optim.py", line 39, in adam
    f_update = theano.function([lr], [], updates=updates, on_unused_input='ignore', profile=False)
  File "/home/john/.local/lib/python2.7/site-packages/theano/compile/function.py", line 317, in function
    output_keys=output_keys)
  File "/home/john/.local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 449, in pfunc
    no_default_updates=no_default_updates)
  File "/home/john/.local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 208, in rebuild_collect_shared
    raise TypeError(err_msg, err_sug)
TypeError: ('An update must have the same type as the original shared variable (shared_var=<TensorType(float32, matrix)>, shared_var.type=TensorType(float32, matrix), update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64, matrix)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')

No such file or directory: '/ais/gobi3/u/rkiros/uvsmodels/coco.npz.dictionary.pkl'

It may not be an issue. Just trying to follow the getting started guide in the readme file to run the model. it shows no model found error. May I know where to find npz file? I can't find it inside f8k dataset.

not all arguments converted during string formatting

File "tools.py", line 32, in load_model
with open('%s.f8k.npz.dictionary.pkl'%path_to_model, 'rb') as f:
TypeError: not all arguments converted during string formatting

default_model = '/home/koel/workplace/semantic-embedding/vse/f8k.npz'
print 'OK'
#-----------------------------------------------------------------------------#

def load_model(path_to_model=default_model):
"""
Load all model components
"""
print path_to_model

# Load the worddict
print 'Loading dictionary...'
with open('%s.f8k.npz.dictionary.pkl'%path_to_model, 'rb') as f:
    worddict = pkl.load(f)

Bow model encodes sentence with an extra word

Thanks for providing such nice and interesting code.

I noticed that when using bag-of-words encoder, the code still adding an 'end of sentence' word for matrix x in function encode_sentences:

        x = numpy.zeros((k+1, len(caption))).astype('int64')
        x_mask = numpy.zeros((k+1, len(caption))).astype('float32')
        for idx, s in enumerate(seqs):
            x[:k,idx] = s
            x_mask[:k+1,idx] = 1.

While in function build_sentence_encoder:

# Word embedding
emb = tparams['Wemb'][x.flatten()].reshape([n_timesteps, n_samples, options['dim_word']])

# Encode sentences
if options['encoder'] == 'bow':
    sents = (emb * mask[:,:,None]).sum(0)

will encodes an sentence with an extra word which corresponds to the first row of tparams['Wemb'], which I think should be ignored.

Also, in encode_sentences function:

        seqs = []
        for i, cc in enumerate(caption):
            seqs.append([model['worddict'][w] if d[w] > 0 and model['worddict'][w] < model['options']['n_words'] else 1 for w in cc])

I think the line inside the for loop should be changed to:

seqs.append([model['worddict'][w] - 2 if d[w] > 0 and model['worddict'][w] < model['options']['n_words']

since the two words '' and 'UNK' are not in params['Wemb']

However, these did not effect the final performance a lot.

TypeError: ('An update must have the same type as the original shared variable (shared_var=<TensorType(float32, matrix)>, shared_var.type=TensorType(float32, matrix), update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64, matrix)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')

def adam(lr, tparams, grads, inp, cost):
gshared = [theano.shared(p.get_value() * 0., name='%s_grad'%k) for k, p in tparams.iteritems()]
gsup = [(gs, g) for gs, g in zip(gshared, grads)]

f_grad_shared = theano.function(inp, cost, updates=gsup, profile=False)

b1 = 0.1
b2 = 0.001
e = 1e-8

updates = []

i = theano.shared(numpy.float32(0.))
i_t = i + 1.
fix1 = 1. - b1**(i_t)
fix2 = 1. - b2**(i_t)
lr_t = lr * (tensor.sqrt(fix2) / fix1)

for p, g in zip(tparams.values(), gshared):
    m = theano.shared(p.get_value() * 0.)
    v = theano.shared(p.get_value() * 0.)
    m_t = (b1 * g) + ((1. - b1) * m)
    v_t = (b2 * tensor.sqr(g)) + ((1. - b2) * v)
    g_t = m_t / (tensor.sqrt(v_t) + e)
    p_t = p - (lr_t * g_t)
    updates.append((m, m_t))
    updates.append((v, v_t))
    updates.append((p, p_t))
updates.append((i, i_t))


f_update = theano.function([lr], [], updates=updates, on_unused_input='raise', profile=False)

return f_grad_shared, f_update

getting the error in line "f_update = theano.function([lr], [], updates=updates, on_unused_input='raise', profile=False)". Help me to fix it.

How to create a .npy file containing a NumPy array of image features

I have my own images and captions, but i dont know How to create a .npy file containing a NumPy array of image features. So that i can put my dataset in the same format, then it can be used for training new models.

The output of the encoder?

I just want to use this project as an sentence embedding.
The output form is just as follows:
[0,0,-1,0,0,-1,-1....-1,0,0]
...
[0,-1,0,-1,0,-1,-1....0,-1,0]

Is this a right form?

The sentence are only encodered as 0 or -1 like a one-hot embedding? @ryankiros

cost is NaN

Epoch 1 Update 510 Cost 308.02814 UD 0.764338970184
Seen 60000 samples
Epoch 2
Epoch 2 Update 520 Cost 118.96719 UD 0.695561885834
Epoch 2 Update 530 Cost 226.9092 UD 0.454373121262
Epoch 2 Update 540 Cost 11.716671 UD 0.462770938873
Epoch 2 Update 550 Cost 25.91058 UD 0.57851600647
NaN detected

can you meet this problem?

ryankiros / visual-semantic-embedding Goto Github PK

visual-semantic-embedding's People

Contributors

Stargazers

Watchers

Forkers

visual-semantic-embedding's Issues

HTTP request sent, awaiting response... 403 Forbidden Imagenet VGG19.pkl

dataset page is not available

run train.py

COCO's val/test sets captions incomplete?

Failed to train

No such file or directory: '/ais/gobi3/u/rkiros/uvsmodels/coco.npz.dictionary.pkl'

not all arguments converted during string formatting

Bow model encodes sentence with an extra word

How to create a .npy file containing a NumPy array of image features

The output of the encoder?

cost is NaN

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent