ryankiros / visual-semantic-embedding Goto Github PK
View Code? Open in Web Editor NEWImplementation of the image-sentence embedding method described in "Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models"
License: Other
Implementation of the image-sentence embedding method described in "Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models"
License: Other
wget https://s3.amazonaws.com/lasagne/recipes/pretrained/imagenet/vgg19.pkl
I cant access to the imagenet vgg19 which is crucial to using this code implementation.
I get access forbidden. Any idea how to I download the .pkl file with wget?
The dataset page in
Each of the dataset files contains the captions as well as VGG features from the 19-layer model. Flickr8K comes with a pre-defined train/dev/test split, while for Flickr30K and MS COCO we use the splits produced by Andrej Karpathy. Note that the original images are not included with the dataset. The full contents of each of the datasets can be obtained here, here and here.
is not available
How can I access to page?
When I run train.trainer() .here are much mistake.Can you meet this problem?
Building model
Building f_log_probs... Done
Building f_cost... Done
Building sentence encoder
Building image encoder
Building f_grad... Building optimizers...Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydev_run_in_console.py", line 53, in run_file
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Users/jiapei.fjp/Documents/python_project/vsepp/visual-semantic-embedding/run.py", line 9, in
train.trainer()
File "/Users/jiapei.fjp/Documents/python_project/vsepp/visual-semantic-embedding/train.py", line 154, in trainer
f_grad_shared, f_update = eval(optimizer)(lr, tparams, grads, inps, cost)
File "/Users/jiapei.fjp/Documents/python_project/vsepp/visual-semantic-embedding/optim.py", line 39, in adam
f_update = theano.function([lr], [], updates=updates, on_unused_input='ignore', profile=False)
File "/Users/jiapei.fjp/venv/download_urls/lib/python2.7/site-packages/theano/compile/function.py", line 317, in function
output_keys=output_keys)
File "/Users/jiapei.fjp/venv/download_urls/lib/python2.7/site-packages/theano/compile/pfunc.py", line 449, in pfunc
no_default_updates=no_default_updates)
File "/Users/jiapei.fjp/venv/download_urls/lib/python2.7/site-packages/theano/compile/pfunc.py", line 208, in rebuild_collect_shared
raise TypeError(err_msg, err_sug)
I realized that in the validation / test provided for COCO there are 5000 images and 5000 captions where for 5000 images should be 25000 captions. Actually, the last 4000 images do not have corresponding caption. So, if we only use the provided captions we are evaluating over a 1000 images subset.
In readme file says:
Flickr8K comes with a pre-defined train/dev/test split, while for Flickr30K and MS COCO we use the splits produced by Andrej Karpathy.
While Karpathy's paper indicates:
For MSCOCO we use 5,000 images for both validation and testing
Is the original test actually over 1000 images or the caption list provided is incomplete?
Thanks,
Armand
{'grad_clip': 2.0, 'dim': 1024, 'optimizer': 'adam', 'dim_word': 300, 'data': 'f8k', 'lrate': 0.0002, 'batch_size': 128, 'encoder': 'gru', 'maxlen_w': 100, 'saveto': 'f8k.snap.npz', 'max_epochs': 15, 'dim_image': 4096, 'dispFreq': 10, 'decay_c': 0.0, 'margin': 0.2, 'reload_': False, 'validFreq': 100}
Loading dataset
Creating dictionary
Dictionary size: 8919
Building model
WARNING (theano.tensor.blas): We did not find a dynamic library in the library_dir of the library we use for blas. If you use ATLAS, make sure to compile it with dynamics library.
Building f_log_probs... Done
Building f_cost... Done
Building sentence encoder
Building image encoder
Building f_grad... Building optimizers...
Traceback (most recent call last):
File "train.py", line 224, in <module>
trainer(data='f8k', saveto='f8k.snap.npz')
File "train.py", line 152, in trainer
f_grad_shared, f_update = eval(optimizer)(lr, tparams, grads, inps, cost)
File "/home/john/withlinux/ai/visual-semantic-embedding/optim.py", line 39, in adam
f_update = theano.function([lr], [], updates=updates, on_unused_input='ignore', profile=False)
File "/home/john/.local/lib/python2.7/site-packages/theano/compile/function.py", line 317, in function
output_keys=output_keys)
File "/home/john/.local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 449, in pfunc
no_default_updates=no_default_updates)
File "/home/john/.local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 208, in rebuild_collect_shared
raise TypeError(err_msg, err_sug)
TypeError: ('An update must have the same type as the original shared variable (shared_var=<TensorType(float32, matrix)>, shared_var.type=TensorType(float32, matrix), update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64, matrix)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')
It may not be an issue. Just trying to follow the getting started guide in the readme file to run the model. it shows no model found error. May I know where to find npz file? I can't find it inside f8k dataset.
File "tools.py", line 32, in load_model
with open('%s.f8k.npz.dictionary.pkl'%path_to_model, 'rb') as f:
TypeError: not all arguments converted during string formatting
default_model = '/home/koel/workplace/semantic-embedding/vse/f8k.npz'
print 'OK'
#-----------------------------------------------------------------------------#
def load_model(path_to_model=default_model):
"""
Load all model components
"""
print path_to_model
# Load the worddict
print 'Loading dictionary...'
with open('%s.f8k.npz.dictionary.pkl'%path_to_model, 'rb') as f:
worddict = pkl.load(f)
Thanks for providing such nice and interesting code.
I noticed that when using bag-of-words encoder, the code still adding an 'end of sentence' word for matrix x in function encode_sentences:
x = numpy.zeros((k+1, len(caption))).astype('int64')
x_mask = numpy.zeros((k+1, len(caption))).astype('float32')
for idx, s in enumerate(seqs):
x[:k,idx] = s
x_mask[:k+1,idx] = 1.
While in function build_sentence_encoder:
# Word embedding
emb = tparams['Wemb'][x.flatten()].reshape([n_timesteps, n_samples, options['dim_word']])
# Encode sentences
if options['encoder'] == 'bow':
sents = (emb * mask[:,:,None]).sum(0)
will encodes an sentence with an extra word which corresponds to the first row of tparams['Wemb'], which I think should be ignored.
Also, in encode_sentences function:
seqs = []
for i, cc in enumerate(caption):
seqs.append([model['worddict'][w] if d[w] > 0 and model['worddict'][w] < model['options']['n_words'] else 1 for w in cc])
I think the line inside the for loop should be changed to:
seqs.append([model['worddict'][w] - 2 if d[w] > 0 and model['worddict'][w] < model['options']['n_words']
since the two words '' and 'UNK' are not in params['Wemb']
However, these did not effect the final performance a lot.
def adam(lr, tparams, grads, inp, cost):
gshared = [theano.shared(p.get_value() * 0., name='%s_grad'%k) for k, p in tparams.iteritems()]
gsup = [(gs, g) for gs, g in zip(gshared, grads)]
f_grad_shared = theano.function(inp, cost, updates=gsup, profile=False)
b1 = 0.1
b2 = 0.001
e = 1e-8
updates = []
i = theano.shared(numpy.float32(0.))
i_t = i + 1.
fix1 = 1. - b1**(i_t)
fix2 = 1. - b2**(i_t)
lr_t = lr * (tensor.sqrt(fix2) / fix1)
for p, g in zip(tparams.values(), gshared):
m = theano.shared(p.get_value() * 0.)
v = theano.shared(p.get_value() * 0.)
m_t = (b1 * g) + ((1. - b1) * m)
v_t = (b2 * tensor.sqr(g)) + ((1. - b2) * v)
g_t = m_t / (tensor.sqrt(v_t) + e)
p_t = p - (lr_t * g_t)
updates.append((m, m_t))
updates.append((v, v_t))
updates.append((p, p_t))
updates.append((i, i_t))
f_update = theano.function([lr], [], updates=updates, on_unused_input='raise', profile=False)
return f_grad_shared, f_update
getting the error in line "f_update = theano.function([lr], [], updates=updates, on_unused_input='raise', profile=False)". Help me to fix it.
I have my own images and captions, but i dont know How to create a .npy file containing a NumPy array of image features. So that i can put my dataset in the same format, then it can be used for training new models.
I just want to use this project as an sentence embedding.
The output form is just as follows:
[0,0,-1,0,0,-1,-1....-1,0,0]
...
[0,-1,0,-1,0,-1,-1....0,-1,0]
Is this a right form?
The sentence are only encodered as 0 or -1 like a one-hot embedding? @ryankiros
Epoch 1 Update 510 Cost 308.02814 UD 0.764338970184
Seen 60000 samples
Epoch 2
Epoch 2 Update 520 Cost 118.96719 UD 0.695561885834
Epoch 2 Update 530 Cost 226.9092 UD 0.454373121262
Epoch 2 Update 540 Cost 11.716671 UD 0.462770938873
Epoch 2 Update 550 Cost 25.91058 UD 0.57851600647
NaN detected
can you meet this problem?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.