Coder Social home page Coder Social logo

ryankiros / visual-semantic-embedding Goto Github PK

View Code? Open in Web Editor NEW
425.0 425.0 126.0 33 KB

Implementation of the image-sentence embedding method described in "Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models"

License: Other

Python 100.00%

visual-semantic-embedding's People

Contributors

ivendrov avatar ryankiros avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

visual-semantic-embedding's Issues

dataset page is not available

The dataset page in

Each of the dataset files contains the captions as well as VGG features from the 19-layer model. Flickr8K comes with a pre-defined train/dev/test split, while for Flickr30K and MS COCO we use the splits produced by Andrej Karpathy. Note that the original images are not included with the dataset. The full contents of each of the datasets can be obtained here, here and here.

is not available

How can I access to page?

run train.py

When I run train.trainer() .here are much mistake.Can you meet this problem?

Building model
Building f_log_probs... Done
Building f_cost... Done
Building sentence encoder
Building image encoder
Building f_grad... Building optimizers...Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydev_run_in_console.py", line 53, in run_file
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Users/jiapei.fjp/Documents/python_project/vsepp/visual-semantic-embedding/run.py", line 9, in
train.trainer()
File "/Users/jiapei.fjp/Documents/python_project/vsepp/visual-semantic-embedding/train.py", line 154, in trainer
f_grad_shared, f_update = eval(optimizer)(lr, tparams, grads, inps, cost)
File "/Users/jiapei.fjp/Documents/python_project/vsepp/visual-semantic-embedding/optim.py", line 39, in adam
f_update = theano.function([lr], [], updates=updates, on_unused_input='ignore', profile=False)
File "/Users/jiapei.fjp/venv/download_urls/lib/python2.7/site-packages/theano/compile/function.py", line 317, in function
output_keys=output_keys)
File "/Users/jiapei.fjp/venv/download_urls/lib/python2.7/site-packages/theano/compile/pfunc.py", line 449, in pfunc
no_default_updates=no_default_updates)
File "/Users/jiapei.fjp/venv/download_urls/lib/python2.7/site-packages/theano/compile/pfunc.py", line 208, in rebuild_collect_shared
raise TypeError(err_msg, err_sug)

COCO's val/test sets captions incomplete?

I realized that in the validation / test provided for COCO there are 5000 images and 5000 captions where for 5000 images should be 25000 captions. Actually, the last 4000 images do not have corresponding caption. So, if we only use the provided captions we are evaluating over a 1000 images subset.
In readme file says:

Flickr8K comes with a pre-defined train/dev/test split, while for Flickr30K and MS COCO we use the splits produced by Andrej Karpathy.

While Karpathy's paper indicates:

For MSCOCO we use 5,000 images for both validation and testing

Is the original test actually over 1000 images or the caption list provided is incomplete?
Thanks,
Armand

Failed to train

{'grad_clip': 2.0, 'dim': 1024, 'optimizer': 'adam', 'dim_word': 300, 'data': 'f8k', 'lrate': 0.0002, 'batch_size': 128, 'encoder': 'gru', 'maxlen_w': 100, 'saveto': 'f8k.snap.npz', 'max_epochs': 15, 'dim_image': 4096, 'dispFreq': 10, 'decay_c': 0.0, 'margin': 0.2, 'reload_': False, 'validFreq': 100}
Loading dataset
Creating dictionary
Dictionary size: 8919
Building model
WARNING (theano.tensor.blas): We did not find a dynamic library in the library_dir of the library we use for blas. If you use ATLAS, make sure to compile it with dynamics library.
Building f_log_probs... Done
Building f_cost... Done
Building sentence encoder
Building image encoder
Building f_grad... Building optimizers...
Traceback (most recent call last):
  File "train.py", line 224, in <module>
    trainer(data='f8k', saveto='f8k.snap.npz')
  File "train.py", line 152, in trainer
    f_grad_shared, f_update = eval(optimizer)(lr, tparams, grads, inps, cost)
  File "/home/john/withlinux/ai/visual-semantic-embedding/optim.py", line 39, in adam
    f_update = theano.function([lr], [], updates=updates, on_unused_input='ignore', profile=False)
  File "/home/john/.local/lib/python2.7/site-packages/theano/compile/function.py", line 317, in function
    output_keys=output_keys)
  File "/home/john/.local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 449, in pfunc
    no_default_updates=no_default_updates)
  File "/home/john/.local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 208, in rebuild_collect_shared
    raise TypeError(err_msg, err_sug)
TypeError: ('An update must have the same type as the original shared variable (shared_var=<TensorType(float32, matrix)>, shared_var.type=TensorType(float32, matrix), update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64, matrix)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')

not all arguments converted during string formatting

File "tools.py", line 32, in load_model
with open('%s.f8k.npz.dictionary.pkl'%path_to_model, 'rb') as f:
TypeError: not all arguments converted during string formatting

default_model = '/home/koel/workplace/semantic-embedding/vse/f8k.npz'
print 'OK'
#-----------------------------------------------------------------------------#

def load_model(path_to_model=default_model):
"""
Load all model components
"""
print path_to_model

# Load the worddict
print 'Loading dictionary...'
with open('%s.f8k.npz.dictionary.pkl'%path_to_model, 'rb') as f:
    worddict = pkl.load(f)

Bow model encodes sentence with an extra word

Thanks for providing such nice and interesting code.

I noticed that when using bag-of-words encoder, the code still adding an 'end of sentence' word for matrix x in function encode_sentences:

        x = numpy.zeros((k+1, len(caption))).astype('int64')
        x_mask = numpy.zeros((k+1, len(caption))).astype('float32')
        for idx, s in enumerate(seqs):
            x[:k,idx] = s
            x_mask[:k+1,idx] = 1.

While in function build_sentence_encoder:

# Word embedding
emb = tparams['Wemb'][x.flatten()].reshape([n_timesteps, n_samples, options['dim_word']])

# Encode sentences
if options['encoder'] == 'bow':
    sents = (emb * mask[:,:,None]).sum(0)

will encodes an sentence with an extra word which corresponds to the first row of tparams['Wemb'], which I think should be ignored.

Also, in encode_sentences function:

        seqs = []
        for i, cc in enumerate(caption):
            seqs.append([model['worddict'][w] if d[w] > 0 and model['worddict'][w] < model['options']['n_words'] else 1 for w in cc])

I think the line inside the for loop should be changed to:

seqs.append([model['worddict'][w] - 2 if d[w] > 0 and model['worddict'][w] < model['options']['n_words']

since the two words '' and 'UNK' are not in params['Wemb']

However, these did not effect the final performance a lot.

TypeError: ('An update must have the same type as the original shared variable (shared_var=<TensorType(float32, matrix)>, shared_var.type=TensorType(float32, matrix), update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64, matrix)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')

def adam(lr, tparams, grads, inp, cost):
gshared = [theano.shared(p.get_value() * 0., name='%s_grad'%k) for k, p in tparams.iteritems()]
gsup = [(gs, g) for gs, g in zip(gshared, grads)]

f_grad_shared = theano.function(inp, cost, updates=gsup, profile=False)

b1 = 0.1
b2 = 0.001
e = 1e-8

updates = []

i = theano.shared(numpy.float32(0.))
i_t = i + 1.
fix1 = 1. - b1**(i_t)
fix2 = 1. - b2**(i_t)
lr_t = lr * (tensor.sqrt(fix2) / fix1)

for p, g in zip(tparams.values(), gshared):
    m = theano.shared(p.get_value() * 0.)
    v = theano.shared(p.get_value() * 0.)
    m_t = (b1 * g) + ((1. - b1) * m)
    v_t = (b2 * tensor.sqr(g)) + ((1. - b2) * v)
    g_t = m_t / (tensor.sqrt(v_t) + e)
    p_t = p - (lr_t * g_t)
    updates.append((m, m_t))
    updates.append((v, v_t))
    updates.append((p, p_t))
updates.append((i, i_t))


f_update = theano.function([lr], [], updates=updates, on_unused_input='raise', profile=False)

return f_grad_shared, f_update

getting the error in line "f_update = theano.function([lr], [], updates=updates, on_unused_input='raise', profile=False)". Help me to fix it.

The output of the encoder?

I just want to use this project as an sentence embedding.
The output form is just as follows:
[0,0,-1,0,0,-1,-1....-1,0,0]
...
[0,-1,0,-1,0,-1,-1....0,-1,0]

Is this a right form?

The sentence are only encodered as 0 or -1 like a one-hot embedding? @ryankiros

cost is NaN

Epoch 1 Update 510 Cost 308.02814 UD 0.764338970184
Seen 60000 samples
Epoch 2
Epoch 2 Update 520 Cost 118.96719 UD 0.695561885834
Epoch 2 Update 530 Cost 226.9092 UD 0.454373121262
Epoch 2 Update 540 Cost 11.716671 UD 0.462770938873
Epoch 2 Update 550 Cost 25.91058 UD 0.57851600647
NaN detected

can you meet this problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.