Hello, Currently, the encode method

Access encoder, decoder tensors from standard transformer about visrep HOT 2 CLOSED

esalesky commented on May 27, 2024

Access encoder, decoder tensors from standard transformer

from visrep.

Comments (2)

esalesky commented on May 27, 2024

Hi! To clarify, are you looking for the tensor sequences of intermediate layers (for example, after the encoder?), or for an easier way to access the input and output tensors?

Intermediate layers and outputs are something that fairseq as a whole doesn't currently expose easily — you'd need to modify the transformer methods themselves to return intermediate layers you're interested in, and then you could return them through the hub interface. This is not something I plan to add in the near-term.

The hub_interface.py code for visrep follows other fairseq interfaces — for example, check out the default fairseq HubInterface. The encode function does tokenization and converts the input into the tensor which is passed to the model (which for this case means take a string and generates a tensor corresponding to the rendered image of it), and decode detokenizes the output tensor from the model to a sentence string. The encoder and decoder aren't accessed separately, and the encoded tensors from VisualTextDataset are those returned by encode().

If you want the tensors for the model inputs and outputs, you can get the tensor sequence for the input by calling encode() directly, and for the final output by modifying L62 in hub_interface.py to return not just the output string:

return [self.decode(hypos[0]["tokens"]) for hypos in batched_hypos]
--> return [(self.decode(hypos[0]["tokens"]), hypos[0])  for hypos in batched_hypos]

where hypos[0] is the best output hypothesis chosen by beam search, and will look something like this:

[{'tokens': tensor([ 57,   8,   7,  11, 749,   5,   2]), 'score': tensor(-0.4421), 'attention': tensor([]), 'alignment': tensor([]), 'positional_scores': tensor([-1.6331, -0.2708, -0.1837, -0.4788, -0.0630, -0.2784, -0.1871])}, {'tokens': tensor([152,   8,   7,  11, 749,   5,   2]), 'score': tensor(-0.4499), 'attention': tensor([]), 'alignment': tensor([]), 'positional_scores': tensor([-1.6986, -0.3032, -0.1806, -0.4487, -0.0603, -0.2711, -0.1871])}, {'tokens': tensor([100,  16,  11, 749,   5,   2]), 'score': tensor(-0.4618), 'attention': tensor([]), 'alignment': tensor([]), 'positional_scores': tensor([-1.6489, -0.2012, -0.4071, -0.0563, -0.2692, -0.1882])}, {'tokens': tensor([ 14,  22,   8,   7,  11, 749,   5,   2]), 'score': tensor(-0.5500), 'attention': tensor([]), 'alignment': tensor([]), 'positional_scores': tensor([-2.5914, -0.3630, -0.2739, -0.1726, -0.4729, -0.0613, -0.2734, -0.1919])}, {'tokens': tensor([ 20,  13,   8,   7,  11, 749,   5,   2]), 'score': tensor(-0.6997), 'attention': tensor([]), 'alignment': tensor([]), 'positional_scores': tensor([-3.1400, -0.9755, -0.3081, -0.1903, -0.4878, -0.0541, -0.2568, -0.1851])}]

from visrep.

long21wt commented on May 27, 2024

are you looking for the tensor sequences of intermediate layers (for example, after the encoder?), or for an easier way to access the input and output tensors?

I interested in intermediate layers and outputs of the model. I can use VisualTextTransformerEncoder like this:

self.models[0].encoder(batch['net_input']['src_tokens'], batch['net_input']['src_lengths'])

The TransformerDecoder is more complicated since it require prev_output_tokens, which are created in SequenceGenerator

Anyway, thanks for your answer.

from visrep.

Access encoder, decoder tensors from standard transformer about visrep HOT 2 CLOSED

Comments (2)

Related Issues (4)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent