Coder Social home page Coder Social logo

Comments (2)

esalesky avatar esalesky commented on May 27, 2024

Hi! To clarify, are you looking for the tensor sequences of intermediate layers (for example, after the encoder?), or for an easier way to access the input and output tensors?

Intermediate layers and outputs are something that fairseq as a whole doesn't currently expose easily — you'd need to modify the transformer methods themselves to return intermediate layers you're interested in, and then you could return them through the hub interface. This is not something I plan to add in the near-term.

The hub_interface.py code for visrep follows other fairseq interfaces — for example, check out the default fairseq HubInterface. The encode function does tokenization and converts the input into the tensor which is passed to the model (which for this case means take a string and generates a tensor corresponding to the rendered image of it), and decode detokenizes the output tensor from the model to a sentence string. The encoder and decoder aren't accessed separately, and the encoded tensors from VisualTextDataset are those returned by encode().

If you want the tensors for the model inputs and outputs, you can get the tensor sequence for the input by calling encode() directly, and for the final output by modifying L62 in hub_interface.py to return not just the output string:

return [self.decode(hypos[0]["tokens"]) for hypos in batched_hypos]
--> return [(self.decode(hypos[0]["tokens"]), hypos[0])  for hypos in batched_hypos]

where hypos[0] is the best output hypothesis chosen by beam search, and will look something like this:

[{'tokens': tensor([ 57,   8,   7,  11, 749,   5,   2]), 'score': tensor(-0.4421), 'attention': tensor([]), 'alignment': tensor([]), 'positional_scores': tensor([-1.6331, -0.2708, -0.1837, -0.4788, -0.0630, -0.2784, -0.1871])}, {'tokens': tensor([152,   8,   7,  11, 749,   5,   2]), 'score': tensor(-0.4499), 'attention': tensor([]), 'alignment': tensor([]), 'positional_scores': tensor([-1.6986, -0.3032, -0.1806, -0.4487, -0.0603, -0.2711, -0.1871])}, {'tokens': tensor([100,  16,  11, 749,   5,   2]), 'score': tensor(-0.4618), 'attention': tensor([]), 'alignment': tensor([]), 'positional_scores': tensor([-1.6489, -0.2012, -0.4071, -0.0563, -0.2692, -0.1882])}, {'tokens': tensor([ 14,  22,   8,   7,  11, 749,   5,   2]), 'score': tensor(-0.5500), 'attention': tensor([]), 'alignment': tensor([]), 'positional_scores': tensor([-2.5914, -0.3630, -0.2739, -0.1726, -0.4729, -0.0613, -0.2734, -0.1919])}, {'tokens': tensor([ 20,  13,   8,   7,  11, 749,   5,   2]), 'score': tensor(-0.6997), 'attention': tensor([]), 'alignment': tensor([]), 'positional_scores': tensor([-3.1400, -0.9755, -0.3081, -0.1903, -0.4878, -0.0541, -0.2568, -0.1851])}]

from visrep.

long21wt avatar long21wt commented on May 27, 2024

are you looking for the tensor sequences of intermediate layers (for example, after the encoder?), or for an easier way to access the input and output tensors?

I interested in intermediate layers and outputs of the model. I can use VisualTextTransformerEncoder like this:

self.models[0].encoder(batch['net_input']['src_tokens'], batch['net_input']['src_lengths'])

The TransformerDecoder is more complicated since it require prev_output_tokens, which are created in SequenceGenerator

Anyway, thanks for your answer.

from visrep.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.