Coder Social home page Coder Social logo

end2_assignment's Introduction

Select a random pair of senetences from the data we prepared

sample = random.choice(pairs) sample

=>['vous etes plus intelligent que moi .', 'you re smarter than me .']

In order to work with embedding layer and the LSTM the inputs should be in the form of tensor, So we need to convert the sentences(words) to tensors. First we'll split the sentences by whitespaces and convert each words into indices(using word2index[word])

input_sentence = sample[0] target_sentence = sample[1] input_indices = [input_lang.word2index[word] for word in input_sentence.split(' ')] target_indices = [output_lang.word2index[word] for word in target_sentence.split(' ')] input_indices, target_indices

=>([118, 214, 152, 135, 902, 42, 5], [129, 78, 1319, 1166, 343, 4])

Then convert the input_indices into tensors

input_tensor = torch.tensor(input_indices, dtype=torch.long, device= device) output_tensor = torch.tensor(target_indices, dtype=torch.long, device= device) Next, We will define a Embedding layer as well as LSTM layers for encoder

embedding = nn.Embedding(input_size, hidden_size).to(device) lstm = nn.LSTM(hidden_size, hidden_size).to(device)

We are working with 1 sample, but we would be working for a batch. Let's fix that by converting our input_tensor into a fake batch

print(embedded_input.shape) embedded_input = embedding(input_tensor[0].view(-1, 1)) print(embedded_input.shape) Let's build our LSTM, initialize the hidden state and cell state with Zeros(Empty state)

(hidden,ct) = torch.zeros(1, 1, 256, device=device),torch.zeros(1, 1, 256, device=device) embedded_input = embedding(input_tensor[0].view(-1, 1)) output, (hidden,ct) = lstm(embedded_input, (hidden,ct))

Now we will define a empty tensor with size MAX_LENGTH to store the Encoder outputs. Then we can get the encoder outputs for each of the word in the Sentence

encoder_outputs = torch.zeros(MAX_LENGTH, 256, device=device) (encoder_hidden,encoder_ct) = torch.zeros(1, 1, 256, device=device),torch.zeros(1, 1, 256, device=device)

for i in range(input_tensor.size()[0]):
embedded_input = embedding(input_tensor[i].view(-1, 1)) output, (encoder_hidden,encoder_ct) = lstm(embedded_input, (encoder_hidden,encoder_ct)) encoder_outputs[i] += output[0,0] Encoder Steps Input Sentence: vous etes plus intelligent que moi . Target Sentence: you re smarter than me . Input indices: [118, 214, 152, 135, 902, 42, 5] Target indices: [118, 214, 152, 135, 902, 42, 5] After adding the token Input indices: [118, 214, 152, 135, 902, 42, 5, 1] Target indices: [118, 214, 152, 135, 902, 42, 5, 1] Input tensor: tensor([118, 214, 152, 135, 902, 42, 5, 1], device='cuda:0') Target tensor: tensor([ 129, 78, 1319, 1166, 343, 4, 1], device='cuda:0')

Step 0 Word => vous Input Tensor => tensor(118, device='cuda:0') 08 09

Step 1 Word => etes Input Tensor => tensor(214, device='cuda:0') 10 11

Step 2 Word => plus Input Tensor => tensor(152, device='cuda:0') 12 13

Step 3 Word => intelligent Input Tensor => tensor(135, device='cuda:0') 14 15

Step 4 Word => que Input Tensor => tensor(902, device='cuda:0') 16 17

Step 5 Word => moi Input Tensor => tensor(42, device='cuda:0') 18 19

Step 6 Word => . Input Tensor => tensor(5, device='cuda:0') 20 21

Step 7 Word => Input Tensor => tensor(1, device='cuda:0') 22 23

We completed the Encoder part now, Now we can start building the Attention Decoder First input to the decoder will be SOS_token, later inputs would be the words it predicted (unless we implement teacher forcing).

Decoder/LSTM's hidden state will be initialized with the encoder's last hidden state.

We will use LSTM's hidden state and last prediction to generate attention weight using a FC layer.

This attention weight will be used to weigh the encoder_outputs using batch matric multiplication. This will give us a NEW view on how to look at encoder_states.

this attention applied encoder_states will then be concatenated with the input, and then sent a linear layer and then sent to the LSTM.

LSTM's output will be sent to a FC layer to predict one of the output_language words

first input decoder_input = torch.tensor([[SOS_token]], device=device) (decoder_hidden,decoder_ct) = (encoder_hidden,encoder_ct) decoded_words = [] The inputs to LSTM are last prediction or a word from the target sentence based on teacher forcing ratio and decoder hidden states.

We need to concatenate the embeddings and the last decoder hidden state

torch.cat((embedded[0], decoder_hidden[0]), 1).shape

=> torch.Size([1, 512]) Now we will calaculate the attentions. We will calculating the attentions by conacatinating the embeddings and last decoder hidden state and giving as input to the fully connected layer.

attn_weight_layer = nn.Linear(256 * 2, 10).to(device) \n attn_weights = attn_weight_layer(torch.cat((embedded[0], decoder_hidden[0]), 1)) \n attn_weights

=>tensor([[-0.8181, 0.0128, 0.0196, -0.3952, -0.1043, -0.1855, -0.5074, -0.4552, -0.5731, 0.5895]], device='cuda:0', grad_fn=)

Now We got the attention weights, now we'll apply the attention on the encoder outputs and combine the applied attentions and embeddings, pass through the relu, this will be input for the LSTM layer. On the output of the LSTM we'll the softmax and predict the expected word.

Decoder steps with Full teacher forcing Step 0 Expected output(word) => you Expected output(Index) => 129 Predicted output(word) => opposed Predicted output(Index) => 2669 24

Step 1 Expected output(word) => re Expected output(Index) => 78 Predicted output(word) => opposed Predicted output(Index) => 2669 25

Step 2 Expected output(word) => smarter Expected output(Index) => 1319 Predicted output(word) => opposed Predicted output(Index) => 2669 26

Step 3 Expected output(word) => than Expected output(Index) => 1166 Predicted output(word) => options Predicted output(Index) => 1343 27

Step 4 Expected output(word) => me Expected output(Index) => 343 Predicted output(word) => articulate Predicted output(Index) => 964 28

Step 5 Expected output(word) => . Expected output(Index) => 4 Predicted output(word) => forgetting Predicted output(Index) => 2345 29

end2_assignment's People

Contributors

priyasharma2427 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.