The nn4nlp_project from shuxinlin

Potentially breaking LENGTH_UNIT after padding

After adding <EOS>, you break the LENGTH_UNIT you padded?

CONLL 2003 use German language

We need to use German language of CONLL 2003 dataset.

How to prepare the pre-trained word embedding?

Computing score in decoder

Currently we transform from hidden state to scores of all possible labels by a linear layer. Is this a good model?

If still use padding, do mask on several things

If use padding, mask the following:

objective function: don't include cross entropy from pair of matching to input padding.
attention: when computing alpha_{ij} (doing softmax), don't include padding. (softmax only over non-padding part.)

Attention does not involve cell states

Currently attention does not involve cell states. Is that a good choice?

Attention

think about it carefully again: who attend to who?

Batching on same-length sentences

Simply batch same-length sentences together. Don't do padding.

What if beam_size > current_batch_size * sentence_len?

Maybe we need to take care of this case later.

Adams learning rate 0.001 or 0.0001

Output context vector to next input

Can consider this later

most_common() not fully utilized?

In preprocess.py, in build_vocab(), there is a line of code:
all_words = collections.Counter(all_text).most_common()
It seems that at this moment the "most_common()" part is not really utilized. But could potentially pick the most common N words here.

Should not discard batches that have sizes different from the user-specified batch size

In preprocessor.py, in minibatch(), in the last two lines with filter:
X_batch = filter(lambda mini_batch: len(mini_batch) == batch_size, X_batch)
Y_batch = filter(lambda mini_batch: len(mini_batch) == batch_size, Y_batch)
They will discard all batches that have sizes different from the user-specified batch size (default 32). I don't think this is the correct thing to do? It would unnecessarily discard many data.

Date is casted into 'reg_digitzreg_digitz'

This makes date (e.g. "1996-08-30") being treated differently from numbers (e.g. "83").

Need to think about whether this is what we want.

"columns_to_batch" not used

In preprocessor.py, minibatch(), the argument columns_to_batch is not used.

Fixed attention

"an LSTM decoder (1 layer with tanh activation function) with a fixed attention mechanism that deterministically attends to the i-th input token when decoding the i-th output, and hence does not involve learning of any attention parameters"

Same hidden dimensions in encoder and decoder

May allow different dimensions.

Need to shuffle data when training

I think we need to do that.

Not using <EOS>?

I don't think that in our tasks (NER and CCG supertagging) we would need . Can we simply remove it?

Use bi-directional LSTM for encoder

Need to use bi-directional LSTM for encoder.

<EOS> has different indices for sentence and entity

In Preprocessor.entity_dict, '<EOS>' is 1.
In build_vocab() in preprocess.py, '<EOS>' is 2. ('<UNK>' is 1 here.)
Need to check whether this has any side effect.

ner.train(), init_dec_hidden = enc_hidden_out[0] should use [-1] instead of [0]

Thanks for @hainow pointing out the following questions:
In ner.py, in train(),
init_dec_hidden = enc_hidden_out[-1] # -1 instead of 0?

Discard long (char count) sentence?

Does it fit our use case that we discard any sentence that has character count larger than self.max_sentence_length? (In preprocessor.py, preprocess().)

shuxinlin / nn4nlp_project Goto Github PK

nn4nlp_project's People

Contributors

Watchers

nn4nlp_project's Issues

Recommend Projects

Recommend Topics

Recommend Org