nn4nlp_project's People
nn4nlp_project's Issues
Potentially breaking LENGTH_UNIT after padding
After adding <EOS>, you break the LENGTH_UNIT you padded?
CONLL 2003 use German language
We need to use German language of CONLL 2003 dataset.
How to prepare the pre-trained word embedding?
Computing score in decoder
Currently we transform from hidden state to scores of all possible labels by a linear layer. Is this a good model?
If still use padding, do mask on several things
If use padding, mask the following:
- objective function: don't include cross entropy from pair of matching to input padding.
- attention: when computing alpha_{ij} (doing softmax), don't include padding. (softmax only over non-padding part.)
Attention does not involve cell states
Currently attention does not involve cell states. Is that a good choice?
Attention
think about it carefully again: who attend to who?
Batching on same-length sentences
Simply batch same-length sentences together. Don't do padding.
What if beam_size > current_batch_size * sentence_len?
Maybe we need to take care of this case later.
Adams learning rate 0.001 or 0.0001
Adams learning rate 0.001 or 0.0001
Check conlleval.py
Understand what it does.
Output context vector to next input
Can consider this later
most_common() not fully utilized?
In preprocess.py, in build_vocab()
, there is a line of code:
all_words = collections.Counter(all_text).most_common()
It seems that at this moment the "most_common()" part is not really utilized. But could potentially pick the most common N words here.
Should not discard batches that have sizes different from the user-specified batch size
In preprocessor.py, in minibatch()
, in the last two lines with filter:
X_batch = filter(lambda mini_batch: len(mini_batch) == batch_size, X_batch)
Y_batch = filter(lambda mini_batch: len(mini_batch) == batch_size, Y_batch)
They will discard all batches that have sizes different from the user-specified batch size (default 32). I don't think this is the correct thing to do? It would unnecessarily discard many data.
Date is casted into 'reg_digitzreg_digitz'
This makes date (e.g. "1996-08-30") being treated differently from numbers (e.g. "83").
Need to think about whether this is what we want.
"columns_to_batch" not used
In preprocessor.py, minibatch()
, the argument columns_to_batch
is not used.
Fixed attention
"an LSTM decoder (1 layer with tanh activation function) with a fixed attention mechanism that deterministically attends to the i-th input token when decoding the i-th output, and hence does not involve learning of any attention parameters"
Same hidden dimensions in encoder and decoder
May allow different dimensions.
Need to shuffle data when training
I think we need to do that.
Not using <EOS>?
I don't think that in our tasks (NER and CCG supertagging) we would need . Can we simply remove it?
Use bi-directional LSTM for encoder
Need to use bi-directional LSTM for encoder.
<EOS> has different indices for sentence and entity
In Preprocessor.entity_dict
, '<EOS>' is 1.
In build_vocab()
in preprocess.py, '<EOS>' is 2. ('<UNK>' is 1 here.)
Need to check whether this has any side effect.
ner.train(), init_dec_hidden = enc_hidden_out[0] should use [-1] instead of [0]
Thanks for @hainow pointing out the following questions:
In ner.py, in train(),
init_dec_hidden = enc_hidden_out[-1] # -1 instead of 0?
Discard long (char count) sentence?
Does it fit our use case that we discard any sentence that has character count larger than self.max_sentence_length
? (In preprocessor.py, preprocess()
.)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.