Coder Social home page Coder Social logo

glue-baselines's People

Contributors

sleepinyourhat avatar w4ngatang avatar woollysocks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

glue-baselines's Issues

QQP: sentences appear in both train and dev/test splits

Hi,

The splits released for QQP seem to be somewhat leaky - a large number of sentences appear in both train and dev/test:

import os
import urllib
import sys
if sys.version_info >= (3, 0):
    import urllib.request
import zipfile

URLLIB=urllib
if sys.version_info >= (3, 0):
    URLLIB=urllib.request

data_file = "qqp.zip"
URLLIB.urlretrieve("https://dl.fbaipublicfiles.com/glue/data/QQP-clean.zip", data_file)
with zipfile.ZipFile(data_file) as zip_ref:
  zip_ref.extractall(".")
os.remove(data_file)

train_sents = set()
dev_sents = set()
test_sents = set()

with open('./QQP/train.tsv') as f:
  train = [x.strip().split('\t') for x in f.readlines()][1:]
for row in train:
  train_sents.add(row[3])
  train_sents.add(row[4])

with open('./QQP/dev.tsv') as f:
  dev = [x.strip().split('\t') for x in f.readlines()][1:]
for row in dev:
  dev_sents.add(row[3])
  dev_sents.add(row[4])

with open('./QQP/test.tsv') as f:
  test = [x.strip().split('\t') for x in f.readlines()][1:]
for row in test:
  test_sents.add(row[1])
  test_sents.add(row[2])

print(len(train_sents & dev_sents))
print(len(train_sents & test_sents))

29852
104698

Is this intentional?

Unable to run by Elmo Embedding

Hello, I am unable to use the ELMo implementation even though I follow the arguments provided at README. I use Python 3.6.2 (Anaconda) and install AllenNLP on virtual environment

Here are the relative arguments I use:

GPUID=-1
SEED=19

SHOULD_TRAIN=1
WORD_EMBS_FILE="../glove/glove.6B/glove.6B.50d.txt"

d_word=50
d_hid=512
glove=0
ELMO=1
deep_elmo=0
elmo_no_glove=1
COVE=0

PAIR_ENC="simple"

Here is my error log:

(allennlp) ➜   bash run_stuff.sh
12/01 04:00:19 PM: Namespace(batch_size=64, bpp_base=10, bpp_method='percent_tr', classifier='mlp', classifier_dropout=0.0, classifier_hid_dim=512, cove=0, cuda=-1, d_hid=512, d_word=50, deep_elmo=0, dropout=0.2, dropout_embs=0.2, elmo=1, elmo_no_glove=1, eval_tasks='none', exp_dir='EXP_DIR', glove=0, load_epoch=-1, load_model=0, load_preproc=1, load_tasks=1, log_file='log.log', lr=0.1, lr_decay_factor=0.5, max_grad_norm=5.0, max_seq_len=40, max_vals=100, max_word_v_size=30000, min_lr=1e-05, n_epochs=10, n_layers_enc=1, n_layers_highway=0, no_tqdm=0, optimizer='sgd', pair_enc='simple', patience=5, preproc_file='preproc.pkl', random_seed=19, run_dir='RUN_DIR', scaling_method='none', scheduler_threshold=0.0, shared_optimizer=1, should_train=1, task_ordering='random', task_patience=0, train_tasks='cola', train_words=0, trainer_type='sampling', val_interval=10, weight_decay=0.0, weighting_method='uniform', word_embs_file='../glove/glove.6B/glove.6B.50d.txt')
12/01 04:00:19 PM: Using random seed 19
12/01 04:00:19 PM: Loading tasks...
12/01 04:00:19 PM: 	Loaded existing task cola
12/01 04:00:19 PM: 	Loaded existing task sst
12/01 04:00:19 PM: 	Loaded existing task mrpc
12/01 04:00:19 PM: 	Finished loading tasks: cola sst mrpc.
12/01 04:00:22 PM: Loading token dictionary from EXP_DIR/vocab.
12/01 04:00:22 PM: 	Finished building vocab. Using 30002 words
12/01 04:00:22 PM: 	Loaded data from EXP_DIR/preproc.pkl
12/01 04:00:22 PM: 	  Training on cola, sst, mrpc
12/01 04:00:22 PM: 	  Evaluating on 
12/01 04:00:22 PM: 	Finished loading tasks in 3.215s
12/01 04:00:22 PM: Building model...
12/01 04:00:22 PM: 	Learning embeddings from scratch!
12/01 04:00:22 PM: 	Using ELMo embeddings!
12/01 04:00:22 PM: 	NOT using GLoVe embeddings!
12/01 04:00:22 PM: Initializing ELMo
12/01 04:00:43 PM: instantiating registered subclass lstm of <class 'allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder'>
12/01 04:00:43 PM: batch_first = True
12/01 04:00:43 PM: stateful = False
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: input_size = 1024
12/01 04:00:43 PM: hidden_size = 512
12/01 04:00:43 PM: num_layers = 1
12/01 04:00:43 PM: bidirectional = True
12/01 04:00:43 PM: batch_first = True
12/01 04:00:43 PM: Initializing parameters
12/01 04:00:43 PM: Done initializing parameters; the following parameters are using their default initialization from their code
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._char_embedding_weights
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.0.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.0.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.1.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.1.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._projection.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_0.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_0.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_1.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_1.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_2.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_2.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_3.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_3.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_4.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_4.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_5.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_5.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_6.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_6.weight
12/01 04:00:43 PM:    _elmo.scalar_mix_0.gamma
12/01 04:00:43 PM:    _elmo.scalar_mix_0.scalar_parameters.0
12/01 04:00:43 PM:    _elmo.scalar_mix_0.scalar_parameters.1
12/01 04:00:43 PM:    _elmo.scalar_mix_0.scalar_parameters.2
12/01 04:00:43 PM:    _phrase_layer._module.bias_hh_l0
12/01 04:00:43 PM:    _phrase_layer._module.bias_hh_l0_reverse
12/01 04:00:43 PM:    _phrase_layer._module.bias_ih_l0
12/01 04:00:43 PM:    _phrase_layer._module.bias_ih_l0_reverse
12/01 04:00:43 PM:    _phrase_layer._module.weight_hh_l0
12/01 04:00:43 PM:    _phrase_layer._module.weight_hh_l0_reverse
12/01 04:00:43 PM:    _phrase_layer._module.weight_ih_l0
12/01 04:00:43 PM:    _phrase_layer._module.weight_ih_l0_reverse
12/01 04:00:43 PM: Initializing parameters
12/01 04:00:43 PM: Done initializing parameters; the following parameters are using their default initialization from their code
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_0.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.backward_layer_1.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_0.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.input_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_linearity.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_linearity.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._elmo_lstm.forward_layer_1.state_projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._char_embedding_weights
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.0.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.0.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.1.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._highways._layers.1.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._projection.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder._projection.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_0.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_0.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_1.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_1.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_2.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_2.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_3.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_3.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_4.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_4.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_5.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_5.weight
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_6.bias
12/01 04:00:43 PM:    _elmo._elmo_lstm._token_embedder.char_conv_6.weight
12/01 04:00:43 PM:    _elmo.scalar_mix_0.gamma
12/01 04:00:43 PM:    _elmo.scalar_mix_0.scalar_parameters.0
12/01 04:00:43 PM:    _elmo.scalar_mix_0.scalar_parameters.1
12/01 04:00:43 PM:    _elmo.scalar_mix_0.scalar_parameters.2
12/01 04:00:43 PM:    _phrase_layer._module.bias_hh_l0
12/01 04:00:43 PM:    _phrase_layer._module.bias_hh_l0_reverse
12/01 04:00:43 PM:    _phrase_layer._module.bias_ih_l0
12/01 04:00:43 PM:    _phrase_layer._module.bias_ih_l0_reverse
12/01 04:00:43 PM:    _phrase_layer._module.weight_hh_l0
12/01 04:00:43 PM:    _phrase_layer._module.weight_hh_l0_reverse
12/01 04:00:43 PM:    _phrase_layer._module.weight_ih_l0
12/01 04:00:43 PM:    _phrase_layer._module.weight_ih_l0_reverse
12/01 04:00:43 PM: 	Finished building model in 20.876s
12/01 04:00:43 PM: patience = 5
12/01 04:00:43 PM: num_epochs = 10
12/01 04:00:43 PM: max_vals = 50
12/01 04:00:43 PM: cuda_device = -1
12/01 04:00:43 PM: grad_norm = 5.0
12/01 04:00:43 PM: grad_clipping = None
12/01 04:00:43 PM: lr_decay = 0.99
12/01 04:00:43 PM: min_lr = 1e-05
12/01 04:00:43 PM: no_tqdm = 0
12/01 04:00:43 PM: Sampling tasks uniformly
12/01 04:00:43 PM: type = sgd
12/01 04:00:43 PM: parameter_groups = None
12/01 04:00:43 PM: Number of trainable parameters: 9449994
12/01 04:00:43 PM: instantiating registered subclass sgd of <class 'allennlp.training.optimizers.Optimizer'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: lr = 0.1
12/01 04:00:43 PM: weight_decay = 1e-05
12/01 04:00:43 PM: type = reduce_on_plateau
12/01 04:00:43 PM: instantiating registered subclass reduce_on_plateau of <class 'allennlp.training.learning_rate_schedulers.LearningRateScheduler'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: mode = max
12/01 04:00:43 PM: factor = 0.5
12/01 04:00:43 PM: patience = 0
12/01 04:00:43 PM: threshold = 0.0
12/01 04:00:43 PM: threshold_mode = abs
12/01 04:00:43 PM: verbose = True
12/01 04:00:43 PM: type = sgd
12/01 04:00:43 PM: parameter_groups = None
12/01 04:00:43 PM: Number of trainable parameters: 9449994
12/01 04:00:43 PM: instantiating registered subclass sgd of <class 'allennlp.training.optimizers.Optimizer'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: lr = 0.1
12/01 04:00:43 PM: weight_decay = 1e-05
12/01 04:00:43 PM: type = reduce_on_plateau
12/01 04:00:43 PM: instantiating registered subclass reduce_on_plateau of <class 'allennlp.training.learning_rate_schedulers.LearningRateScheduler'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: mode = max
12/01 04:00:43 PM: factor = 0.5
12/01 04:00:43 PM: patience = 0
12/01 04:00:43 PM: threshold = 0.0
12/01 04:00:43 PM: threshold_mode = abs
12/01 04:00:43 PM: verbose = True
12/01 04:00:43 PM: type = sgd
12/01 04:00:43 PM: parameter_groups = None
12/01 04:00:43 PM: Number of trainable parameters: 9449994
12/01 04:00:43 PM: instantiating registered subclass sgd of <class 'allennlp.training.optimizers.Optimizer'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: lr = 0.1
12/01 04:00:43 PM: weight_decay = 1e-05
12/01 04:00:43 PM: type = reduce_on_plateau
12/01 04:00:43 PM: instantiating registered subclass reduce_on_plateau of <class 'allennlp.training.learning_rate_schedulers.LearningRateScheduler'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: mode = max
12/01 04:00:43 PM: factor = 0.5
12/01 04:00:43 PM: patience = 0
12/01 04:00:43 PM: threshold = 0.0
12/01 04:00:43 PM: threshold_mode = abs
12/01 04:00:43 PM: verbose = True
12/01 04:00:43 PM: type = sgd
12/01 04:00:43 PM: parameter_groups = None
12/01 04:00:43 PM: Number of trainable parameters: 9449994
12/01 04:00:43 PM: instantiating registered subclass sgd of <class 'allennlp.training.optimizers.Optimizer'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: lr = 0.1
12/01 04:00:43 PM: weight_decay = 1e-05
12/01 04:00:43 PM: type = reduce_on_plateau
12/01 04:00:43 PM: instantiating registered subclass reduce_on_plateau of <class 'allennlp.training.learning_rate_schedulers.LearningRateScheduler'>
12/01 04:00:43 PM: Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
12/01 04:00:43 PM: CURRENTLY DEFINED PARAMETERS: 
12/01 04:00:43 PM: mode = max
12/01 04:00:43 PM: factor = 0.5
12/01 04:00:43 PM: patience = 0
12/01 04:00:43 PM: threshold = 0.0
12/01 04:00:43 PM: threshold_mode = abs
12/01 04:00:43 PM: verbose = True
12/01 04:00:43 PM: Beginning training.
Traceback (most recent call last):
  File "main.py", line 280, in <module>
    sys.exit(main(sys.argv[1:]))
  File "main.py", line 177, in main
    args.load_model)
  File "/Users/apple/Desktop/q1_course/CS273/ml_final/CS273A/src/trainer.py", line 776, in train
    output_dict = self._forward(batch, task=task, for_training=True)
  File "/Users/apple/Desktop/q1_course/CS273/ml_final/CS273A/src/trainer.py", line 1003, in _forward
    return self._model.forward(task, **tensor_batch)
  File "/Users/apple/Desktop/q1_course/CS273/ml_final/CS273A/src/models.py", line 216, in forward
    pair_emb = self.pair_encoder(input1, input2)
  File "/Users/apple/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/apple/Desktop/q1_course/CS273/ml_final/CS273A/src/models.py", line 289, in forward
    s1_elmo_embs = self._elmo(s1['elmo'])
KeyError: 'elmo'

If you suspect this is an IPython bug, please report it at:
    https://github.com/ipython/ipython/issues
or send an email to the mailing list at [email protected]

You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.

Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
    %config Application.verbose_crash=True

Unable to obttain the results written in the paper

I've tried on your relased code of baselines, but there were some differences between the results tested on the validation set and reported in your paper.

experiment, CoLA(mcc), SST-2, MRPC(acc/f1), QQP(acc/f1), STS-B(pear/spear), MNLI(m/mm), QNLI, RTE, WNLI
your paper, 240, 858, 719/821, 802/591, 688/670, 658/660, 711, 468, 637
my result, 125, 870, 740/829, 794/735, 726/726, 599/605, 584, 574, 141

Both employ the basic BiLSTM model, and follows the MTL setting
You can see there is a huge gap between the results of CoLA(24.0, 12.5) and WNLI(63.7, 14.1)

Here are my hyperparameter settings, could you please help me to check if I'm using the same as yours? This is based on run_staff.sh

GPUID=0
train_tasks='all' # 'all', 'none'
original_model_code=1 # 1 if use the original models.py
single_encoder=1 # never mind if original_model_code=1

SHOULD_TRAIN=1
SHOULD_TEST=0
LOAD_MODEL=0
LOAD_TASKS=1
LOAD_PREPROC=1
load_epoch=-1

SCRATCH_PREFIX='.'
EXP_NAME="preprocess"
RUN_NAME="results/original"
SEED=19
no_tqdm=0

eval_tasks='none'
CLASSIFIER=mlp
d_hid_cls=512
max_seq_len=40
VOCAB_SIZE=30000
WORD_EMBS_FILE="${SCRATCH_PREFIX}/embeddings/glove.840B.300d.txt"

d_word=300
d_hid=1500
glove=1
ELMO=0
deep_elmo=0
elmo_no_glove=0
COVE=0

PAIR_ENC="simple"
N_LAYERS_ENC=2
n_layers_highway=0

OPTIMIZER="adam"
LR=1e-3
min_lr=1e-5
dropout=.2
LR_DECAY=.2
patience=5
task_patience=0
train_words=0
WEIGHT_DECAY=0.0
SCHED_THRESH=0.0
BATCH_SIZE=128
BPP_METHOD="percent_tr"
BPP_BASE=10
VAL_INTERVAL=10000 # also the epoch_size, default=10
MAX_VALS=100
TASK_ORDERING="random"
weighting_method="uniform"
scaling_method='none'

while getopts 'ikmn:r:S:s:tvh:l:L:o:T:E:O:b:H:p:edcgP:qB:V:M:D:C:X:GI:N:y:K:W:' flag; do
    case "${flag}" in
        P) SCRATCH_PREFIX="${OPTARG}" ;;
        n) EXP_NAME="${OPTARG}" ;;
        r) RUN_NAME="${OPTARG}" ;;
        S) SEED="${OPTARG}" ;;
        q) no_tqdm=1 ;;
        t) SHOULD_TRAIN=0 ;;
        k) LOAD_TASKS=0 ;;
        m) LOAD_MODEL=1 ;;
        i) LOAD_PREPROC=0 ;;
        M) BPP_METHOD="${OPTARG}" ;; 
        B) BPP_BASE="${OPTARG}" ;;
        V) VAL_INTERVAL="${OPTARG}" ;;
        X) MAX_VALS="${OPTARG}" ;;
        T) train_tasks="${OPTARG}" ;;
        #E) eval_tasks="${OPTARG}" ;;
        O) TASK_ORDERING="${OPTARG}" ;;
        H) n_layers_highway="${OPTARG}" ;;
        l) LR="${OPTARG}" ;;
        #s) min_lr="${OPTARG}" ;;
        L) N_LAYERS_ENC="${OPTARG}" ;;
        o) OPTIMIZER="${OPTARG}" ;;
        h) d_hid="${OPTARG}" ;;
        b) BATCH_SIZE="${OPTARG}" ;;
        E) PAIR_ENC="${OPTARG}" ;;
        G) glove=0 ;;
        e) ELMO=1 ;;
        d) deep_elmo=1 ;;
        g) elmo_no_glove=1 ;;
        c) COVE=1 ;;
        D) dropout="${OPTARG}" ;;
        C) CLASSIFIER="${OPTARG}" ;;
        I) GPUID="${OPTARG}" ;;
        N) load_epoch="${OPTARG}" ;;
        y) LR_DECAY="${OPTARG}" ;;
        K) task_patience="${OPTARG}" ;;
        p) patience="${OPTARG}" ;;
        W) weighting_method="${OPTARG}" ;;
        s) scaling_method="${OPTARG}" ;;
    esac
done

LOG_PATH="${SCRATCH_PREFIX}/${RUN_NAME}/log.log"
EXP_DIR="${SCRATCH_PREFIX}/${EXP_NAME}/"
RUN_DIR="${SCRATCH_PREFIX}/${RUN_NAME}/"
mkdir -p ${EXP_DIR}
mkdir -p ${RUN_DIR}

ALLEN_CMD="python src/main.py --cuda ${GPUID} --random_seed ${SEED} --no_tqdm ${no_tqdm} --log_file ${LOG_PATH} --exp_dir ${EXP_DIR} --run_dir ${RUN_DIR} --train_tasks ${train_tasks} --eval_tasks ${eval_tasks} --classifier ${CLASSIFIER} --classifier_hid_dim ${d_hid_cls} --max_seq_len ${max_seq_len} --max_word_v_size ${VOCAB_SIZE} --word_embs_file ${WORD_EMBS_FILE} --train_words ${train_words} --glove ${glove} --elmo ${ELMO} --deep_elmo ${deep_elmo} --elmo_no_glove ${elmo_no_glove} --cove ${COVE} --d_word ${d_word} --d_hid ${d_hid} --n_layers_enc ${N_LAYERS_ENC} --pair_enc ${PAIR_ENC} --n_layers_highway ${n_layers_highway} --batch_size ${BATCH_SIZE} --bpp_method ${BPP_METHOD} --bpp_base ${BPP_BASE} --optimizer ${OPTIMIZER} --lr ${LR} --min_lr ${min_lr} --lr_decay_factor ${LR_DECAY} --task_patience ${task_patience} --patience ${patience} --weight_decay ${WEIGHT_DECAY} --dropout ${dropout} --val_interval ${VAL_INTERVAL} --max_vals ${MAX_VALS} --task_ordering ${TASK_ORDERING} --weighting_method ${weighting_method} --scaling_method ${scaling_method} --scheduler_threshold ${SCHED_THRESH} --load_model ${LOAD_MODEL} --load_tasks ${LOAD_TASKS} --load_preproc ${LOAD_PREPROC} --should_train ${SHOULD_TRAIN} --should_test ${SHOULD_TEST} --load_epoch ${load_epoch}"
eval ${ALLEN_CMD}

BTW, this is how I test on the validation set. The code is based on eval_test.py and main.py

import os
import json
import ipdb as pdb
import numpy as np

from sklearn.metrics import matthews_corrcoef, f1_score
from scipy.stats import pearsonr, spearmanr
from allennlp.data.dataset import Batch

def evaluate_val(tasks, val_preds):
    for eval_task, task_preds in val_preds.items(): # write predictions for each task
        #if 'mnli' not in eval_task:
        #    continue
        task = [task for task in tasks if task.name == eval_task][0]
        preds = task_preds[0]
        val_data = Batch(task.val_data).as_tensor_dict()
        golds = val_data['label']
        assert len(preds) == len(golds)
        if 'mnli' in eval_task:
            # matched
            evaluate('mnli-m', preds[:9815], golds[:9815])
            # mismatched
            evaluate('mnli-mm', preds[9815:9815+9832], golds[9815:9815+9832])
        else:
            metrics = ['acc']
            if 'cola' in eval_task:
                metrics = ['matthews']
            if 'mrpc' in eval_task or 'qqp' in eval_task:
                metrics = ['acc', 'f1']
            if 'sts' in eval_task:
                golds = golds * 5.
                metrics = ['corr']
            evaluate(eval_task, golds, preds, metrics)


def evaluate(task_name, golds, preds, metrics=['acc']):
    assert len(golds) == len(preds)
    print('***************************** %s:' % task_name)
    if 'acc' in metrics:
        acc = sum([1 for gold, pred in zip(golds, preds) if gold == pred]) / float(len(golds))
        print("acc: %.3f" % acc)
    if 'f1' in metrics:
        f1 = f1_score(golds, preds)
        print("f1: %.3f" % f1)
    if 'matthews' in metrics:
        mcc = matthews_corrcoef(golds, preds)
        print("mcc: %.3f" % mcc)
    if 'corr' in metrics:
        golds = np.asarray(golds).reshape(-1)
        preds = np.asarray(preds).reshape(-1)
        corr = pearsonr(golds, preds)[0]
        print("pearson r: %.3f" % corr)
        corr = spearmanr(golds, preds)[0]
        print("spearman r: %.3f" % corr)

Thanks!

BIDAF encoder

When using the default (bidaf) for args.pair_enc, then this line throws an error as pair_encoder is undefined.

I assume that an elif statement that assigns the bidaf encoder is missing at the moment. Is that correct?

How was SST-2 training constructed?

Hello and happy new year!

Could you please provide more detail on how you constructed the SST-2 training set? Since the test labels are not available, I want to use part of the training data to create new train/dev/test sets for my experiments. Before doing so, I need to know how you came up with 69k training samples, and if possible, provide with the sentence ids for these.

Thank you in advance!

Processing MRPC, ValueError: need more than 1 value to unpack

root@instance-1:~/GLUE-baselines# python download_glue_data.py --data_dir glue_data --tasks all
Downloading and extracting CoLA...
        Completed!
Downloading and extracting SST...
        Completed!
Processing MRPC...
Traceback (most recent call last):
  File "download_glue_data.py", line 144, in <module>
    sys.exit(main(sys.argv[1:]))
  File "download_glue_data.py", line 136, in main
    format_mrpc(args.data_dir, args.path_to_mrpc)
  File "download_glue_data.py", line 86, in format_mrpc
    label, id1, id2, s1, s2 = row.strip().split('\t')
ValueError: need more than 1 value to unpack

MRPC failed, Chinese blocking s3.amazonaws.com

When I run the command as below:

python download_glue_data.py --data_dir glue_data --tasks all

It just stuck here:

Downloading and extracting CoLA...
	Completed!
Downloading and extracting SST...
	Completed!
Processing MRPC...

The code doesn't work

The code in this repository is broken with multiple issues.

First the code has hard coded paths, this is unprofessional and I expected better from such a reputed lab especially with instutions like NYU, DeepMind and UW involved.

The path for downloading MRPC dataset from SentEval is broken. They seemed to have moved their data to a different URIs, namely
MRPC_TRAIN = 'https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_train.txt'
MRPC_TEST = 'https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_test.txt'

The command to run baseline is broken and it needs --eval_tasks to be passed else the code breaks as empty string is passed to task definition and a check their doesn't find the empty string in supported tasks.

Then half the code is migrated to QNLIV2 but dataset download part still download QNLI (V1?) hence the code breaks there.

Once I got passed this error, I encountered the following error. tr_generator = iterator(task.train_data, num_epochs=None, cuda_device=self._cuda_device)

Finally, the following error broke my spirits and I decided not to use GLUE benchmark for my experiments as despite importing the conda env with the package and having spent 3-4 hours getting the basic command from README to run, I just gave up as I am bit skeptical now about multiple hidden traps I might have to encounter fixing the code to get GLUE benchmark to run.

ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'

In case, there is a commit or version that I can run out of the box, please let me know. It will be a big help.

CoLA is not accessible thru python request

Hi all,

when I ran the python script "download_glue_data.py" to download all tasks thru python request, I noticed that CoLA task is not accessible, and other tasks are totally fine. The error message is here:

Downloading and extracting CoLA...
Traceback (most recent call last):
File "download_glue_data.py", line 137, in
sys.exit(main(sys.argv[1:]))
File "download_glue_data.py", line 133, in main
download_and_extract(task, args.data_dir)
File "download_glue_data.py", line 44, in download_and_extract
urllib.request.urlretrieve(TASK2PATH[task], data_file)
File "/usr/lib/python3.5/urllib/request.py", line 188, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/usr/lib/python3.5/urllib/request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.5/urllib/request.py", line 472, in open
response = meth(req, response)
File "/usr/lib/python3.5/urllib/request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.5/urllib/request.py", line 510, in error
return self._call_chain(*args)
File "/usr/lib/python3.5/urllib/request.py", line 444, in _call_chain
result = func(*args)
File "/usr/lib/python3.5/urllib/request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Could you please take a look?

Thanks!

ImportError: torch.utils.ffi is deprecated......

Hi all,

I met this problem and I have solved it by myself. I recorded here and hope it can help future users.

My PC uses Ubuntu 18.04. After installing the environment and downloading data, I got ' ImportError: torch.utils.ffi is deprecated... ' when running the main.py.

Possible cause: pytorch is 1.3 rather than 0.4.

Using conda list in the environment, I see pytorch is version 0.4. However, if I use python to import torch and print its version, it is 1.3!!!!

Long story short->
My guess: It seems that the pytorch install during 'conda env create -f environment.yml' has a name 0.4 but is actually 1.3.
My Solution: manually downloading from https://download.pytorch.org/whl/torch_stable.html and then install with pip under the virtual environment.

Update data URLs

We're spending a fair bit of money keeping up the old non-FB data URLs, and we should probably delete them soon. @W4ngatang: Mind updating them here to match what's on the benchmark site?

newer version of QNLI ?

Could anyone tell me what happened here with the issue mentioned?

"Please make sure that you are submitting your results with newer version of QNLI"

Discrepancies with the original CoLA dataset

Hi, I noticed that there may be a minor problem with the CoLA dataset.

By downloading data with the command
python download_glue_data.py --data_dir glue_data --tasks CoLA

I see that line 19 of dev.tsv reads

"bc01 1 He could not] have been working."

and line 6998 of train.tsvreads

"sgww85 1 I consider that a rude remark and in very [NP and PP] bad taste."

The square brackets are not to be found in the original CoLA dataset https://nyu-mll.github.io/CoLA/

I am not sure of what may be the source of the discrepancy.

Adding GLUE to PyTorch

Hi, I am currently adding some files into the PyTorch project that would enable it to directly import the GLUE datasets. I am however facing a problem regarding the QQP and SNLI datasets. There are some lines where there are too much tabs according to the number of columns that are mentioned in the first lines of those files. For example in the train.tsv file of QQP, line 97.931 is :

"\tWas Muhammad a real historical figure? What is the evidence for his existence?\t0

So in that line are supposed to be 3 columns while in the file there should 6 columns.
How should I handle those lines?

Thank you.

fail in downloading MRPC

I cannot download MRPC dataset and should have not been blocked from s3.amazonaws.com (l am in US). I have already tried

git clone https://github.com/wasiahmad/paraphrase_identification.git

python download_glue_data.py --data_dir glue_data --tasks all --path_to_mrpc=paraphrase_identification/dataset/msr-paraphrase-corpus

I noticed that there is actually no "MRPC in TASK2PATH, then why are we doing urlretrieve?

Following the history, I added

"MRPC":'https://firebasestorage.googleapis.com/v0/b/mtl-sentence-representations.appspot.com/o/data%2Fmrpc_dev_ids.tsv?alt=media&token=ec5c0836-31d5-48f4-b431-7480817f1adc'

to TASK2PATH and it is solved now.

can‘t import allennlp.modules.matrix_attention import DotProductMatrixAttention

when I follow the instructions and download allennlp==0.4.0 successfully, an error occured as below:

from allennlp.modules.matrix_attention import DotProductMatrixAttention
ImportError: cannot import name 'DotProductMatrixAttention'

but I can't find any specific release of allennlp which contain this the fuction 'DotProductMatrixAttention'

Wrong submission feedback for Question NLI

It seems that you only updated the task data for Question NLI, but not the evaluation code.
When I submitted the QNLI.tsv file, it shows "not all ids present in task Question NLI". Then I tried to submit the old version but only the number of ids was right, and there were something wrong with the evaluation result.

can't use elmo

When i set --elmo to 1 , i have got an error.
I don’t know allennlp well, and it took a few days to still not be able to solve the following error. If anyone knows a solution, please leave me a message, thank you very much!

Traceback (most recent call last):
File "/home/GLUE-baselines-master/src/main.py", line 280, in
sys.exit(main(sys.argv[1:]))
File "/home/GLUE-baselines-master/src/main.py", line 177, in main
args.load_model)
File "/home/GLUE-baselines-master/src/trainer.py", line 240, in train
output_dict = self._forward(batch, task=task, for_training=True)
File "/home/GLUE-baselines-master/src/trainer.py", line 464, in _forward
return self._model.forward(task, **tensor_batch)
File "/home/GLUE-baselines-master/src/models.py", line 219, in forward
sent_emb = self.sent_encoder(input1)
File "/home/anaconda3/envs/glue/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/GLUE-baselines-master/src/models.py", line 413, in forward
sent_embs = self._highway_layer(self._text_field_embedder(sent))
File "/home/anaconda3/envs/glue/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/glue/lib/python3.6/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 63, in forward
raise ConfigurationError(message)
allennlp.common.checks.ConfigurationError: "Mismatched token keys: dict_keys(['words']) and dict_keys(['elmo', 'words'])"

cannot import name 'DotProductMatrixAttention'

Hi all,

I followed steps in the readme. I installed the environment and am trying to run the baseline. However, I got the above error for ' from allennlp.modules.matrix_attention import DotProductMatrixAttention'. The version of allennlp in yml is 0.4.0, but I checked the sources code: there is no DotProductMatrixAttention in the matrix_attention.py.

Any help would be appreciated. Thanks!

Mia

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.