Coder Social home page Coder Social logo

cnn-for-sentence-classification-in-keras's People

Contributors

alexander-rakhlin avatar andrewyates avatar shellbye avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cnn-for-sentence-classification-in-keras's Issues

Error when checking model input

Hello, thank you for your implementation. Sadly I'm just starting to become familiar with Keras and I got the following error after setting up the model for 'CNN-non-static' or 'CNN-static' (so, just working for 'CNN-rand'.

I set the parameters to:

model_variation = 'CNN-non-static' # CNN-rand | CNN-non-static | CNN-static
print('Model variation is %s' % model_variation)

Model Hyperparameters
sequence_length = 56
embedding_dim = 20
filter_sizes = (3, 4)
num_filters = 3
dropout_prob = (0.7, 0.8)
hidden_dims = 100

and i got the following error:

ValueError: Error when checking model input: expected dropout_input_3 to have 3 dimensions, but got array with shape (10662L, 56L)

any help would be appreciated

The model always predicts the same label

Update:

I think that I solved the problem changing "argmax" with a threshold:

y_proba = model.predict(x_test)
Y_classes = (y_proba > 0.5).astype(np.int)
print(Y_classes)

Is it right?


Hi,

I run your code with another dataset in order to train this model for a binary classification task; It works fine, but when I print the predictions, it always assigns the same label (0 - absent) to whole instances.

This is the code used:

y_proba = model.predict(x_test)
Y_classes = y_proba.argmax(axis=-1)

Can you understand why? I'm quite new to deep learning and CNN

Merge stopped working after keras update

The code was working just fine before I updated the keras version to 2.0.

After the update, I am having errors with the Merge function.

The error log is the following:

rodrigo@garage:~/Projetos/CNN-for-Sentence-Classification-in-Keras$ python3 trainGraph_modelo_velho.py
Using TensorFlow backend.
/usr/local/lib/python3.5/dist-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection
module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. T
his module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
Model variation is CNN-non-static
Loading data...
Parsing sentences from training set
/usr/local/lib/python3.5/dist-packages/gensim/models/phrases.py:274: UserWarning: For a faster implementation, use the gensim.models.phrases.Phraser class
warnings.warn("For a faster implementation, use the gensim.models.phrases.Phraser class")
(13575, 10)
(13575, 194)
Loading existing Word2Vec model '100features_10minwords_10context'
Vocabulary Size: 12962
Sequence Max Length: 194
Tensor("flatten_1/Reshape:0", shape=(?, ?), dtype=float32)
<class 'list'>
trainGraph_modelo_velho.py:160: UserWarning: The Merge layer is deprecated and will be removed after 08/2017. Use instead layers from keras.layers.merge, e.g. a dd, concatenate, etc.
out = Merge(mode='concat')(convs)
Traceback (most recent call last):
File "trainGraph_modelo_velho.py", line 160, in
out = Merge(mode='concat')(convs)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/topology.py", line 554, in call
output = self.call(inputs, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/keras/legacy/layers.py", line 210, in call
return K.concatenate(inputs, axis=self.concat_axis)
File "/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py", line 1553, in concatenate
return tf.concat([to_dense(x) for x in tensors], axis)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_ops.py", line 1075, in concat
dtype=dtypes.int32).get_shape(
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 669, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py", line 176, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py", line 165, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_util.py", line 367, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))
TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

vocabulary_inv

Hello, thanks for sharing the implement of the paper. When i read the code of w2v.py, i think the vocabulary_inv is a list while the doc_string says it's a dict. I wonder if something is wrong.

Another question is that when train the word2vec model, it can be seen that sentences constitute of words are used, so why the code first load the data by turning the words into digits and later turn the digits into words for training, Is it necessary?

Thank you!

Multiple Dropouts different from Original Paper and Denny Britz

Thank you for sharing your code for the Keras implementation. I had a question about dropouts that are added. In the original paper, dropout is only added once after the convolution layer with a dropout of rate of 0.5, this is also true in Denny Britz's implementation. In your implementation dropout of 0.5 is added after embedding layer and drop out of 0.8 is added after the convolution layer.

Just want to confirm if this is a deviation from the above two sources, and what was the reasoning for this?

Thanks!

Error in w2v.py line 52

Error in w2v.py line 52:
for key, word in vocabulary_inv.items()}
I think, it may be for key, word in enumerate(vocabulary_inv)}?

How to train the model with multi-class dataset

Today, I tried to train the model using my own dataset. But disappointedly, it performs like having an error. To make my problem clearly, I use one example to train the model:
train_data:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
5

test_data:

[1, 13, 14, 15, 16, 17, 18, 19, 8, 20, 21, 22, 12]
4

vocabulary:

请问 1
京都 2
议定书 3
规定 4
几个 5
工业 6
国家 7
的 8
二氧化碳 9
排放量 10
限制 11
? 12
金氏 13
世界纪录 14
中 15
最长 16
( 17
1650CM 18
) 19
发绣 20
作品 21
是 22

And the code that I had modified as follows.
Loading training data and testing data:

x_train, y_train, x_test, y_test = [], [], [], []
lines = open('data/train_data.txt', 'r').readlines()
for i in range(0, len(lines), 2):
    x_train.append(ast.literal_eval(lines[i]))
    y_train.append(int(lines[i + 1].replace('\n', '')))
lines = open('data/test_data.txt', 'r').readlines()
for i in range(0, len(lines), 2):
    x_test.append(ast.literal_eval(lines[i]))
    y_test.append(int(lines[i + 1].replace('\n', '')))
x_train = np.asarray(x_train)
y_train = np.asarray(y_train)
x_test = np.asarray(x_test)
y_test = np.asarray(y_test)
# (x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_words, start_char=None, oov_char=None, index_from=None)

Loading vocabulary:

vocabulary = {}
with open('data/data_info.txt', 'r') as f:
for line in f:
    content = line.strip().split()
    vocabulary[content[0]] = int(content[1])
# vocabulary = imdb.get_word_index()
vocabulary_inv = dict((v, k) for k, v in vocabulary.items())
vocabulary_inv[0] = "<PAD/>"

Loading word vectors:

# embedding_weights = train_word2vec(np.vstack((x_train, x_test)), vocabulary_inv, num_features=embedding_dim, min_word_count=min_word_count, context=context)
embedding_weights = {key: np.random.uniform(-0.25, 0.25, embedding_dim) for key, word in vocabulary_inv.items()}

Run the project and it shows:

Train on 1 samples, validate on 1 samples
Epoch 1/10

  • 0s - loss: 1.6577 - acc: 0.0000e+00 - val_loss: 0.9745 - val_acc: 0.0000e+00
    Epoch 2/10
  • 0s - loss: 5.3148 - acc: 0.0000e+00 - val_loss: 0.8149 - val_acc: 0.0000e+00
    Epoch 3/10
  • 0s - loss: -1.1496e+00 - acc: 0.0000e+00 - val_loss: 0.6260 - val_acc: 0.0000e+00
    Epoch 4/10
  • 0s - loss: 1.4295 - acc: 0.0000e+00 - val_loss: 0.5458 - val_acc: 0.0000e+00
    Epoch 5/10
  • 0s - loss: 1.8466 - acc: 0.0000e+00 - val_loss: 0.4188 - val_acc: 0.0000e+00
    Epoch 6/10
  • 0s - loss: -1.1397e+00 - acc: 0.0000e+00 - val_loss: 0.2414 - val_acc: 0.0000e+00
    Epoch 7/10
  • 0s - loss: -3.5465e-01 - acc: 0.0000e+00 - val_loss: -1.7151e-02 - val_acc: 0.0000e+00
    Epoch 8/10
  • 0s - loss: -9.7948e-01 - acc: 0.0000e+00 - val_loss: -2.8196e-01 - val_acc: 0.0000e+00
    Epoch 9/10
  • 0s - loss: 1.3985 - acc: 0.0000e+00 - val_loss: -5.5554e-01 - val_acc: 0.0000e+00
    Epoch 10/10
  • 0s - loss: 3.3196 - acc: 0.0000e+00 - val_loss: -7.9620e-01 - val_acc: 0.0000e+00

Need help with real time model eval

Once the training is done, how do you use the learned model to test for other text (interactive test)? As in, what do I do to test the model on some random data say twitter scraped data (Using the model to predict/classify for tweets)?

loss turns into nan when training using gpu

When i ran this code in my machine, the loss turns into nan ? how can i solve it?
Epoch 1/10
9595/9595 [==============================] - 1s - loss: 0.6242 - acc: 0.6295 - val_loss: 0.5265 - val_acc: 0.7470
Epoch 2/10
9595/9595 [==============================] - 1s - loss: nan - acc: 0.4117 - val_loss: nan - val_acc: 0.0000e+00
Epoch 3/10
9595/9595 [==============================] - 1s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00
Epoch 4/10
9595/9595 [==============================] - 1s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00
Epoch 5/10
9595/9595 [==============================] - 1s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00
Epoch 6/10
9595/9595 [==============================] - 1s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00
Epoch 7/10
9595/9595 [==============================] - 1s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00

How to train multiclass documents data?

Hi,

Thanks for releasing the code in keras, the other implementations are in torch and tensorflow and theano which currently I am not familiar with.

I have one question though, how can I train multiclass documents with this model? Currently it seems to support only binary class data?

Two fully-connected layers after convolutions

Thanks for sharing this nice implementation of the Yoon Kim CNN!

I noticed that you've added two fully connected layers after the dropout of the convolutions. Just to confirm, is this another deviation from the original paper? How I see it, the original paper only uses one fully connected layer.

z = Dropout(dropout_prob[1])(z)
z = Dense(hidden_dims, activation="relu")(z)
model_output = Dense(1, activation="sigmoid")(z)

About embedding_weights

First, thank you for code sharing.

In w2v.py, I saw your code as follows:

 embedding_weights = [np.array([embedding_model[w] if w in embedding_model
                  else np.random.uniform(-0.25, 0.25, embedding_model.vector_size)
                  for w in vocabulary_inv])]

For obtaining weights from embedding_model, parameter w must be a word, e.g. "happy".
But, in w2v.py, "for w in embedding_model ", w is an index of word

Is that a mistake here?
The code "else np.random.uniform(-0.25, 0.25, embedding_model.vector_size)" seems been executed in every loop.

accuracy

请问您的这个实现可以达到test集在88%左右吗

Great difference between train and test.

Thanks for sharing the code.
I have a question of the performance.

  1. Train + Save model + Load model + Test = Good Test Precision, Recall, F1-measure.
    After we comment the codes of 'Train' and 'Save model':
  2. Load model + Test = Bad Test Precision, Recall, F1-measure.
    Almost 10% gap, I guess that the training dataset is grouped with test set in some functions.

Here is a similar problem in cnn-text-classification-tf:
dennybritz/cnn-text-classification-tf#63

Thanks!

Negative dimension size caused by subtracting 3 from 1

Hi,

I tried my own dataset and I am getting the below error.

Error Message.

File "", line 135, in
strides=1)(z)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 619, in call
output = self.call(inputs, **kwargs)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/keras/layers/convolutional.py", line 160, in call
dilation_rate=self.dilation_rate[0])

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 3297, in conv1d
data_format=tf_data_format)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 781, in convolution
return op(input, filter)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 869, in call
return self.conv_op(inp, filter)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 521, in call
return self.call(inp, filter)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 205, in call
name=self.name)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 194, in _conv1d
name=name)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 497, in new_func
return func(*args, **kwargs)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 497, in new_func
return func(*args, **kwargs)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 2420, in conv1d
data_format=data_format)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 631, in conv2d
data_format=data_format, dilations=dilations, name=name)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3273, in create_op
compute_device=compute_device)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3313, in _create_op_helper
set_shapes_for_outputs(op)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2501, in set_shapes_for_outputs
return _set_shapes_for_outputs(op)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2474, in _set_shapes_for_outputs
shapes = shape_func(op)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2404, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 627, in call_cpp_shape_fn
require_shape_fn)

File "/home/nagarajan/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 691, in _call_cpp_shape_fn_impl
raise ValueError(err.message)

ValueError: Negative dimension size caused by subtracting 3 from 1 for 'conv1d_17/convolution/Conv2D' (op: 'Conv2D') with input shapes: [?,1,1,50], [1,3,50,10].

Trying to replicate the results obtained with denny brtiz's code

First of all, thanks for sharing your code, I was looking for a keras implementation of denny britz' code because while his implementation is really well explained it's very taxing when it comes to working with bigger text corpuses, since he doesn't store the indexes of the words in each sentence but their full embedding.

However, while your code works extremely well as far as memory management is concerned, I can't get to the same results as with the denny britz's implementation

Working with my own data (pretty short and preprocessed tweets), pretrained embeddings (dimension = 300) and two classes (0 and 1) I get an accuracy of 87% on the training set and 81% on the validation set in only a few epochs with denny britz' code.

Using the code here, with the exact same layers and parameters you have set up (except for filter_sizes = [2, 3, 4, 5], which was 3 and 8 in your implementation), the performance on my data is pretty bad

image

Which makes sense, since you apparently tweaked your parameters around longer text and I'm working with tweets.

So I thought I would replicate the exact same network as denny britz to see if the performance would be comparable, here's what the code looks like now

filter_sizes = [2, 3, 4, 5]
num_filters = 128
hidden_dims = 20
dropout_prob = 0.5
batch_size=1024 

model_input = Input(shape=(max_sentence_len,), dtype='int32')

z =Embedding(word_embeddings.shape[0],
            word_embeddings.shape[1],
            input_length=max_sentence_len,
            weights=[word_embeddings],
            trainable=False)(model_input)



# Convolutional block
conv_blocks = []
for sz in filter_sizes:
    conv = Convolution1D(filters=num_filters,
                         filter_length=sz,
                         padding="valid", #same / valid
                         activation="relu")(z)
    conv = MaxPooling1D(pool_size=2)(conv)
    conv = Flatten()(conv)
    conv_blocks.append(conv)
z = Concatenate()(conv_blocks) if len(conv_blocks) > 1 else conv_blocks[0]

#from keras.layers import Reshape
#z = Reshape([-1, num_filters*len(filter_sizes)])(z)   #doesn't work

z = Dropout(dropout_prob)(z)
z = Dense(hidden_dims, activation="relu")(z)
model_output = Dense(1, activation="sigmoid")(z)

from keras.optimizers import Adam

#base settings : lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0
adam_optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

model = Model(model_input, model_output)
model.compile(loss="binary_crossentropy", optimizer=adam_optimizer, metrics=["accuracy"])

# Train the model
hist = model.fit(X_train, y_train, batch_size=batch_size, epochs=100,
          validation_split=0.05, verbose=2)

In the denny britz implementation, the training and validation accuracies increase steadily, then the validation accuracy starts going down (due to overfitting) at around 81%
With the code above, the training accuracy increases steadily up to over 90% but the validation accuracy is all over the place and never goes above 80%, it almost seems random

image

Am I doing something wrong with my CNN? I can't get the reshape to work, but figured the additional hidden layer kind of did the job. I've tried tweaking its size (as well as many other parameters and layers) but can't get to the same level as the denny britz implementation. Since you most likely have way more experience with Keras than me, could you confirm whether my implementation is correct?

Or maybe the problem comes from my data itself?
Just to clarify, X_train is a tensor of the shape (rows, max_sentence_len) that contains the indexes of each word + index 0 for extra padding (index 0 leads to a vector full of 0's in the embedding file), y is an array that looks like [1 0 0 0 1 1 1 1 0 0....], max_sntences_len = 32 (since it's short tweets). I think that's the way the data is supposed to look like so I doubt the problem comes from there.

error when retraining word vector

Hi, I have reverted to an earlier commit: https://github.com/alexander-rakhlin/CNN-for-Sentence-Classification-in-Keras/tree/0a10445fbd0a1c783e6231cd1accb1e1a4e2252f

The codes run fine with the provided embeddings. But when I delete the embeddings in the models directory and retrain it. I get the following errors:

trainGraph.py:127: UserWarning: Update your Conv1D call to the Keras 2 API: Conv1D(activation="relu", padding="valid", strides=1, filters=3, kernel_size=3)
subsample_length=1)(graph_in)
trainGraph.py:128: UserWarning: Update your MaxPooling1D call to the Keras 2 API: MaxPooling1D(pool_size=2)
pool = MaxPooling1D(pool_length=2)(conv)
trainGraph.py:127: UserWarning: Update your Conv1D call to the Keras 2 API: Conv1D(activation="relu", padding="valid", strides=1, filters=3, kernel_size=4)
subsample_length=1)(graph_in)
trainGraph.py:133: UserWarning: The Merge layer is deprecated and will be removed after 08/2017. Use instead layers from keras.layers.merge, e.g. add, concatenate, etc.
out = Merge(mode='concat')(convs)
trainGraph.py:137: UserWarning: Update your Model call to the Keras 2 API: Model(outputs=Join.0, inputs=/input_1)
graph = Model(input=graph_in, output=out)
/root/anaconda/lib/python2.7/site-packages/keras/models.py:849: UserWarning: The nb_epoch argument in fit has been renamed epochs.
warnings.warn('The nb_epoch argument in fit '
Traceback (most recent call last):
File "trainGraph.py", line 158, in
model.fit(x_shuffled, y_shuffled, batch_size=batch_size, nb_epoch=num_epochs, validation_split=val_split, verbose=2)
File "/root/anaconda/lib/python2.7/site-packages/keras/models.py", line 868, in fit
initial_epoch=initial_epoch)
File "/root/anaconda/lib/python2.7/site-packages/keras/engine/training.py", line 1434, in fit
batch_size=batch_size)
File "/root/anaconda/lib/python2.7/site-packages/keras/engine/training.py", line 1310, in _standardize_user_data
exception_prefix='input')
File "/root/anaconda/lib/python2.7/site-packages/keras/engine/training.py", line 139, in _standardize_input_data
str(array.shape))
ValueError: Error when checking input: expected embedding_1_input to have shape (None, 56) but got array with shape (481625, 629)

Any idea to fix it? Thanks.

expected input_4 to have shape (None, 185) but got array with shape (1665, 35)

Hi , I have tried to run the model on my text file on your code. I have changed the parts but In the last line, I got this error:
ValueError: Error when checking input: expected input_4 to have shape (None, 185) but got array with shape (1665, 35)
Do you know how can I solve it? I would really appreciate it.
regards,

instruction on using data_handler with local data

Hi, thanks for sharing your code. Can you tell me how to use the rt-polarity data in the data foler with data_handler? Currently, sentiment_cnn.py would not work if the line loading the imdb data is replaced with loading data supplied by data_handler. Thanks in advance

CNN-static

Hi thank you for shaping with your code.

When CNN-static work correctly, how many performance can we expect to get?
I got good performance of CNN-rand and CNN-non-static which are 88-90% after 5 epochs. However, CNN-static got around 65 after even 50 epochs.

In order to work CNN-static right, should I change some parameters?

If giving advice, I really appreciate you.

Using Glove or GoogleNews?

I see that w2v.py builds a w2v model from text and use it in classification process 'sentiment_cnn'. Is it possible to load a popular trained w2v model such as 'Glove' or 'Google news', instead?

Thanks

Big difference acc and val_acc

Hello, I ran the this code but I got much worse result compared to the score author said.
Although I tried this code in both backends, tensorflow and Theano, both I got similar score.
Do you have any idea how to deal with the problem?
Thanks

screenshot 2017-05-04 07 42 46

TypeError: __init__() takes at least 3 arguments (2 given)

Hi, I clone the code and run it. But it seems that Convolution1D needs to be replaced by Conv1D first. And after the replace, I get the following complaints

conv = Conv1D(filters=num_filters, kernel_size=sz, padding="valid", activation="relu",strides=1)(z)
TypeError: init() takes at least 3 arguments (2 given)

I try removing (z) but the error message wont change. I also try add a "step" argument (=1) to input_shape but it does not have any effect either.

Any idea on how to fix this? Many thanks.

as for the CNN-non-static model initialization issue

hi, as for the below code,I don't think that this will initialize in the correct order.you can see my detailed valued below.

Initialize weights with word2vec

if model_type == "CNN-non-static":
weights = np.array([v for v in embedding_weights.values()])
print("Initializing embedding layer with word2vec weights, shape", weights.shape)
embedding_layer = model.get_layer("embedding")
embedding_layer.set_weights([weights])

embedding_weights[0]
array([-0.16838592, -0.06879535, 0.02589965, -0.10925651, 0.04015173,
-0.02650859, 0.0561571 , -0.05312364, -0.1532818 , 0.01564424,
0.01530888, -0.08801481, 0.03454486, -0.11123013, 0.14548153,
0.02919406, -0.05312879, 0.05036286, -0.17147419, -0.14561656,
-0.04017974, 0.06030416, -0.00127188, -0.08445276, -0.11172746,
0.02585902, -0.02454675, -0.05854974, -0.06838308, 0.06150924,
0.03584356, -0.01313417, -0.3134318 , -0.02525482, -0.07446898,
0.11405084, 0.05362098, -0.153895 , 0.24954648, -0.3041862 ,
-0.06347843, 0.32938534, 0.08088692, 0.2416265 , -0.18891637,
0.18807773, 0.28629586, -0.02997993, 0.32699 , 0.21486959],
dtype=float32)
weights[0]
array([-0.11203807, 0.11447106, -0.23200327, 0.21161124, -0.24602622,
0.24165967, 0.09703771, -0.22066535, -0.2168101 , -0.11105311,
0.00792669, -0.24837167, 0.20158639, 0.03558441, -0.12939217,
0.07052899, 0.23526682, 0.16881258, -0.24279054, -0.01003068,
0.11427283, 0.03127737, 0.21540562, 0.17022561, -0.0857431 ,
-0.17637912, -0.21004551, -0.0486851 , 0.24322698, -0.18044314,
0.08381914, 0.09214279, 0.06886086, 0.07738437, -0.19275099,
0.2035606 , -0.20341418, 0.0917947 , -0.00225014, -0.23336153,
0.2036845 , -0.07482199, -0.06884935, -0.18994595, 0.09833185,
0.15198245, 0.07992977, 0.04723972, 0.09418446, 0.12231474])

Problem in code understanding

Hi Rakhlin,
I have doubt in your code written kept in the link "https://github.com/alexander-rakhlin/CNN-for-Sentence-Classification-in-Keras/blob/master/sentiment_cnn.py" in sentiment_cnn.py file file at line number 81.In this embedding_weights which is a list and embedding_weights[0] is np.narray, how you can pass np.narray as index to the array index. It should technically give error something like "arrays used as indices must be of integer (or boolean) type".

Kindly help me I am stuck at this step. It will help me a lot in my college project. Attaching screenshot for you reference.
github

Reason for using pool length 2?

Hi,

I am not able to understand your motivation behind using pool length of only 2 in maxpooling layer.

In the paper, authors have used global max pooling, i.e. maximum activation from each feature map, one scalar from each feat. map. While you are taking half of the activations from each feature map

Only words, no sentences

What modification should I made if I want to train the model with my own dataset, without separating it per sentences? I only have a document of words and a label for each word. Hence, I would like to have a shape of the dataset [rows, colums] where rows is the number of words in my dataset) and columns is a number of 300 or 400 vector representation of each word. As a result each label will be assigned to each word separately, and not to the whole sentence.
I have modified the inner functions in order for the words to be understood, but in the final result, I cannot set correctly the vectors dimensionality of each word.

I am aware that this is a very specific question, obviously different than what this implementation provides, but since it is very clear what it does, I would like to apply it on my own results, if it is possible.

Thank you in advance!

Not the Same Performance At 'CNN-non-static'

Hi, I follow your setting below about 'CNN-non-static' network.

model_variation = 'CNN-non-static'  #  CNN-rand | CNN-non-static | CNN-static
print('Model variation is %s' % model_variation)

# Model Hyperparameters
sequence_length = 56
embedding_dim = 20
filter_sizes = (3, 4)
num_filters = 3
dropout_prob = (0.7, 0.8)
hidden_dims = 100

Then I split 10% samples for test.

# Shuffle data
shuffle_indices = np.random.permutation(np.arange(len(y)))
x_shuffled = x[shuffle_indices]
y_shuffled = y[shuffle_indices].argmax(axis=1)

test_split = 0.1
split_at = len(y) * (1 - test_split)
x_train, y_train = x_shuffled[:split_at], y_shuffled[:split_at]
x_val, y_val = x_shuffled[split_at:], y_shuffled[split_at:]

The network trains with the other 90% samples under the setting validation_split=0.1. The test samples will be evaluated after the last epoch.

model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

# Training model
# ==================================================
model.fit(x_train, y_train, batch_size=batch_size, nb_epoch=num_epochs, validation_split=val_split, verbose=2)
e = model.evaluate(x_val, y_val, batch_size=len(y_val), verbose=2)
print('loss %f acc %f' % tuple(e))

The trainning log seems perfect. the val_acc goes beyond 90%, but the test performance is sad.
It is only 76% far away from your result.
It looks like the validation is used for training over epochs, leading to the good performance of validation.

Is there some mistakes I made?

3s - loss: 0.2998 - acc: 0.8870 - val_loss: 0.1997 - val_acc: 0.9344
Epoch 91/100
3s - loss: 0.2971 - acc: 0.8923 - val_loss: 0.2010 - val_acc: 0.9302
Epoch 92/100
3s - loss: 0.2856 - acc: 0.8935 - val_loss: 0.1979 - val_acc: 0.9250
Epoch 93/100
3s - loss: 0.2831 - acc: 0.8940 - val_loss: 0.1994 - val_acc: 0.9302
Epoch 94/100
3s - loss: 0.2818 - acc: 0.8903 - val_loss: 0.1967 - val_acc: 0.9333
Epoch 95/100
3s - loss: 0.2895 - acc: 0.8892 - val_loss: 0.2103 - val_acc: 0.9187
Epoch 96/100
3s - loss: 0.2774 - acc: 0.8953 - val_loss: 0.2021 - val_acc: 0.9292
Epoch 97/100
3s - loss: 0.2739 - acc: 0.8957 - val_loss: 0.2453 - val_acc: 0.9073
Epoch 98/100
3s - loss: 0.2771 - acc: 0.8945 - val_loss: 0.1993 - val_acc: 0.9198
Epoch 99/100
3s - loss: 0.2798 - acc: 0.8960 - val_loss: 0.1953 - val_acc: 0.9250
Epoch 100/100
3s - loss: 0.2810 - acc: 0.8932 - val_loss: 0.2059 - val_acc: 0.9167
loss 0.635195 acc 0.761012

Running instructions

Hi @alexander-rakhlin it seems like a great work for sentence classification. Can you please provide details about how to use/run the system? And also kindly provide the link to the publication related to the system (if any).
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.