Using the splitting method provided in the paper_experiment, I found that the testing

Testing data and train data are repeated in Shakespeare about leaf HOT 7 CLOSED

talwalkarlab commented on September 4, 2024

Testing data and train data are repeated in Shakespeare

from leaf.

Comments (7)

zliangak commented on September 4, 2024 1

Cool! Thanks a lot!

from leaf.

scaldas commented on September 4, 2024

You are right. The split shouldn't be at random but temporal, and we should be careful to avoid this. We will work on fixing this.

Can you provide us with the parameters that you used to obtain the testing accuracy (learning rate, size of the layers, etc.)?

Thank you.

from leaf.

zliangak commented on September 4, 2024

I am using pytorch. All info is shown in the file below. One different of my model is that I feed all the hidden unit (instead of the last one) in to the linear layer. When I use your implementation of LSTM, i.e. only feeding the last hidden unit, I can still get a pretty high accuracy given enough training epochs.

Hope this can be fixed soon. It is a very helpful dataset. Thanks.

test.pdf

from leaf.

zliangak commented on September 4, 2024

Sorry, there is a mistake of my implementation in the cell-6 of "test.pdf". When I define test_loader, I should use dataset=test_set instead of dataset=train_set.

And this would not affect the existence of the problem regarding this issue.

Regards,

from leaf.

scaldas commented on September 4, 2024

Sorry, I'm confused by your update. Does this mean the issue remains?

from leaf.

zliangak commented on September 4, 2024

Yes, the issue remains.

from leaf.

scaldas commented on September 4, 2024

I have modified the train/test splits for Shakespeare. They are now temporally split, and samples that would leak any test information into the training set are ignored. This means that, if the last training sample happens at index i, the first test sample happens at index i + seq_len. We use seq_len as 80.

A side effect of this change is that some users now don't have any test samples, and have to be dropped from training.

from leaf.

Recommend Projects

Testing data and train data are repeated in Shakespeare about leaf HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent