Comments (6)
Hi, I feel strange about it too.
Another thing puzzles me is where the wordlist.txt
comes from. Since I try to generate IMDB wordlist using its train.txt and dev.txt. Just get 90000+ words, rather than 100000+ words as authors described.
Is there anything I missed?
from nsc.
@lan2720 About the wordlist, i think you should just use the author gives to you.
About where it comes from , i guess , maybe the original dataset(that have not been split).Then use tokenization to get it.
from nsc.
@wangjialin114 Thanks but I don't think so. Because the authors say they use the datasets provided by Tang. In Tang's zip file, train/dev/test data have been split already. And he didn't provide other files like wordlist.txt. So wordlist.txt was created by the authors using train and dev sets. But now I just cannot get the same result according to this idea.
from nsc.
- @wangjialin114
Thanks for your question. As you said, there is one "UNK" word embedding in the embinit.save, which is the last embedding.
from nsc.
- @lan2720
Thanks for your question. I get the wordlist.txt from a more original dataset in Tang 2015"Document Modeling with Gated Recurrent Neural Network for Sentiment Classification". And the dataset in this paper is a subset from the original dataset. Maybe I state not so clearly.
from nsc.
@huimchen Thanks for your clear answer!
from nsc.
Related Issues (20)
- NSC+UPA问题 HOT 5
- What is 'self.output = outs[1]' in "LSTMLayer.py" means?
- 关于LSTMModel.py的问题 HOT 3
- 代码容易跑崩 HOT 1
- 词向量训练 HOT 1
- Pre-train the 200-dimensional word embeddings on each dataset HOT 1
- Will it work without GPU? HOT 2
- when i run THEANO_FLAGS="floatX=float32,device=gpu" python train.py IMDB 10 occurred ValueError,
- 在下载data.zip的IMDB数据中,为什么词向量很多都是0 HOT 1
- 我的GPU是64 位的,训练指令配置32位,THEANO_FLAGS="floatX=float32,device=gpu" python train.py IMDB 10
- 使用GPU报内存错误,请问你的theano版本? HOT 1
- About the result. HOT 5
- NSC+UPA的模型中,为什么句子层和词汇层得到的user, product表示不同? HOT 1
- 训练train的疑问 HOT 2
- 训练一直会有错误。请求指导? HOT 2
- 可以支持中文吗? HOT 2
- can this be used on Windows computer? HOT 7
- i need help HOT 1
- 您好,想问下这个用户和产品注意力的词典或者矩阵是怎么建立的呢?不太明白。谢谢您的指导! HOT 2
- 数据集的下载链接访问受限 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nsc.