adjidieng / detm Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hi, is there a way to infer topic distributions on unseen documents (not used to train the model)?
Thanks in advance!
Hi,
How do I obtain the document-topic proportions for each document in the corpus?
Thank you
Luke
Hi,
Would you be able to point me in the direction where I can get the file "embeddings.pkl" mentioned in the main
print('Getting embeddings ...') emb_path = args.emb_path vect_path = os.path.join(args.data_path.split('/')[0], 'embeddings.pkl') vectors = {}
Many thanks.
The processed datasets(like ACL) in the given link are in different folders. For ACL datasets, are min_df_10, min_df_100, and min_df_2. What is the difference between them?
Look forward to your favorable reply.
Thanks a lot.
Can someone tell me why we optimize NELBO? In the paper it only said "We optimize the ELBO with respect to the variational parameters." As far as I understand it D-ETM consists of three neural networks to find the distributions for theta, eta and alpha and then estimates KL divergences for them. And then the KL divergence values are simply added together and optimized jointly? But why is NLL added? And I thought that "Solving this optimization problem is equivalent to maximizing the evidence lower bound (ELBO)" would mean that we don't minimize it as a loss which the model seems to do but rather maximize it.
Sorry, I am pretty confused (I am rather new to Bayesian statistics and variational inference)
Hi there,
I ran https://github.com/adjidieng/DETM/blob/master/scripts/data_undebates.py on the kaggle data for un debates (as link in your paper: https://www.kaggle.com/unitednations/un-general-debates) but I am unable to reproduce the preprocessed data you linked here https://bitbucket.org/franrruiz/data_undebates_largev/src/master/ (variables in .mat files are different from yours) .
Any idea? There is not much setting beside min_df and max_df. I used the default, perhaps you used something else?
Hi,
Thank you for sharing your experimentation.
I wanted to try your experiment on a custom dataset, and I was wondering how to do it.
Should I just modify the "data_acl.py" in order to load my custom dataset and then everything would be straightforward or do you advice some others processing ?
Hi, is D-ETM good for streamin data sources? How do you suggest training in mini-batches of data as it accumulates?
I also have another related question. I understand that DETM can adjust the model for concept drifts by shifting the mean and varience of topics by gaussian distribution. What if there is a particular topic that exists only (and heavily) in only one (or a few) particular time interval. Would DETM be able to detect it? Or is there an assumption that all topics exist at all time intervals?
Thanks for your sharing.
I wish to conduct an analysis on the topic changes on twitter. I wonder whether DETM is suitable for doing this.
Traceback (most recent call last):
File "main.py", line 480, in
train(epoch)
File "main.py", line 217, in train
loss, nll, kl_alpha, kl_eta, kl_theta = model(data_batch, normalized_data_batch, times_batch, train_rnn_inp, args.num_docs_train)
File "/home/rodrigo/anaconda3/envs/DETM/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/rodrigo/Escritorio/DETM/DETM-master/detm.py", line 203, in forward
beta = beta[times.type('torch.LongTensor')]
RuntimeError: [enforce fail at CPUAllocator.cpp:56] posix_memalign(&data, gAlignment, nbytes) == 0. 12 vs 0
Hi, is it possible to show the evolution of term embeddings over times? At the moment the main.py shows the evolution of most prominent words per topic over time, but term embeddings are shown only statically. So, I am wondering if it can show smth like:
Word: war .. Time: 0 ===> neighbors: ['war', 'imperialism', 'iraq', 'bomb']
Word: war .. Time: 40 ===> neighbors: ['war', 'iran', 'ukraine', 'drone']
I was wondering whether/how I can get the final distribution of topics over documents or time slots? Is that not possible with DETM? I have started applying and going through your code and have honestly not yet completely understood every line as I am quite new to coding and machine learning. So please excuse me if the question seems stupid. I have just seen blogposts about other models talking about the distribution of topics over documents and was wondering how I can see whether topics occurred more often in some time slots.
On my own data and when trying to reproduce the results for the UN dataset, I observe rising KL_alpha and KL_theta using Adam optimizer.
I have tried different settings for Adam and other optimizers on my data, but have not found a solution. I always observe this issue with Adam (I have tried different parameter settings) and other optimizers like asgd don't seem to work at all (very bad topics, no improvements in loss reduction,...). Anybody else with that issue and an idea how to solve it?
Thanks for the interesting paper and great repository. I have a few clarification questions regarding the method and the code that I was wondering if you could help me with. Thanks in advance!
In Section 4.2 of the (arXiv version) paper, it states that
"We choose a Gaussian distribution q(\eta_t | \eta_{1:t-1}, \tilde{w}_t), whose mean and covariance are given by the output of the LSTM."
However, in this repository, the LSTM takes in only \tilde{w}_t
as input, but not \eta_{1:t-1}
(https://github.com/adjidieng/DETM/blob/master/detm.py#L130)
Rather, \eta_{t-1}
is only used AFTER LSTM (https://github.com/adjidieng/DETM/blob/master/detm.py#L146) through concatenation with the LSTM output. In this way, the LSTM can only capture the temporal dependency of \tilde{w}
, but not the temporal dependency of \eta
. I probably missed something, but I wonder if you could please help me understand the intuition behind this. Thank you.
In D-LDA (Dynamic Topic Models, Blei & Lafferty 2006) paper, the method is able to perform "future" prediction (Fig 5 in the D-LDA paper). On the other hand, with DETM, I wonder if the dependency of \tilde{w}_t
in q(\eta_t | \eta_{1:t-1}, \tilde{w}_t)
disables DETM from doing future prediction, since it uses "words from the future time step" (\tilde{w}_t
).
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.