adjidieng / detm Goto Github PK

View Code? Open in Web Editor NEW

130.0 130.0 37.0 240 KB

License: MIT License

Python 100.00%

detm's People

Contributors

Stargazers

Watchers

detm's Issues

Infer topics of unseen documents

Hi, is there a way to infer topic distributions on unseen documents (not used to train the model)?

Thanks in advance!

How to obtain document-topic proportions (the thetas) for each document

Hi,

@adjidieng

How do I obtain the document-topic proportions for each document in the corpus?

Thank you
Luke

Cannot find embeddings.pkl

Hi,

Would you be able to point me in the direction where I can get the file "embeddings.pkl" mentioned in the main

print('Getting embeddings ...') emb_path = args.emb_path vect_path = os.path.join(args.data_path.split('/')[0], 'embeddings.pkl') vectors = {}

Many thanks.

The processed datasets(like ACL) in the given link are in different folders. For ACL datasets, are min_df_10, min_df_100, and min_df_2. What is the difference between them?
Look forward to your favorable reply.
Thanks a lot.

What exactly is NELBO and why do we optimize it?

Can someone tell me why we optimize NELBO? In the paper it only said "We optimize the ELBO with respect to the variational parameters." As far as I understand it D-ETM consists of three neural networks to find the distributions for theta, eta and alpha and then estimates KL divergences for them. And then the KL divergence values are simply added together and optimized jointly? But why is NLL added? And I thought that "Solving this optimization problem is equivalent to maximizing the evidence lower bound (ELBO)" would mean that we don't minimize it as a loss which the model seems to do but rather maximize it.

Sorry, I am pretty confused (I am rather new to Bayesian statistics and variational inference)

can't reproduce the preprocessed data

Hi there,
I ran https://github.com/adjidieng/DETM/blob/master/scripts/data_undebates.py on the kaggle data for un debates (as link in your paper: https://www.kaggle.com/unitednations/un-general-debates) but I am unable to reproduce the preprocessed data you linked here https://bitbucket.org/franrruiz/data_undebates_largev/src/master/ (variables in .mat files are different from yours) .
Any idea? There is not much setting beside min_df and max_df. I used the default, perhaps you used something else?

Custom Dataset

Hi,

Thank you for sharing your experimentation.

I wanted to try your experiment on a custom dataset, and I was wondering how to do it.

Should I just modify the "data_acl.py" in order to load my custom dataset and then everything would be straightforward or do you advice some others processing ?

Streamin Data Sources

Hi, is D-ETM good for streamin data sources? How do you suggest training in mini-batches of data as it accumulates?

I also have another related question. I understand that DETM can adjust the model for concept drifts by shifting the mean and varience of topics by gaussian distribution. What if there is a particular topic that exists only (and heavily) in only one (or a few) particular time interval. Would DETM be able to detect it? Or is there an assumption that all topics exist at all time intervals?

Loss versus KL

Does DETM work on short texts like tweets and does it work on corpus in other languages？

Thanks for your sharing.
I wish to conduct an analysis on the topic changes on twitter. I wonder whether DETM is suitable for doing this.

Running error

Traceback (most recent call last):
File "main.py", line 480, in
train(epoch)
File "main.py", line 217, in train
loss, nll, kl_alpha, kl_eta, kl_theta = model(data_batch, normalized_data_batch, times_batch, train_rnn_inp, args.num_docs_train)
File "/home/rodrigo/anaconda3/envs/DETM/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/rodrigo/Escritorio/DETM/DETM-master/detm.py", line 203, in forward
beta = beta[times.type('torch.LongTensor')]
RuntimeError: [enforce fail at CPUAllocator.cpp:56] posix_memalign(&data, gAlignment, nbytes) == 0. 12 vs 0

Embeddings over time

Hi, is it possible to show the evolution of term embeddings over times? At the moment the main.py shows the evolution of most prominent words per topic over time, but term embeddings are shown only statically. So, I am wondering if it can show smth like:

Word: war .. Time: 0 ===> neighbors: ['war', 'imperialism', 'iraq', 'bomb']
Word: war .. Time: 40 ===> neighbors: ['war', 'iran', 'ukraine', 'drone']

How to get final distribution of topics over documents/time slots?

I was wondering whether/how I can get the final distribution of topics over documents or time slots? Is that not possible with DETM? I have started applying and going through your code and have honestly not yet completely understood every line as I am quite new to coding and machine learning. So please excuse me if the question seems stupid. I have just seen blogposts about other models talking about the distribution of topics over documents and was wondering how I can see whether topics occurred more often in some time slots.

Loss decreases, KL_alpha and KL_theta increase

On my own data and when trying to reproduce the results for the UN dataset, I observe rising KL_alpha and KL_theta using Adam optimizer.

I have tried different settings for Adam and other optimizers on my data, but have not found a solution. I always observe this issue with Adam (I have tried different parameter settings) and other optimizers like asgd don't seem to work at all (very bad topics, no improvements in loss reduction,...). Anybody else with that issue and an idea how to solve it?

Here is an example from training D-ETM on my own dataset:

x

Questions regarding "q(\eta_t | \eta_{1:t-1}, \tilde{w}_t)"

Thanks for the interesting paper and great repository. I have a few clarification questions regarding the method and the code that I was wondering if you could help me with. Thanks in advance!

In Section 4.2 of the (arXiv version) paper, it states that
"We choose a Gaussian distribution q(\eta_t | \eta_{1:t-1}, \tilde{w}_t), whose mean and covariance are given by the output of the LSTM."
However, in this repository, the LSTM takes in only \tilde{w}_t as input, but not \eta_{1:t-1}
(https://github.com/adjidieng/DETM/blob/master/detm.py#L130)
Rather, \eta_{t-1} is only used AFTER LSTM (https://github.com/adjidieng/DETM/blob/master/detm.py#L146) through concatenation with the LSTM output. In this way, the LSTM can only capture the temporal dependency of \tilde{w}, but not the temporal dependency of \eta. I probably missed something, but I wonder if you could please help me understand the intuition behind this. Thank you.
In D-LDA (Dynamic Topic Models, Blei & Lafferty 2006) paper, the method is able to perform "future" prediction (Fig 5 in the D-LDA paper). On the other hand, with DETM, I wonder if the dependency of \tilde{w}_t in q(\eta_t | \eta_{1:t-1}, \tilde{w}_t) disables DETM from doing future prediction, since it uses "words from the future time step" (\tilde{w}_t).

Thank you!

adjidieng / detm Goto Github PK

detm's People

Contributors

Stargazers

Watchers

Forkers

detm's Issues

Recommend Projects

Recommend Topics

Recommend Org