Coder Social home page Coder Social logo

detm's People

Contributors

adjidieng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

detm's Issues

Cannot find embeddings.pkl

Hi,

Would you be able to point me in the direction where I can get the file "embeddings.pkl" mentioned in the main

print('Getting embeddings ...') emb_path = args.emb_path vect_path = os.path.join(args.data_path.split('/')[0], 'embeddings.pkl') vectors = {}

Many thanks.

Dataset meaning

The processed datasets(like ACL) in the given link are in different folders. For ACL datasets, are min_df_10, min_df_100, and min_df_2. What is the difference between them?
Look forward to your favorable reply.
Thanks a lot.

What exactly is NELBO and why do we optimize it?

Can someone tell me why we optimize NELBO? In the paper it only said "We optimize the ELBO with respect to the variational parameters." As far as I understand it D-ETM consists of three neural networks to find the distributions for theta, eta and alpha and then estimates KL divergences for them. And then the KL divergence values are simply added together and optimized jointly? But why is NLL added? And I thought that "Solving this optimization problem is equivalent to maximizing the evidence lower bound (ELBO)" would mean that we don't minimize it as a loss which the model seems to do but rather maximize it.

Sorry, I am pretty confused (I am rather new to Bayesian statistics and variational inference)

can't reproduce the preprocessed data

Hi there,
I ran https://github.com/adjidieng/DETM/blob/master/scripts/data_undebates.py on the kaggle data for un debates (as link in your paper: https://www.kaggle.com/unitednations/un-general-debates) but I am unable to reproduce the preprocessed data you linked here https://bitbucket.org/franrruiz/data_undebates_largev/src/master/ (variables in .mat files are different from yours) .
Any idea? There is not much setting beside min_df and max_df. I used the default, perhaps you used something else?

Custom Dataset

Hi,

Thank you for sharing your experimentation.

I wanted to try your experiment on a custom dataset, and I was wondering how to do it.

Should I just modify the "data_acl.py" in order to load my custom dataset and then everything would be straightforward or do you advice some others processing ?

Streamin Data Sources

Hi, is D-ETM good for streamin data sources? How do you suggest training in mini-batches of data as it accumulates?

I also have another related question. I understand that DETM can adjust the model for concept drifts by shifting the mean and varience of topics by gaussian distribution. What if there is a particular topic that exists only (and heavily) in only one (or a few) particular time interval. Would DETM be able to detect it? Or is there an assumption that all topics exist at all time intervals?

Running error

Traceback (most recent call last):
File "main.py", line 480, in
train(epoch)
File "main.py", line 217, in train
loss, nll, kl_alpha, kl_eta, kl_theta = model(data_batch, normalized_data_batch, times_batch, train_rnn_inp, args.num_docs_train)
File "/home/rodrigo/anaconda3/envs/DETM/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/rodrigo/Escritorio/DETM/DETM-master/detm.py", line 203, in forward
beta = beta[times.type('torch.LongTensor')]
RuntimeError: [enforce fail at CPUAllocator.cpp:56] posix_memalign(&data, gAlignment, nbytes) == 0. 12 vs 0

Embeddings over time

Hi, is it possible to show the evolution of term embeddings over times? At the moment the main.py shows the evolution of most prominent words per topic over time, but term embeddings are shown only statically. So, I am wondering if it can show smth like:

Word: war .. Time: 0 ===> neighbors: ['war', 'imperialism', 'iraq', 'bomb']
Word: war .. Time: 40 ===> neighbors: ['war', 'iran', 'ukraine', 'drone']

How to get final distribution of topics over documents/time slots?

I was wondering whether/how I can get the final distribution of topics over documents or time slots? Is that not possible with DETM? I have started applying and going through your code and have honestly not yet completely understood every line as I am quite new to coding and machine learning. So please excuse me if the question seems stupid. I have just seen blogposts about other models talking about the distribution of topics over documents and was wondering how I can see whether topics occurred more often in some time slots.

Loss decreases, KL_alpha and KL_theta increase

On my own data and when trying to reproduce the results for the UN dataset, I observe rising KL_alpha and KL_theta using Adam optimizer.

I have tried different settings for Adam and other optimizers on my data, but have not found a solution. I always observe this issue with Adam (I have tried different parameter settings) and other optimizers like asgd don't seem to work at all (very bad topics, no improvements in loss reduction,...). Anybody else with that issue and an idea how to solve it?

Here is an example from training D-ETM on my own dataset:
Bildschirmfoto 2020-10-06 um 13 39 23

Questions regarding "q(\eta_t | \eta_{1:t-1}, \tilde{w}_t)"

Thanks for the interesting paper and great repository. I have a few clarification questions regarding the method and the code that I was wondering if you could help me with. Thanks in advance!

  1. In Section 4.2 of the (arXiv version) paper, it states that
    "We choose a Gaussian distribution q(\eta_t | \eta_{1:t-1}, \tilde{w}_t), whose mean and covariance are given by the output of the LSTM."
    However, in this repository, the LSTM takes in only \tilde{w}_t as input, but not \eta_{1:t-1}
    (https://github.com/adjidieng/DETM/blob/master/detm.py#L130)
    Rather, \eta_{t-1} is only used AFTER LSTM (https://github.com/adjidieng/DETM/blob/master/detm.py#L146) through concatenation with the LSTM output. In this way, the LSTM can only capture the temporal dependency of \tilde{w}, but not the temporal dependency of \eta. I probably missed something, but I wonder if you could please help me understand the intuition behind this. Thank you.

  2. In D-LDA (Dynamic Topic Models, Blei & Lafferty 2006) paper, the method is able to perform "future" prediction (Fig 5 in the D-LDA paper). On the other hand, with DETM, I wonder if the dependency of \tilde{w}_t in q(\eta_t | \eta_{1:t-1}, \tilde{w}_t) disables DETM from doing future prediction, since it uses "words from the future time step" (\tilde{w}_t).

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.