Coder Social home page Coder Social logo

dynamite's Introduction

DynaMiTE: Discovering Explosive Topic Evolutions with User Guidance

This repository is the official implementation of "DynaMiTE: Discovering Explosive Topic Evolutions with User Guidance", which was accepted to Findings of ACL 2023

model name

Datasets

The datasets used in our experiments can be found on Huggingface here! There are three splits in the dataset: arxiv, un, and newspop, which correspond to the three datasets used in the paper. Each dataset also has two columns:

  • text: The text of the document in the corpus
  • time_discrete: The time stamp of the document in the corpus

Requirements

This code was run on Python 3.8.10. We recommend creating a virtual environment to run DynaMiTE.

To install requirements:

pip install -r requirements.txt

Preprocessing

First, create a folder with your dataset name in the data folder (e.g. data/arxiv). To load in your dataset, create a data.csv file where each row is a document. This CSV must contain at least two columns, text and time_discrete, which correspond to the text of the document as well as an ordinal integer representing the time step of the document.

The zip file containing AutoPhrase must also be downloaded from here. This zip file should be moved into the preprocessing folder.

To preprocess the dataset, navigate to /preprocess/, specify the parameters at the top of the preprocess.py file, and run the following command:

python preprocess.py

The folder with your CSV will become populated with more data. Expect ~15min to process each dataset.

We provide links to download the Arxiv, UN, and Newspop datasets.

Training

First, navigate to train_model/train.py and specify your parameters at the top of the file. Then, navigate back to parent directory.

You can train DynaMiTE by running the following command:

python train_model/train.py

You must specify an output folder, which defaults to the results folder. You must also add the same dataset folder to the output folder specified in the preprocessing step (e.g. results/arxiv/). After training, the specified output folder will be populated with the topic evolutions in a text file, along with the embeddings from the discriminative dynamic word embedding space.

Experiments

We include code for running quantitative experiments for NPMI and the category shift analysis. Both experiments require the outputs from training.

You can calculate NPMI by running the following command:

python eval.py

You can run the category experiment through the following command:

python shift_study.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.