Coder Social home page Coder Social logo

gan_doc_model's Introduction

Document modeling with Generative Adversarial Networks

Setup Guide:

The below steps need to be run to generate documents using this GAN model.

Requirements

  1. Ensure that Python 3 is installed before installing remaining dependencies
  2. Create a virtual environment (conda or pip3) to run the solution
  • Create a pip3 virtual environment to run the model, using below commands:

    $ python3 -m venv dmgan

    $ source dmgan/bin/activate

  • Create a conda virtual environment to run the model, using below commands:

    $ conda create -y --name dmgan python=3.6

    $ conda activate dmgan

    $ conda install pip

  1. Install the remaining dependencies and libraries, by running below command:

    $ pip install -r requirements.txt

Data Population

  1. Run below command to prepare the raw input dataset (consisting of 18,846 documents), and split over train-test-validation datasets:

    $ python prepare.py

  2. 3 new files (training.csv (13,192 documents); validation.csv (1,884 documents); test.csv (3,769 documents)) are populated in the /data folder. In each CSV file, the 1st column is the label and 2nd column is the raw text document body.

Data Preprocessing

  1. Run below command to pre-process the input raw data to the vectorized format expected by the model:

    $ python preprocess.py --input data --output preprocessed_data --vocab data/20newsgroups.vocab

where: input is path to input dataset; output is path to preprocessed output dataset; vocab is path to vocab file

  1. 4 new files (training.csv; validation.csv; test.csv; labels.txt) are populated in the /preprocessed_data folder. In each CSV file, the 1st column is the label and 2nd column is the vectorized document body. The text file consists of the 20 groups of 20NewsGroups corpus.

Model Training

  1. Run below command to train the GAN model:

    $ python train.py --dataset preprocessed_data --model results

where: dataset is path to preprocessed dataset; model is path to model output directory

  1. To view Tensorboard graphs, plots, etc., run below command in new terminal and open the generated URL link:

    $ tensorboard --logdir results/logs/

where: logdir is path to results logs directory

  1. To view additional parameters:

    $ python train.py --help

Evaluating results

  1. Run below command to evaluate the retrieval results:

    $ python evaluate.py --dataset preprocessed_data --model results

where: dataset is path to preprocessed dataset; model is path to trained model directory

Extracting document vectors

  1. Run below command to extract document vectors which will be saved in NumPy text format to the model directory:

    $ python vectors.py --dataset preprocessed_data --model results

where: dataset is path to preprocessed dataset; model is path to trained model directory

Note: [Base code inspired from https://github.com/AYLIEN/adversarial-document-model]

gan_doc_model's People

Contributors

harshirao avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.