GETM: Graph Embedded Topic Model

A generative model that intergrates embedded topic model and node2vec.

The detailed description and its application on UK-Biobank could be found here

Contents

1 Model Overview

(a) GETM training. GETM is a variational autoencoder (VAE) model. The neural network encoder takes individuals' condition and medication information as input and produces the variational mean μ and variance σ²> for the patient topic mixtures θ. The decoder is linear and consists of two tri-factorizations. One learns medication-defined topic embedding α^(med) and medication embedding ρ^(med). The other learns condition-specific topic embedding α^(cond) and the condition embedding ρ^(cond). We separately pre-train (b) the embedding of medications ρ^(med) and (c) the embedding of conditions ρ^(cond) using node2vec based on their structural meta-information. This is done learning the node embedding that maximizes the likelihood of the tree-structured relational graphs of conditions and medications.

2 Dependencies

The requirements.txt is located in scripts/requirements.txt

pip install -r requirements.txt

3 Usage and Running Examples

Data format

The getm takes a bag-of-words individual-by-med+cond numpy matrix, a medication embedding matrix and a condition embedding matrix.
node2vec requires a text file with format as: node1 node2.

Command examples

Get node embedding with node2vec

import networkx as nx 
from node2vec import Node2Vec
# Get groph using text file
G = nx.read_edgelist(graph_file, nodetype=int, create_using=nx.DiGraph())
for edge in G.edges():
    G[edge[0]][edge[1]]['weight'] = 1
G = G.to_undirected()
# Run node2vec
node2vec = Node2Vec(G, dimensions=dimensions, walk_length=walk_length, \
                   num_walks=num_walks, workers=workers)

Commands to run getm
- Run getm without masking test information
python main_multi_etm_sep.py --epochs=10 --lr=0.01 --batch_size=100 --save_path="acute2chronic_results/results_m802c443_topic128"\ --vocab_size1=802 --vocab_size2=443 --data_path="data/drug802_cond443" --num_topics=128 --rho_size=128 --emb_size=128 --t_hidden_size=128 --enc_drop=0.0 \ --train_embeddings1=0 --embedding1="drug_emb.npy" --train_embeddings2=0 --embedding2="code_emb.npy" --rho_fixed1=1 --rho_fixed2=1

data-path: path for loading input data in form of bag-of-words for each feature
-vocab_size1: number of unique medication
-vocab_size2: number of unique condition
-train_embedding1: whether to initialize medication embedding randomly
-train_embedding2: whether to initialize medication embedding randomly
-embedding1: path for pretrained medication embedding
-embedding2: path for pretrained condition embedding
-rho_fixed1: whether to fix medication embedding during training
-rho_fixed2: whether to fix condition embedding during training
- Run getm with partial test information masked
python main_multi_etm_rec.py ...

4 File Description

subjob
- run_metm.sh: bash script to run job
scripts
- multi_etm_sep.py: GETM model script
- main_multi_etm_sep.py: The script to instantialize a GETM model, train and evaluate it
- main_multi_etm_rec.py: The script to instantialize a GETM model, train and evaluate it with test medication totally masked
- etm.py: ETM model script
- main.py: The script to instantialize a ETM model, train and evaluate it

5 References

[1] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. CoRR, abs/1607.00653, 2016. [2]a> Dieng, Adji B and Ruiz, Francisco J R and Blei, David M. Topic modeling in embedding spaces. arXiv preprint arXiv:1907.04907, 2019

li-lab-mcgill / getm Goto Github PK

getm's Introduction

GETM: Graph Embedded Topic Model

Contents

1 Model Overview

2 Dependencies

3 Usage and Running Examples

Data format

Command examples

4 File Description

5 References

getm's People

Contributors

Stargazers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent