Coder Social home page Coder Social logo

getm's Introduction

GETM: Graph Embedded Topic Model

A generative model that intergrates embedded topic model and node2vec.

The detailed description and its application on UK-Biobank could be found here

Contents

1 Model Overview

(a) GETM training. GETM is a variational autoencoder (VAE) model. The neural network encoder takes individuals' condition and medication information as input and produces the variational mean μ and variance σ2> for the patient topic mixtures θ. The decoder is linear and consists of two tri-factorizations. One learns medication-defined topic embedding α(med) and medication embedding ρ(med). The other learns condition-specific topic embedding α(cond) and the condition embedding ρ(cond). We separately pre-train (b) the embedding of medications ρ(med) and (c) the embedding of conditions ρ(cond) using node2vec based on their structural meta-information. This is done learning the node embedding that maximizes the likelihood of the tree-structured relational graphs of conditions and medications.

2 Dependencies

The requirements.txt is located in scripts/requirements.txt

pip install -r requirements.txt

3 Usage and Running Examples

Data format

  • The getm takes a bag-of-words individual-by-med+cond numpy matrix, a medication embedding matrix and a condition embedding matrix.
  • node2vec requires a text file with format as: node1 node2.

Command examples

  • Get node embedding with node2vec
import networkx as nx 
from node2vec import Node2Vec
# Get groph using text file
G = nx.read_edgelist(graph_file, nodetype=int, create_using=nx.DiGraph())
for edge in G.edges():
    G[edge[0]][edge[1]]['weight'] = 1
G = G.to_undirected()
# Run node2vec
node2vec = Node2Vec(G, dimensions=dimensions, walk_length=walk_length, \
                   num_walks=num_walks, workers=workers)
  • Commands to run getm

    • Run getm without masking test information

    python main_multi_etm_sep.py --epochs=10 --lr=0.01 --batch_size=100 --save_path="acute2chronic_results/results_m802c443_topic128"\ --vocab_size1=802 --vocab_size2=443 --data_path="data/drug802_cond443" --num_topics=128 --rho_size=128 --emb_size=128 --t_hidden_size=128 --enc_drop=0.0 \ --train_embeddings1=0 --embedding1="drug_emb.npy" --train_embeddings2=0 --embedding2="code_emb.npy" --rho_fixed1=1 --rho_fixed2=1

    data-path: path for loading input data in form of bag-of-words for each feature
    -vocab_size1: number of unique medication
    -vocab_size2: number of unique condition
    -train_embedding1: whether to initialize medication embedding randomly
    -train_embedding2: whether to initialize medication embedding randomly
    -embedding1: path for pretrained medication embedding
    -embedding2: path for pretrained condition embedding
    -rho_fixed1: whether to fix medication embedding during training
    -rho_fixed2: whether to fix condition embedding during training

    • Run getm with partial test information masked

    python main_multi_etm_rec.py ...

4 File Description

5 References

[1] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. CoRR, abs/1607.00653, 2016. [2]a> Dieng, Adji B and Ruiz, Francisco J R and Blei, David M. Topic modeling in embedding spaces. arXiv preprint arXiv:1907.04907, 2019

getm's People

Contributors

yueningwang avatar

Stargazers

Sami Nas avatar Chris Tomlinson avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.