Coder Social home page Coder Social logo

weaksupervised-graphemb-userrep's Introduction

WeakSupervised-GraphEmb-UserRep

This repository contains code for [ Twitter User Representation using Weakly Supervised Graph Embedding ], ICWSM 2022.

Data:

  1. Please download the 'data' folder from following link (on request):

[Data]

  1. 'data' folder should be kept inside the 'WeakSupervised-GraphEmb-UserRep/code' folder.

Computing Machine:

Supermicro SuperServer 7046GT-TR

Two Intel Xeon X5690 3.46GHz 6-core processors

48 GB of RAM

Linux operating system

Four NVIDIA GeForce GTX 1060 GPU cards for CUDA/OpenCL programming

Software Packages and libraries:

python 3.6.6

PyTorch 1.1.0

jupiter notebook

pandas

pickle

gensim

nltk

nltk.tag

spacy

emoji

sklearn

statsmodels

scipy

matplotlib

numpy

preprocessor

transformers

Code:

For Yoga:

  1. Create graph embedding (Input files: weakly_all_des_gt_mergetweets_yoga_13k.csv, mention_yoga.pickle, Output files: data_yoga_des_net.pickle, processed_yoga_data_des_net.pickle, yoga_graph_des_net.adjlist, yoga_graph_des_net.mapping) :
create_graph_des_@mention_weak_yoga_13k.ipynb 

  1. For Embedding learning (Input files: data_yoga_des_net.pickle, Output files: b_model_des_net_yoga.m, predicted_utypes_des_net_yoga.pickle, user_des_net_yoga.embeddings):

yoga_graph_des_@mention_embd.ipynb


  1. For M step, run iteratively following codes:

For first iteration, run following code


EM_yoga_graph_des_@mention_embd.ipynb

With Input files: processed_yoga_data_des_net.pickle, user_des_net_yoga.embeddings, weakly_all_des_gt_mergetweets_yoga_13k.csv, Output files: weakly_all_des_gt_mergetweets_yoga_13k_em1_des_net.csv

Then run following code:


EM_create_graph_des_@mention_weak_yoga_13k.ipynb

With Input files: weakly_all_des_gt_mergetweets_yoga_13k_em1_des_net.csv, mention_yoga.pickle, Output files: data_yoga_des_net_em1.pickle, yoga_graph_des_net_em1.adjlist, yoga_graph_des_net_em1.mapping

For second iteration, run following code


EM_yoga_graph_des_@mention_embd.ipynb

With Input files: data_yoga_des_net_em1.pickle, processed_yoga_data_des_net.pickle, user_des_net_yoga_em1.embeddings, weakly_all_des_gt_mergetweets_yoga_13k_em1_des_net.csv, Output files: b_model_des_net_yoga_em1.m, predicted_utypes_des_net_yoga_em1.pickle, user_des_net_yoga_em1.embeddings, weakly_all_des_gt_mergetweets_yoga_13k_em2_des_net.csv

Then run following code:


EM_create_graph_des_@mention_weak_yoga_13k.ipynb

With Input files: weakly_all_des_gt_mergetweets_yoga_13k_em2_des_net.csv, mention_yoga.pickle, Output files: data_yoga_des_net_em2.pickle, yoga_graph_des_net_em2.adjlist, yoga_graph_des_net_em2.mapping

For third iteration, run following code


EM_yoga_graph_des_@mention_embd.ipynb

With Input files: data_yoga_des_net_em2.pickle, processed_yoga_data_des_net.pickle, user_des_net_yoga_em2.embeddings, weakly_all_des_gt_mergetweets_yoga_13k_em2_des_net.csv, Output files: b_model_des_net_yoga_em2.m, predicted_utypes_des_net_yoga_em2.pickle, user_des_net_yoga_em2.embeddings, weakly_all_des_gt_mergetweets_yoga_13k_em3_des_net.csv

  1. Supervised Baseline for yoga:

    4.1) For LSTM_Glove, run

    baseline_glove_lstm_yoga_13k_weak.ipynb
    
    
    

    4.2) For fine-tuned BERT, run

    baseline_BERT_finetuned_tweet_yoga_13k_weak.ipynb
    
    
    

    For 3-fold cross-validation, randomly splitted input files are: For training: baseline_train_yoga_weak_cv1.csv, baseline_train_yoga_weak_cv2.csv, baseline_train_yoga_weak_cv3.csv For validation: baseline_val_yoga_weak_cv1.csv, baseline_val_yoga_weak_cv2.csv, baseline_val_yoga_weak_cv3.csv For testing: baseline_test_yoga_weak_cv1.csv, baseline_test_yoga_weak_cv2.csv, baseline_test_yoga_weak_cv3.csv

  2. Visualize embeddings after label propagation: (input files: user_des_net_yoga.embeddings and predicted_yoga_13k_lp.csv )

    visualize_graph_learning_embeddings_yoga.ipynb
    
    
    
  3. Visualize embeddings after EM: (input files: user_des_net_yoga_em2.embeddings and predicted_yoga_13k.csv )

    visualize_embeddings_EM2_yoga.ipynb
    
    

For Keto:

  1. Create graph embedding (Input files: weakly_all_des_mergetweets_keto_gt_14k.csv, mention_keto.pickle, Output files: data_keto_des_net.pickle, processed_keto_data_des_net.pickle, keto_graph_des_net.adjlist, keto_graph_des_net.mapping) :
create_graph_des_@mention_weak_keto_14k.ipynb 

  1. For Embedding learning (Input files: data_keto_des_net.pickle, Output files: b_model_des_net_keto.m, predicted_utypes_des_net_keto.pickle, user_des_net_keto.embeddings):

keto_graph_des_@mention_embd.ipynb


  1. For M step, run iteratively following codes:

For first iteration, run following code


EM_keto_graph_des_@mention_embd.ipynb

With Input files: processed_keto_data_des_net.pickle, user_des_net_keto.embeddings, weakly_all_des_mergetweets_keto_gt_14k.csv, Output files: weakly_all_des_mergetweets_keto_gt_14k_em1_des_net.csv

Then run following code:


EM_create_graph_des_@mention_weak_keto_14k.ipynb

With Input files: weakly_all_des_mergetweets_keto_gt_14k_em1_des_net.csv, mention_keto.pickle, Output files: data_keto_des_net_em1.pickle, keto_graph_des_net_em1.adjlist, keto_graph_des_net_em1.mapping

For second iteration, run following code


EM_keto_graph_des_@mention_embd.ipynb

With Input files: data_keto_des_net_em1.pickle, processed_keto_data_des_net.pickle, user_des_net_keto_em1.embeddings, weakly_all_des_mergetweets_keto_gt_14k_em1_des_net.csv, Output files: b_model_des_net_keto_em1.m, predicted_utypes_des_net_keto_em1.pickle, user_des_net_keto_em1.embeddings, weakly_all_des_mergetweets_keto_gt_14k_em2_des_net.csv

Then run following code:


EM_create_graph_des_@mention_weak_keto_14k.ipynb

With Input files: weakly_all_des_mergetweets_keto_gt_14k_em2_des_net.csv, mention_yoga.pickle, Output files: data_keto_des_net_em2.pickle, keto_graph_des_net_em2.adjlist, keto_graph_des_net_em2.mapping

For third iteration, run following code


EM_keto_graph_des_@mention_embd.ipynb

With Input files: data_keto_des_net_em2.pickle, processed_keto_data_des_net.pickle, user_des_net_keto_em2.embeddings, weakly_all_des_mergetweets_keto_gt_14k_em2_des_net.csv, Output files: b_model_des_net_keto_em2.m, predicted_utypes_des_net_keto_em2.pickle, user_des_net_keto_em2.embeddings, weakly_all_des_mergetweets_keto_gt_14k_em3_des_net.csv

Then run following code:


EM_create_graph_des_@mention_weak_keto_14k.ipynb

With Input files: weakly_all_des_mergetweets_keto_gt_14k_em3_des_net.csv, mention_yoga.pickle, Output files: data_keto_des_net_em3.pickle, keto_graph_des_net_em3.adjlist, keto_graph_des_net_em3.mapping

For fourth iteration, run following code


EM_keto_graph_des_@mention_embd.ipynb

With Input files: data_keto_des_net_em3.pickle, processed_keto_data_des_net.pickle, user_des_net_keto_em3.embeddings, weakly_all_des_mergetweets_keto_gt_14k_em3_des_net.csv, Output files: b_model_des_net_keto_em3.m, predicted_utypes_des_net_keto_em3.pickle, user_des_net_keto_em3.embeddings, weakly_all_des_mergetweets_keto_gt_14k_em4_des_net.csv

  1. Supervised Baseline for keto:

    4.1) For LSTM_Glove, run

    baseline_glove_lstm_keto_14k_weak.ipynb
    
    

    4.2) For fine-tuned BERT, run

    baseline_BERT_finetuned_tweet_keto_14k_weak.ipynb
    
    

    For 3-fold cross-validation, randomly split input files are: For training: baseline_train_keto_weak_cv1.csv, baseline_train_keto_weak_cv2.csv, baseline_train_keto_weak_cv3.csv For validation: baseline_val_keto_weak_cv1.csv, baseline_val_keto_weak_cv2.csv, baseline_val_keto_weak_cv3.csv For testing: baseline_test_keto_weak_cv1.csv, baseline_test_keto_weak_cv2.csv, baseline_test_keto_weak_cv3.csv

  2. Visualize embeddings after label propagation: (input files: user_des_net_keto.embeddings and predicted_keto_14k_lp.csv )

    
    visualize_graph_learning_embeddings_keto.ipynb
    
    
    
  3. Visualize embeddings after EM: (input files: user_des_net_keto_em3.embeddings and predicted_keto_14k.csv )

    
    visualize_embeddings_EM3_keto.ipynb
    
    
    

Citation:

If you find the paper useful in your work, please cite:

@inproceedings{islam2022twitter,
  title={Twitter User Representation using Weakly Supervised Graph Embedding},
  author={Islam, Tunazzina and Goldwasser, Dan},
  booktitle={Proceedings of the International AAAI Conference on Web and Social Media},
  volume={16},
  pages={358--369},
  year={2022}
}

weaksupervised-graphemb-userrep's People

Contributors

tunazislam avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.