Coder Social home page Coder Social logo

cs790-topoae_classification's Introduction

CS790-TopoAE_Classification

The current biggest, most separable, pieces we can work on is recreating their results (A-1) and preprocessing our graph datasets (B-1-a).

Core - does a unsupervised pretraining to reconstruct graph structured data while preserving the topology of data within the latent space enhance the ability to classify graph structured data?

A) Test with image data first

  1. Recreate results from Topological Autoencoders on MNIST/CIFAR10, I dont think we need to worry about spheres dataset a) we can use other models with/without the topology loss they used as baselines b) should be able to train with navigating to the correct directory and then using, follow their installation instructions for requirements.txt

When running, output will be stored in the directory denoted by file_storage, please make sure this is occuring properly. Model parameters will be stored here which we will use for classification.

 python -m exp.train_model with experiments/train_model/best_runs/MNIST/TopoRegEdgeSymmetric.json device='cuda' --file_storage=BASEDIR

might need to change device

 python -m exp.train_model with experiments/train_model/best_runs/MNIST/TopoRegEdgeSymmetric.json device='cpu' --file_storage=BASEDIR

[ I Need Help With Training, my laptop is a toaster ]

  1. TopoRegEdgeSymmetric - MNIST

     python -m exp.train_model with experiments/train_model/best_runs/MNIST/TopoRegEdgeSymmetric.json device='cuda' --file_storage=TopoRegEdgeSymmetric_MNIST
    
  2. LinearAE-TopoRegEdgeSymmetric - MNIST

    python -m exp.train_model with experiments/train_model/best_runs/MNIST/LinearAE-TopoRegEdgeSymmetric.json device='cuda' --file_storage=LinearAE-TopoRegEdgeSymmetric_MNIST
    
  3. Vanilla - MNIST

     python -m exp.train_model with experiments/train_model/best_runs/MNIST/Vanilla.json device='cpu' --file_storage=Vanilla_MNIST
    
  4. TopoRegEdgeSymmetric - FashionMNIST

     python -m exp.train_model with experiments/train_model/best_runs/FashionMNIST/TopoRegEdgeSymmetric.json device='cuda' --file_storage=TopoRegEdgeSymmetric_FasionMNIST
    
  5. LinearAE-TopoRegEdgeSymmetric - FashionMNIST

    python -m exp.train_model with experiments/train_model/best_runs/FashionMNIST/LinearAE-TopoRegEdgeSymmetric.json device='cuda' --file_storage=LinearAE-TopoRegEdgeSymmetric_FashionMNIST
    
  6. Vanilla - FashionMNIST

    python -m exp.train_model with experiments/train_model/best_runs/FashionMNIST/Vanilla.json device='cpu' --file_storage=Vanilla_FashionMNIST
    
  7. TopoRegEdgeSymmetric - CIFAR

    python -m exp.train_model with experiments/train_model/best_runs/CIFAR/TopoRegEdgeSymmetric.json device='cuda' --file_storage=TopoRegEdgeSymmetric_CIFAR
    
  8. LinearAE-TopoRegEdgeSymmetric - CIFAR

    python -m exp.train_model with experiments/train_model/best_runs/CIFAR/LinearAE-TopoRegEdgeSymmetric.json device='cuda' --file_storage=LinearAE-TopoRegEdgeSymmetric_CIFAR
    
  9. Vanilla - CIFAR

    python -m exp.train_model with experiments/train_model/best_runs/CIFAR/Vanilla.json device='cpu' --file_storage=Vanilla_CIFAR
    
  10. Create new model with just the enocder and a classification layer after the final layer of the encoder a) potentially project weights from the second to last layer as well, the final layer is usually only two units in their implementation, this might not be very informative for a linear classifier - done - added to the DeepAE and LinearAE models as classification layer

  11. Fine tune on image classification, a) compare to random initialization b) pretraining with normal loss

B) Test with graph structured data

  1. Test topology preserving loss for Autoencoding: COLLAB, IMDB-BINARY, IMDB-MULTI, REDDIT-BINARY, REDDIT-5K, REDDIT-12k

Datasets we (might) need to preprocess: i) COLLAB ii) IMDB-BINARY iii) IMDB-MULTI iv) REDDIT-BINARY v) REDDIT-5K REDDIT-12k

b) Datasets preprocessed: i) None

  1. Fine tune on network classification a) compare to random initialization b) pretraining with normal loss

  2. We could use the same baselines for these as in "EndNote: Feature-based classification of networks"

C) Generate figures

D) Prepare presentation

D) Write paper

Extra if we finish the above with reasonable time remaining

Consistency Regularization / Data Augmentation / Perturbation

In "EndNote", referenced in B-3, they show that a Random Forest with a set of features that were various network statistics calculated for each graph they were able to obtain SOA results in graph classification for the 5 social network datasets listed above.

The features are below, see the supplementary information for better formatting:

Feature label Feature name NumNodes Number of nodes NumEdges Number of edges NumTri Number of triangles ClustCoef Global clustering coefficient DegAssort Degree assortativity coefficient(Newman, 2003) AvgDeg Average degree FracF Fraction of nodes that are female FracMF Fraction of edges that are male-female AvgAgeDif Average age difference (absolute value) over edges FracSameZip Fraction of edges that share the same ZIP code DegPC1-4 Principal components of degree distribution ClusPC1-4 Principal components of clustering distribution

If these features resulted in competitive performance, maybe we can perturb some of these features and then apply some form of consistency regularization. We can explore this more once we finish with the core

Another perturbation we can try is random cropping of graphs - similar to random cropping of images, but here we just select some number of nodes or edges and remove them from the graph, I think we will need to do this somewhat intelligently though if we want this to be informative

Apply to VAE

Also open to other ideas if you have any, and one of my co-mentors worked on the paper mentioned above (Peter Mucha) so we can talk to him if we have any questions

cs790-topoae_classification's People

Contributors

williamstanford avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.