Coder Social home page Coder Social logo

planetoid's Introduction

Planetoid

Introduction

This is an implementation of Planetoid, a graph-based semi-supervised learning method proposed in the following paper:

Revisiting Semi-Supervised Learning with Graph Embeddings. Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov. ICML 2016.

Please cite the above paper if you use the datasets or code in this repo.

Run the demo

We include the Citeseer dataset in the directory data, where the data structures needed are pickled.

To run the transductive version,

python test_trans.py

To run the inductive version,

python test_ind.py

You can refer to test_trans.py and test_ind.py for example usages of our model.

Models

The models are implemented mainly in trans_model.py (transductive) and ind_model.py (inductive), with inheritance from base_model.py. You might refer to the source files for detailed API documentation.

Prepare the data

Transductive learning

The input to the transductive model contains:

  • x, the feature vectors of the training instances,
  • y, the one-hot labels of the training instances,
  • graph, a dict in the format {index: [index_of_neighbor_nodes]}, where the neighbor nodes are organized as a list. The current version only supports binary graphs.

Let L be the number of training instances. The indices in graph from 0 to L - 1 must correspond to the training instances, with the same order as in x.

Inductive learning

The input to the inductive model contains:

  • x, the feature vectors of the labeled training instances,
  • y, the one-hot labels of the labeled training instances,
  • allx, the feature vectors of both labeled and unlabeled training instances (a superset of x),
  • graph, a dict in the format {index: [index_of_neighbor_nodes]}.

Let n be the number of both labeled and unlabeled training instances. These n instances should be indexed from 0 to n - 1 in graph with the same order as in allx.

Preprocessed datasets

Datasets for Citeseet, Cora, and Pubmed are available in the directory data, in a preprocessed format stored as numpy/scipy files.

The dataset for DIEL is available at http://www.cs.cmu.edu/~lbing/data/emnlp-15-diel/emnlp-15-diel.tar.gz. We also provide a much more succinct version of the dataset that only contains necessary files and some (not very well-organized) pre-processing code here at http://cs.cmu.edu/~zhiliny/data/diel_data.tar.gz.

The NELL dataset can be found here at http://www.cs.cmu.edu/~zhiliny/data/nell_data.tar.gz.

In addition to x, y, allx, and graph as described above, the preprocessed datasets also include:

  • tx, the feature vectors of the test instances,
  • ty, the one-hot labels of the test instances,
  • test.index, the indices of test instances in graph, for the inductive setting,
  • ally, the labels for instances in allx.

The indices of test instances in graph for the transductive setting are from #x to #x + #tx - 1, with the same order as in tx.

You can use cPickle.load(open(filename)) to load the numpy/scipy objects x, y, tx, ty, allx, ally, and graph. test.index is stored as a text file.

Hyper-parameter tuning

Refer to test_ind.py and test_trans.py for the definition of different hyper-parameters (passed as arguments). Hyper-parameters are tuned by randomly shuffle the training/test split (i.e., randomly shuffling the indices in x, y, tx, ty, and graph). For the DIEL dataset, we tune the hyper-parameters on one of the ten runs, and then keep the same hyper-parameters for all the ten runs.

planetoid's People

Contributors

kimiyoung avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.