hazyresearch / hyperbolics Goto Github PK

Hyperbolic Embeddings

Julia 31.22% Python 57.69% Shell 0.04% Jupyter Notebook 10.56% Dockerfile 0.50%

hyperbolics's Introduction

hyperbolics

Hyperbolic embedding implementations of Representation Tradeoffs for Hyperbolic Embeddings + product embedding implementations of Learning Mixed-Curvature Representations in Product Spaces

Setup

We use Docker to set up the environment for our code. See Docker/README.md for installation and launch instructions.

In this README, all instructions are assumed to be run inside the Docker container. All paths are relative to the /hyperbolics directory, and all commands are expected to be run from this directory.

Usage

The following programs and scripts expect the input graphs to exist in the /data/edges folder, e.g. /data/edges/phylo_tree.edges. All graphs that we report results on have been prepared and saved here.

Combinatorial construction

julia combinatorial/comb.jl --help to see options. Example usage (for better results on this dataset, raise the precision):

julia combinatorial/comb.jl -d data/edges/phylo_tree.edges -m phylo_tree.r10.emb -e 1.0 -p 64 -r 10 -a -s

Pytorch optimizer

python pytorch/pytorch_hyperbolic.py learn --help to see options. Optimizer requires torch >=0.4.1. Example usage:

python pytorch/pytorch_hyperbolic.py learn data/edges/phylo_tree.edges --batch-size 64 --dim 10 -l 5.0 --epochs 100 --checkpoint-freq 10 --subsample 16

Products of hyperbolic spaces with Euclidean and spherical spaces are also supported. E.g. adding flags -euc 1 -edim 20 -sph 2 -sdim 10 embeds into a product of Euclidean space of dimension 20 with two copies of spherical space of dimension 10.

Experiment scripts

scripts/run_exps.py runs a full set of experiments for a list of datasets. Example usage (note: the default run settings take a long time to finish):
```
python scripts/run_exps.py phylo -d phylo_tree --epochs 20
```
Currently, it executes the following experiments:
1. The combinatorial construction with fixed precision in varying dimensions
2. The combinatorial construction in dimension 2 (Sarkar's algorithm), with very high precision
3. Pytorch optimizer in varying dimensions, random initialization
4. Pytorch optimizer in varying dimensions, using the embedding produced by the combinatorial construction as initialization
The combinatorial constructor combinatorial/comb.jl has an option for reporting the MAP and distortion statistics. However, this can be slow on larger datasets such as wordnet
- scripts/comb_stats.py provides an alternate method for computing stats that can leverage multiprocessing Example usage: python scripts/comb_stats.py phylo_tree -e 1.0 -r 2 -p 1024 -q 4 to run on 4 cores

hyperbolics's People

Contributors

Stargazers

Watchers

Forkers

ml-lab sragain tonyle9 jvonderwell ahug ahoyosid whiplash01 hulalazz gongqingyi-github davidmcdonald1993 bellamkondaprakash qiuyumo208 emaadmanzoor tatsuyashirakawa vishalbelsare hzhang57 macio232 andbloch lyndonckz jarnov2 yu-rp vishaldeyiiest ericzlou codeaudit raffybekhit arijitthegame codevariations ming1993li zhangjindou subercui freekang gluon31 behavioral-data emilien-p dragomirradev weibinzhao davecerr tommy-xu brunoscaglione dddragons jungel2star suveshbaskar varunia bluelancer zhengchaow ndlrf-rnd yashkumaratri yangshengaa zwytop sumorgit rakeshraghavendrad armandoamaris landryraccoon boualililila lwj5 habibsifat shism2 nswood rohans0509 liyubov mozi-2060 nborwankar

hyperbolics's Issues

Embeddings extraction from PyTorch model.

Thank you for this awesome project!

I am trying to use your PyTorch implementation in order to train a model and extract the embedding matrix from it. As a result, I am getting values that are not between 0 and 1, as it is supposed to be for hyperbolic embeddings, but I am not sure whether it is normal or I need to do some post-processing in order to get the right embeddings.

The way I'm trying to extract the matrix from the model is the following:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = torch.load(model_load_file).to(device)
embedding_flatten = model.embedding().data
embedding_dim = (embedding_flatten.shape[0] // model.n)
num_nodes = model.n
embedding_matrix = embedding_flatten.view(num_nodes, embedding_dim)

Any help would be much appreciated. Thanks in advance!

--use-svrg doesn't work

So I tried using the --use-svrg command and it didn't work.

First it told me I could Hyperbolic_Parameter from hyperbolic_parameter. I changed this so that it imports HyperbloidParameter instead. So that line went fine until we got to line 94 of svrg.py file where I get the following error


File "/hyperbolics-master/pytorch/svrg.py", line 94, in step
    for i, (data, target) in enumerate(self.data_loader):
ValueError: too many values to unpack (expected 2)

I was wondering if there was a quick fix for this.

wrong Embedding dimensions

when dim was set to 2, actually the dims of embeddings is 3, and --visualize doesn't work. But set the dim to 1, and the dimension become 2, Visualisation can work, but the graph was totally wrong.

I just use the example command line in README file. What wrong with the code?

h-MDS Dimensions

I'm trying the code in the hMDS folder, but I encountered some trouble in the proposed example:
julia hMDS/hmds-simple.jl -d data/edges/phylo_tree.edges -r 100 -t 0.1 -m savetest.csv

All the results are equals:
Output:
h-MDS. Info:
Data set = data/edges/phylo_tree.edges
Dimensions = 100
Save embedding to savetest
Scaling = 0.1

Number of nodes is 344
Time elapsed = 0.17120695114135742
Doing h-MDS...
elapsed time: 2.771179635 seconds
elapsed time: 4.263825574 seconds
Building recovered graph...
elapsed time: 0.297506376 seconds
Getting metrics...

Distortion avg, max, bad = 0.032613823778721615, 4.335859234069171, 762.0
MAP = 0.6170225406053894

but watching at savetest.csv I noticed that the dimensions are less than 100.
In particular the savetest.csv file has 345 columns (the 344 elements of phylo_tree.edges and the scaling factor) and 81 rows (and consequently dimensions).

Why am I receiving less than the 100 dimensions required?

I've tried with few dimension and I got the following results:
3 required dimensions -> 2 output rows
4 required dimensions -> 3 output rows
10 required dimensions -> 7 output rows

Thank you in advance.

Julia Version Problems

Hello,

What version of Julia do I need to run comb.jl? It seems like a lot of the commands are now outdated.

Thanks

Output embeddings from pytorch

How can I get the embeddings from the model as output?

keyword incorrect

Traceback (most recent call last):
File "pytorch/pytorch_hyperbolic.py", line 393, in
_parser.dispatch()
File "/opt/conda/lib/python3.6/site-packages/argh/helpers.py", line 55, in dispatch
return dispatch(self, *args, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/argh/dispatching.py", line 174, in dispatch
for line in lines:
File "/opt/conda/lib/python3.6/site-packages/argh/dispatching.py", line 277, in _execute_command
for line in result:
File "/opt/conda/lib/python3.6/site-packages/argh/dispatching.py", line 260, in _call
result = function(*positional, **keywords)
File "pytorch/pytorch_hyperbolic.py", line 302, in learn
m = cudaify( Hyperbolic_Emb(G.order(), rank, initialize=m_init, learn_scale=learn_scale, exponential_rescale=exponential_rescale) )
File "/root/hyperbolics/pytorch/hyperbolic_models.py", line 84, in init
self.w = Hyperbolic_Parameter(x)
File "/root/hyperbolics/pytorch/hyperbolic_parameter.py", line 12, in new
ret = super(nn.Parameter, cls).new(cls, data, requires_grad=requires_grad)
TypeError: new() received an invalid combination of arguments - got (Tensor, requires_grad=bool), but expected one of:

(torch.device device)
(torch.Storage storage)
(Tensor other)
(tuple of ints size, torch.device device)
didn't match because some of the keywords were incorrect: requires_grad
(object data, torch.device device)
didn't match because some of the keywords were incorrect: requires_grad

Example of embedding non-tree graph

Do you have any examples of running your algorithm on non-tree graphs? From the paper, it sounds like sometimes you use a Steiner tree and sometimes a BFS tree -- are you able to give some details on how you got the reported numbers for the diseases, Gr-QC and wordnet datasets?

Thanks!

Parameters to reproduce pytorch results from paper

In Tables 3 and 4 in the paper, you report 0.237 distortion and 0.951 MAP for the phylo tree using the pytorch implementation. Are you able to share the parameters you used to get those results?

Running the parameters from the README:

python pytorch/pytorch_hyperbolic.py learn data/edges/phylo_tree.edges --batch-size 64 -r 10 -l 5.0 --epochs 100 --checkpoint-freq 10

yields

...
2018-12-04T22:47:47 99 loss=94.07083774585367
2018-12-04T22:47:47 final loss=94.07083774585367
2018-12-04T22:47:48 Compare matrices built
2018-12-04T22:47:49 Distortion avg=0.6756635903819984 wc=271.74577409770677 me=6.7022873965209975 mc=40.54522851985782 nan_elements=0.0
2018-12-04T22:47:49 MAP = 0.3775067551581052
2018-12-04T22:47:49 data_scale=1.0 scale=0.0

(I removed the -w phylo_tree.r10.emb because I didn't have the warmstart file, so maybe that's changing the results.)

Thanks!
~ Ben

Multiple workers in `pytorch_hyperbolic`

I noticed that you can get substantial training speedups by adding eg. num_workers=8 here:
https://github.com/HazyResearch/hyperbolics/blob/master/pytorch/pytorch_hyperbolic.py#L279

I haven't had a chance to test whether this is valid on the other 3 possible DataLoader instantiations.

Combinatorial example error: delt_idx not defined

I'm trying the example proposed in the README, but I'm having some trouble with the comb.jl file.

While running the command:
julia combinatorial/comb.jl -d data/edges/phylo_tree.edges -m phylo_tree.r10.emb -e 1.0 -p 64 -r 10 -a -s

with the dimension parameter greater than 2, I have the following error:
ERROR: LoadError: UndefVarError: delt_idx not defined

which is originated in the rdim.jl file, at the 249th line, in function place_on_sphere().
It seems that in such function the variable "is forgotten" when the points_idx parameter is increased.
When the dimension is set to 2, the embedding is successfully created with all the measurements completed.
I'm running Julia 0.7.

Thank You in advance.

h-MDS from distance matrix

Hi, Am I correct in understanding that hmds-simple.jl using the -k argument implements the h-MDS algorithm 2 of section 4 of the paper?

I'm running into some difficulty using this function. I've tried running the command julia hMDS/hmds-simple.jl -k data/test.pickle -r 10 -t 0.1 -m savetest.csvpip both with my setup and using the docker, but I get errors in both. The test.pickle file is a pickled torch 2D tensor, but I also tried running the same command with a space-separated distance matrix and a pickled numpy file (in case I had misunderstood).

In my setup I get the error:

ERROR: LoadError: PyError ($(Expr(:escape, :(ccall(#= /home/jacob/.julia/packages/PyCall/RQjD7/src/pyfncall.jl:44 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'RuntimeError'>
RuntimeError('Overflow when unpacking long',)
  File "/home/jacob/hyperbolics/utils/load_dist.py", line 44, in load_emb_dm
    m = torch.load(file).to(device)
  File "/home/jacob/miniconda3/lib/python3.6/site-packages/torch/serialization.py", line 368, in load
    return _load(f, map_location, pickle_module)
  File "/home/jacob/miniconda3/lib/python3.6/site-packages/torch/serialization.py", line 533, in _load
    if magic_number != MAGIC_NUMBER:

In docker I get the error:

ERROR: LoadError: ArgumentError: Module LinearAlgebra not found in current path.
Run `Pkg.add("LinearAlgebra")` to install the LinearAlgebra package.
Stacktrace:
 [1] _require(::Symbol) at ./loading.jl:435
 [2] require(::Symbol) at ./loading.jl:405
 [3] include_from_node1(::String) at ./loading.jl:576
 [4] include(::String) at ./sysimg.jl:14
 [5] process_options(::Base.JLOptions) at ./client.jl:305
 [6] _start() at ./client.jl:371
while loading /root/hyperbolics/hMDS/hmds-simple.jl, in expression starting on line 4

Thanks for the help!