rdevon / cortex Goto Github PK

View Code? Open in Web Editor NEW

93.0 93.0 25.0 9.04 MB

A machine learning library for PyTorch

License: BSD 3-Clause "New" or "Revised" License

Python 99.16% Shell 0.34% Makefile 0.21% Batchfile 0.28%

cortex's People

Contributors

Stargazers

Watchers

cortex's Issues

Visdom - Append point to graph instead of recreating it for every new datapoints.

Visdom allows to append datapoint to a figure (see: append at https://github.com/facebookresearch/visdom#visline). I think it would be more efficient to append rather than creating from scratch every time.

One edge case I can see is that when you rerun an experiment using the same name as one experiment previously used (i.e. you already have a visualization with a given name and you rerun an experiment using the same name), the data point could be appended to a graph computed from a previous experiment. This is not a desired effect.

It looks like it is possible to delete an environment from the python interface. But we would have to be very careful not to delete an environment that a user still use.

Reloading with old arguments

-r <PATH> isn't loading the model arguments, so they have to be all reentered for reloading to work properly. This should be fixed ASAP and tested.

Dangerous advice: it looks that by default the dataset is at /data/milatmp1

If this is true, Fred is not going to be happy when a lot of people will be hammering NFS.

Given that an experiment is run by a magic script, it is simple to copy data to /Tmp.

Rename the config file

Right now, it is ~/config.yml. This name is too generic. It could conflict with other software. Also, user won't easily know for which software this file is.

I propose ~/.cortex.yml instead. It make it clear to the user this is a file related to cortex. I'll also make this an "invisible" file as all the configuration file. This is 2 standards on Linux.

Data handler shouldn't reference specific input names

cortex/cortex/_lib/data/data_handler.py

Lines 132 to 133 in 4a529d2

    
           if item == 'inputs': 
        
               item = 'images'

KeyError: "Error with dimension query inputs. Available: {'data': {'images': torch.Size([3, 32, 32]), 'targets': 10, 'N_train': 50000, 'N_test': 10000}, 'Z': 64, 'E': 1}"

Hi,

Thanks very much for your great effort for this project.
I've installed cortex successfully but encounter an Error when I run "cortex GAN --d.source CIFAR10 --d.copy_to_local" for testing.

Following is the Error information.
KeyError: "Error with dimension query inputs. Available: {'data': {'images': torch.Size([3, 32, 32]), 'targets': 10, 'N_train': 50000, 'N_test': 10000}, 'Z': 64, 'E': 1}"

Do you have any clue for this error?
Thanks a lot for advance.

Changing parsing system for cortex

In order to make it possible to use models outside of cortex, the parser needs to cover 3 cases:

model is within cortex, e.g. cortex gan or cortex core.gan
model is available via the PYTHONPATH, e.g., cortex my_models.my_gan
model is available via a file, e.g., cortex my_file.py or cortex my_file.py:my_model

Optional:
an intermediate step that allows users to just run their model as in python my_file.py and inside you have a call to cortex.run()

As a researcher, I want an autoencoder

inconsistent parameters?

cortex/cortex/built_ins/models/mine.py

Line 94 in 2837b22

self.mine.routine(generated, Z, generated, Z_P)

Hi, thank you for this great repository.
I am wondering if line 94 and line 111 are correct? shouldn't it be self.mine.routine(generated, generated, Z, Z_P)? because the routine accepts arguments in the order of "X_real, X_fake, Z_real, Z_fake".

Besides, line 123 "self.mine.visualize(generated, generated, Z, Z_N, targets)" seems correct.

Readme is outdated?

I wanted to check out cortex again, but I cannot run it:

$ python main.py classifier -S MNIST -n test_classifier
Import of architecture (module) tempate failed (expected an indented block (tempate.py, line 36))
Import of architecture (module) discrete_gan failed (attempted relative import with no known parent package)
Import of architecture (module) copulas failed (attempted relative import with no known parent package)
Import of architecture (module) embedding_gan failed (attempted relative import with no known parent package)
Import of architecture (module) utils failed (Architecture (module) utils lacks `DEFAULT_CONFIG` skipping)
usage: main.py [-h] [-o OUT_PATH] [-n NAME] [-r RELOAD]
               [-R RELOADS [RELOADS ...]] [-M LOAD_MODELS] [-C] [-m META]
               [-c CONFIG_FILE] [-k] [-v VERBOSITY] [-d DEVICE]
               {core.gan,core.featnet,core.classifier,core.adversarial_clustering,core.featnet_finetuning,core.ali,core.nat,core.adversarial_autoencoder,core.vral,core.toyset_clustering,core.minet,core.vae}
               ...
main.py: error: argument arch: invalid choice: 'classifier' (choose from 'core.gan', 'core.featnet', 'core.classifier', 'core.adversarial_clustering', 'core.featnet_finetuning', 'core.ali', 'core.nat', 'core.adversarial_autoencoder', 'core.vral', 'core.toyset_clustering', 'core.minet', 'core.vae')

Changing to core.classifier doesn't help

$ python main.py core.classifier -S MNIST -n test_classifier
Import of architecture (module) tempate failed (expected an indented block (tempate.py, line 36))
Import of architecture (module) discrete_gan failed (attempted relative import with no known parent package)
Import of architecture (module) copulas failed (attempted relative import with no known parent package)
Import of architecture (module) embedding_gan failed (attempted relative import with no known parent package)
Import of architecture (module) utils failed (Architecture (module) utils lacks `DEFAULT_CONFIG` skipping)
usage: main.py [-h] [-o OUT_PATH] [-n NAME] [-r RELOAD]
               [-R RELOADS [RELOADS ...]] [-M LOAD_MODELS] [-C] [-m META]
               [-c CONFIG_FILE] [-k] [-v VERBOSITY] [-d DEVICE]
               {core.gan,core.featnet,core.classifier,core.adversarial_clustering,core.featnet_finetuning,core.ali,core.nat,core.adversarial_autoencoder,core.vral,core.toyset_clustering,core.minet,core.vae}
               ...
main.py: error: unrecognized arguments: -S MNIST -n test_classifier

Set up pytest

Someone should add a basic testing framework (using pytest) on which all future tests will be based.

Add an option to add noise to the input of the auto encoder while training.

We should have an option to add noise to corrupt the input of the auto encoder. This is also known as a denoising auto encoder.

`cortex.run()` function

Cortex needs a function that can be used in a script so that the model can be a file rather than a model name registered in cortex.

For example: cortex <path_to_file>.py -<arguments> would have the same function as if the model was registered.

A version problem about DIM and this

Hello Dr. Devon Hjelm,
When I want to run DIM(https://github.com/rdevon/DIM) on cortex, it needs a "0.13a0" version in cortex (the 10th line in https://github.com/rdevon/DIM/blob/master/setup.py), but I can not find it, and DIM seems have lots of BUGs on cortex's master branch.
Could you tell me which branch of cortex can run DIM?

Regards.

Cortex crashes with MNIST

I tried a bunch of different models. It seems that the problem is with data handler.

$ cortex VAE --d.source MNIST
/u/serdyuk/.conda/envs/mpy36/lib/python3.6/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
  warnings.warn(warning.format(ret))
[INFO:cortex]:Setting logging to INFO
EXPERIMENT---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0
[INFO:cortex.exp]:Using CPU
INFO:tornado.access:200 POST /win_exists (127.0.0.1) 0.57ms
[INFO:cortex.exp]:Creating out path `/data/milatmp1/serdyuk/cortex_outs/VAE`
[INFO:cortex.exp]:Setting out path to `/data/milatmp1/serdyuk/cortex_outs/VAE`
[INFO:cortex.exp]:Logging to `/data/milatmp1/serdyuk/cortex_outs/VAE/out.log`
[INFO:cortex]:Saving logs to /data/milatmp1/serdyuk/cortex_outs/VAE/out.log
[INFO:cortex.init]:Ultimate data arguments:
{'batch_size': {'test': 640, 'train': 64},
 'copy_to_local': False,
 'data_args': {},
 'inputs': {'inputs': 'images'},
 'n_workers': 4,
 'shuffle': True,
 'skip_last_batch': False,
 'source': 'MNIST'}
[INFO:cortex.init]:Ultimate model arguments:
{'beta_kld': 1.0,
 'decoder_args': {'output_nonlinearity': 'tanh'},
 'decoder_crit': <function mse_loss at 0x7f54c54aa510>,
 'decoder_type': 'convnet',
 'dim_encoder_out': 1024,
 'dim_out': None,
 'dim_z': 64,
 'encoder_args': {'fully_connected_layers': 1024},
 'encoder_type': 'convnet',
 'vae_criterion': <function mse_loss at 0x7f54c54aa510>}
[INFO:cortex.init]:Ultimate optimizer arguments:
{'clipping': {},
 'learning_rate': 0.0001,
 'model_optimizer_options': {},
 'optimizer': 'Adam',
 'optimizer_options': {},
 'weight_decay': {}}
[INFO:cortex.init]:Ultimate train arguments:
{'archive_every': 10,
 'epochs': 500,
 'eval_during_train': True,
 'eval_only': False,
 'quit_on_bad_values': True,
 'save_on_best': 'losses.classifier',
 'save_on_highest': None,
 'save_on_lowest': 'losses.vae',
 'test_mode': 'test',
 'train_mode': 'train'}
DATA---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/u/serdyuk/.conda/envs/mpy36/bin/cortex", line 11, in <module>
    load_entry_point('cortex', 'console_scripts', 'cortex')()
  File "/data/milatmp1/serdyuk/projects/cortex/cortex/main.py", line 37, in run
    data.setup(**exp.ARGS['data'])
  File "/data/milatmp1/serdyuk/projects/cortex/cortex/_lib/data/__init__.py", line 56, in setup
    plugin.handle(source, copy_to_local=copy_to_local, **data_args)
  File "/data/milatmp1/serdyuk/projects/cortex/cortex/built_ins/datasets/torchvision_datasets.py", line 157, in handle
    dim_x, dim_y = train_set[0][0].size()
ValueError: too many values to unpack (expected 2)

Use `.item()` to collect results and losses-as-results only at an epoch's end

Using .item() to store results in routine call forces GPU to synchronize in order to have access at a lazy-evaluated Python number. This is suboptimal as kernel scheduling (CPU load) and kernel execution (GPU load) should be as parallel pipelines as possible, resulting in delays in the opposite case.

On the other hand, we need _all_epoch_results at the end of an epoch for visualization purposes.
As @obilaniu has noted elsewhere, it's better to use .detach() to store results within a training step, and
then let's process results+losses-as-results internally to get the Python/Numpy values, at the moment
they are actually needed - that's the end of an epoch.

Discrete BGAN

Discrete BGAN needs to be ported over and added to built-ins.

Setup fails

Hi,
Thank you for your great effort to create this repository.

I cannot setup cortex. I will encounter the following error:

ERROR: Command errored out with exit status 1:
command: /home/sina/.conda/envs/pytorch/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-0y42isk4/setup.py'"'"'; file='"'"'/tmp/pip-req-build-0y42isk4/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info
cwd: /tmp/pip-req-build-0y42isk4/
Complete output (24 lines):
Traceback (most recent call last):
File "/home/sina/.conda/envs/pytorch/lib/python3.6/site-packages/setuptools/dist.py", line 220, in assert_string_list
assert isinstance(value, (list, tuple))
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-req-build-0y42isk4/setup.py", line 28, in <module>
    zip_safe=False)
  File "/home/sina/.conda/envs/pytorch/lib/python3.6/site-packages/setuptools/__init__.py", line 144, in setup
    _install_setup_requires(attrs)
  File "/home/sina/.conda/envs/pytorch/lib/python3.6/site-packages/setuptools/__init__.py", line 133, in _install_setup_requires
    (k, v) for k, v in attrs.items()
  File "/home/sina/.conda/envs/pytorch/lib/python3.6/site-packages/setuptools/dist.py", line 446, in __init__
    k: v for k, v in attrs.items()
  File "/home/sina/.conda/envs/pytorch/lib/python3.6/distutils/dist.py", line 281, in __init__
    self.finalize_options()
  File "/home/sina/.conda/envs/pytorch/lib/python3.6/site-packages/setuptools/dist.py", line 734, in finalize_options
    ep.load()(self, ep.name, value)
  File "/home/sina/.conda/envs/pytorch/lib/python3.6/site-packages/setuptools/dist.py", line 225, in assert_string_list
    "%r must be a list of strings (got %r)" % (attr, value)
distutils.errors.DistutilsSetupError: 'dependency_links' must be a list of strings (got {'git+https://github.com/facebookresearch/visdom.git'})
----------------------------------------

ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Any hint?
Thank you

Argparsing needs to be tested thoroughly

The hyperparameters of a model are set in many ways in cortex:

default arguments in build, routine, visualize, etc
through a defaults static attribute for a model
through the command line.

All of these need to be tested. Notably, we need to make sure that:

the defaults for the build, routine, visualize, etc are the default and these show up in the command line.
the above can be overridden by the defaults static attribute.
The command line can override the above
Nested arguments (within dicts) are updated corrected. In other words, if a default argument looks like this:
classifier_args = dict(batch_size=False)

Updating this argument with classifier_args=dict(dim_h=100) yields
classifier_args = dict(batch_size=False, dim_h=100)

So far everything looks like it's working, but it needs to be tested.

Add CUB dataset

It would be good to have this dataset

Adversarial autoencoder

Adversarial autoencoder needs to be ported over and added to built-ins.

toy datasets

Toy datasets needs to be ported over, with a plugin added and registered.

Different between MINE measures?

Hi,
I noticed that in your MINE implement contains servals measurement, copied from the code:
{GAN, JSD, KL, RKL (reverse KL), X2 (Chi^2), H2 (squared Hellinger), DV (Donsker Varahdan KL), W1 (IPM)}
I know the DV representation as is mentioned in the paper of MINE, but where can I know the meaning the other?

Testing of model builds

Model build needs to be added to testing. Starting with classifier.py, with some variable arguments, the classifier needs to be tested for having the right parameter sizes as well as output given input shape. Some dummy data can be used (it can just be zeros tensor). It would be good to also test if dropout, batch norm, spectral norm layers, etc, are appearing as they should.

Dataloader worker not releasing on interrupts

So right now, there is an init_fn being passed to the DataLoader to avoid a terminal flood when you do a keyboard interrupt. Normally, pytorch doesn't handle this well, but I fit in a hack to treat sigint as sigign. However, there is a side effect, that is the workers get terminated later, not when you sigint:

xception ignored in: <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f9db1d30fd0>>
Traceback (most recent call last):
  File "/home/devon/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 349, in __del__
    self._shutdown_workers()
  File "/home/devon/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers
    self.worker_result_queue.get()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 337, in get
    return _ForkingPickler.loads(res)
  File "/home/devon/.local/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
    fd = df.detach()
  File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 493, in Client
    answer_challenge(c, authkey)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError:

TypeError: load() missing 1 required positional argument: 'Loader'

Hi Dr.RDevon
When I run the code cortex GAN --d.source CIFAR10 --d.copy_to_local after I setup the cortex, it still throw a error as:
[INFO:cortex]: Setting logging to INFO
Traceback (most recent call last):
File "/home/mtang4/anaconda3/envs/IARPA2/bin/cortex", line 8, in
sys.exit(run())
File "/home/mtang4/anaconda3/envs/IARPA2/lib/python3.6/site-packages/cortex/main.py", line 32, in run
config.set_config()
File "/home/mtang4/anaconda3/envs/IARPA2/lib/python3.6/site-packages/cortex/_lib/config.py", line 79, in set_config
d = yaml.load(f)
TypeError: load() missing 1 required positional argument: 'Loader'
Can you help me?

Include `seed` to argument parser

Hello there!

I would like to make a PR which would include the following things:

Add an cmd line argument for a seed.
Create a setter functions which takes a seed as an argument and reseeds relevant PRGs with it.
In exp.save save a "current" seed along with other stuff.
Modify "reload" routines to reseed properly.

Do you agree with these modifications? Where would they be most appropriate to include?

Also, what is the current status regarding the resumability of a specific named experiment?
Some optimizer objects are not stateless, are they also saved? and if they are not, should they?

Thank you!

Visdom server detection

Some feedback for the user would help if the visdom server isn't running. What should be added is a simple script which checks if a server is running with the visualization object is created. If it's not, allow the user to opt to start a server at the location specified in the config, try another location, or skip visualization.

Keyboard interrupt not functioning properly

Quits the model no matter what

LossHandler does not handle two different sets on the same network key

I have come to realize by the implementation in built_ins/gan.py that two separate calls are done in order to update the discriminator. The first optimizer update utilizes the gan loss and the second the gradient penalty.

This is because LossHandle will overwrite (s1) any value for a specific network key, which is an inconvenient behaviour. I can see in s2 and s3 that there was an intention to implement a convenient behaviour, but it seems that it has not been done.

I propose the following, tell me what you think:

self.losses.network = a will overwrite/set the loss for the network.
self.losses.network += a would add to the already existing loss, if it exists, and if it doesn't it sets it to a.
Remove any options for method or add_value=True as it would only introduce confusion and incoherencies to the API for those creating ModelPlugins.

Mutual Information Neural Estimation

Hello Dr. Devon Hjelm,
I was searching for the implementation of Mutual Information Neural Estimation and i came accross your code. But i am not sure if its implemented!! Could you please help me on how to use it ?

Regards,
Sankar Mukherjee

Cortex installation fails with python 3.6 but successful with 3.5

As per the documentation cortex is supported for both python versions 3.5 and 3.6 but the installation succeeds only for 3.5 and it fails for 3.6. Steps to reproduce the issue:

$ conda create --yes --name "myenv" python=3.6
$ conda activate myenv
$ pip install visdom
$ git clone https://github.com/rdevon/cortex.git
$ cd cortex
$ pip install .

Expected behavior: successful installation.
Actual behavior: installation fails with following error:

Processing <path>
    ERROR: Command errored out with exit status 1:
     command: <path>/miniconda3/envs/myenv/bin/python3.6 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-fdz1bz19/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-fdz1bz19/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-4spmvwv4
         cwd: /tmp/pip-req-build-fdz1bz19/
    Complete output (1 lines):
    error in cortex setup command: 'dependency_links' must be a list of strings (got {'git+https://github.com/facebookresearch/visdom.git'})
    ----------------------------------------
WARNING: Discarding file:///<path>/cortex. Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

reloading models

Reloading is highly sensitive to the environment that the reload is performed in. This is problematic if people want to reload from their ipython console or Jupyter notebook.

rdevon / cortex Goto Github PK

cortex's People

Contributors

Stargazers

Watchers

Forkers

cortex's Issues

Recommend Projects

Recommend Topics

Recommend Org