rdevon / cortex Goto Github PK
View Code? Open in Web Editor NEWA machine learning library for PyTorch
License: BSD 3-Clause "New" or "Revised" License
A machine learning library for PyTorch
License: BSD 3-Clause "New" or "Revised" License
Visdom allows to append datapoint to a figure (see: append at https://github.com/facebookresearch/visdom#visline). I think it would be more efficient to append rather than creating from scratch every time.
One edge case I can see is that when you rerun an experiment using the same name as one experiment previously used (i.e. you already have a visualization with a given name and you rerun an experiment using the same name), the data point could be appended to a graph computed from a previous experiment. This is not a desired effect.
It looks like it is possible to delete an environment from the python interface. But we would have to be very careful not to delete an environment that a user still use.
-r <PATH>
isn't loading the model arguments, so they have to be all reentered for reloading to work properly. This should be fixed ASAP and tested.
If this is true, Fred is not going to be happy when a lot of people will be hammering NFS.
Given that an experiment is run by a magic script, it is simple to copy data to /Tmp
.
Right now, it is ~/config.yml. This name is too generic. It could conflict with other software. Also, user won't easily know for which software this file is.
I propose ~/.cortex.yml instead. It make it clear to the user this is a file related to cortex. I'll also make this an "invisible" file as all the configuration file. This is 2 standards on Linux.
cortex/cortex/_lib/data/data_handler.py
Lines 132 to 133 in 4a529d2
Hi,
Thanks very much for your great effort for this project.
I've installed cortex successfully but encounter an Error when I run "cortex GAN --d.source CIFAR10 --d.copy_to_local" for testing.
Following is the Error information.
KeyError: "Error with dimension query inputs. Available: {'data': {'images': torch.Size([3, 32, 32]), 'targets': 10, 'N_train': 50000, 'N_test': 10000}, 'Z': 64, 'E': 1}"
Do you have any clue for this error?
Thanks a lot for advance.
In order to make it possible to use models outside of cortex, the parser needs to cover 3 cases:
cortex gan
or cortex core.gan
cortex my_models.my_gan
cortex my_file.py
or cortex my_file.py:my_model
Optional:
an intermediate step that allows users to just run their model as in python my_file.py
and inside you have a call to cortex.run()
cortex/cortex/built_ins/models/mine.py
Line 94 in 2837b22
Besides, line 123 "self.mine.visualize(generated, generated, Z, Z_N, targets)" seems correct.
I wanted to check out cortex again, but I cannot run it:
$ python main.py classifier -S MNIST -n test_classifier
Import of architecture (module) tempate failed (expected an indented block (tempate.py, line 36))
Import of architecture (module) discrete_gan failed (attempted relative import with no known parent package)
Import of architecture (module) copulas failed (attempted relative import with no known parent package)
Import of architecture (module) embedding_gan failed (attempted relative import with no known parent package)
Import of architecture (module) utils failed (Architecture (module) utils lacks `DEFAULT_CONFIG` skipping)
usage: main.py [-h] [-o OUT_PATH] [-n NAME] [-r RELOAD]
[-R RELOADS [RELOADS ...]] [-M LOAD_MODELS] [-C] [-m META]
[-c CONFIG_FILE] [-k] [-v VERBOSITY] [-d DEVICE]
{core.gan,core.featnet,core.classifier,core.adversarial_clustering,core.featnet_finetuning,core.ali,core.nat,core.adversarial_autoencoder,core.vral,core.toyset_clustering,core.minet,core.vae}
...
main.py: error: argument arch: invalid choice: 'classifier' (choose from 'core.gan', 'core.featnet', 'core.classifier', 'core.adversarial_clustering', 'core.featnet_finetuning', 'core.ali', 'core.nat', 'core.adversarial_autoencoder', 'core.vral', 'core.toyset_clustering', 'core.minet', 'core.vae')
Changing to core.classifier
doesn't help
$ python main.py core.classifier -S MNIST -n test_classifier
Import of architecture (module) tempate failed (expected an indented block (tempate.py, line 36))
Import of architecture (module) discrete_gan failed (attempted relative import with no known parent package)
Import of architecture (module) copulas failed (attempted relative import with no known parent package)
Import of architecture (module) embedding_gan failed (attempted relative import with no known parent package)
Import of architecture (module) utils failed (Architecture (module) utils lacks `DEFAULT_CONFIG` skipping)
usage: main.py [-h] [-o OUT_PATH] [-n NAME] [-r RELOAD]
[-R RELOADS [RELOADS ...]] [-M LOAD_MODELS] [-C] [-m META]
[-c CONFIG_FILE] [-k] [-v VERBOSITY] [-d DEVICE]
{core.gan,core.featnet,core.classifier,core.adversarial_clustering,core.featnet_finetuning,core.ali,core.nat,core.adversarial_autoencoder,core.vral,core.toyset_clustering,core.minet,core.vae}
...
main.py: error: unrecognized arguments: -S MNIST -n test_classifier
Someone should add a basic testing framework (using pytest) on which all future tests will be based.
We should have an option to add noise to corrupt the input of the auto encoder. This is also known as a denoising auto encoder.
Cortex needs a function that can be used in a script so that the model can be a file rather than a model name registered in cortex.
For example: cortex <path_to_file>.py -<arguments>
would have the same function as if the model was registered.
Hello Dr. Devon Hjelm,
When I want to run DIM(https://github.com/rdevon/DIM) on cortex, it needs a "0.13a0" version in cortex (the 10th line in https://github.com/rdevon/DIM/blob/master/setup.py), but I can not find it, and DIM seems have lots of BUGs on cortex's master branch.
Could you tell me which branch of cortex can run DIM?
Regards.
I tried a bunch of different models. It seems that the problem is with data handler.
$ cortex VAE --d.source MNIST
/u/serdyuk/.conda/envs/mpy36/lib/python3.6/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
[INFO:cortex]:Setting logging to INFO
EXPERIMENT---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0
[INFO:cortex.exp]:Using CPU
INFO:tornado.access:200 POST /win_exists (127.0.0.1) 0.57ms
[INFO:cortex.exp]:Creating out path `/data/milatmp1/serdyuk/cortex_outs/VAE`
[INFO:cortex.exp]:Setting out path to `/data/milatmp1/serdyuk/cortex_outs/VAE`
[INFO:cortex.exp]:Logging to `/data/milatmp1/serdyuk/cortex_outs/VAE/out.log`
[INFO:cortex]:Saving logs to /data/milatmp1/serdyuk/cortex_outs/VAE/out.log
[INFO:cortex.init]:Ultimate data arguments:
{'batch_size': {'test': 640, 'train': 64},
'copy_to_local': False,
'data_args': {},
'inputs': {'inputs': 'images'},
'n_workers': 4,
'shuffle': True,
'skip_last_batch': False,
'source': 'MNIST'}
[INFO:cortex.init]:Ultimate model arguments:
{'beta_kld': 1.0,
'decoder_args': {'output_nonlinearity': 'tanh'},
'decoder_crit': <function mse_loss at 0x7f54c54aa510>,
'decoder_type': 'convnet',
'dim_encoder_out': 1024,
'dim_out': None,
'dim_z': 64,
'encoder_args': {'fully_connected_layers': 1024},
'encoder_type': 'convnet',
'vae_criterion': <function mse_loss at 0x7f54c54aa510>}
[INFO:cortex.init]:Ultimate optimizer arguments:
{'clipping': {},
'learning_rate': 0.0001,
'model_optimizer_options': {},
'optimizer': 'Adam',
'optimizer_options': {},
'weight_decay': {}}
[INFO:cortex.init]:Ultimate train arguments:
{'archive_every': 10,
'epochs': 500,
'eval_during_train': True,
'eval_only': False,
'quit_on_bad_values': True,
'save_on_best': 'losses.classifier',
'save_on_highest': None,
'save_on_lowest': 'losses.vae',
'test_mode': 'test',
'train_mode': 'train'}
DATA---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Traceback (most recent call last):
File "/u/serdyuk/.conda/envs/mpy36/bin/cortex", line 11, in <module>
load_entry_point('cortex', 'console_scripts', 'cortex')()
File "/data/milatmp1/serdyuk/projects/cortex/cortex/main.py", line 37, in run
data.setup(**exp.ARGS['data'])
File "/data/milatmp1/serdyuk/projects/cortex/cortex/_lib/data/__init__.py", line 56, in setup
plugin.handle(source, copy_to_local=copy_to_local, **data_args)
File "/data/milatmp1/serdyuk/projects/cortex/cortex/built_ins/datasets/torchvision_datasets.py", line 157, in handle
dim_x, dim_y = train_set[0][0].size()
ValueError: too many values to unpack (expected 2)
Using .item()
to store results in routine
call forces GPU to synchronize in order to have access at a lazy-evaluated Python number. This is suboptimal as kernel scheduling (CPU load) and kernel execution (GPU load) should be as parallel pipelines as possible, resulting in delays in the opposite case.
On the other hand, we need _all_epoch_results
at the end of an epoch for visualization purposes.
As @obilaniu has noted elsewhere, it's better to use .detach()
to store results within a training step, and
then let's process results+losses-as-results internally to get the Python/Numpy values, at the moment
they are actually needed - that's the end of an epoch.
Discrete BGAN needs to be ported over and added to built-ins.
Hi,
Thank you for your great effort to create this repository.
I cannot setup cortex. I will encounter the following error:
ERROR: Command errored out with exit status 1:
command: /home/sina/.conda/envs/pytorch/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-0y42isk4/setup.py'"'"'; file='"'"'/tmp/pip-req-build-0y42isk4/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info
cwd: /tmp/pip-req-build-0y42isk4/
Complete output (24 lines):
Traceback (most recent call last):
File "/home/sina/.conda/envs/pytorch/lib/python3.6/site-packages/setuptools/dist.py", line 220, in assert_string_list
assert isinstance(value, (list, tuple))
AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-req-build-0y42isk4/setup.py", line 28, in <module>
zip_safe=False)
File "/home/sina/.conda/envs/pytorch/lib/python3.6/site-packages/setuptools/__init__.py", line 144, in setup
_install_setup_requires(attrs)
File "/home/sina/.conda/envs/pytorch/lib/python3.6/site-packages/setuptools/__init__.py", line 133, in _install_setup_requires
(k, v) for k, v in attrs.items()
File "/home/sina/.conda/envs/pytorch/lib/python3.6/site-packages/setuptools/dist.py", line 446, in __init__
k: v for k, v in attrs.items()
File "/home/sina/.conda/envs/pytorch/lib/python3.6/distutils/dist.py", line 281, in __init__
self.finalize_options()
File "/home/sina/.conda/envs/pytorch/lib/python3.6/site-packages/setuptools/dist.py", line 734, in finalize_options
ep.load()(self, ep.name, value)
File "/home/sina/.conda/envs/pytorch/lib/python3.6/site-packages/setuptools/dist.py", line 225, in assert_string_list
"%r must be a list of strings (got %r)" % (attr, value)
distutils.errors.DistutilsSetupError: 'dependency_links' must be a list of strings (got {'git+https://github.com/facebookresearch/visdom.git'})
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
Any hint?
Thank you
The hyperparameters of a model are set in many ways in cortex:
All of these need to be tested. Notably, we need to make sure that:
the defaults for the build, routine, visualize, etc are the default and these show up in the command line.
the above can be overridden by the defaults static attribute.
The command line can override the above
Nested arguments (within dicts) are updated corrected. In other words, if a default argument looks like this:
classifier_args = dict(batch_size=False)
Updating this argument with classifier_args=dict(dim_h=100)
yields
classifier_args = dict(batch_size=False, dim_h=100)
So far everything looks like it's working, but it needs to be tested.
It would be good to have this dataset
Adversarial autoencoder needs to be ported over and added to built-ins.
Toy datasets needs to be ported over, with a plugin added and registered.
Hi,
I noticed that in your MINE implement contains servals measurement, copied from the code:
{GAN, JSD, KL, RKL (reverse KL), X2 (Chi^2), H2 (squared Hellinger), DV (Donsker Varahdan KL), W1 (IPM)}
I know the DV representation as is mentioned in the paper of MINE, but where can I know the meaning the other?
Model build needs to be added to testing. Starting with classifier.py, with some variable arguments, the classifier needs to be tested for having the right parameter sizes as well as output given input shape. Some dummy data can be used (it can just be zeros tensor). It would be good to also test if dropout, batch norm, spectral norm layers, etc, are appearing as they should.
So right now, there is an init_fn being passed to the DataLoader to avoid a terminal flood when you do a keyboard interrupt. Normally, pytorch doesn't handle this well, but I fit in a hack to treat sigint as sigign. However, there is a side effect, that is the workers get terminated later, not when you sigint:
xception ignored in: <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f9db1d30fd0>>
Traceback (most recent call last):
File "/home/devon/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 349, in __del__
self._shutdown_workers()
File "/home/devon/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers
self.worker_result_queue.get()
File "/usr/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/home/devon/.local/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/usr/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError:
Hi Dr.RDevon
When I run the code cortex GAN --d.source CIFAR10 --d.copy_to_local after I setup the cortex, it still throw a error as:
[INFO:cortex]: Setting logging to INFO
Traceback (most recent call last):
File "/home/mtang4/anaconda3/envs/IARPA2/bin/cortex", line 8, in
sys.exit(run())
File "/home/mtang4/anaconda3/envs/IARPA2/lib/python3.6/site-packages/cortex/main.py", line 32, in run
config.set_config()
File "/home/mtang4/anaconda3/envs/IARPA2/lib/python3.6/site-packages/cortex/_lib/config.py", line 79, in set_config
d = yaml.load(f)
TypeError: load() missing 1 required positional argument: 'Loader'
Can you help me?
Hello there!
I would like to make a PR which would include the following things:
exp.save
save a "current" seed along with other stuff.Do you agree with these modifications? Where would they be most appropriate to include?
Also, what is the current status regarding the resumability of a specific named experiment?
Some optimizer objects are not stateless, are they also saved? and if they are not, should they?
Thank you!
Some feedback for the user would help if the visdom server isn't running. What should be added is a simple script which checks if a server is running with the visualization object is created. If it's not, allow the user to opt to start a server at the location specified in the config, try another location, or skip visualization.
Quits the model no matter what
I have come to realize by the implementation in built_ins/gan.py
that two separate calls are done in order to update the discriminator. The first optimizer update utilizes the gan loss and the second the gradient penalty.
This is because LossHandle will overwrite (s1) any value for a specific network key, which is an inconvenient behaviour. I can see in s2 and s3 that there was an intention to implement a convenient behaviour, but it seems that it has not been done.
I propose the following, tell me what you think:
self.losses.network = a
will overwrite/set the loss for the network
.self.losses.network += a
would add to the already existing loss, if it exists, and if it doesn't it sets it to a
.method
or add_value=True
as it would only introduce confusion and incoherencies to the API for those creating ModelPlugin
s.Hello Dr. Devon Hjelm,
I was searching for the implementation of Mutual Information Neural Estimation and i came accross your code. But i am not sure if its implemented!! Could you please help me on how to use it ?
Regards,
Sankar Mukherjee
As per the documentation cortex is supported for both python versions 3.5 and 3.6 but the installation succeeds only for 3.5 and it fails for 3.6. Steps to reproduce the issue:
$ conda create --yes --name "myenv" python=3.6
$ conda activate myenv
$ pip install visdom
$ git clone https://github.com/rdevon/cortex.git
$ cd cortex
$ pip install .
Expected behavior: successful installation.
Actual behavior: installation fails with following error:
Processing <path>
ERROR: Command errored out with exit status 1:
command: <path>/miniconda3/envs/myenv/bin/python3.6 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-fdz1bz19/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-fdz1bz19/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-4spmvwv4
cwd: /tmp/pip-req-build-fdz1bz19/
Complete output (1 lines):
error in cortex setup command: 'dependency_links' must be a list of strings (got {'git+https://github.com/facebookresearch/visdom.git'})
----------------------------------------
WARNING: Discarding file:///<path>/cortex. Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
Reloading is highly sensitive to the environment that the reload is performed in. This is problematic if people want to reload from their ipython console or Jupyter notebook.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.