victoresque / pytorch-template Goto Github PK

View Code? Open in Web Editor NEW

4.6K 4.6K 1.1K 288 KB

PyTorch deep learning projects made easy.

License: MIT License

Python 100.00%

pytorch-template's People

Contributors

Stargazers

Watchers

Forkers

moustaphacheikh ansidong rwzhao dahuang37 little1tow g-wang houhlin shaunstanislauslau qihongl njuapp satpreetsingh dykim07 batigooal franciszchen sebastiani rawmarshmellows wangmingliang1990 armstrongyang chenyulinx diego999 hajungong007 denistome southatsouth bfelbo taylor-liu maveriq minhpqn sunqpark jacknhat chuangag 82magnolia psavine42 walter-woodall noahkim11 llcing huyhoang17 ylada tskatom rt0220 mcdavid109 bohanli shlpu galdude33 robin-des-bois sampathweb wdan manojsukhavasi xp-ji thomfrick xxradon amjltc295 javiribera jeeyung felixmonkey jiangxiluning kelvinson vikasmech sonyeric sephfire05 littleredhat christinaliang dlwyp borgesa jkhlot streiitzia tomokane qwertimer1 sjy1203 ionvision martinhahner zijunwei chasonlee jinchenglee yanxiaobin-ben vin136 fengzhangyin raspberryice yjlolo garygaryry xingbaji yuan310979 matthiasbaur yapdianang zhaobozb codingboo dsp6414 machinelearning147 freealong wyk0517 nash2325138 amirunpri2018 xyj77 ibrahimyang hsinlichu ythhy liwb5 banxia1994 stevefoy bnu-wangxun rjanser

pytorch-template's Issues

Questions regarding to dataloader

The project is very exciting, but I don't understand why we should implement some functions in
dataloader.py, such as shuffle, batch_szie and so on, which have already been implemented in torch.utils.data.DataLoader.
If we could make the dataloader lighter?

Thanks.

i face error when resume training.

in line 99 of base trainer, not_improved_count += 1
when i resume training, my code into else partition. so occured error because this variable is not initialized but perform plus operation itself+1.

so i propose to initialize variable before where into if partition.

Readme

in readme:
Additional logging

If you have additional information to be logged, in _train_epoch() of your trainer class, merge them with log as shown below before returning:

additional_log = {"gradient_norm": g, "sensitivity": s}
log = log.update(additional_log)
return log

should it be log.update(additional_log) instead of log = log.update(additional_log)?

"hydra" support

Hydra is a python package for handling config files developed by facebook research team.
Recently I noticed that using this package can simplify source code of this project a lot, by replacing following features with package functions.

using CLI options to change config
object initialization from config
accessing config items as attributes (using config.something, instead of config['something'])
managing checkpoint directory with timestamp

Since replacing current implementations with a package can be seen as a quite radical change, I made separate hydra branch to use this package. Comparing with master branch, this version will

have much simpler design
contain some advanced features
be maintained more frequently (since I'm not using master version any more)
contain some bugs yet...

If you are interested, checkout to that branch by git checkout hydra on this repository

ZeroDivisionError in TensorboardWriter.set_step()

There is a risk of getting ZeroDivisionError: float division by zero in TensorboardWriter.set_step() method in line:
self.add_scalar('steps_per_sec', 1 / duration.total_seconds())
I get this when running example config with "tensorboard": false

cuda or not cuda?

I have a gpu ,so I set "n_gpu": 1. And "base_trainer.py"(line 17) set model to cuda. And "trainer.py"(line 42) set input to cuda. But when start my Dataset, the error appears.
'''python
Trainable parameters: 31042369
Traceback (most recent call last):
File "train.py", line 68, in
main(config)
File "train.py", line 49, in main
trainer.train()
File "E:\kaggle\pytorch-template\base\base_trainer.py", line 66, in train
result = self._train_epoch(epoch)
File "E:\kaggle\pytorch-template\trainer\trainer.py", line 47, in _train_epoch
output = self.model(data)
File "D:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "E:\kaggle\pytorch-template\model\model.py", line 117, in forward
x1 = self.inc(x)
File "D:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "E:\kaggle\pytorch-template\model\model.py", line 42, in forward
return self.double_conv(x)
File "D:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "D:\Anaconda3\lib\site-packages\torch\nn\modules\container.py", line 100, in forward
input = module(input)
File "D:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "D:\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 349, in forward
return self._conv_forward(input, self.weight)
File "D:\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 346, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same
'''
It makes me feel confused.I don't have idea.

Potential Memory Leak

In train.py line 58:

        config = json.load(open(args.config))

But the "open" without closing will make a memory leak.

If you want I can make a PR.

Some features I have implemented.

I started to use this template since 2020 Oct.
And I found that there are some features can be included to this template.
I made them in my repo.
I just list some notable features I have added:

Overview of `config.json`

{
    "n_gpu": 1,
    "root_dir": "./",
    "save_dir": "saved/",
    "name": "dataset_model",

    "datasets": {
        "train": {
            "data": {
                "module": ".data_loader",
                "type": "MyDataset",
                "kwargs": {
                    "data_dir": "./data",
                    "label_path": null,
                    "mode": "train"
                }
            }
        },
        "valid": {
        },
        "test": {
            "data": {
                "module": ".data_loader",
                "type": "MyDataset",
                "kwargs": {
                    "data_dir": "./data",
                    "label_path": null,
                    "mode": "test"
                }
            }
        }
    },
    "data_loaders": {
        "train": {
            "data": {
                "module": ".data_loader",
                "type": "BaseDataLoader",
                "kwargs": {
                    "validation_split": 0.2,
                    "DataLoader_kwargs": {
                        "batch_size": 64,
                        "shuffle": true,
                        "num_workers": 4
                    },
                    "do_transform": true
                }
            }
        },
        "valid": {
        },
        "test": {
            "data": {
                "module": ".data_loader",
                "type": "DataLoader",
                "kwargs": {
                    "batch_size": 64,
                    "shuffle": false,
                    "num_workers": 4
                },
                "do_transform": true
            }
        }
    },
    "models": {
        "model": {
            "module": ".model",
            "type": "MyModel"
        }
    },
    "losses": {
        "loss": {
            "type": "nll_loss"
        }
    },
    "metrics": {
        "per_iteration": ["accuracy"],
        "per_epoch": ["AUROC", "AUPRC"]
    },
    "optimizers": {
        "model": {
            "type": "Adam",
            "kwargs": {
                "lr": 0.001
            }
        }
    },
    "lr_schedulers": {
        "model": {
            "type": "StepLR",
            "kwargs": {
                "step_size": 50,
                "gamma": 0.1
            }
        }
    },
    "trainer": {
        "module": ".trainer",
        "type": "Trainer",
        "k_fold": 5,
        "fold_idx": 0,
        "kwargs": {
            "finetune": false,
            "epochs": 2,
            "len_epoch": null,

            "save_period": 5,
            "save_the_best": true,
            "verbosity": 2,

            "monitor": "max val_accuracy",
            "early_stop": 0,

            "tensorboard": false
        }
    }
}

Enable multiple instances in datasets, data_loaders, models, losses, optimizers, lr_schedulers

Multiple datasets like domain adaption training will use source dataset and target dataset, so do data_loaders.
Multiple models like GAN. Generator and Discriminator.
Multiple losses, optimizers, lr_schedulers can be found in many ML papers.

train/valid/test

If the path of train/valid/test is already given, then the content can directly put in the section in datasets, data_loaders.

module/type

When there are more than one module, for example,

data_loader/first_loader.py and data_loader/second_loader.py.
trainer/first_trainer.py and trainer/second_trainer.py
model/model1.py and model/model2.py
Each of them has some class.
In parse_config.py, ConfigParser.init_obj() can automatically import the specified class by using importlib.

AUROC/AUPRC

In metric part, I add two commonly used metric AUROC/AUPRC. These two metric need to compute on whole epoch, so the compute method is different from accuracy

MetricTracker

Continue from AUROC/AUPRC, I revise the MetricTracker, which is moved to model/metric.py.
The MetricTracker can record both accuracy-like metric (metirc_iter) and AUROC-like (metric_epoch) metric.

Cross validation

cross validation are supported
class Cross_Valid in base/base_dataloader.py can record each fold results (all metrics in MetricTracker).
The model of each fold are saved.
test.py can ensemble k-fold validation results.

Examples

I add some example codes to use the above features.

MNIST dataset
ImageNet dataset
Adult dataset

I'm appreciated if anyone has some comments on my work.

Thanks,
Pei-Ying, Liu

Bugs in Resume Checkpoint

Thanks for the great work on this template.
However, in the base_trainer.py, there are two bugs while resuming checkpoints:

line 100: (I think this is related to #54, but you closed the issue without fixing it)
While resume training from saved checkpoints, there will be "reference before assignment" error on the not_improved_count variable.
line 165 and 171:
The sanity check here is invalid, as the config would be overwritten by the config.json of the saved checkpoint when parsing config (line 22 in parse_config.py)

Iteration-based training

Any ideas when can we expect to have iteration-based training ready? It would be a very useful feature

Command line options

I'm considering to add more command line options to this project.
For now, what we have are

--config to start new training
--resume to continue training of saved checkpoint.

In current setup, I feels the following problems

manually delete checkpoint folder, when cancelling the training and run again
config is overridden when checkpoint is being loaded

Hence, the options I suggests are as following

-f, --force option to clean up previous checkpoints if it already exists
using -c and -r at the same time for loading checkpoint but uses given config

What I want most is to add the first, -f option, which seems to be not that hard to implement, not crashing with other parts of project.
The second option looks quite difficult to add since the difference between config files should be handled carefully, but we need this to enable fine-tuning process with this template.

Are there any other suggestions or opinions??

testing of pre-trained model on different test dataset

Hi,
I hope that you are doing ok.

Background:
I am working on uncertainty estimation so, I need to test models trained on one dataset (say MNIST) on test data from different datasets(say Fashion MNIST, NOTMINIST).

Problem:
I have a question regarding testing of pre-trained model saved at <saved/[model_name]/model_best> on test data from different dataset.
For Example say, I trained my MNIST model on MNIST dataset and I want to test that trained model through "model_best.pth" saved in <saved/[model_name]> on FashionMNIST test data. How can I do that?

Upon testing I am adding resume argument to model_best which I believe probably has information regarding the dataset for testing.
Changing the config.json file's dataset path in <saved/[model_name]> doesn't do any good too. It always goes to the dataset model was trained on.
Would be kind enough to tell me how can i go around this problem?

Add test.py or infer.py

It would be great if there is a test.py template that could be used to infer testing data with pretrained model in saved/.
Also, I notice that valid_epoch, train_epoch and the inference code seem to be similar. Maybe they could be refactored to three functions that use a common function with some more parameters (e.g, whether to do pack propagation, calculate loss ...).

What do you guys think?

Maybe a bug in lr_scheduler when resuming

In trainer.py, at line 71 self.lr_scheduler.step() should be replaced by self.lr_scheduler.step(epoch)
the source code of the function step is

def step(self, epoch=None):
        if epoch is None:
            epoch = self.last_epoch + 1
        self.last_epoch = epoch
        for param_group, lr in zip(self.optimizer.param_groups, self.get_lr()):
            param_group['lr'] = lr

Because when resuming from a checkpoint, the lr_scheduler will init from scratch, the default value of parameterlast_epoch in lr_scheduler will be -1 rather than the last epoch in checkpoint.

Setting 'early_stop: 0' does not disable it

In the README it is said about early stopping that 'This feature can be turned off by passing 0'. This is, however, will not work. self.early_stop will be set to 0 in:

pytorch-template/base/base_trainer.py

Line 34 in 85c5535

self.early_stop = cfg_trainer.get('early_stop', inf)

and later during training this condition will be true if current epoch was worse than previous:

pytorch-template/base/base_trainer.py

Line 91 in 85c5535

if not_improved_count > self.early_stop:

leading to the output:
Validation performance didn't improve for 0 epochs. Training stops.

how can I add arg into config after data loading step

I write a custom dataset and dataloader. The nn's input_dim is determined by one attribute of dataset. However, it seems I can only set the input_dim in config file. Any way to achieve this ? Thanks.

Proposal regarding "within epoch progress bar"

Fast scrolling 'epoch text' drowning warnings

I support using the current usage of warnings, but they tend to drown quickly when the "Train epoch"-status rolls across the screen, blazing fast.

I guess most real projects (i.e., not the MNIST example) will most of the time will have slower updates, but I think it to some degree still applies.

Consider putting info into 'tqdm'?

I thought it could be nice to use tqdm for this "within epoch status". The tqdm progress bar description can be updated during training, like this:

epoch_iterator = tqdm(range(1, 5000))
for i in epoch_iterator:
    if i % 500 == 0:
        print("\nSomething hapened, so a message is printed. \n More text.\n")
    time.sleep(0.001)

    my_loss = 42
    epoch_iterator.set_description(f"Value: {i}, loss = {my_loss}")

The above does not behave perfectly (can hopefully be fixed), but I hope you get my point.

base_trainer._save_checkpoint error in Windows

Hi,

The use of 'os.rename' when saving a new 'model_best' in 'base_trainer._save_checkpoint' raises an exception.

With reference to os.rename documentation:

'On Windows, if dst already exists, OSError will be raised even if it is a file.'

The function can be replaced with 'os.replace':

'If you want cross-platform overwriting of the destination, use replace().'

Both quotes from https://docs.python.org/3.6/library/os.html

Multi Gpu Usage Problem

When i execute train.py using command CUDA_VISIBLE_DEVICES=2,3 python train.py -c config.json (n_gpu option is 2), still used and started index is 0 gpu.
because list_ids is made just range(n_gpu_use), so if n_gpu is 2, list_ids is [0,1].

any support of k-fold cross validation?

calling Trainer with len_epoch parameter

Hi,

There seems to be an issue when setting len_epoch parameter and calling an instance of Trainer in this way:
trainer = Trainer(model, criterion, metrics, optimizer,
config=config,
data_loader=data_loader,
valid_data_loader=valid_data_loader,
lr_scheduler=lr_scheduler, len_epoch=config['trainer']['steps_per_epoch'])

ReduceLROnPlateau lr_scheduler

Hello,
Thanks for your efforts, I am trying to use ReduceLROnPlateau lr_scheduler, but I got the following error

Traceback (most recent call last):
  File "/home/T1/train.py", line 73, in <module>
    main(config)
  File "/home/T1/train.py", line 54, in main
    trainer.train()
  File "/home/T1/base/base_trainer.py", line 63, in train
    result = self._train_epoch(epoch)
  File "/home/T1/trainer/trainer.py", line 72, in _train_epoch
    self.lr_scheduler.step()
TypeError: step() missing 1 required positional argument: 'metrics'

Accourding to Pytorch doc it should be used like this:

>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
>>> scheduler = ReduceLROnPlateau(optimizer, 'min')
>>> for epoch in range(10):
>>>     train(...)
>>>     val_loss = validate(...)
>>>     # Note that step should be called after validate()
>>>     scheduler.step(val_loss)

I guess to do so _valid_epoch(self, epoch): has to return val_loss or save it somewhere that the optimizer can access, Could you guide me on this?

Tidying up (grouping, adding some spacing and commenting) the code base

Hi,

I would propose that we go through the repository and add more comments/documentation inline, as well as giving the overall code some more structure.

This in order to make it easier for new users to understand the mechanics of the template code.

A good example of this need is the 'BaseTrainer' class. I believe that it would benefit from some restructuring (not functional, but as mentioned make it easier to get an overview).

Do you agree?

Proposal in "model/loss.py" - Use loss classes instead of functional API

Hi,

Currently, 'get_loss_function' queries for local functions in the same file and returns the object, if it finds a function with the matching name. I propose that we instead use the interface of 'torch.nn.modules.loss' classes and return an instantiated class object, instead of a function reference.

These classes either way call the functional API, and are better documented.

As an example:

Current functionality:
- 'my_loss' returns 'F.nll_loss(y_in, y_target)'
- Function 'get_loss_function' returns the function reference to 'my_loss'
Proposed functionality:
- 'get_loss_function' can instead use a combination of 'getattr(torch.nn.modules.loss, loss_fn_name)' (finding all built in loss classes in PyTorch) and searching for custom loss classes inside the file ('model/loss.py')
- An instantiated class object is returned

What do you think?

Abstract base classes do not use ABCMeta metaclass

First, thank you for taking the time and effort to create this useful template! :)

I noticed for base_model.py and base_trainer.py, you are using the abc.abstractmethod decorator from the abc module. However, as currently implemented, I think this decorator might be used incorrectly, as the abstract base classes do not use abc.ABCMeta. For more information, see https://docs.python.org/3/library/abc.html#abc.abstractmethod

Using this decorator requires that the class’s metaclass is ABCMeta or is derived from it. A class that has a metaclass derived from ABCMeta cannot be instantiated unless all of its abstract methods and properties are overridden.

To demonstrate: If I define a new class derived from BaseTrainer, but I do not override _train_epoch, I will still be allowed to instantiate an object of that new class. But, @abstractmethod is supposed to prevent this. To fix the issue, change BaseTrainer to BaseTrainer(metaclass=ABCMeta). Once this change is made, the instantiation will throw a TypeError as expected in the case that _train_epoch was not overridden.

Not using GPU during validation

As title, please check.

In test.py i face error.

test.py code :
args = parser.parse_args()
*config = ConfigParser(args) origin code
*config = ConfigParser(parser) proposal code
main(config, args.resume)

class ConfigParser:
def init(self, args, options='', timestamp=True):
# parse default and custom cli options
for opt in options:
args.add_argument(*opt.flags, default=None, type=opt.type)
args = args.parse_args()

In args = args.parse_args(), args is result of parser.parse_args()
so i occurred error.
so i proposal changing ConfigParser(args) to ConfigParser(parser) in test.py

Cookiecutter

I wonder if you might prefer to make this project a cookiecutter template rather than using a copy script.

You can see my fork here for an example. To test it out, simply run

pip install cookiecutter
cookiecutter https://github.com/khornlund/cookiecutter-pytorch

If so, I'll submit a PR.

Some problems

Hi, I used your template and I had some problems as follows:

In trainer.py, at line 53, the command met(output, target) should be changed to met(output, target).item(). If we don't, the memory space will increase after each batch_size. I tested with PyTorch 0.4.1 and realized this problem.
In my opinion, the computed evaluation metrics are not correct. Should we add the batch_size to the update function like this:

# at line 53
self.train_metrics.update(met.__name__, met(output, target).item(), n=batch_size)

How do you think about that?

Proposal: Use of 'mutually exclusive' argparse arguments.

Hi,

If you prefer, I can create a pull request for the below proposal?

I propose the following change to the argparser in 'train.py':

parser = argparse.ArgumentParser(description='PyTorch Template')
arg_group = parser.add_mutually_exclusive_group(required=True)
arg_group.add_argument('-c', '--config', default=None, type=str,
                           help='config file path (default: None)')
arg_group.add_argument('-r', '--resume', default=None, type=str,
                        help='path to latest checkpoint (default: None)')
args = parser.parse_args()

Unless I have overlooked something, the above enables current lines 49-58 to be simplified to:

    if args.resume:
        config = torch.load(args.resume)['config']
    else:
        config = json.load(open(args.config))
        path = os.path.join(config['trainer']['save_dir'], config['name'])
        assert not os.path.exists(path), "Path {} already exists!".format(path)

    main(config, args.resume)

i face on error when extend dataset class. How extend to other dataset like voc?

I change one line two case in data loader.
first case:
Before
self.dataset = datasets.MNIST(self.data_dir, train=training, download=True, transform=trsfm)
After
self.dataset = datasets.MNIST(self.data_dir, train=training, download=True, transform=trsfm, target_transform=trsfm)
second case:
Before
origin code
After
self.dataset = datasets.VOCDetection(self.data_dir, year='2012', image_set='trainval', download=True, transform=trsfm, target_transform=trsfm)

but i have been facing on similar error on 2 case.

may be my think is can't convert data loader as enumerate variance...

i wonder my guess is right and how extends mnist data loader code to voc dataloader?

Replacing usage of 'eval' in repository

Hi,

While I was trying to figure out how to both enable users to use both "PyTorch" loss functions and custom loss functions (ref. pull request #22), I was made aware that usage of eval often is discouraged and can even be dangerous (if abused).

Therefore, I propose that we replace the three instances of it in this repository:
model.py:

def get_model_instance(model_arch, model_params):
    try:
        model = eval(model_arch)

metric.py

def get_metric_functions(metric_names):
    try:
        metric_fns = [eval(metric) for metric in metric_names]

trainer.py

   def _valid_epoch(self, epoch):
        self.model.eval()

I am not certain about what to replace it with. Any of you know?

Add custom flag & override from CLI

Hello, Thanks for your efforts.
I am trying to add a custom flag and at the same time allow the user to override it from the command line.
I tried the following but still config.extract returns the value from config.json, not the one I provided in the CLI. How can I solve this?
I tried the following

CustomArgs = collections.namedtuple('CustomArgs', 'flags type target')
options = [CustomArgs(['-x', '--extract'], type=bool, target=('extract'))]
config = ConfigParser.from_args(args,options=options)

and also:

args.add_argument('-x', '--extract', help='extract parameters, default=False, type=bool, nargs=1)

Kind Regards.

License?

What's the license of this repo?

best_model is calculated based on save_epoch, not every epoch

After setting the monitor to val accuracy, I would like to get the best model out of each epoch, but later on I found out that it is when epoch % save_epoch == 0 that best_model will be saved. This would cause the problem that models, with potentially better validation accuracy, are missed because they do not met condition to be saved, which does not seem to be a good idea.

This is the line that illustrates the problem:

pytorch-template/base/base_trainer.py

Line 102 in f444986

self._save_checkpoint(epoch, save_best=best)

Add early stop in base_trainer.py

Thanks for your contribution, I have learned a lot from the template.
I think it would be good to have early stop in base_trainer.py.

something like

def __init__(): 
    self.no_improve_count = 0 if self.config['trainer']['early_stop'] else None

def train()
    if self.no_improve_count:
        self.no_improve_count += 1
            if self.no_improve_count == self.early_stop:
            msg = "Metric named '{}' ".format(self.monitor)\
                        + "has not improved for {} epochs; stop training".format(self.early_stop)
            self.logger.info(msg)
            break

Consider set this repo as a github template?

Github now has a feature about template repository, and the setup is simple.
It might help for users who want to use this template but don't know how to deal with the git history.

stacking uni-channeled image thrice to make it 3-channeled image

Hi again,

Background:
I am working on uncertainty estimation so, I need to test models trained on one dataset (say MNIST) on test data from different datasets(say Fashion MNIST, NOTMINIST).

Problem:
So I am testing trained models for one dataset on various datasets' test data. Suppose one scenario is testing Mnist model trained on MNIST on CIFAR-10's test data. Now as you know that CIFAR-10 has RGB images whereas MNIST has gray-scale images. So testing of trained model on MNIST will pose dimension clash on CIFAR-10's test data upon computation of fully connected layers, since it is RGB data. I am looking to find a way to stack the gray-scale image of MNIST thrice and make it 3 channeled grayscle image with all channels being the same so that it doesn't pose dimension clash further.

Any help would be great.

I want to train own datasets by this template.

I want to train own datasets by this template. So How do I modify the code?

Model is moved to GPU after the optimizer is instatiated, resulting in a performance hit.

I noticed that the optimizer is instantiated before the model is moved to the GPU.

This is contrary to the PyTorch docs:

If you need to move a model to GPU via .cuda(), please do so before constructing optimizers for it. Parameters of a model after .cuda() will be different objects with those before the call.

In general, you should make sure that optimized parameters live in consistent locations when optimizers are constructed and used.
-- https://pytorch.org/docs/stable/optim.html#how-to-use-an-optimizer

I noticed the problem on my machine, because I had fluctuating GPU utilization (checked with nvtop). The utilization jumped every couple seconds from 10-20% to 80% and back. Moving the model to cuda beforehand (in train.py) fixed issue for me. (Afterwards the utilization never dropped under 70%).

model = config.init_obj('arch', module_arch)
model.cuda()
...
optimizer = config.init_obj('optimizer', torch.optim, trainable_params)

Can you reproduce the behavior?

Add support of visdom

e.g. use visdom to visualize the process of training loss.

Save best model

pytorch-template/base/base_trainer.py

Line 101 in a39a102

if epoch % self.save_period == 0:

here should correct as if epoch % self.save_period == 0 or best:
so that it can save best model in the case self.save_period > 1.

Versioned Repository

Hi,
This project seems really interesting, wondering if it proper to archive periodically, so we can use extract version of it.

changing the model to use resnet

I am using your code base for my project and I want to change the model to use torchvision.models.resnet18() instead of customized layers.But I am facing issues. I just changed the model.py to :
import torch.nn as nn
import torch.nn.functional as F
from base import BaseModel
import torchvision.models as models

class MnistModel(BaseModel):
def init(self, num_classes=10):
super(MnistModel, self).init()
# self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
# self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
# self.conv2_drop = nn.Dropout2d()
# self.fc1 = nn.Linear(320, 50)
# self.fc2 = nn.Linear(50, num_classes)
models.resnet18()

def forward(self, x):
    x = F.relu(F.max_pool2d(self.conv1(x), 2))
    x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
    x = x.view(-1, 320)
    x = F.relu(self.fc1(x))
    x = F.dropout(x, training=self.training)
    x = self.fc2(x)
    return F.log_softmax(x, dim=1)

The error I got :Trainable parameters: 0
MnistModel()
Traceback (most recent call last):
File "/home/rinkita/MLMI/pytorch-template/train.py", line 72, in
main(config, args.resume)
File "/home/rinkita/MLMI/pytorch-template/train.py", line 35, in main
optimizer = get_instance(torch.optim, 'optimizer', config, trainable_params)
File "/home/rinkita/MLMI/pytorch-template/train.py", line 15, in get_instance
return getattr(module, config[name]['type'])(*args, **config[name]['args'])
File "/usr/local/lib/python3.6/dist-packages/torch/optim/adam.py", line 41, in init
super(Adam, self).init(params, defaults)
File "/usr/local/lib/python3.6/dist-packages/torch/optim/optimizer.py", line 38, in init
raise ValueError("optimizer got an empty parameter list")
ValueError: optimizer got an empty parameter list

Can you please help me with this

Dataloader & Shuffle

When setting 'validation_split > 0.0 the base_data_loader creates samplers that shuffle the indices of the dataset.

Shouldn't these be wrapped with an if statement? i.e. shuffle the indices when self.shuffle is set to True

Distribute template as a pip package

First of all, thanks for an amazing project!

It offers a lot of handy features but to use them one needs to copy the whole template every time (or am I missing something?). It would be nice to create a package, installable via pip or conda to write only the necessary additions in new projects. From the architecture perspective, I don't see any major obstacles. The users should only point get_instance calls to custom modules and files.

Passing an iterable from config.json

Hi,
I was trying to config different learning rates for different subnetworks, so I wonder how could I achieve the following set up using the template.

The Pytorch Optimizer takes in an iterable without any keyword argument in this case, so I'm not sure how to configure that.

Example from Pytorch documenation

optimizer = optim.SGD([
                {'params': model.base.parameters()},
                {'params': model.classifier.parameters(), 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

https://pytorch.org/docs/stable/optim.html

Shall it be like this?

subnet_list = [
                {'params': model.base.parameters()},
                {'params': model.classifier.parameters(), 'lr': 1e-3}
            ]
optimizer = config.init_obj('optimizer', torch.optim, subnet_list)

Removing the transformations in the validation phase

Hi,
I would like to remove some of the transformations in the data loader when there is the validation phase. Any suggestion to do it in a clean way through your template?

thanks

A little question about loading checkpoint.

Great work! I have a little question about loading checkpoints. The checkpoint in your project saved in this way:

 state = {
            'arch': arch,
            'epoch': epoch,
            'state_dict': self.model.state_dict(),
            'optimizer': self.optimizer.state_dict(),
            'monitor_best': self.mnt_best,
            'config': self.config
        }
filename = str(self.checkpoint_dir / 'checkpoint-epoch{}.pth'.format(epoch))
torch.save(state, filename)

And I just want to load the checkpoint in a very simple way for some reason, just like this:

checkpoint = torch.load('model_best.pth', map_location=torch.device('cpu'))

But an error occurred：

Traceback (most recent call last):
File ".\test.py", line 11, in
checkpoint = torch.load(Path('model_best.pth'), map_location=torch.device('cpu'))
File "C:\Users\user\anaconda3\envs\luoqiuhong\lib\site-packages\torch\serialization.py", line 593, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "C:\Users\user\anaconda3\envs\luoqiuhong\lib\site-packages\torch\serialization.py", line 773, in _legacy_load
result = unpickler.load()
ModuleNotFoundError: No module named 'parse_config'

Is there any way can do this simple checkpoints loading(without 'parse_config')?
Thanks in advance.

victoresque / pytorch-template Goto Github PK

pytorch-template's People

Contributors

Stargazers

Watchers

Forkers

pytorch-template's Issues

Overview of config.json

Enable multiple instances in datasets, data_loaders, models, losses, optimizers, lr_schedulers

train/valid/test

module/type

AUROC/AUPRC

MetricTracker

Cross validation

Examples

Fast scrolling 'epoch text' drowning warnings

Consider putting info into 'tqdm'?

Recommend Projects

Recommend Topics

Recommend Org

Overview of `config.json`