toodef / neural-pipeline Goto Github PK

Neural networks training pipeline based on PyTorch

Home Page: https://neural-pipeline.readthedocs.io

License: MIT License

Python 100.00%

deep-learning pytorch image-classification image-segmentation supervised-learning object-detection training-pipeline neural-networks pipeline

neural-pipeline's People

Contributors

Stargazers

Watchers

neural-pipeline's Issues

Add a guide about execution performance

This need after 0.2 performance increase release.
Need to add to documentation guide about how to use Neural Pipeline more effective.

Extract all console output to Console Monitor

It's really useful to manage and customize console output by separated class.
Also, this makes possible to include some info to console:

Real and model forward fps
Some events, like it implemented in other monitors
Manage metrics, that's show in progress bar

Profile code and optimize as possible

Think about parallel losses computation. May be needed when losses computed on CPU.

Add loss composer for stack losses with weights.

Weights optimizing
Easily add weights for monitoring
Easily add weights for optimizing

Add system monitor

The idea is cpu and gpu occupancy monitoring for analyse code pervormance

Typo in docstrings

neural-pipeline/neural_pipeline/data_processor/model.py

Line 11 in dbf5cc9

    
               Wrapper for :class:`torch.nn.Module`. This class provide initialisation, call and serialisation for it

serialisation ---> serialization
initialisation ---> initialization

Add batch gradients accumulating (for increase batch without increasing memory usage)

For solving this issue these steps needed:

Add method enable_grads_acumulation(steps_num: int) to Trainer class
Add gradients accumulating like described there
Write tests for a simple network, there will be compared 2 losses values:
1. Calculated without gradients accumulating
2. Calculated with gradients accumulating

For do this test need to provide same data input to model and same weights in model (last can be done by flushing weights to file).

[Optional] Explore how BatchNorm works with gradients accumulating. There says, that it's a problem (but disscussion from pre-relase of PyTorch 1.0 version)

Describe the bug

I was wondering when saw strange results of my metrics. I have custom metrics-wrappers around sklearn metrics:

from neural_pipeline.train_config import AbstractMetric, MetricsProcessor, MetricsGroup
from sklearn.metrics import precision_score, recall_score, accuracy_score

class Metric(AbstractMetric):
  def __init__(self, name, function):
    super().__init__(name)
    self.function = function

  def calc(self, output: torch.Tensor, target: torch.Tensor) -> np.ndarray or float:
    predicted = output.gt(0.5)
    return self.function(target, predicted)

class Metrics(MetricsProcessor):
  def __init__(self, stage_name: str):
    super().__init__()
    accuracy = Metric('accuracy', accuracy_score)
    precision = Metric('precision', precision_score)
    recall = Metric('recall', recall_score)
    self.add_metrics_group(MetricsGroup(stage_name).\
                           add(accuracy).\
                           add(precision).\
                           add(recall))

Configuration is following:

train_batch_size = 32
val_batch_size = len(X_test)

train_dataset = DataProducer([Dataset(X_train, y_train)], batch_size=train_batch_size)
validation_dataset = DataProducer([Dataset(X_test, y_test)], batch_size=val_batch_size)

train_stages = [TrainStage(train_dataset, Metrics('train')), 
                ValidationStage(validation_dataset, Metrics('validation'))]
loss = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)
train_config = TrainConfig(train_stages, loss, optimizer)

fsm = FileStructManager(base_dir='data', is_continue=False)

epochs = 1
trainer = Trainer(model, train_config, fsm, device).set_epoch_num(epochs)
trainer.monitor_hub.add_monitor(TensorboardMonitor(fsm, is_continue=False))\
                   .add_monitor(LogMonitor(fsm))

trainer.train()

After training I load the metrics from data/monitors/metrics_log/metrics_log.json I got following results for validation data:

But I did same manually and got different result

I understand that if I used the batch_size that is not equal to the len of validation dataset I would get a different result. But here this is not the case. Another problem that I also found is result of last training step also differ from those computed manually no matter what value for epochs is used. I can't find error in the code, but it seems to me as magic

Experiments comparision

Visual comparision (maybe by loading files from few logging monitors and passing it to another monitor)
Console output
Parsing the file for metrics comparision

Add FP16 support

Add DataProducerPredictor

Separate Predictor to Predictor and DataProducerPredictor. Predictor just init model and provide an interface to predict one item, DataProducerPredictor - provides an interface to predict all data from DataProducer object.

Also DataProducerPredictor can have interfaces:

Connecting metrics to predict
Select best and most worth predicts

Add delayed losses computation for increase batch size without memory increasing.

Add `jit` training

Add built-in metrics for classification and segmentation tasks

Add `turbojpeg` for images fast loading

Add the ability to select a subpart from dataset

Frequently we need to extract part of dataset for test network for example.
We need to implement this feature in DataProducer, cause it'll make possible to not change dataset class. Also, DataProsucer has flushing and loading indices interfaces.

Requirements:

Possible of flushing indices for reproducible
Different strategies for indices selection: (from begin, from range, all of it with step, e.t.c)

Add trainer with asynchronous metrics processing

Add continue from best checkpoint to Predictor

Need to add continue from best checkpoint option to Predictor

Add Comparable Trainer

Move to PyTorch 1.0.1

Add models constructor

Disappearance of metrics_log when resuming

Environment

OS: Ubuntu 18.04
Python version: 3.6
PyTorch version: 0.4.1
Neural Pipeline version: 0.1.0

Describe the bug

There is traceback on screenshot, which happen after resuming train process with "resume" method of Trainer class in the end of last epoch.

Typo in setup.py

Add best practices guide to documentation

for DVC
for datascience-cookiecutter

Make possible to share calculation between metrics
Make plots grouping more intuitive
Add function to restore metrics from log
Make possible to disable/enable histogram

TensorboardMonitor is_continue flag

Environment

OS: Ubuntu 18.04
Python version: 3.6
PyTorch version: 0.4.1
Neural Pipeline version: 0.1.0

Describe the bug

TensorboardMonitor "is_continue" flag should be set True

neural-pipeline/examples/files/resume_train.py

Line 34 in b7ada08

    
           tensorboard = TensorboardMonitor(file_struct_manager, is_continue=False, network_name='PortraitSegmentation')

toodef / neural-pipeline Goto Github PK

neural-pipeline's People

Contributors

Stargazers

Watchers

Forkers

neural-pipeline's Issues

Describe the bug

Environment

Describe the bug

Environment

Describe the bug

Recommend Projects

Recommend Topics

Recommend Org