Coder Social home page Coder Social logo

toodef / neural-pipeline Goto Github PK

View Code? Open in Web Editor NEW
313.0 7.0 24.0 13.96 MB

Neural networks training pipeline based on PyTorch

Home Page: https://neural-pipeline.readthedocs.io

License: MIT License

Python 100.00%
deep-learning pytorch image-classification image-segmentation supervised-learning object-detection training-pipeline neural-networks pipeline

neural-pipeline's People

Contributors

pfriesch avatar toodef avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neural-pipeline's Issues

Extract all console output to Console Monitor

It's really useful to manage and customize console output by separated class.
Also, this makes possible to include some info to console:

  • Real and model forward fps
  • Some events, like it implemented in other monitors
  • Manage metrics, that's show in progress bar

Add system monitor

The idea is cpu and gpu occupancy monitoring for analyse code pervormance

Add batch gradients accumulating (for increase batch without increasing memory usage)

For solving this issue these steps needed:

  • Add method enable_grads_acumulation(steps_num: int) to Trainer class
  • Add gradients accumulating like described there
  • Write tests for a simple network, there will be compared 2 losses values:
    1. Calculated without gradients accumulating
    2. Calculated with gradients accumulating

For do this test need to provide same data input to model and same weights in model (last can be done by flushing weights to file).

  • [Optional] Explore how BatchNorm works with gradients accumulating. There says, that it's a problem (but disscussion from pre-relase of PyTorch 1.0 version)

Metrics computation is wrong?

Describe the bug

I was wondering when saw strange results of my metrics. I have custom metrics-wrappers around sklearn metrics:

from neural_pipeline.train_config import AbstractMetric, MetricsProcessor, MetricsGroup
from sklearn.metrics import precision_score, recall_score, accuracy_score

class Metric(AbstractMetric):
  def __init__(self, name, function):
    super().__init__(name)
    self.function = function

  def calc(self, output: torch.Tensor, target: torch.Tensor) -> np.ndarray or float:
    predicted = output.gt(0.5)
    return self.function(target, predicted)

class Metrics(MetricsProcessor):
  def __init__(self, stage_name: str):
    super().__init__()
    accuracy = Metric('accuracy', accuracy_score)
    precision = Metric('precision', precision_score)
    recall = Metric('recall', recall_score)
    self.add_metrics_group(MetricsGroup(stage_name).\
                           add(accuracy).\
                           add(precision).\
                           add(recall))

Configuration is following:

train_batch_size = 32
val_batch_size = len(X_test)

train_dataset = DataProducer([Dataset(X_train, y_train)], batch_size=train_batch_size)
validation_dataset = DataProducer([Dataset(X_test, y_test)], batch_size=val_batch_size)

train_stages = [TrainStage(train_dataset, Metrics('train')), 
                ValidationStage(validation_dataset, Metrics('validation'))]
loss = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)
train_config = TrainConfig(train_stages, loss, optimizer)

fsm = FileStructManager(base_dir='data', is_continue=False)

epochs = 1
trainer = Trainer(model, train_config, fsm, device).set_epoch_num(epochs)
trainer.monitor_hub.add_monitor(TensorboardMonitor(fsm, is_continue=False))\
                   .add_monitor(LogMonitor(fsm))

trainer.train()

After training I load the metrics from data/monitors/metrics_log/metrics_log.json I got following results for validation data:

изображение

But I did same manually and got different result
изображение
I understand that if I used the batch_size that is not equal to the len of validation dataset I would get a different result. But here this is not the case. Another problem that I also found is result of last training step also differ from those computed manually no matter what value for epochs is used. I can't find error in the code, but it seems to me as magic

Experiments comparision

  • Visual comparision (maybe by loading files from few logging monitors and passing it to another monitor)
  • Console output
  • Parsing the file for metrics comparision

Add DataProducerPredictor

Separate Predictor to Predictor and DataProducerPredictor. Predictor just init model and provide an interface to predict one item, DataProducerPredictor - provides an interface to predict all data from DataProducer object.

Also DataProducerPredictor can have interfaces:

  • Connecting metrics to predict
  • Select best and most worth predicts

Add the ability to select a subpart from dataset

Frequently we need to extract part of dataset for test network for example.
We need to implement this feature in DataProducer, cause it'll make possible to not change dataset class. Also, DataProsucer has flushing and loading indices interfaces.

Requirements:

  • Possible of flushing indices for reproducible
  • Different strategies for indices selection: (from begin, from range, all of it with step, e.t.c)

Disappearance of metrics_log when resuming

Environment

  • OS: Ubuntu 18.04
  • Python version: 3.6
  • PyTorch version: 0.4.1
  • Neural Pipeline version: 0.1.0

Describe the bug

There is traceback on screenshot, which happen after resuming train process with "resume" method of Trainer class in the end of last epoch.
Screenshot from 2019-03-29 12-28-50

Let user choose a device by himself

Now in np user do Trainer(..., device=torch.device('cuda)).
It's good but bad when we try to pass unexpected data to device.

Need to make possible use callback for this + create some predefinitions.

Improve metrics

  • Make possible to share calculation between metrics
  • Make plots grouping more intuitive
  • Add function to restore metrics from log
  • Make possible to disable/enable histogram

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.