pytorch-ignite / examples Goto Github PK

Examples, tutorials, and how-to guides

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 96.54% Python 3.46%

hacktoberfest pytorch-ignite tutorials how-to-guide examples

examples's Issues

Load Model and Resume Training guide

Following up on this question: pytorch/ignite#2301, it'd be great if we have a dedicated how-to guide for checkpointing and then loading up the checkpoint from disk and resuming training. The setup code can be taken from https://pytorch-ignite.ai/tutorials/beginner/01-getting-started/.

How-to guide to cover multi-output models and their evaluation

The idea is to show how to tackle a use-case like:

x, y1, y2 = batch
y_pred1, y_pred2, aux = model(x)

and where we would like to compute metrics between y_pred1 vs y1 and y_pred2 vs y2.

Show that we can create custom evaluator that returns a dict
We attach metric and specify output_transform for each metric

Context: discussed on discord, "questions" channel, multi-output thread.

Split for summary

Hugo has built-in ability to split for summary from the markdown files.

Add  (should be exact) where you want to show the summary as the content before .

Add summary frontmatter if the summary doesn't look good from markdown content.

Link to 

Example

This could allow the summary to be shown consistently on the website.

cc @Priyansi

Update `01-collective-communication` notebook regarding NCCL `gather` support

NCCL now supports gather operator. The line in bottom should be updated.

examples/tutorials/advanced/01-collective-communication.ipynb

Line 307 in c2657d3

    
           "**Note:** In the above example, only the main process required the gathered values and not all the processes. This can be also be done via the `gather()` method, however one of the backends [`nccl` does not support `gather()`](https://pytorch.org/docs/stable/distributed.html) hence we had to use `all_gather()`."

Create new how-to guide for FastAI LRFinder

Shift the notebook FastaiLRFinder_MNIST.ipynb as a how-to guide making appropriate changes.

Put `cifar10-distributed.py` to intermediate folder

Finalize moving tutorials into level-based folders:

beginner
intermediate

We have cifar10-distributed.py file that IMO should be put to intermediate folder near the notebook.
What to do:

put the file into the folder
update the notebook: https://github.com/pytorch-ignite/examples/blob/main/tutorials/intermediate/01-cifar10-distributed.ipynb
rerun the notebook and make sure that everything is still working.

Text Classification using Transformers

The idea is to make a new beginner-level NLP tutorial combining this transformers example and this Text CNN notebook.

Update:
The idea has now been changed to using this 🤗 tutorial: https://huggingface.co/transformers/training.html for all the base code and then introducing Ignite for training. Ignite concepts are still derived from the TextCNN notebook.

cifar10 example is not scalable with multiple GPUs

I slightly adapted the cifar10 example in this fork, basically removing python-fire and adding the torch.distributed.launch function, so that it can be executed as a standalone script with clearml-task.

I executed the following script with nproc_per_node in [1, 2, 3, 4] on a AWS g4dn.12xlarge instance (x4 T4 GPUs). I got the following results:

batch size=16, nproc_per_node=1 => Runtime: 29:53
batch size=16, nproc_per_node=1 => Runtime: 05:34
Here I disabled DataParallel as mentionned in pytorch/ignite#2447
batch size=32, nproc_per_node=2 => Runtime: 17:11
batch size=48, nproc_per_node=3 => Runtime: 11:33
batch size=64, nproc_per_node=4 => Runtime: 08:47

I am increasing the batch size by 16 each time I add a GPU, so that each GPU has the same number of samples. I didn't change the default number of processes (8) for all of them, because I didn't oberserve that the GPUs were under-used (below 95%)

GPU utilization as reported by clearml

I was expecting to observe a quasi-linear time improvement, but it isn't the case. Am I missing something?

PS: Here are the requirements I used to execute the script

torch==1.7.1+cu110
torchvision==0.8.2
pytorch-ignite==0.4.8
clearml==.1.1.6
tensorboardX==2.4.1

Custom Metrics Example

A tutorial which shows how custom metrics can be made using Metric class would be useful.
The example can create an easy metric like the Levenshtein Distance between two strings.
Reference - https://pytorch.org/ignite/metrics.html#how-to-create-a-custom-metric

Add weight in frontmatter

Add weight in frontmatter of how to guides and tutorials pages. It could allow Hugo to sort the order of appearance.

See: https://gohugo.io/templates/lists#by-weight

TLDR; Lower number get higher precedence. So we shall start with 1. And I suggest to rename the file names to start with number like 01-file-name or 1-file-name so that it will show in order when viewing. And it could allow to scan the files.
It also allow us to see the files in order in the sidebar on the website.

For example:

01-installation
02-data-iterator
03-gradient-accumulation
04-fastai-lr-finder
05-time-profiling
...

cc @Priyansi @vfdev-5

python files besides notebook

Currently, the download links for python files are not working.
https://pytorch-ignite.ai/tutorials/getting-started/

Because there is no python files in this repo which is used as submodule in the website repo.

It would be great if python files are also in this repo to ease the download of python files.

cc @Priyansi @trsvchn @vfdev-5

Advanced tutorial: Hyperparameter tuning

Use Ray Tune with Ignite for hyperparameter optimization. Also, compare Tune with Optuna and Ax.

Cross validation guide

Ideas on how to do cross-validation: pytorch/ignite#1384

Convert pure pytorch code to ignite

Following on the discussion with @vfdev-5 , this new how-to guide should be abstract and:

The purpose is to show explicitly how to convert pure pytorch code to ignite and explain what we gain by that

Topics to cover:

Training loop
Validation loop
Metrics
Checkpoints
LR Scheduler

Logging in Ignite

It would be nice to have a how-to guide combine all the built-in loggers that Ignite provides: ClearML, Tensorboard, MLflow, etc. See more here: https://pytorch.org/ignite/contrib/handlers.html#loggers.
The code on how to use these is already provided in https://github.com/pytorch/ignite/tree/master/examples/contrib/mnist.
When creating this new how-to guide, please:

Create a Jupyter Notebook, see README.md.
Follow the code under the basic setup of how-to guides. See this guide for reference.
Add explanation while creating the loggers and how to use them.
Run the notebook, so that the output is clearly visible, before submitting the PR.

Everyone is welcome to contribute! 🚀 Please feel free to ask any further questions below.

Replace all instances of `save_handler=DiskSaver(...)` with `save_handler=path/to/dir`

Since PyTorch-Ignite v0.4.7, we have updated save_handler in Checkpoint to pass the path to checkpoint directory rather than mentioning DiskSaver(checkpoint_dir, create_dir=True). It would be nice if this could be updated for all instances in this repository too. One such instance can be in under Checkpointing in How to convert PyTorch Code to Ignite.

Add frontmatter in notebooks

To make ease of converting notebook to markdown files, add a few frontmatter in the notebook as html comments.

Current frontmatter to add in a cell (this should be at the most top of the notebook)

<!-- ---
title: Example title
description: Example description
date: 2021-07-27
downloads: true
include_footer: true
sidebar: true
tags:
  - deep learning
  - machine learning
  - pytorch
  - python
--- -->

cc @trsvchn @Priyansi

`idist` tutorial

A new advanced standalone tutorial on idist that doesn't rely on comparison as in the blog post Distributed Made Easy with Ignite but focuses more on idist methods like all_reduce and all_gather and broadcast.

Reinforcement learning beginner tutorial

The idea is to move the scripts from https://github.com/pytorch/ignite/tree/master/examples/reinforcement_learning here in the form of a tutorial. To create this new tutorial, please:

Create a new Jupyter notebook, see README.md
Look at the already available tutorials to get an idea: https://pytorch-ignite.ai/tutorials/
Add explanation and outputs wherever necessary.

Feel free to ask any questions here 🙌 Everyone is welcome to contribute!

How-to-guide to cover LRScheduler

Following up on this question: pytorch/ignite#2441. As mentioned in the comments, It would be great to have an example for using LRScheduler in How-to-guide. The code example is already available in the comments

Add LICENSE?

Rework cifar10 distributed tutorial

The idea is to restructure this tutorial in the following way:
We can provide 2 parts:

minimal DDP code with 1 minimalistic trainer and 1 evaluator (computing accuracy metric) = goal to put the most pertinent info in minimum lines of text
show how to add other features from current text: logging, resuming from checkpoint, etc.

@Priyansi @trsvchn what do you think ?

could you give an example of how to save checkpoints?

unfortunately, as i upgrade the ignite, i don't know how to save my model's checkpoints, becuase the save_interval is deprecated, my code is below, but it doesn' work.
checkpointer = ModelCheckpoint(output_dir, cfg.MODEL.NAME, n_saved=10, require_empty=False) trainer.add_event_handler(Events.EPOCH_COMPLETED(every=1), checkpointer, to_save={'model': model, 'optimizer': optimizer})

please give an example, thanks

!pip install

I suggest to uniform this by providing a conda and/or requirements.txt file/s so that a full environment can be created.

pytorch-ignite / examples Goto Github PK

examples's Issues

Recommend Projects

Recommend Topics

Recommend Org