torch-vision

This repository consists of:

vision.datasets : Data loaders for popular vision datasets
vision.models : Definitions for popular model architectures, such as AlexNet, VGG, and ResNet and pre-trained models.
vision.transforms : Common image transformations such as random crop, rotations etc.
vision.utils : Useful stuff such as saving tensor (3 x H x W) as image to disk, given a mini-batch creating a grid of images, etc.

Installation

Anaconda:

conda install torchvision -c soumith

pip:

pip install torchvision

From source:

python setup.py install

Datasets

The following dataset loaders are available:

MNIST
COCO (Captioning and Detection)
LSUN Classification
ImageFolder
Imagenet-12
CIFAR10 and CIFAR100
STL10
SVHN
PhotoTour

Datasets have the API: - __getitem__ - __len__ They all subclass from torch.utils.data.Dataset Hence, they can all be multi-threaded (python multiprocessing) using standard torch.utils.data.DataLoader.

For example:

torch.utils.data.DataLoader(coco_cap, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)

In the constructor, each dataset has a slightly different API as needed, but they all take the keyword args:

transform - a function that takes in an image and returns a transformed version
common stuff like ToTensor, RandomCrop, etc. These can be composed together with transforms.Compose (see transforms section below)
target_transform - a function that takes in the target and transforms it. For example, take in the caption string and return a tensor of word indices.

MNIST

dset.MNIST(root, train=True, transform=None, target_transform=None, download=False)

root: root directory of dataset where processed/training.pt and processed/test.pt exist

train: True - use training set, False - use test set.

transform: transform to apply to input images

target_transform: transform to apply to targets (class labels)

download: whether to download the MNIST data

COCO

This requires the COCO API to be installed

Captions:

dset.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform])

Example:

import torchvision.datasets as dset
import torchvision.transforms as transforms
cap = dset.CocoCaptions(root = 'dir where images are',
                        annFile = 'json annotation file',
                        transform=transforms.ToTensor())

print('Number of samples: ', len(cap))
img, target = cap[3] # load 4th sample

print("Image Size: ", img.size())
print(target)

Output:

Number of samples: 82783
Image Size: (3L, 427L, 640L)
[u'A plane emitting smoke stream flying over a mountain.',
u'A plane darts across a bright blue sky behind a mountain covered in snow',
u'A plane leaves a contrail above the snowy mountain top.',
u'A mountain that has a plane flying overheard in the distance.',
u'A mountain view with a plume of smoke in the background']

Detection:

dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])

LSUN

dset.LSUN(db_path, classes='train', [transform, target_transform])

db_path = root directory for the database files
classes =
'train' - all categories, training set
'val' - all categories, validation set
'test' - all categories, test set
['bedroom_train', 'church_train', ...] : a list of categories to load

CIFAR

dset.CIFAR10(root, train=True, transform=None, target_transform=None, download=False)

dset.CIFAR100(root, train=True, transform=None, target_transform=None, download=False)

root : root directory of dataset where there is folder cifar-10-batches-py
train : True = Training set, False = Test set
download : True = downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, does not do anything.

STL10

dset.STL10(root, split='train', transform=None, target_transform=None, download=False)

root : root directory of dataset where there is folder stl10_binary
split : 'train' = Training set, 'test' = Test set, 'unlabeled' = Unlabeled set,

'train+unlabeled' = Training + Unlabeled set (missing label marked as -1)
download : True = downloads the dataset from the internet and

puts it in root directory. If dataset is already downloaded, does not do anything.

SVHN

dset.SVHN(root, split='train', transform=None, target_transform=None, download=False)

root : root directory of dataset where there is folder SVHN
split : 'train' = Training set, 'test' = Test set, 'extra' = Extra training set
download : True = downloads the dataset from the internet and

puts it in root directory. If dataset is already downloaded, does not do anything.

ImageFolder

A generic data loader where the images are arranged in this way:

root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png

root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png

dset.ImageFolder(root="root folder path", [transform, target_transform])

It has the members:

self.classes - The class names as a list
self.class_to_idx - Corresponding class indices
self.imgs - The list of (image path, class-index) tuples

Imagenet-12

This is simply implemented with an ImageFolder dataset.

The data is preprocessed as described here

Here is an example.

PhotoTour

Learning Local Image Descriptors Data http://phototour.cs.washington.edu/patches/default.htm

import torchvision.datasets as dset
import torchvision.transforms as transforms
dataset = dset.PhotoTour(root = 'dir where images are',
                         name = 'name of the dataset to load',
                         transform=transforms.ToTensor())

print('Loaded PhotoTour: {} with {} images.'
      .format(dataset.name, len(dataset.data)))

Models

The models subpackage contains definitions for the following model architectures:

AlexNet: AlexNet variant from the "One weird trick" paper.
VGG: VGG-11, VGG-13, VGG-16, VGG-19 (with and without batch normalization)
ResNet: ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152
SqueezeNet: SqueezeNet 1.0, and SqueezeNet 1.1

You can construct a model with random weights by calling its constructor:

import torchvision.models as models
resnet18 = models.resnet18()
alexnet = models.alexnet()
vgg16 = models.vgg16()
squeezenet = models.squeezenet1_0()

We provide pre-trained models for the ResNet variants, SqueezeNet 1.0 and 1.1, and AlexNet, using the PyTorch model zoo. These can be constructed by passing pretrained=True:

import torchvision.models as models
resnet18 = models.resnet18(pretrained=True)
alexnet = models.alexnet(pretrained=True)
squeezenet = models.squeezenet1_0(pretrained=True)

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be atleast 224.

The images have to be loaded in to a range of [0, 1] and then normalized using mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225]

An example of such normalization can be found in the imagenet example here <https://github.com/pytorch/examples/blob/42e5b996718797e45c46a25c55b031e6768f8440/imagenet/main.py#L89-L101>

Transforms

Transforms are common image transforms. They can be chained together using transforms.Compose

`transforms.Compose`

One can compose several transforms together. For example.

transform = transforms.Compose([
    transforms.RandomSizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ],
                          std = [ 0.229, 0.224, 0.225 ]),
])

Transforms on PIL.Image

`Scale(size, interpolation=Image.BILINEAR)`

Rescales the input PIL.Image to the given 'size'.

If 'size' is a 2-element tuple or list in the order of (width, height), it will be the exactly size to scale.

If 'size' is a number, it will indicate the size of the smaller edge. For example, if height > width, then image will be rescaled to (size * height / width, size) - size: size of the smaller edge - interpolation: Default: PIL.Image.BILINEAR

`CenterCrop(size)` - center-crops the image to the given size

Crops the given PIL.Image at the center to have a region of the given size. size can be a tuple (target_height, target_width) or an integer, in which case the target will be of a square shape (size, size)

`RandomCrop(size, padding=0)`

Crops the given PIL.Image at a random location to have a region of the given size. size can be a tuple (target_height, target_width) or an integer, in which case the target will be of a square shape (size, size) If padding is non-zero, then the image is first zero-padded on each side with padding pixels.

`RandomHorizontalFlip()`

Randomly horizontally flips the given PIL.Image with a probability of 0.5

`RandomSizedCrop(size, interpolation=Image.BILINEAR)`

Random crop the given PIL.Image to a random size of (0.08 to 1.0) of the original size and and a random aspect ratio of 3/4 to 4/3 of the original aspect ratio

This is popularly used to train the Inception networks - size: size of the smaller edge - interpolation: Default: PIL.Image.BILINEAR

`Pad(padding, fill=0)`

Pads the given image on each side with padding number of pixels, and the padding pixels are filled with pixel value fill. If a 5x5 image is padded with padding=1 then it becomes 7x7

Transforms on torch.*Tensor

`Normalize(mean, std)`

Given mean: (R, G, B) and std: (R, G, B), will normalize each channel of the torch.*Tensor, i.e. channel = (channel - mean) / std

Conversion Transforms

ToTensor() - Converts a PIL.Image (RGB) or numpy.ndarray (H x W x
C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
ToPILImage() - Converts a torch.*Tensor of range [0, 1] and shape C x H x W or numpy ndarray of dtype=uint8, range[0, 255] and shape H x W x C to a PIL.Image of range [0, 255]

Generic Transforms

`Lambda(lambda)`

Given a Python lambda, applies it to the input img and returns it. For example:

transforms.Lambda(lambda x: x.add(10))

Utils

make_grid(tensor, nrow=8, padding=2, normalize=False, range=None, scale_each=False, pad_value=0) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Given a 4D mini-batch Tensor of shape (B x C x H x W), or a list of images all of the same size, makes a grid of images

normalize=True will shift the image to the range (0, 1), by subtracting the minimum and dividing by the maximum pixel value.

if range=(min, max) where min and max are numbers, then these numbers are used to normalize the image.

scale_each=True will scale each image in the batch of images separately rather than computing the (min, max) over all images.

pad_value=<float> sets the value for the padded pixels.

Example usage is given in this notebook <https://gist.github.com/anonymous/bf16430f7750c023141c562f3e9f2a91>

save_image(tensor, filename, nrow=8, padding=2, normalize=False, range=None, scale_each=False, pad_value=0) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Saves a given Tensor into an image file.

If given a mini-batch tensor, will save the tensor as a grid of images.

All options after filename are passed through to make_grid. Refer to it's documentation for more details

puzzledqs / vision Goto Github PK

vision's Introduction

torch-vision

Installation

Datasets

MNIST

COCO

Captions:

Detection:

LSUN

CIFAR

STL10

SVHN

ImageFolder

Imagenet-12

PhotoTour

Models

Transforms

transforms.Compose

Transforms on PIL.Image

Scale(size, interpolation=Image.BILINEAR)

CenterCrop(size) - center-crops the image to the given size

RandomCrop(size, padding=0)

RandomHorizontalFlip()

RandomSizedCrop(size, interpolation=Image.BILINEAR)

Pad(padding, fill=0)

Transforms on torch.*Tensor

Normalize(mean, std)

Conversion Transforms

Generic Transforms

Lambda(lambda)

Utils

vision's People

Contributors

Stargazers

Watchers

Recommend Projects

Recommend Topics

Recommend Org