Coder Social home page Coder Social logo

puzzledqs / vision Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pytorch/vision

2.0 2.0 0.0 426 KB

Datasets, Transforms and Models specific to Computer Vision

License: BSD 3-Clause "New" or "Revised" License

Python 25.13% Jupyter Notebook 74.87%

vision's Introduction

torch-vision

image

This repository consists of:

  • vision.datasets : Data loaders for popular vision datasets
  • vision.models : Definitions for popular model architectures, such as AlexNet, VGG, and ResNet and pre-trained models.
  • vision.transforms : Common image transformations such as random crop, rotations etc.
  • vision.utils : Useful stuff such as saving tensor (3 x H x W) as image to disk, given a mini-batch creating a grid of images, etc.

Installation

Anaconda:

conda install torchvision -c soumith

pip:

pip install torchvision

From source:

python setup.py install

Datasets

The following dataset loaders are available:

Datasets have the API: - __getitem__ - __len__ They all subclass from torch.utils.data.Dataset Hence, they can all be multi-threaded (python multiprocessing) using standard torch.utils.data.DataLoader.

For example:

torch.utils.data.DataLoader(coco_cap, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)

In the constructor, each dataset has a slightly different API as needed, but they all take the keyword args:

  • transform - a function that takes in an image and returns a transformed version
  • common stuff like ToTensor, RandomCrop, etc. These can be composed together with transforms.Compose (see transforms section below)
  • target_transform - a function that takes in the target and transforms it. For example, take in the caption string and return a tensor of word indices.

MNIST

dset.MNIST(root, train=True, transform=None, target_transform=None, download=False)

root: root directory of dataset where processed/training.pt and processed/test.pt exist

train: True - use training set, False - use test set.

transform: transform to apply to input images

target_transform: transform to apply to targets (class labels)

download: whether to download the MNIST data

COCO

This requires the COCO API to be installed

Captions:

dset.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform])

Example:

import torchvision.datasets as dset
import torchvision.transforms as transforms
cap = dset.CocoCaptions(root = 'dir where images are',
                        annFile = 'json annotation file',
                        transform=transforms.ToTensor())

print('Number of samples: ', len(cap))
img, target = cap[3] # load 4th sample

print("Image Size: ", img.size())
print(target)

Output:

Number of samples: 82783
Image Size: (3L, 427L, 640L)
[u'A plane emitting smoke stream flying over a mountain.',
u'A plane darts across a bright blue sky behind a mountain covered in snow',
u'A plane leaves a contrail above the snowy mountain top.',
u'A mountain that has a plane flying overheard in the distance.',
u'A mountain view with a plume of smoke in the background']

Detection:

dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])

LSUN

dset.LSUN(db_path, classes='train', [transform, target_transform])

  • db_path = root directory for the database files
  • classes =
  • 'train' - all categories, training set
  • 'val' - all categories, validation set
  • 'test' - all categories, test set
  • ['bedroom_train', 'church_train', ...] : a list of categories to load

CIFAR

dset.CIFAR10(root, train=True, transform=None, target_transform=None, download=False)

dset.CIFAR100(root, train=True, transform=None, target_transform=None, download=False)

  • root : root directory of dataset where there is folder cifar-10-batches-py
  • train : True = Training set, False = Test set
  • download : True = downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, does not do anything.

STL10

dset.STL10(root, split='train', transform=None, target_transform=None, download=False)

  • root : root directory of dataset where there is folder stl10_binary
  • split : 'train' = Training set, 'test' = Test set, 'unlabeled' = Unlabeled set,

    'train+unlabeled' = Training + Unlabeled set (missing label marked as -1)

  • download : True = downloads the dataset from the internet and

    puts it in root directory. If dataset is already downloaded, does not do anything.

SVHN

dset.SVHN(root, split='train', transform=None, target_transform=None, download=False)

  • root : root directory of dataset where there is folder SVHN
  • split : 'train' = Training set, 'test' = Test set, 'extra' = Extra training set
  • download : True = downloads the dataset from the internet and

    puts it in root directory. If dataset is already downloaded, does not do anything.

ImageFolder

A generic data loader where the images are arranged in this way:

root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png

root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png

dset.ImageFolder(root="root folder path", [transform, target_transform])

It has the members:

  • self.classes - The class names as a list
  • self.class_to_idx - Corresponding class indices
  • self.imgs - The list of (image path, class-index) tuples

Imagenet-12

This is simply implemented with an ImageFolder dataset.

The data is preprocessed as described here

Here is an example.

PhotoTour

Learning Local Image Descriptors Data http://phototour.cs.washington.edu/patches/default.htm

import torchvision.datasets as dset
import torchvision.transforms as transforms
dataset = dset.PhotoTour(root = 'dir where images are',
                         name = 'name of the dataset to load',
                         transform=transforms.ToTensor())

print('Loaded PhotoTour: {} with {} images.'
      .format(dataset.name, len(dataset.data)))

Models

The models subpackage contains definitions for the following model architectures:

  • AlexNet: AlexNet variant from the "One weird trick" paper.
  • VGG: VGG-11, VGG-13, VGG-16, VGG-19 (with and without batch normalization)
  • ResNet: ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152
  • SqueezeNet: SqueezeNet 1.0, and SqueezeNet 1.1

You can construct a model with random weights by calling its constructor:

import torchvision.models as models
resnet18 = models.resnet18()
alexnet = models.alexnet()
vgg16 = models.vgg16()
squeezenet = models.squeezenet1_0()

We provide pre-trained models for the ResNet variants, SqueezeNet 1.0 and 1.1, and AlexNet, using the PyTorch model zoo. These can be constructed by passing pretrained=True:

import torchvision.models as models
resnet18 = models.resnet18(pretrained=True)
alexnet = models.alexnet(pretrained=True)
squeezenet = models.squeezenet1_0(pretrained=True)

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be atleast 224.

The images have to be loaded in to a range of [0, 1] and then normalized using mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225]

An example of such normalization can be found in the imagenet example here <https://github.com/pytorch/examples/blob/42e5b996718797e45c46a25c55b031e6768f8440/imagenet/main.py#L89-L101>

Transforms

Transforms are common image transforms. They can be chained together using transforms.Compose

transforms.Compose

One can compose several transforms together. For example.

transform = transforms.Compose([
    transforms.RandomSizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ],
                          std = [ 0.229, 0.224, 0.225 ]),
])

Transforms on PIL.Image

Scale(size, interpolation=Image.BILINEAR)

Rescales the input PIL.Image to the given 'size'.

If 'size' is a 2-element tuple or list in the order of (width, height), it will be the exactly size to scale.

If 'size' is a number, it will indicate the size of the smaller edge. For example, if height > width, then image will be rescaled to (size * height / width, size) - size: size of the smaller edge - interpolation: Default: PIL.Image.BILINEAR

CenterCrop(size) - center-crops the image to the given size

Crops the given PIL.Image at the center to have a region of the given size. size can be a tuple (target_height, target_width) or an integer, in which case the target will be of a square shape (size, size)

RandomCrop(size, padding=0)

Crops the given PIL.Image at a random location to have a region of the given size. size can be a tuple (target_height, target_width) or an integer, in which case the target will be of a square shape (size, size) If padding is non-zero, then the image is first zero-padded on each side with padding pixels.

RandomHorizontalFlip()

Randomly horizontally flips the given PIL.Image with a probability of 0.5

RandomSizedCrop(size, interpolation=Image.BILINEAR)

Random crop the given PIL.Image to a random size of (0.08 to 1.0) of the original size and and a random aspect ratio of 3/4 to 4/3 of the original aspect ratio

This is popularly used to train the Inception networks - size: size of the smaller edge - interpolation: Default: PIL.Image.BILINEAR

Pad(padding, fill=0)

Pads the given image on each side with padding number of pixels, and the padding pixels are filled with pixel value fill. If a 5x5 image is padded with padding=1 then it becomes 7x7

Transforms on torch.*Tensor

Normalize(mean, std)

Given mean: (R, G, B) and std: (R, G, B), will normalize each channel of the torch.*Tensor, i.e. channel = (channel - mean) / std

Conversion Transforms

  • ToTensor() - Converts a PIL.Image (RGB) or numpy.ndarray (H x W x

    C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]

  • ToPILImage() - Converts a torch.*Tensor of range [0, 1] and shape C x H x W or numpy ndarray of dtype=uint8, range[0, 255] and shape H x W x C to a PIL.Image of range [0, 255]

Generic Transforms

Lambda(lambda)

Given a Python lambda, applies it to the input img and returns it. For example:

transforms.Lambda(lambda x: x.add(10))

Utils

make_grid(tensor, nrow=8, padding=2, normalize=False, range=None, scale_each=False, pad_value=0) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Given a 4D mini-batch Tensor of shape (B x C x H x W), or a list of images all of the same size, makes a grid of images

normalize=True will shift the image to the range (0, 1), by subtracting the minimum and dividing by the maximum pixel value.

if range=(min, max) where min and max are numbers, then these numbers are used to normalize the image.

scale_each=True will scale each image in the batch of images separately rather than computing the (min, max) over all images.

pad_value=<float> sets the value for the padded pixels.

Example usage is given in this notebook <https://gist.github.com/anonymous/bf16430f7750c023141c562f3e9f2a91>

save_image(tensor, filename, nrow=8, padding=2, normalize=False, range=None, scale_each=False, pad_value=0) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Saves a given Tensor into an image file.

If given a mini-batch tensor, will save the tensor as a grid of images.

All options after filename are passed through to make_grid. Refer to it's documentation for more details

vision's People

Contributors

soumith avatar colesbury avatar bodokaiser avatar lukeyeager avatar uridah avatar apaszke avatar alykhantejani avatar maratyszcza avatar graingert avatar ntomita avatar edgarriba avatar rtqichen avatar yanghuanflc avatar gpleiss avatar fmassa avatar dmitryulyanov avatar adamlerer avatar zhoumingjun avatar smartkiwi avatar orashi avatar panovr avatar lanpa avatar skrish13 avatar szagoruyko avatar spro avatar mbjoseph avatar lopuhin avatar karandwivedi42 avatar suhangpro avatar lysandergg avatar

Stargazers

Smrutiranjan Sahu avatar  avatar

Watchers

James Cloos avatar Shi Qiu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.