Coder Social home page Coder Social logo

image-dataset-augmentor's Introduction

ImageAugmentor

Library Design using C++ - Final Project

Using jpeglib.h (libjpeg) which is a commonly used jpeg library/package - http://libjpeg.sourceforge.net/

Using the library - https://github.com/md81544/libjpeg_cpp/ (https://www.martyndavis.com/?tag=libjpeg) - Which is a wrapper around the standard libjpeg/jpeglib.h written in STL around the jpeglib.h data structures that use c-style representations of the image class/strucure.


We will be using this Wrapper instead of the standard classes that Jpeglib.h offers since it is way more simpler and writing our own wrapper might be tedious.

Before starting, please install libjpeg using (Comes in-built if you are using Unix-like distributions) -
brew install libjpeg or brew install libjpeg-dev

Steps to use the library:

  1. git clone https://github.com/Gouthamkreddy1234/Image-Dataset-Augmentor.git
  2. brew install libjpeg
  3. Copy the code form the cloned directory into your project directory
  4. Add the Augmentor.cpp jpeg.cpp Operation.cpp to your makefile (follow below example assuming main.cpp is your main project file)
    prod: main.cpp Augmentor.cpp jpeg.cpp Operation.cpp
      g++ -O -std=c++17 -Wall -Wextra -Wpedantic -Werror -o prod main.cpp Augmentor.cpp jpeg.cpp Operation.cpp -ljpeg

    test: unit_test.cpp Augmentor.cpp jpeg.cpp Operation.cpp
      g++ -O -std=c++17 -Wall -Wextra -Wpedantic -Werror -o test unit_test.cpp Augmentor.cpp jpeg.cpp Operation.cpp -ljpeg -lgtest

    debug: main.cpp Augmentor.cpp jpeg.cpp Operation.cpp
      g++ -g -O -std=c++17 -Wall -Wextra -Wpedantic -Werror -o debug *.cpp -ljpeg
  1. Command - make prod IMPORTANT: Use the prod target when you build the code, test target builds only the unit test code

  2. #include "Augmentor.h" in your main.cpp

  3. Usage example:

    augmentorLib::Augmentor augmentor(argv[1],argv[2]); //input and output directory path <br>
    augmentor <br>
    .rotate(45,90,0.5) // 45-90 degree of rotation randomness <br>
    .flip(HORIZONTAL, 0.5) // 0.5 probability of flip operation being applied to an image <br>
    .crop(300, 300, true) // (x, y) size of cropped image <br>
    .resize(120,120,1) // (x, y) size of resized image <br>
    .invert(1) // invert with probability 1 <br>
    .sample(1000); // Output 1000 images
  1. This will output 1000 augmented images to the provided destination directory (argv[2])

  2. Documentation
    LINK - http://image-augmentor.s3-website-us-east-1.amazonaws.com
    PDF - https://image-augmentor-pdf.s3.amazonaws.com/Documentation.pdf

Augmentor Design Document

1. Introduction

Augmentor is a C++ library focused on image batch manipulation and processing. We would like to provide simple interface and excellent performance for users, so we have leveraged the following design ideas.

2. Declarative APIs

Similar as other data processing tools (e.g. Spark, Flink), our library also provides declarative APIs for users to build up their image manipulation pipelines. The advantage of declarative APIs is that the pipelines can be checked before the actual processing. The library evaluates the inputs of each operation, and make sure all of them are legal operations. For example, the input of resize operation cannot be negative. If invalid parameters are found, the program will be terminated in the building stage.

2.1. Build with chain operations

When users setup their pipelines, it is common to concatenate a set of operations to manipulate images. Thus, the pipeline builder of our library is designed to be chain-able. For example,

augmentor
.rotate(45,90,1)
.flip(HORIZONTAL, 1)
.crop(300, 300, true)
.resize(120,120,1)
.rapid_blur(5)
.invert(1);

In order to implement this style, we design the methods like

Augmentor& Augmentor::some_operation(parameter param...);

The augmentor has a vector to store the definitions of every operations, which looks like

std::vector<std::unique_ptr< Operation<Image> >> operations;

The reason to use unique_ptr will be discussed more in the Section 4.

2.2. Run the program

After building up a pipeline, users can run it simply by calling sample method. The current implementation is straight forward: for each image, loop through every operation and perform the transform on the image. It looks something like

for (auto &operation : operations) {
    image = operation->perform(image);
}

Since the processing of images are independent of one another, parallel programming will be used in future to speed up the processing.

3. Template

There are many formats of images. They share the same APIs (e.g. getPixel, setPixel), but have different implementations. It is preferred to generalize the library to manipulate images in different formats, so Template is used in this library. Here is an example:

template<typename Image>
class Operation {
// 
// private content...
//
public:
    virtual Image* perform(Image* image) = 0;
};

Currently, this library only supports JPEG images.

// Image actually means JpegImage
Operation<Image> operations;

In future, we will introduce more image formats, like png, bmp.

Operation<PngImage> PngOperations;
Operation<BmpImage> BmpOperations;

3.1. Concept (in future)

This library is built under the C++17 standard. In future, we would like to upgrade to C++20, where Concept is introduced. Since images, despite their formats, share the same interface (e.g. getHeight, getWidth, getPixel, setPixel), it is recommended to constrain an image class using Concept. A concept can make sure every image class share the same interface.

4. Polymorphism

This library uses the idea of Polymorphism to implement different operations. All operations inherit the base class Operation. It provides a few methods: (1) a bool randomizer to see if an operation take places this time; (2) a random number generator (between 0 and 1) to introduce randomness in each operation; and (3) a virtual function perform to ensure the same interface of its subclasses on image processing.

It looks like:

template<typename Image>
class Operation {
private:   
    inline bool operate_this_time();
    inline _precision_type uniform_random_number();

public:
    virtual Image* perform(Image* image) = 0;
};

One subclass looks like

template<typename Image>
class ResizeOperation: public Operation<Image> {
public:
    Image* perform(Image* image) override;
};

A pipeline is made up of operations, and Augmentor stores the operations in a vector. We want to show the Augmentor's ownership of operations. There are two safe ways to show ownership: (1) a vector of objects, (2) a vector of unique pointers. Since a vector of base classes casts the objects of subclasses into base-class objects, we decide to use unique_ptr to show the ownership.

// Augmentor.h
class Augmentor {
    std::vector<std::unique_ptr< Operation<Image> >> operations;
public:
    Augmentor& some_operation(parameters param...) {
        auto operation = std::make_unique<SomeOperation<Image>>(param);
        operations.push_back(std::move(operation));
        return *this;;
    }
}

The advantage for ownership lies in memory management. Before an Augumentor is going to be destroyed, the unique_ptr will first release the object it points to. Therefore, we can prevent any potential memory leaks in Augmentor class.

5. Compile-time programming

This library uses a lot of compile-time programming to optimize the code and performance. First of all, this library uses Template heavily. Since I have discussed Template in previous section, I will skip the basic usage of Template here. Instead, I will talk about how we can use Template to optimize code structure.

5.1. Different Implementations based on template parameters

5.1.1. Gaussian Blur Filter

Here I will show how to use template to make either static or dynamic filter when the GaussianBlue operation is built. The basic idea of blurring an image is to convolute between a filter and an image. The values of a filter can be either stored in array or vector. Storing in an array has advantages like fast accessing, but it requires users to specify the size before compile. Storing in a vector is more flexible, whose size is set in run-time, but it has relatively slow access. Therefore, I decide to use Template to integrate these two situations. The design looks like

template<unsigned N=0, bool Static = (N > 0) >
class gaussian_blur_filter_1D;

template<unsigned N>
class gaussian_blur_filter_1D<N, true> {
    double array[N];
public:
    explicit gaussian_blur_filter_1D(double sigma);
};

template<unsigned N>
class gaussian_blur_filter_1D<N, false> {
    std::vector<double> vector;
public:
    explicit gaussian_blur_filter_1D(double sigma, size_t n);
};

Either static and dynamic filters are created based on the constructor. Here is an example to build them.

// build a static filter with size of 5.
auto filter = gaussian_blur_filter_1D<5>(1.0);

// build a dynamic filter with size of 5.
auto filter = gaussian_blur_filter_1D(1.0, 5);

BlurOperation class wraps these two constructors into two member methods, so is the Augmentor class. Thus, users can build a blur operation either by blur<5>(sigma) or blur(sigma, 5).

5.1.2. Random Number Generator

Here is another case to use template: allow a random number generator to output either integer or floating point numbers. In modern c++, <random> package is used to generate random numbers. It has two uniform number generators:std::uniform_real_distribution and std::uniform_int_distribution. Normally, if we want to build a generator like:

template <typename DataType>
class UniformDistributionGenerator<DataType> {
public:
    inline DataType operator()();
}

We may include both uniform number generators mentioned above as members, and call them based on datatype. However, we can slightly modify the template and split this template into two implementations:

template <typename DataType, bool IsReal = std::is_floating_point<DataType>::value>
class UniformDistributionGenerator;

class UniformDistributionGenerator<DataType, true> {
 std::uniform_real_distribution<DataType> distribution;
}

class UniformDistributionGenerator<DataType, true> {
 std::uniform_int_distribution<DataType> distribution;
}

In this way, there is no overhead from redundant members. Also, the performance is better since we don't have to do if-else evaluation when calling operator(). The performance difference here may not be significant due to the simple function here, but this idea can be applied to more complex design.

6. Features

There are other design features that distinguish our library from others.

6.1. More realistic randomness

Many image processing libraries (e.g. PIL) use rand() to generate random numbers, but this method may cause a few issues. please see this Q&A. On the contrary, our library uses <random> to achieve more realistic randomness. We use current timestamp as seed to create a generator, and then use uniform_distribution to output random numbers. Our library should result in better randomness than others.

6.2. Fast Gaussian Blur

This library implements some optimized algorithm to increase performance. One example is the Fast Gaussian Blur. Since Gaussian Blur is expensive, whose complexity should be at least O(N * r), where N is the area of an image and r is the size of a filter. However, research have found that multiple Box Blurs can approximate the result of Gaussian Blur, and the complexity of a Box Blur can be as low as O(N). Therefore, this library decides to implement the fast Gaussian Blur to increase the performance. The details can be found in the Opperation.h file.

image-dataset-augmentor's People

Contributors

gouthamreddykotapalle avatar mingen-pan avatar shubhmsinha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

shubhmsinha

image-dataset-augmentor's Issues

Performance TODO

  1. Multi-threading support
  2. Pushdown optimzation (lazy evaluations)

jpeg.h - wrapper code for jpeglib.h

Using the jpeg.h and jpeg.cpp wrapper code for simplicity since it is a wrapper written in STL around the jpeglib.h data structures that use c-style representations. This is a wrapper library around the commonly used licensed libjpeg/jpeglib.h library for reading and writing jpeg files.

We can use the image class mentioned in this library for the Image objects in our Augmentor class. There is a sample shrink/expand function as well in the main.cpp file. We can take inspiration from this function and build the other image manipulation APIs.

Porbability vs randomness

Each manipulation function is now handling the randomness of the operation using min and max facrtors. So as per the current design, each of the output images go through all the operations in the pipeline. (for example, when probability is 1).

But now, when each manipulation function takes an additional probability parameter, how will this effect our output and what design will we be following here?

Performance

I could see that our library is considerable slow when compared to the python library. I was wondering how we will handle the comparisions ๐Ÿ˜…?

Sampling multiple images and saving has bugs

@mingen-pan Need your help resolving the bug. The code is in goutham_dev branch.

So when I have invert probability=0.1 and have given only 2 images in my pipeline (batch images). The code fails to run and errors out. But it works with probability=1.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.