Coder Social home page Coder Social logo

bariarviv / dali Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nvidia/dali

0.0 1.0 0.0 187.24 MB

A library containing both highly optimized building blocks and an execution engine for data pre-processing in deep learning applications

Home Page: https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/index.html

License: Apache License 2.0

CMake 2.48% C++ 65.62% Python 18.44% C 0.58% Cuda 11.51% Shell 1.33% Dockerfile 0.05%

dali's Introduction

License Documentation

NVIDIA DALI

The NVIDIA Data Loading Library (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It provides a collection of highly optimized building blocks for loading and processing image, video and audio data. It can be used as a portable drop-in replacement for built in data loaders and data iterators in popular deep learning frameworks.

Deep learning applications require complex, multi-stage data processing pipelines that include loading, decoding, cropping, resizing, and many other augmentations. These data processing pipelines, which are currently executed on the CPU, have become a bottleneck, limiting the performance and scalability of training and inference.

DALI addresses the problem of the CPU bottleneck by offloading data preprocessing to the GPU. Additionally, DALI relies on its own execution engine, built to maximize the throughput of the input pipeline. Features such as prefetching, parallel execution, and batch processing are handled transparently for the user.

In addition, the deep learning frameworks have multiple data pre-processing implementations, resulting in challenges such as portability of training and inference workflows, and code maintainability. Data processing pipelines implemented using DALI are portable because they can easily be retargeted to TensorFlow, PyTorch, MXNet and PaddlePaddle.

DALI Diagram

Highlights

  • Easy-to-use functional style Python API.
  • Multiple data formats support - LMDB, RecordIO, TFRecord, COCO, JPEG, JPEG 2000, WAV, FLAC, OGG, H.264, VP9 and HEVC.
  • Portable accross popular deep learning frameworks: TensorFlow, PyTorch, MXNet, PaddlePaddle.
  • Supports CPU and GPU execution.
  • Scalable across multiple GPUs.
  • Flexible graphs let developers create custom pipelines.
  • Extensible for user-specific needs with custom operators.
  • Accelerates image classification (ResNet-50), object detection (SSD) workloads as well as ASR models (Jasper, RNN-T).
  • Allows direct data path between storage and GPU memory with GPUDirect Storage_.
  • Easy integration with NVIDIA Triton Inference Server_ with DALI TRITON Backend_.
  • Open source.

Installing DALI

To install the latest DALI release for the latest CUDA version (11.x):

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda110

DALI comes preinstalled in the TensorFlow, PyTorch, and MXNet containers on NVIDIA GPU Cloud (versions 18.07 and later).

For other installation paths (TensorFlow plugin, older CUDA version, nightly and weekly builds, etc), please refer to the Installation Guide_.

To build DALI from source, please refer to the Compilation Guide_.


Examples and Tutorials

An introduction to DALI can be found in the Getting Started_ page.

More advanced examples can be found in the Examples and Tutorials_ page.

For an interactive version (Jupyter notebook) of the examples, go to the docs/examples directory.

Note: Select the Latest Release Documentation_ or the Nightly Release Documentation_, which stays in sync with the main branch, depending on your version.


Additional Resources

  • GPU Technology Conference 2018; Fast data pipeline for deep learning training, T. Gale, S. Layton and P. Trędak: slides_, recording_.
  • GPU Technology Conference 2019; Fast AI data pre-preprocessing with DALI; Janusz Lisiecki, Michał Zientkiewicz: slides_, recording_.
  • GPU Technology Conference 2019; Integration of DALI with TensorRT on Xavier; Josh Park and Anurag Dixit: slides_, recording_.
  • GPU Technology Conference 2020; Fast Data Pre-Processing with NVIDIA Data Loading Library (DALI); Albert Wolant, Joaquin Anton Guirao recording_.
  • Developer Page.
  • Blog Posts.

Contributing to DALI

We welcome contributions to DALI. To contribute to DALI and make pull requests, follow the guidelines outlined in the Contributing document.

If you are looking for a task good for the start please check one from external contribution welcome label.

Reporting Problems, Asking Questions

We appreciate feedback, questions or bug reports. When you need help with the code, follow the process outlined in the Stack Overflow https://stackoverflow.com/help/mcve document. Ensure that the posted examples are:

  • minimal: Use as little code as possible that still produces the same problem.
  • complete: Provide all parts needed to reproduce the problem. Check if you can strip external dependency and still show the problem. The less time we spend on reproducing the problems, the more time we can dedicate to the fixes.
  • verifiable: Test the code you are about to provide, to make sure that it reproduces the problem. Remove all other problems that are not related to your request.

Acknowledgements

DALI was originally built with major contributions from Trevor Gale, Przemek Tredak, Simon Layton, Andrei Ivanov and Serge Panev.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.