Coder Social home page Coder Social logo

hdf5-examples's Introduction

HDF5

Code examples for processing HDF5 files.

About HDF5

HDF5 is a file format for storing data that is highly extensible and flexible. For example, you can store a large number of images in a single HDF5 file.
  • Stands for "Hierarichal Data Format".
  • Current version is 5.
  • It is a file format for storing data that is highly extensible and flexible.
  • Open-source and free.
  • We may directly use the core implementation in C, C++, and Java. There are wrappers for several other languages, including Python.

Using HDF5

To use HDF5, you need to install the h5py module. Then you can use it to read and write HDF5 files. For example, to read a file called "myfile.h5"
  import h5py
  f = h5py.File('myfile.h5', 'r')
  print(f.keys())
  print(f['data'].shape)
  print(f['data'][:])
  f.close()

To save a file, you need to create a new file object. For example, to create a new file called "myfile.h5"

    import h5py
    f = h5py.File('myfile.h5', 'w')
    data_set = f.create_dataset('data', (100,), dtype='i')
    data_set[:] = np.arange(100)
    f.close()

Structure

  • Groups (a concept similar to directories)

    • Groups can contain datasets and other groups.
  • Datasets (a concept similar to files)

    • Shape (ex. 1D, 2D, 5D)
    • Datatype (ex. float, int32)
    • Attributes (ex. compression, chunking, compression)
    • Data (ex. data[:])
    • Subdatasets (ex. subdataset[:])

Linear vs Chunked

This concept diffrentiaties HDF5 from other data formats. Chunked datasets are stored in a more compact way. It allows for faster access to data.

Linear:

  • Data is stored in a single file.
  • Data is stored in a single chunk.
  • Data is stored in a single block.

Chunked:

  • Data is stored in multiple chunks.
  • Data is stored in multiple blocks.
  • Data is stored in multiple files.

Chunk size must strike a balance:

  • maximizing i/o speed.
  • minimizing non-used data i/o.
  • minimizing chunking i/o overhead cost.

Filter

Filter is a way to compress data.

  • Can be applied to datasets.
  • It is a layer betwen program and data.

Program <- Filter (CPU) <- data (Disk).

Examples:

  • Gzip (compression filter)
  • ScaleOffset (stores data subtracted by median, then while reading median is added back)
  • Szip (compression filter)
  • Shuffle (shuffles data)
  • Fletcher32 (checksum)

Code Samples

hdf5-examples's People

Contributors

djeada avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.