Coder Social home page Coder Social logo

matajoh / libnpy Goto Github PK

View Code? Open in Web Editor NEW
18.0 3.0 2.0 291 KB

Multi-platform C++ library for reading and writing NPY and NPZ files, with an additional .NET interface

License: MIT License

CMake 2.97% C++ 88.44% C# 5.77% SWIG 2.82%
cpp cpp-library npy-files npy npz npz-files

libnpy's Introduction

libnpy

Build Status

libnpy is a multi-platform C++ library for reading and writing NPY and NPZ files, with an additional .NET interface. It was built with the intention of making it easier for multi-language projects to use NPZ and NPY files for data storage, given their simplicity and support across most Python deep learning frameworks.

The implementations in this library are based upon the following file format documents:

Getting Started

There are two main ways to use the library: as a statically linked C++ library, and as a .NET DLL (on Windows using Visual Studio). In this guide we will walk through how to compile the library and run the tests on our currently supported platforms. These directions will likely work for other platforms as well (the codebase is written to be clean, portable C++ 11). If you have problems on your platform, please raise it as an issue.

Ubuntu 18.04 [gcc 7.3.0], Ubuntu 16.04 [gcc 5.4.0]

First, install all of the necessary dependencies:

sudo apt-get install git cmake build-essential

If you want to build the documentation, you will also need:

sudo apt-get install doxygen

You may also find that cmake is easier to use via the curses GUI:

sudo apt-get install cmake-curses-gui

Once everything is in place, you can clone the repository and generate the makefiles:

git clone https://github.com/matajoh/libnpy.git
mkdir libnpy/build
cd libnpy/build
cmake -DCMAKE_BUILD_TYPE=Debug ..

Your other build options are Release and RelWithDebInfo.

Windows 10

On Windows, you can download and install the dependencies from the following locations:

Install CMake

Download and run e.g. v3.19/cmake-3.19.0-win64-x64.msi from https://cmake.org/files/.

Install git and Visual Studio.

Get the latest Windows git from https://git-scm.com/downloads. Download a version of Visual Studio from https://visualstudio.microsoft.com/vs/. You will need the C++ compiler (and C# compiler if needed).

Install SWIG (optional, only for C#)

Browse to http://swig.org/download.html and download the latest version of swigwin. Unzip the directory and copy it to your C:\ drive. Add (e.g.) C:\swigwin-4.0.2 to your PATH. CMake should then find swig automatically.

Download and install Doxygen (optional)

If you want to build the documentation, you should also download Doxygen.

Generate MSBuild

Now that everything is ready, cmake can generate the MSBuild files necessary for the project. Run the following commands in a command prompt once you have navigated to your desired source code folder:

git clone https://github.com/matajoh/libnpy.git
mkdir libnpy\build
cd libnpy\build
cmake ..

If building the C# library, you will also need to do the following:

cmake --build . --target NumpyIONative
cmake ..

The reason for the above is that SWIG autogenerates the C# files for the interface in the first pass, after which CMake needs to scan the generated directory to build the wrapper library.

Build and Test

You are now able to build the test the library. Doing so is the same regardless of your platform. First, navigate to the build folder you created above. Then run the following commands:

cmake --build . --config <CONFIG>

Where <CONFIG> is one of Release|Debug|RelWithDebInfo. This will build the project, including the tests and (if selected) the documentation. You can then do the following:

ctest -C <CONFIG>

Where again you replace <CONFIG> as above will run all of the tests. If you want to install the library, run:

cmake --build . --config <CONFIG> --target INSTALL

Sample code

Once the library has been built and installed, you can begin to use it in your code. We have provided some sample programs (and naturally the tests as well) which show how to use the library, but the basic concepts are as follows. For the purpose of this sample code we will use the built-in tensor class, but you should use your own tensor class as appropriate.

#include "tensor.h"
#include "npy.h"
#include "npz.h"

...
    // create a tensor object
    std::vector<size_t> shape({32, 32, 3});
    npy::tensor<std::uint8_t> color(shape);

    // fill it with some data
    for (int row = 0; row < color.shape(0); ++row)
    {
        for (int col = 0; col < color.shape(1); ++col)
        {
            color(row, col, 0) = static_cast<std::uint8_t>(row << 3);
            color(row, col, 1) = static_cast<std::uint8_t>(col << 3);
            color(row, col, 2) = 128;
        }
    }

    // save it to disk as an NPY file
    npy::save("color.npy", color);

    // we can manually set the endianness to use
    npy::save("color.npy", color, npy::endian_t::BIG);

    // the built-in tensor class also has a save method
    color.save("color.npy");

    // we can peek at the header of the file
    npy::header_info header = npy::peek("color.npy");

    // we can load it back the same way
    color = npy::load<std::uint8_t, npy::tensor>("color.npy");

    // let's create a second tensor as well
    shape = {32, 32};
    npy::tensor<float> gray(shape);

    for (int row = 0; row < gray.shape(0); ++row)
    {
        for (int col = 0; col < gray.shape(1); ++col)
        {
            gray(row, col) = 0.21f * color(row, col, 0) +
                             0.72f * color(row, col, 1) +
                             0.07f * color(row, col, 2);
        }
    }

    // we can write them to an NPZ file
    {
        npy::onpzstream output("test.npz");
        output.write("color.npy", color);
        output.write("gray.npy", gray);
    }

    // and we can read them back out again
    {
        npy::inpzstream input("test.npz");

        // we can test to see if the archive contains a file
        if (input.contains("color.npy"))
        {
            // and peek at its header
            header = input.peek("color.npy");
        }

        color = input.read<std::uint8_t>("color.npy");
        gray = input.read<float>("gray.npy");
    }

The generated documentation contains more details on all of the functionality. We hope you find that the library fulfills your needs and is easy to use, but if you have any difficulties please create issues so the maintainers can make the library even better. Thanks!

libnpy's People

Contributors

matajoh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

jakepoz zbendefy

libnpy's Issues

.NET Exception handling

Need to add appropriate SWIG support to give useful Exception pass-through to .NET users of the library.

Build without installing zlib

It would be great to enable building the code without having to install zlib first. There are couple of options to consider:

  • Use ExternalProject_Add to reference zlib (the code could be pulled directly from GitHub).
  • Add zlib as a submodule in this repo and include it in the build.
    This would make the build much easier, especially on Windows.

Build fails with error G42DEF535: reference to non-static member function must be called

Linux build fails with the below error:

  Building dependency tree...
  Reading state information...
  libboost-test-dev is already the newest version (1.67.0.1).
  0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
  In file included from /__w/1/s/extern/libnpy/src/npz.cpp:10:
  In file included from /__w/1/s/extern/libnpy/include/npy/npz.h:27:
/__w/1/s/extern/libnpy/include/npy/tensor.h(211,30): error G42DEF535: reference to non-static member function must be called [/__w/1/s/src/<proj>/<proj>.vcxproj]
          if (source.size() != size)
                               ^~~~
  1 error generated.
/home/vsts_azpcontainer/.nuget/packages/microsoft.internal.sharedbuildsystem.cpp.sdk/1.0.95/Sdk/Linux/End.targets(233,5): error MSB3073: The command "clang++-7 -c -O3 -DNDEBUG -fPIC  -m64 -std=c++17    <snip> -I/__w/1/s/extern/libnpy/include <snip> <snip> /__w/1/s/extern/libnpy/src/npz.cpp -o /__w/1/s/build/obj/clang7/<proj>/x64/Release/npz.o" exited with code 1. [/__w/1/s/src/<proj>/<proj>.vcxproj]

Error inflating stream

I am getting a logic error, "Error inflating stream", when reading larger arrays from an npz archive. After a lot of debugging I found out that by increasing the size of CHUNK in zip.cpp so that the file could be read in a single pass everything worked as expected.

Here is the solution I came up with. Basically replacing the npy_inflate() call in the read_file function in npz.cpp with a single pass inflation. Everything is anyway in memory all the time and there is no need to split the inflation step up into chunks.

   //Use one step inflation, we anyway need to hold everything in memory at some point
   std::vector<std::uint8_t> compressed_bytes(uncompressed_bytes);
   uncompressed_bytes.resize(entry.uncompressed_size);

   //Initialize miniz
   z_stream strm;
   strm.zalloc = Z_NULL;
   strm.zfree = Z_NULL;
   strm.opaque = Z_NULL;
   strm.avail_in = 0;
   strm.next_in = Z_NULL;
   const int WINDOW_BITS = -15;
   auto ret = inflateInit2(&strm, WINDOW_BITS);
   if (ret != Z_OK)
   {
      throw std::logic_error("Unable to initialize inflate algorithm");
   }

   //Inflate in one step
   strm.next_in = &compressed_bytes[0];
   strm.avail_in = compressed_bytes.size();
   strm.next_out = &uncompressed_bytes[0];
   strm.avail_out = uncompressed_bytes.size();

   ret = inflate(&strm, Z_FINISH);
   if (ret != Z_STREAM_END) {
      std::cerr<<strm.msg<<std::endl;
      (void)inflateEnd(&strm);
      throw std::logic_error("Error inflating stream");
   }
   (void)inflateEnd(&strm);

Add Zip64 support

Add support for allowing Zip64 when required by data size. Code currently breaks if NPZ is over 2GB.

Feature request: Write NPZ to memory

Hi!

I'm looking for a feature to be able to write the NPZ content into a memory stream (as it is already possible using npy export).

I have a basic implementation, but its just a first attempt to see it working, without much consideration of the interface.
https://github.com/zbendefy/libnpy

If you have some comments or preferred direction of how to include this feature, I would apply them and would submit this as a PR.

Thanks

C# tests

Add tests for writing/reading
Need to expose test tensor functionality in some way
File comparisons?

Error when compiling with macports clang on OS X

When I try to compile with clang++-mp-13 (libc++) from macports on OS X I get the following error

/opt/local/libexec/llvm-13/bin/../include/c++/v1/istream:321:26: error: implicit instantiation of undefined template 'std::ctype'
if (!__ct.is(__ct.space, *__i))

Does anyone know of a solution to this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.