Coder Social home page Coder Social logo

usnistgov / h5wasm Goto Github PK

View Code? Open in Web Editor NEW
76.0 7.0 12.0 41.26 MB

A WebAssembly HDF5 reader/writer library

License: Other

Makefile 0.20% JavaScript 25.68% C++ 38.51% TypeScript 32.01% Python 1.18% CMake 2.41%
nodejs browser hdf5 webassembly wasm zero-dependency javascript

h5wasm's Introduction

h5wasm

a zero-dependency WebAssembly-powered library for reading and writing HDF5 files from javascript

(built on the HDF5 C API)

The built binaries (esm and node) will be attached to the latest release as h5wasm-{version}.tgz

The wasm-compiled libraries libhdf5.a, libhdf5_cpp.a ... and the related include/ folder are retrieved from libhdf5-wasm during the build.

Instead of importing a namespace "*", it is now possible to import the important h5wasm components in an object, from the default export:

// in hdf5_hl.ts:
export const h5wasm = {
    File,
    Group,
    Dataset,
    ready,
    ACCESS_MODES
}

The Emscripten filesystem is important for operations, and it can be accessed after the WASM is loaded as below.

Browser (no-build)

import h5wasm from "https://cdn.jsdelivr.net/npm/[email protected]/dist/esm/hdf5_hl.js";

// the WASM loads asychronously, and you can get the module like this:
const Module = await h5wasm.ready;

// then you can get the FileSystem object from the Module:
const { FS } = Module;

// Or, you can directly get the FS if you don't care about the rest 
// of the module:
// const { FS } = await h5wasm.ready;

let response = await fetch("https://ncnr.nist.gov/pub/ncnrdata/vsans/202003/24845/data/sans59510.nxs.ngv");
let ab = await response.arrayBuffer();

FS.writeFile("sans59510.nxs.ngv", new Uint8Array(ab));

// use mode "r" for reading.  All modes can be found in h5wasm.ACCESS_MODES
let f = new h5wasm.File("sans59510.nxs.ngv", "r");
// File {path: "/", file_id: 72057594037927936n, filename: "data.h5", mode: "r"}

Worker usage

Since ESM is not supported in all web worker contexts (e.g. Firefox), an additional ./dist/iife/h5wasm.js is provided in the package for h5wasm>=0.4.8; it can be loaded in a worker and used as in the example below (which uses the WORKERFS file system for random access on local files):

// worker.js
onmessage = async function(e) {
    const { FS } = await h5wasm.ready;
    
    // send in a file opened from an <input type="file" />
    const f_in = e.data[0];

    FS.mkdir('/work');
    FS.mount(FS.filesystems.WORKERFS, { files: [f_in] }, '/work');

    const f = new h5wasm.File(`/work/${f_in.name}`, 'r');
    console.log(f);
}

self.importScripts('../dist/iife/h5wasm.js');

Browser target (build system)

npm i h5wasm or yarn add h5wasm then in your file

// index.js
import h5wasm from "h5wasm";
const { FS } = await h5wasm.ready;

let f = new h5wasm.File("test.h5", "w");
f.create_dataset({name: "text_data", data: ["this", "that"]});
// ...

note: you must configure your build system to target >= ES2020 (for bigint support)

nodejs

The host filesystem is made available through Emscripten "NODERAWFS=1".

Enabling BigInt support may be required for nodejs < 16

npm i h5wasm
node --experimental-wasm-bigint
const h5wasm = await import("h5wasm/node");
await h5wasm.ready;

let f = new h5wasm.File("/home/brian/Downloads/sans59510.nxs.ngv", "r");
/*
File {
  path: '/',
  file_id: 72057594037927936n,
  filename: '/home/brian/Downloads/sans59510.nxs.ngv',
  mode: 'r'
} 
*/

Usage

(all examples are written in ESM - for Typescript some type casting is probably required, as get returns either Group or Dataset)

Reading

let f = new h5wasm.File("sans59510.nxs.ngv", "r");

// list keys:
f.keys()
// ["entry"]

f.get("entry/instrument").keys()
// ["attenuator","beam","beam_monitor_low","beam_monitor_norm","beam_stop_C2","beam_stop_C3","collimator","converging_pinholes","detector_B","detector_FB","detector_FL","detector_FR","detector_FT","detector_MB","detector_ML","detector_MR","detector_MT","lenses","local_contact","name","sample_aperture","sample_aperture_2","sample_table","source","source_aperture","type"]

let data = f.get("entry/instrument/detector_MR/data")
// Dataset {path: "/entry/instrument/detector_MR/data", file_id: 72057594037927936n}

data.metadata
/* 
{
    "signed": true,
    "vlen": false,
    "littleEndian": true,
    "type": 0,
    "size": 4,
    "shape": [
        48,
        128
    ],
    "total_size": 6144
}
*/

// for convenience, these are extracted from metadata:
data.dtype
// "<i"
data.shape
// (2) [48, 128]

// data are loaded into a matching TypedArray in javascript if one exists, otherwise raw bytes are returned (there is no Float16Array, for instance).  In this case the matching type is Int32Array
data.value
/*
Int32Array(6144) [0, 0, 0, 2, 2, 2, 3, 1, 1, 7, 3, 5, 7, 8, 9, 21, 43, 38, 47, 8, 8, 7, 3, 6, 1, 7, 3, 7, 47, 94, 91, 99, 76, 81, 86, 112, 98, 103, 85, 100, 83, 122, 111, 123, 136, 129, 134, 164, 130, 164, 176, 191, 200, 211, 237, 260, 304, 198, 32, 9, 5, 2, 6, 5, 8, 6, 25, 219, 341, 275, 69, 11, 4, 5, 5, 45, 151, 154, 141, 146, 108, 107, 105, 113, 99, 101, 96, 84, 86, 77, 78, 107, 73, 80, 105, 65, 75, 79, 62, 31, …]
*/

// take a slice from 0:10 on axis 0, keeping all of axis 1:
// (slicing is done through libhdf5 instead of in the javascript library - should be very efficient)
data.slice([[0,10],[]])
/*
Int32Array(1280) [0, 0, 0, 2, 2, 2, 3, 1, 1, 7, 3, 5, 7, 8, 9, 21, 43, 38, 47, 8, 8, 7, 3, 6, 1, 7, 3, 7, 47, 94, 91, 99, 76, 81, 86, 112, 98, 103, 85, 100, 83, 122, 111, 123, 136, 129, 134, 164, 130, 164, 176, 191, 200, 211, 237, 260, 304, 198, 32, 9, 5, 2, 6, 5, 8, 6, 25, 219, 341, 275, 69, 11, 4, 5, 5, 45, 151, 154, 141, 146, 108, 107, 105, 113, 99, 101, 96, 84, 86, 77, 78, 107, 73, 80, 105, 65, 75, 79, 62, 31, …]
*/

// Convert to nested Array, with JSON-compatible elements:
data.to_array()
/*
[
  [
      0,   0,   0,   2,   2,   2,   3,   1,   1,   7,   3,   5,
      7,   8,   9,  21,  43,  38,  47,   8,   8,   7,   3,   6,
      1,   7,   3,   7,  47,  94,  91,  99,  76,  81,  86, 112,
     98, 103,  85, 100,  83, 122, 111, 123, 136, 129, 134, 164,
    130, 164, 176, 191, 200, 211, 237, 260, 304, 198,  32,   9,
      5,   2,   6,   5,   8,   6,  25, 219, 341, 275,  69,  11,
      4,   5,   5,  45, 151, 154, 141, 146, 108, 107, 105, 113,
     99, 101,  96,  84,  86,  77,  78, 107,  73,  80, 105,  65,
     75,  79,  62,  31,
    ... 28 more items
  ],
  [
      0,   0,   2,   2,   4,   1,   2,   7,   2,   3,   2,   5,
      6,   3,   6,  24,  37,  42,  25,   8,   3,   5,   4,   8,
      2,   6,   7,   9,  61,  81,  81,  89, 104, 110,  82,  82,
    104,  92,  97,  99, 104, 115, 106, 128, 134, 111, 125, 123,
    159, 155, 182, 228, 227, 242, 283, 290, 295, 114,  11,   6,
      5,   6,   8,   4,   4,  10,  59, 401, 401, 168,  10,   6,
      6,   4,  10,  37, 150, 152, 146, 121, 125, 117, 122,  88,
    100,  97,  86,  79,  90,  87,  78,  87,  87,  87,  84,  76,
     76,  66,  51,  11,
    ... 28 more items
  ],
  ... 46 more items
*/

SWMR Read

(single writer multiple readers)

const swmr_file = new h5wasm.File("swmr.h5", "Sr");
let dset = swmr_file.get("data");
dset.shape;
// 12
// ...later
dset.refresh();
dset.shape;
// 16

Writing

let new_file = new h5wasm.File("myfile.h5", "w");

new_file.create_group("entry");

// shape and dtype will match input if omitted
new_file.get("entry").create_dataset({name: "auto", data: [3.1, 4.1, 0.0, -1.0]});
new_file.get("entry/auto").shape
// [4]
new_file.get("entry/auto").dtype
// "<d"
new_file.get("entry/auto").value
// Float64Array(4) [3.1, 4.1, 0, -1]

// make float array instead of double (shape will still match input if it is set to null)
new_file.get("entry").create_dataset({name: "data", data: [3.1, 4.1, 0.0, -1.0], shape: null, dtype: '<f'});
new_file.get("entry/data").shape
// [4]
new_file.get("entry/data").value
//Float32Array(4) [3.0999999046325684, 4.099999904632568, 0, -1]

// create a dataset with shape=[2,2]
// The dataset stored in the HDF5 file with the correct shape, 
// but no attempt is made to make a 2x2 array out of it in javascript
new_file.get("entry").create_dataset({name: "square_data", data: [3.1, 4.1, 0.0, -1.0], shape: [2,2], dtype: '<d'});
new_file.get("entry/square_data").shape
// (2) [2, 2]
new_file.get("entry/square_data").value
//Float64Array(4) [3.1, 4.1, 0, -1]

// create a dataset with compression
const long_data = [...new Array(1000000)].map((_, i) => i);
new_file.get("entry").create_dataset({name: "compressed", data: long_data, shape: [1000, 1000], dtype: '<f', chunks: [100,100], compression: 9});
// equivalent to:
// new_file.get("entry").create_dataset({name: "compressed", data: long_data, shape: [1000, 1000], dtype: '<f', chunks=[100,100], compression='gzip', compression_opts=[9]});
new_file.get("entry/compressed").filters
// [{id: 1, name: 'deflate'}]);
new_file.get("entry/compressed").slice([[2,3]]);
// Float32Array(1000) [ 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, … ]


// create an attribute (creates a VLEN string by default for a string)
new_file.get("entry").create_attribute("myattr", "a string");
Object.keys(new_file.get("entry").attrs)
// ["myattr"]
new_file.get("entry").attrs["myattr"]
// {value: "a string", shape: Array(0), dtype: "S"}

new_file.get("entry").create_attribute("fixed", ["hello", "you"], null, "S5")
new_file.get("entry").attrs["fixed"]
/*
{
    "value": [
        "hello",
        "you"
    ],
    "shape": [
        2
    ],
    "dtype": "S5"
}
*/

// close the file - reading and writing will no longer work.
// calls H5Fclose on the file_id.
new_file.close()

Links

let new_file = new h5wasm.File("myfile.h5", "w");
new_file.create_group("entry");
new_file.get("entry").create_dataset({name: "auto", data: [3.1, 4.1, 0.0, -1.0]});

// create a soft link in root:
new_file.create_soft_link("/entry/auto", "my_soft_link");
new_file.get("my_soft_link").value;
// Float64Array(4) [3.1, 4.1, 0, -1]

// create a hard link:
new_file.create_hard_link("/entry/auto", "my_hard_link");
new_file.get("my_hard_link").value;
// Float64Array(4) [3.1, 4.1, 0, -1]

// create an external link:
new_file.create_external_link("other_file.h5", "other_dataset", "my_external_link");
new_file.get_external_link("my_external_link");
// {filename: "other_file.h5", obj_path: "other_dataset"}

// create a soft link in a group:
new_file.create_group("links");
const links_group = new_file.get("links");
links_group.create_soft_link("/entry/auto", "soft_link");
new_file.get("/links/soft_link").value;
// Float64Array(4) [3.1, 4.1, 0, -1]
new_file.get_link("/links/soft_link");
// "/entry/auto"
new_file.get_link("/entry/auto");
// null // (null is returned if the path is not a symbolic link);

new_file.close()

Edit

One can also open an existing file and write to it:

let f = new h5wasm.File("myfile.h5", "a");

f.create_attribute("new_attr", "something wicked this way comes");
f.close()

Web Helpers

Optional, to support uploads and downloads

import {uploader, download, UPLOADED_FILES} from "https://cdn.jsdelivr.net/npm/h5wasm@latest/dist/esm/file_handlers.js";
// 
// Attach to a file input element:
// will save to Module.FS (memfs) with the name of the uploaded file
document.getElementById("upload_selector").onchange = uploader;
// file can be found with 
let f = new h5wasm.File(UPLOADED_FILES[UPLOADED_FILES.length -1], "r");

let new_file = new h5wasm.File("myfile.h5", "w");

new_file.create_group("entry");

// shape and dtype will match input if omitted
new_file.get("entry").create_dataset({name: "auto", data: [3.1, 4.1, 0.0, -1.0]});

// this will download a snapshot of the HDF5 in its current state, with the same name
// (in this case, a file named "myfile.h5" would be downloaded)
download(new_file);

Persistent file store (web)

To persist the emscripten virtual filesystem between sessions, use IDBFS (syncs with browser IndexedDB), e.g.

// create a local mount of the IndexedDB filesystem:
FS.mount(FS.filesystems.IDBFS, {}, "/home/web_user")

// to read from the browser IndexedDB into the active filesystem:
FS.syncfs(true, (e) => {console.log(e)});

// to push all current files in /home/web_user to IndexedDB, e.g. when closing your application:
FS.syncfs(false, (e) => {console.log(e)})

h5wasm's People

Contributors

axelboc avatar bmaranville avatar loichuder avatar thelartians avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

h5wasm's Issues

Methods being called before Wasm runtime has initialized in Node.js

This is a nice piece of work - we're thinking about using it to replace jsfive in our single-cell analysis browser app. While I was playing around with it, though, I was having some trouble getting the Node example to work.

MRE

Create a test.js file containing:

const hdf5 = require("h5wasm");
let f = new hdf5.File("./sans59510.nxs.ngv", "r")

The HDF5 file is taken from the URL on the README, and I'll assume that npm i h5wasm has previously been run. Then, running:

node --experimental-wasm-bigint test.js

Gives me:

RuntimeError: abort(Assertion failed: native function `stackSave` called before runtime initialization) at Error
    at jsStackTrace (/home/luna/Code/h5wasm/node_modules/h5wasm/dist/node/hdf5_util.js:1:29303)
    at stackTrace (/home/luna/Code/h5wasm/node_modules/h5wasm/dist/node/hdf5_util.js:1:29479)
    at abort (/home/luna/Code/h5wasm/node_modules/h5wasm/dist/node/hdf5_util.js:1:24640)
# etc. etc.

Running on Node v14.15.5 with h5wasm 0.1.8.

Diagnosis

This error seems to occur whenever the h5wasm methods are called before the Wasm runtime has fully initialized. I would guess that most existing uses of h5wasm never encounter this because the actual HDF5 calls occur some time after loading. Indeed, waiting for a hundred milliseconds before calling hdf5.File avoids the problem, but this is not a particularly satisfying solution.

A more reliable approach may be to provide a ready promise that users can await on to obtain a fully initialized h5wasm Wasm runtime. (Something like this seems to be exported in hdf5_hl.js but not in hdf5_hl_node.js inside my node_modules/h5wasm/dist directory.) To demonstrate, I modified hdf5_hl_node.js to export Module, and then:

const hdf5 = require("h5wasm");

// Manually exported Module in hdf5_hl_node.js.
var ready = new Promise(resolve => {
    hdf5.Module.onRuntimeInitialized = () => { resolve(true) };
});

ready.then(init => {
    let h = new hdf5.File("./sans59510.nxs.ngv", 'r');
    console.log(h.keys()); // [ "entry" ]
});

Running node --experimental-wasm-bigint on this script now works without issue.

Unexpected HTTP requests with createLazyFileLRU

Hi! We are using your (awesome) LazyFileLRU implementation to load specific datasets from large HDF5 files on remote servers. This generally works fine and things like chunking, the LRU options etc. is a great benifit for us!

One thing we noticed is that when requesting datasets of a small size with a larger chunk size, there are often a lot more http requests being made then expected. For example, when requesting a dataset that's only 4 bytes in size with a chunk size of 50kB, we sometimes see 6 or more http requests being made.
You can reproduce this by using the hosted version of lazyFileLRU you provided by requesting a dataset like "80.0/definition", which is very small with a big chunk size (for example 1 MB). You should be able to see something like 5 requests being made. We assumed that a maximum of 2 requests should be made here. One to figure out where the search for the dataset should be started and one to actually retrieve it. This can also be reproduced with chunk sizes closer to the actual dataset size. I assume that, in this example, the dataset along with all of it's attributes and metadata should be below a kilobyte of size.

My main question is: What could be the reason for these additional requests? Some things that came to mind:

  • Compression of datasets?
  • datasets being "scattered" throughout the whole hdf5 file, which prevents loading them with a single request?
  • Or maybe my assumption that two requests should be enough is just wrong, maybe because hdf5 cannot actually provide the information where to search for the dataset?

We would be super grateful if you can spare some of your time to help us understand how fetching of remote data is actually done, or if you can hint us to some documentation for this?

Another, unrelated remark: It would be interesting to understand if we could modify the implementation to load different datasets with different chunk sizes? We have knowledge of the rough size of datasets before loading them, so adjusting the chunk size "per request" would be very valuable to us.

Thank you already for the support you provided in the past! If you can help and need more information, please let me know. I would be able to provide a reproduction example as well.

Lazy loading datasets: Change default chunk size?

Hello!
I am currently using the feature you described in this issue to handle loading large files from our server:
#4 (comment)

The lazy loading works great, but currently datasets are always loaded in chunks of 1MB, which affects the performance greatly on large datasets. Is there any way to change the chunk size? I tried to find a solution in your documentation and in emscripten, but cannot find any mention on how to control this.
Thank you for this great library btw!

I am struggling to simply open a local file

The docs are very thin on doing this trivial task. I have passed a local file object to to h5wasm and it refuses to open the file even though permissions allow it to do so.

Here file is a local file handed off to the File() method.

const cndbf = new h5wasm.File(file.name, "r")

My console fills with all manner of unintelligible errors. Apparently I don't have permissions to read the file.

What am I missing here?

Thanks,
Doug

Support for GRIB files

Thanks for this wonderful tool; The VS Code extension h5web that uses h5wasm is super handy.

I know GRIB files are a completely different data format, but there are conversion tools to convert GRIB2 to NetCDF4. I wondered if it were possible to extend h5wasm to support GRIB2 files, either reading GRIB2 files directly, or converting them to HDF5 before reading them.

Thanks for considering this idea.

TypeError: wasm function signature contains illegal type, in a Qt WebEngine

I am trying to use h5wasm from an html page displayed in a Qt WebEngine in a Qt application.
The Qt WebEngine is using Chromium (v87), with the V8 javascript engine (see https://wiki.qt.io/QtWebEngine for more info).

The code runs fine outside of Qt, but within Qt I get a similar error as emscripten-core/emsdk#476:

Uncaught (in promise) TypeError: wasm function signature contains illegal type
    at embind_init_builtin() (<anonymous>:wasm-function[4431]:0x2cceb2)
    at _GLOBAL__sub_I_bind.cpp (<anonymous>:wasm-function[4432]:0x2cd0db)
    at __wasm_call_ctors (<anonymous>:wasm-function[67]:0x4923)
    at callRuntimeCallbacks (http://127.0.0.1:5432/static/vendor/h5wasm/hdf5_util.js:1192:26)
    at initRuntime (http://127.0.0.1:5432/static/vendor/h5wasm/hdf5_util.js:796:3)
    at doRun (http://127.0.0.1:5432/static/vendor/h5wasm/hdf5_util.js:7972:5)
    at run (http://127.0.0.1:5432/static/vendor/h5wasm/hdf5_util.js:7992:5)
    at runCaller (http://127.0.0.1:5432/static/vendor/h5wasm/hdf5_util.js:7933:19)
    at removeRunDependency (http://127.0.0.1:5432/static/vendor/h5wasm/hdf5_util.js:921:7)
    at receiveInstance (http://127.0.0.1:5432/static/vendor/h5wasm/hdf5_util.js:1082:5)

I checked and BigInt are included in javascript from the Qt Webengine (which is expected, as it is using V8 on Chromium v87):
image

I also checked that the js file was compiled with -s WASM_BIGINT, and even compiled it myself to be sure.

I tried to run Qt with additional options: --wasm-staging, --enable-experimental-webassembly-features, enable-webassembly-baseline, but nothing worked and I am now out of ideas.

(I am not 100% sure if this repo is the best place to ask this, or the emsdk repo, or the Qt repo. But it seems to be a wasm issue from what I understand.)

Can't update existing attributes in edit mode

When opening an existing file in edit ('a') mode, any attempt to update an existing attribute (or other aspects of the file, to my knowledge) results in internal warnings that keep the attribute unmodified. Is this the intended / known behavior of the library?

Here's a minimal example that will trigger these warnings (code, demo):

import h5wasm from "https://cdn.jsdelivr.net/npm/[email protected]/dist/esm/hdf5_hl.js";
const { FS } = await h5wasm.ready;

let new_file = new h5wasm.File("myfile.h5", "w");

const attrToUpdate = "new_attr"

new_file.create_attribute(attrToUpdate, "something wicked this way comes");
new_file.close()

let editable = new h5wasm.File("myfile.h5", "a");
editable.create_attribute(attrToUpdate, "something new this way comes");
editable.close()

let readable = new h5wasm.File("myfile.h5", "r");
console.log(`Got updated ${attrToUpdate} on hdf5 file:`, readable.attrs[attrToUpdate].value)

This may be related to #14, though this PR specifically flagged their need to update datasets—not other parts of the file.

Currently, webnwb loads all file contents into a JS object that tracks changes and is recompiled into a separate HDF5 file when the user decides to save them—though it would be significantly more elegant to place updates into the existing h5wasm File instance (loaded in 'a' mode) when that decision occurs and close the file instance when all edits are complete.

Unable to open file

Hi
When trying to create a file using new h5wasm.File(fileName, 'w'), occasionally following error happens.

HDF5-DIAG: Error detected in HDF5 (1.12.1) thread 0:
#000: /home/brian/dev/libhdf5-wasm/wasm_build/1_12_1/_deps/hdf5-src/src/H5F.c line 532 in H5Fcreate(): unable to create file
major: File accessibility
minor: Unable to open file
#1: /home/brian/dev/libhdf5-wasm/wasm_build/1_12_1/_deps/hdf5-src/src/H5VLcallback.c line 3282 in H5VL_file_create(): file create failed
major: Virtual Object Layer
minor: Unable to create file
#2: /home/brian/dev/libhdf5-wasm/wasm_build/1_12_1/_deps/hdf5-src/src/H5VLcallback.c line 3248 in H5VL__file_create(): file create failed
major: Virtual Object Layer
minor: Unable to create file
#3: /home/brian/dev/libhdf5-wasm/wasm_build/1_12_1/_deps/hdf5-src/src/H5VLnative_file.c line 63 in H5VL__native_file_create(): unable to create file
major: File accessibility
minor: Unable to open file
#4: /home/brian/dev/libhdf5-wasm/wasm_build/1_12_1/_deps/hdf5-src/src/H5Fint.c line 1858 in H5F_open(): unable to truncate a major: File accessibility
minor: Unable to open file

Once I get the error, all requests are getting failed after that point with following error.
RuntimeError: memory access out of bounds

The only thing that worked for me was trying to restart.

I wonder whether anyone can help me to fix this.
Thanks!

Braveen.

to_array() for browser build is not working

The data.value function is working fine but the to_array is not working. Getting the following error.

Uncaught (in promise) TypeError: data.to_array is not a function

Here is my try:

import h5wasm from "https://cdn.jsdelivr.net/npm/[email protected]/dist/esm/hdf5_hl.js";


class Read_HDF5{
    
    constructor(h5_file){
        this.h5_file = h5_file;
        this.random_name = "random.h5"
    }
    
    async getKeys(){
        // the WASM loads asychronously, and you can get the module like this:
        const Module = await h5wasm.ready;

        // then you can get the FileSystem object from the Module:
        const { FS } = Module;
        console.log(this.h5_file)
        let response = await fetch(this.h5_file)
        let ab = await response.arrayBuffer();

        FS.writeFile(this.random_name, new Uint8Array(ab));
        let f = new h5wasm.File(this.random_name, "r");
        console.log(f.keys())
        let data = f.get("12_rd_p");
        console.log(data)
        console.log(data.value) 
       
        console.log(data.to_array()) // error 
    }
}

export {Read_HDF5};

if its not available for browser build, is there any other alternative to work with a 2d array i.e., shape (481, 199)

Question about writing files

Hi again :)
I came to the part where I need to convert some data and write it into a new file, so per the docs I do:

let file = new hdf5.File("test.h5", "w");
file.create_dataset("text_data", ["this", "that"]);

I was wondering what happens when new hdf5.File() or file.close() is called. Is the file stored somewhere so it can be accessed later? If so, can I just access it via let f = new hdf5.File("test.h5", "r");?

If not, it seems I should be writing the file to the filesystem like so:

hdf5.FS.writeFile("test.h5", new Uint8Array(ab));

However, writeFile() only takes a string or ArrayBuffer, so how can I properly store the file for later use?

Getting the real values?

Hello,

I'm trying to read sea surface temperatures (SST) from the following GOES nc4 file:
https://noaa-goes16.s3.amazonaws.com/ABI-L2-SSTF/2023/013/00/OR_ABI-L2-SSTF-M6_G16_s20230130000205_e20230130059513_c20230130105442.nc

To read the file I can do:

const goes = await fetch(
  "https://noaa-goes16.s3.amazonaws.com/ABI-L2-SSTF/2023/013/00/OR_ABI-L2-SSTF-M6_G16_s20230130000205_e20230130059513_c20230130105442.nc"
)
  .then((response) => response.arrayBuffer())
  .then(async (d) => {
    await h5.FS.writeFile("x", new Uint8Array(d));
    return new hdf5.File("x", "r");
  })

And the contents is:

const sst = goes.get("SST").value; // Uint16Array(29419776) [65535, 65535, 65535, 65535, 65535, 65535, …

But how can I get the actual temperatures? Apparently I need to do a transform such as:

function transformKelvin(x) { return x < 65530 ? x * 0.00244163 + 180 : NaN; }

I'm not sure where I can find these factors (scale, and acceptable range) in the file?

How to read local files using browser (no-build)

Hi,
I am trying to read a HDF5 file stored locally.
The file is available at "http://localhost:8080/examples/h5_files/test_data/test.h5". So providing the relative path as input for the file like /examples/h5_files/test_data/test.h5

Test code.
file: HDF5_Reader.js

import h5wasm from "https://cdn.jsdelivr.net/npm/[email protected]/dist/esm/hdf5_hl.js";

// the WASM loads asychronously, and you can get the module like this:
const Module = await h5wasm.ready;

class Read_HDF5{
    
    constructor(h5_file){
        self.h5_file = h5_file
    }
    
    async getKeys(){
        let f = new h5wasm.File(self.h5_file, "r");
        console.log(f.keys())
    }
}


export {Read_HDF5};
<!DOCTYPE html>
<html lang="en">
    <head>
        <title>View Hdf5</title>
        <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.6.0/jquery.min.js"></script>
    </head>
    
<body>
    <h1>View the keys of of HDF5 files</h1>
    
    <script type="module">
        import {Read_HDF5} from "./js/HDF5_Reader.js";
        let h5_file = "/examples/h5_files/test_data/test.h5";
        const h5_reader = new Read_HDF5(h5_file);
        h5_reader.getKeys();
    </script>

</body>
</html>

but getting error something like this.
image

TypeScript compiling issue

Hi, first of all, thanks a lot for your work. Your libs are exactly what I need but I'm having some trouble building to javascript. Since jsfive doesnt have any TypeScript support I am planning on re-writing everything in this library, however when I run my postinstall script tsc, I get the following errors:

node_modules/h5wasm/src/emscripten.d.ts:52:33 - error TS2304: Cannot find name 'WebGLRenderingContext'.

52     preinitializedWebGLContext: WebGLRenderingContext;
                                   ~~~~~~~~~~~~~~~~~~~~~

node_modules/h5wasm/src/emscripten.d.ts:66:28 - error TS2304: Cannot find name 'MessageEvent'.

66     onCustomMessage(event: MessageEvent): void;

... 9 more similar errors

Do I need to configure something in order for it to work? I am using Node v16.5.0

This package does not run on Windows

Hello,

I am trying to use this package in my application but it does not work in windows computers. After sime research I believe that in file src/hdf5_util.cc every .c_str() must be replaced for .data().

Could you try that for me?

I created this fork (https://github.com/masasso/h5wasm) to solve the problem but as I dont know much about web assembly I am not able to compile the project.

Thanks in advance.

Add Typed Files/Groups

I want to contribute a feature I will likely need in the future: Adding types to Files/Groups in order to describe their content. For example, if an HDF5 file has some groups like "data" and "metadata", each containing some datasets, an interface for the File could look like this:

interface MyHDF5File {
  data:  {
    dataset_1: Dataset;
    subsets: {
      dataset_3: Dataset;
    }
  },
  metadata: {
    metadata_1: Dataset;
  }
}

With this, you could "strongly type" any file, and even get type hints in group.get() methods. (with typescript Template Literals)

const file = h5wasm.File<MyHDF5File>(...)

file.get("/data/dataset_4") // results in typescript error, as it does not exist in the Interface.

First of all, would it okay to add such a feature? If so, I would like to implement it and open a PR. It would, of course, be optional and should not break any existing code.

While working with h5wasm locally, I noticed some things:

  1. Some tests seem to be failing. For datatype_test.mjs I get an error:
AssertionError [ERR_ASSERTION]: Expected values to be strictly deep-equal:
+ actual - expected

+ Datatype {
- {
    type: 'Datatype'
  }
  1. There is no proper description in the readme on how to build h5wasm. It's quite straight-forward, but adding a small step-by-step guide could be helpful (also naming the emscripten SDK as a dependency). If you want I can open a PR for that, though I am not very familiar with the emscripten/c ecosystem.

As a side-note, I saw in the git history that you thought about switching to esbuild for building h5wasm? Is there any help needed there?

Thank you!

Dimension scales, NetCDF-4 support

Do you have plans to add dimension scales to h5wasm? If not, would you be open to doing so?

I’ve been tinkering with h5wasm as a possible means of dynamically creating NetCDF-4 files in the browser. Given the similarities between NetCDF-4 and HDF5, h5wasm is the lightest-weight approach I’ve found. (By the way I appreciate the work you’ve done on it! The TypeScript type definitions have made it very easy for me to learn the API quickly.) The one thing I’ve found to be missing for NetCDF-4 support (thus far anyway) is dimension scales. Perhaps you’re aware of others, but this seems to be the most obvious one.

References:

SWMR seem to be broken

Hi, I tried to get the swmr mode working for some time now but made no progress so far. Thus I thought you might have an idea...

Setup:
Overall, I'm trying to read a file with h5wasm while writing it with h5py. As both libraries wrap the same c code, this should theoretically be possible. I can read the file with h5wasm without any problem if I don't use swmr.

Reader: (h5wasm)

import h5wasm from "h5wasm";

const { FS } = await h5wasm.ready;
var file = new h5wasm.File('./data.hdf5', "Sr");

Error:
The reader errors out with

HDF5-DIAG: Error detected in HDF5 (1.12.1) thread 0:
  #000: /home/brian/dev/libhdf5-wasm/wasm_build/1_12_1/_deps/hdf5-src/src/H5F.c line 493 in H5Fcreate(): invalid flags
    major: Invalid arguments to routine
    minor: Bad value

Writer: (h5py)
Only using h5py, swmr works flawlessly with multiple readers and a single writer

import h5py
import time
f = h5py.File("data.hdf5", "w", libver="v110")
f.swmr_mode = True

f.create_dataset("nodes", (0, 4), maxshape=(None, 4), dtype=h5py.string_dtype())
while True:
   # write data
    s = ds.shape
    ds.resize((s[0] + 1, *s[1:]))
    ds[s[0]] = ["test0","test1","test2","test3"]
    ds.flush()

    # Wait 5 seconds
    time.sleep(5)

importing in ESM project

Hi, I am planning on converting some stuff to ESM since some other packages are moving to ESM only (no CommonJS imports). I saw that v0.4.0 is now also compiled as ESM.

However, I am getting Could not find a declaration file for module 'h5wasm'. Which is weird, since the package includes Typescript declarations right? Do I need to change something for ESM support?

CRUD operations

Hi all,
I’m trying to do CRUD OPS in order to manage and populate a dataset: writing and reading on it are straightforwared, but i cannot understand how to update or delete attributes, group or dataset.
I’m working with a stream of data in a nest js server, where elements are:

{
_id,
timestamp,
value
}

I need to create a dataset for every _id that contains timestamp and value (1st and 2nd columns in a n x 2 table).
I’m working with a big volume of data, so i cannot collect everything server-side before creating the dataset, so i need to create the dataset and update it at every step.
As my last point, i'm not aware if the creation of the dataset using dynamic length is possible; but I can retrieve the final length of the dataset.
Kind Regards, Carlo

Question: Loading h5 file

I do not quite understand how I load h5 files using h5wasm.

  • What is the .nxs.ngv data format and how could I convert a local h5 file to this format?
  • If I try to use file_handlers.js I get tons of warnings and eventually a hdf5_util.js:9 Uncaught error - name not defined! error.

Intuitively I would write the following in my app.js:

import * as hdf5 from "https://cdn.jsdelivr.net/npm/h5wasm@latest/dist/esm/hdf5_hl.js";

await hdf5.ready;

$(document).ready(async function () {
    $("#datafile").change(function loadData() {
        var file_input = document.getElementById('datafile');
        var file = file_input.files[0]; // only one file allowed
        let data_filename = file.name;
        let f = new hdf5.File(data_filename, "r");
        console.log(f.keys())
        f.close()
    })
})

And have an index.html file like this:

<!DOCTYPE html>
<html lang="eng">
    <head>
    <!-- Import JQuery -->
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.6.0/jquery.min.js"></script>
    </head>
    <body>

    <input type="file" id="datafile" name="file">

    <!-- Import main JS -->
    <script type="module" src="app.js" ></script>
    </body>
</html>

Error exposes information about your system

When opening a file that doesn't exist on disk, an error is thrown. This error exposes information about your system. Not sure if this is intended or a side-effect of some logging. I'm not sure which other erros also expose related information. Perhaps good to look into / be aware of.

E.g:

const file = new h5wasm.File("nonexistent.h5", "r");
#000: /home/brian/dev/libhdf5-wasm/wasm_build/1_12_1/_deps/hdf5-src/src/H5F.c line 620 in H5Fopen(): unable to open file
// My name is not Brian, but yours is :)

Building problem with typescript

I'm having problem with building a typescript project that referees this package. The problem seems to be that the Node version of the code is using a require of an ESM module which neither TSC or ESBUILD like.

I can get the project to build if I force to use the ESM module of the project, but then I does not get access to the file system. Is the WASM code for the ESM build enabled with file access?

Container does not open in h5wasm that opens in jsfive

I'm working on a React component that reads in data from an HDF5 container.

The container is readable with the jsfive library, but does not open correctly with the h5wasm library, where I get "invalid file name" errors.

I am using yarn add to add both libraries. Yarn adds v0.3.6 of jsfive and v0.1.8 of h5wasm.

My test container is generated with the following script, which creates the container from a very rudimentary UMAP clustering result: https://gist.github.com/alexpreynolds/8e3a29c75f2ff86fa922b7a092f5e299

For convenience, the container is also available here: https://somebits.io/data.h5

The relevant code for loading the container via jsfive is:

import * as hdf5 from 'jsfive';

...

class App extends React.Component<Props, State> {
  ...
  async componentDidMount() {
    await fetch("https://somebits.io/data.h5")
      .then(function(response) { 
        return response.arrayBuffer() 
      })
      .then(function(buffer) {
        const f = new hdf5.File(buffer, "data.h5");
        console.log(`f.keys ${JSON.stringify(f.keys)}`);
        const data = f.get('data');
        console.log(`data.keys ${JSON.stringify(data.keys)}`);
        const dataGroup = data.get('tsg8n0ki');
        console.log(`dataGroup.attrs ${JSON.stringify(dataGroup.attrs)}`);
        const metadata = f.get('metadata');
        console.log(`metadata.keys ${JSON.stringify(metadata.keys)}`);
        const groups = metadata.get('groups');
        const group = groups.get('tsg8n0ki');
        console.log(`metadata.groups['tsg8n0ki'].attrs ${JSON.stringify(group.attrs)}`);
      })
      .catch(err => {
        console.log(`err ${err}`);
      });
  }
  ...
}

The browser console reports the following expected messages with jsfive:

f.keys ["data","metadata"]
data.keys ["tsg8n0ki"]
dataGroup.attrs {}
metadata.keys ["axes","groups","summary"]
metadata.groups['tsg8n0ki'].attrs {"name":"RGBa colorspace"}

When testing h5wasm, I use the following import:

import * as hdf5 from 'h5wasm';

The rest of the code is identical.

The head of output from the console (including warnings):

hdf5_util.js:9 HDF5-DIAG: Error detected in HDF5 (1.12.1) thread 0:
hdf5_util.js:9   #000: /home/brian/dev/h5wasm/hdf5-hdf5-1_12_1/src/H5F.c line 487 in H5Fcreate(): invalid file name
hdf5_util.js:9     major: Invalid arguments to routine
hdf5_util.js:9     minor: Bad value
f.keys undefined
...

I'm leaving out the rest of the console messages, which show values being undefined due to the container not loading correctly.

Exception when loading h5wasm as ES module from another ES module

Hi,

We have a CommonJS module that depends on 'h5wasm'. Everything works fine.

I converted our module from CommonJS to ESM and got an exception when 'h5wasm' is loaded.

To simplify everything, I created a brand new little ES module that has nothing but one unit test, which tries to open a file. Got exactly the same exception:

$ mocha --recursive --loader=ts-node/esm -R spec "src/**/*.spec.ts" --no-timeouts --exit
(node:94962) ExperimentalWarning: --experimental-loader is an experimental feature. This feature could change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
(node:94962) Warning: To load an ES module, set "type": "module" in the package.json or use the .mjs extension.

/Users/rsemenov/vsts/test-h5wasm/node_modules/h5wasm/dist/esm/hdf5_hl.js:1
import { default as ModuleFactory } from './hdf5_util.js';
^^^^^^

SyntaxError: Cannot use import statement outside a module
    at Object.compileFunction (node:vm:352:18)
    at wrapSafe (node:internal/modules/cjs/loader:1032:15)
    at Module._compile (node:internal/modules/cjs/loader:1067:27)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1155:10)
    at Module.load (node:internal/modules/cjs/loader:981:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at ModuleWrap.<anonymous> (node:internal/modules/esm/translators:168:29)
    at ModuleJob.run (node:internal/modules/esm/module_job:195:25)
    at async Promise.all (index 0)
    at async ESMLoader.import (node:internal/modules/esm/loader:337:24)
    at async importModuleDynamicallyWrapper (node:internal/vm/module:437:15)
    at async formattedImport (/Users/rsemenov/vsts/test-h5wasm/node_modules/mocha/lib/nodejs/esm-utils.js:7:14)
    at async Object.exports.requireOrImport (/Users/rsemenov/vsts/test-h5wasm/node_modules/mocha/lib/nodejs/esm-utils.js:48:32)
    at async Object.exports.loadFilesAsync (/Users/rsemenov/vsts/test-h5wasm/node_modules/mocha/lib/nodejs/esm-utils.js:103:20)
    at async singleRun (/Users/rsemenov/vsts/test-h5wasm/node_modules/mocha/lib/cli/run-helpers.js:125:3)
    at async Object.exports.handler (/Users/rsemenov/vsts/test-h5wasm/node_modules/mocha/lib/cli/run.js:374:5)
error Command failed with exit code 1.

I followed the warning in the log above:

Warning: To load an ES module, set "type": "module" in the package.json or use the .mjs extension.

and added this line:

"type": "module",

to

/Users/rsemenov/vsts/test-h5wasm/node_modules/h5wasm/package.json

After this, exception disappeared.

I'm not sure why 'h5wasm' doesn't have the 'type' property specified in its package.json. If I remember correctly, unless it's explicitly set to 'module', it is assumed that the module is CommonJS.

I'm using Node.js v16.14.0 and TypeScript. The 'target' and 'module' properties in tsconfig.ts:

{
  "compilerOptions": {
    "target": "es2020",
    "module": "es2020",
    "moduleResolution": "node",

Best regards,
Roman

Web helpers

We have an issue when importing the web helpers from file_handlers.js in a typescript project:

Package path ./dist/esm/file_handlers.js is not exported from package <path_to_project>\node_modules\h5wasm (see exports field in <path_to_project>\node_modules\h5wasm\package.json)

Changing "exports" in h5wasm/packages.json to

"exports": {
".": {
"node": "./dist/node/hdf5_hl.js",
"import": "./dist/esm/hdf5_hl.js"
},
"./dist/esm/file_handlers.js": "./dist/esm/file_handlers.js"
},

works, but may not be the desired solution.

Send boolean values as boolean rather than integers

Hi again 👋

The new enum metadata works great for boolean datasets/attributes 👍

However, values are returned as Int8Array for arrays or integers for scalars. This means conversion must be done by the consumer which brings issues for nD datasets/attributes.

Do you think it would be possible to return arrays of booleans (or simply booleans for scalars) instead ?

Reading HDF5 file that is larger than client memory

The h5wasm and jsfive libraries look valuable for processing HDF5 files in web browsers. Thanks for making them available!

Use case

I want to read large HDF5 files in a web browser. More specifically, I want to let users select an HDF5 file from their local machine, then have my web application read that file, validate the HDF5 file's content per some custom rules, then upload the file to a server. This could turn an important yet disjointed 10-15 minute process into a coherent 30 second process for my users.

My users' HDF5 files are often too big to load completely into client RAM. So I want to read and validate HDF5 content in a streaming manner -- never loading the full file into RAM, only a part at a time, before involving a server.

My investigation

I explored whether jsfive could load partial HDF5 files into memory, but your June presentation, GitHub comments, examples, and my own experiments indicate to me that's not yet possible in jsfive.

Maybe h5wasm is a better bet for stream-processing local HDF5 files in a web browser.

It seems this wasn't possible as of January 2022. Comments from June '22 look related: I see HDF5 data being requested in chunks (via HTTP range requests) in your lazyFileLRU demo. At a glance that seems like progress towards but not quite sufficient for my use case.

Feature request

So my understanding is that HDF5 files can be read in small chunks via h5wasm, but there's no current way to load, say, a 16 GB HDF5 file in a web browser if your computer only has 8 GB RAM.

Is that right? If so, please consider this a feature request to enable that! If not, could you point me to any examples?

Enable ROS3 Driver

Is there a way to point to a ROS3 driver in the current implementation? I've gotten some interest to integrate my wrapper into DANDI instead of using the current cloud-based visualizer.

Add a `CMakeLists.txt` to the library tarball for downstream CMake projects

The static libraries requested previously (#8) work like a charm, and have been quite easy to integrate into my existing CMake project. Perhaps you may be interested in the CMakeLists.txt I was using for this. This idea is to put this file inside the libhdf5_wasm.tgz tarball to allow CMake to do a configure-time fetch of h5wasm, eliminating the need for developers to manually manage this particular dependency. (Non-CMake users are unaffected and can just ignore the extra file.)

My CMakeLists.txt file is available at https://github.com/LTLA/h5wasm-cmake-demo; downstream packages can then just do:

FetchContent_Declare(
  h5wasm
  GIT_REPOSITORY https://github.com/LTLA/h5wasm-cmake-demo
  GIT_TAG master
)
FetchContent_MakeAvailable(h5wasm)

which exposes h5wasm and h5wasm_cpp as library targets. A similar approach can be used for a tarball, something like:

FetchContent_Declare(
  h5wasm
  URL https://github.com/usnistgov/h5wasm/releases/download/v0.2.0/libhdf5_wasm.tgz
  URL_HASH MD5=<whatever-the-hash-is>
)

You can see a test example of the former approach at https://github.com/LTLA/bind-h5wasm-test, with a more real-life use case at kanaverse/scran.js#34. Some more elbow grease could be applied to do proper exports so that the exposed targets look like h5wasm::h5wasm, but I don't know enough about CMake to say whether that's sensible in this case.

Of course, downstream packages could just define all the targets themselves if there wasn't a CMakeLists.txt in the tarball, but this is a bit of a chore. It seems worthwhile to have a single definition inside the tarball for immediate consumption via something like:

target_link_libraries(mylib h5wasm_cpp)

Adding `libhdf5_cpp.a` to the `libhdf5` branch?

Would this be straightforward to do? I realized I could make my application more efficient by directly linking to your libhdf5.a during construction of my own Wasm binary; however, it's been a long time since I worked directly with the HDF5's C API, and I'm more familiar with the C++ flavor. If it's not a lot of extra effort, it might be generally useful to include libhdf5_cpp.a in the repo.

Can this be loaded via ESM loaders, for example babel?

Hey! First of all, thank you for developing this!

I am trying to use h5wasm in a web / esm environment (specifically react.js), installed it using npm and am now trying to import it simply with
import * as hdf5 from "h5wasm";. Is this supposed to work? I see in the dist folder, that you do have both ESM and CJS versions.

I get an error from the loader I use (babel):

./node_modules/h5wasm/dist/hdf5_hl.js 121:21
Module parse failed: Unexpected token (121:21)
File was processed with these loaders:
 * ./node_modules/react-scripts/node_modules/babel-loader/lib/index.js
You may need an additional loader to handle the result of these loaders.
|   // for data being sent to Module
|   // set shape to size of array if it is not specified:
>   var shape = shape ?? (Array.isArray(data) || ArrayBuffer.isView(data) ? [data.length] : []);
|   var data = Array.isArray(data) || ArrayBuffer.isView(data) ? data : [data];
|   let total_size = shape.reduce((previous, current) => current * previous, 1);

I would greatly appreciate any help. Thank you!

Known lossy compression filters to use with h5wasm

Hello! I know this issue is not really h5wasm-related, but I figured you might be able to give me some hints regarding libraries that implement lossy compression filters listed in the HDF5 docs.

Do you know of any libary that supports lossy compression that can be used with javascript or WASM? Idealy it should also have a h5py counterpart. Are custom filters even supported by h5wasm?

Support for URLs (like in jsfive) and compound data types

Hi,
Thanks so much for your work on both jsfive and this library!

I'm trying to read the data in a remote HDF5 file (https://ndownloader.figshare.com/files/7024985) but was unable to do it using jsfive since it contains compound datatypes. From usnistgov/jsfive#22 , it seems that it's unsupported and might not be for some time. Does h5wasm support reading such data?

Also, since I'm writing code for the browser, I don't have local files and instead want to implement URL-based access. This was supported seamlessly by jsfive, but I can't find a way to do this (pass a URL or an array buffer) in h5wasm. Is this possible?

Streamed writes

Hi Brian, as mentioned by mail earlier, we are working on streamed writes. We need it to convert large binary files in a web app with memory limitations.

We have already made a version with python, where we are parsing data blocks. But we are planning to use Javascript/Typescipt.

Attached you can find our example of using the NETCDF4 library. We append new data from the stream, extending the netcdf4 file. See attached source code. Most of the logic is in main(), but some other functions are included.
createNetCDF4.zip

Kind Regards, Jason

Arrays containing big-endian numbers are wrongly decoded

When reading a dataset containing values (floats or integers) stored with big endianness, the resulting values are wrong.

I guess it comes from the process_data function that only expects data with little endianness ?

It is very easy to reproduce by creating a file like this and reading it with h5wasm:

import h5py
import numpy as np

with h5py.File("BE.h5", "w") as f:

    # Note the `>` in `dtype` that tells that this is big-endian
    f.create_dataset("float", data=np.arange(0.01, 0.11, 0.01), dtype=">f8")
    f.create_dataset(
        "int", data=np.arange(8).reshape(2, 2, 2), dtype=">i8", shape=(2, 2, 2)
    )

Related to silx-kit/h5web#1421 (comment)

Feature request: support network-based filesystem support for large files

For files too large to load into memory, it would be nice to be able to load parts of a file on-demand over the network (see discussion in #2). The new WASM Filesystem being designed for emscripten seems like it is anticipating this use case: see design documents at emscripten-core/emscripten#15041

When this is implemented and generally available, look into adding this capability to h5wasm so that very large files can be retrieved piecewise and on-demand by URL.

Smaller bundles

image

The current bundle weighs roughly 3.3 MB. Since it's basically a big binary, it's not tree-shakable by front-end tooling. I was wondering if it would be possible to maybe provide smaller, optimised bundles for specific use cases.

The main use case I'm thinking of is applications like myHDF5 that just read HDF5 files, and don't need all the APIs to write HDF5 files (create_dataset, etc.)

Also, my understanding is that currently, the WASM output includes three Emscripten file systems: MEMFS, IDBFS and WORKERFS. Perhaps it would be worth generating separate bundles for each as well?

I know that's a lot of potential combinations, especially when adding ESM/Node/IIFE to the mix... Perhaps there's a way to reorganise the code and the API to make use of dynamic linking or Module Splitting ... though this is all way over my head. 😅

Support boolean datatype

Boolean datatype is treated as H5T_ENUM (when saved though h5py):

      DATATYPE  H5T_ENUM {
         H5T_STD_I8LE;
         "FALSE"            0;
         "TRUE"             1;
      }

With the implementation of 0.4.1, h5wasm returns the metadata and dtype of the base type integer which do not allow to distinguish between integer and boolean datasets.

Ideally, H5T_ENUM should be marked as such. The base type could be stored in a sub-field enum_type as for compound and array dtypes.

Listing Saved Files

Is there an easy way to list all the saved filenames? I'm currently unable to find a way to request them.

Additional compression filters

Hi,
First of all - this project is an AWESOME idea, thanks for implementing it!

We're currently using the h5wasm since it's part of the (another great idea) https://myhdf5.hdfgroup.org/.
However, we were wondering if it's possible to add filter libraries to the system.
In particular, we're selecting the compression filter to fit our data. We've seen good results with zstandard, especially compared to gzip, however the MyHDF5 doesn't come with any additional filters.

On desktop (HDFView) we can easily install additional plugins, but since I suppose the wasm version is 'sandboxed' it might not be possible to easily add filters? Is it possible to create a custom version with zstandard compiled in and use that binary as base for a customized, self-hosted myhdf5 viewer?

Thanks again for the great tool!

Creation of Links and External_links.

Hello,

I am working at Helmholtz-Zentrum Berlin and I am currently developing a tool to create nexus files using a description of the nexus structure format as an input. The software runs on Javascript (NodeJS) and I am using H5Web. I have managed to create groups, attributes and dataset satisfactory, but I could not find a way to create links and external_links. Looking the source code I only see these two functions in the Group class:

  • get_link(obj_path: string)
  • get_external_link(obj_path: string)

I suppose the creation of links is not yet implemented and therefore I would like to ask if there are plans to implement them. Or if there is way to create them using the current version of the library, could you please guide me through the process?

Thank you in advance,
Hector

I cant createDataSet in a array

export async function json_to_hdf5(json: string): Promise<H5WasmFile> {
    const empty = await createEmptyHdf5();
    empty.create_dataset({
        name: JSON_TAG,
        data: json,
    });
    return empty;
}

const a = new Array(10).fill(10);
a.forEach(async () => {
    await json_to_hdf5('xxxxxx');
});

img_v2_6f3a24e4-7775-4f82-9e54-00be23fa86eh

Recursive calls to `process_data` in `hdf5_hl_base` might not handle all datatype conditions?

I used h5dump to interrogate my container, which contains a compound of three 32-bit floats and a 32-bit unsigned integer:

HDF5 "/Users/areynolds/Desktop/data.h5" {
GROUP "/" {
   GROUP "data" {
      DATASET "tsg8n0ki" {
         DATATYPE  H5T_COMPOUND {
            H5T_ARRAY { [3] H5T_IEEE_F32LE } "xyz";
            H5T_STD_U32LE "label_idx";
         }
         DATASPACE  SIMPLE { ( 100 ) / ( 100 ) }
         DATA {
         (0): {
               [ 6.31591, 3.62053, 5.03742 ],
               0
            },
         (1): {
               [ 4.09167, 4.58137, 5.51809 ],
               1
            },
            ...

Calling slice on a row of compound data calls process_data, which in turn returns an array of twelve Uint8s, along with the unsigned integer.

I can convert the twelve-byte portion of the per-row slice to three floats with the following function:

  public sliceRawUint8XyzToXyz = (rawXyz: number[], littleEndian : boolean) : number[] => {
    const results = [];  
    const bytes = 4;
    const points = 3;
    const buffer = new ArrayBuffer(bytes);
    const view = new DataView(buffer);

    if (littleEndian) {
      for (let i = 0; i < points; i++) {
        rawXyz.slice(i * bytes, (i + 1) * bytes).reduceRight(function (_, b, i) : any { view.setUint8(Math.abs(i - bytes + 1), b) }, rawXyz[points]);
        const acc = parseFloat(view.getFloat32(0).toFixed(6));
        results.push(acc);
      }
    }
    else {
      for (let i = 0; i < points; i++) {
        rawXyz.slice(i * bytes, (i + 1) * bytes).reduceRight(function (_, b, i) : any { view.setUint8(i, b) }, rawXyz[0]);
        const acc = parseFloat(view.getFloat32(0).toFixed(6));
        results.push(acc);
      }
    }

    return results;
  }

This gives me back the original array of three floats in the HDF5 table row.

I'm filing an issue in case it might be worth handling the H5T_ARRAY or other types in the process_data conditions, when constructing results. However, I admit I don't know how much work or how worthwhile it would be, if compound types are rare or discouraged.

Missing `chunks` field ?

Hey 👋

I was testing the new h5wasm versions and wanted to make use of the new chunks field for datasets.

I created a HDF5 file with h5py like so:

import numpy as np
import h5py

with h5py.File("chunked.h5", "w") as h5file:
    h5file.create_dataset(
        "chunked", data=np.random.random((1000, 1000)), chunks=(100, 100)
    )

When opening the file with h5wasm, I would expect to have a chunks field in the metadata of the dataset chunked, equal to [100, 100]. Unfortunately, the chunks dataset is not present in the metadata 😞

Is this only working for datasets created through h5wasm ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.