Comments (7)
Thanks @abunimeh for raising this.
Can you, by chance, create a MCVE with h5netcdf only?
from h5netcdf.
MCVE
import h5netcdf
from time import sleep
with h5netcdf.File("sample1.nc", "w") as f1:
f1.dimensions = {"x": 5}
v = f1.create_variable("hello", ("x",), float)
v[:] = [1, 1, 1, 1, 1]
# comment out this line and the md5 checksum will match
sleep(5)
with h5netcdf.File("sample2.nc", "w") as f2:
f2.dimensions = {"x": 5}
v = f2.create_variable("hello", ("x",), float)
v[:] = [1, 1, 1, 1, 1]
without sleep() md5 checksum matches
md5 sample1.nc sample2.nc
MD5 (sample1.nc) = 5a8ece3bce095ae7d651936306e21351
MD5 (sample2.nc) = 5a8ece3bce095ae7d651936306e21351
it looks like data/time of creation is embedded in the file which is causing the file to have different checksums. It would be nice to prevent embedding such info inside the file that way the file checksum is always the same regardless of when it was created.
with sleep() md5 checksum doesn't match
md5 sample1.nc sample2.nc
MD5 (sample1.nc) = eeeb814c19f9efbaeb1f062ac89bab0f
MD5 (sample2.nc) = 9f1703c5ef4bf27d4c10b93e724b35db
from h5netcdf.
Thanks @abunimeh for providing that minimal example.
I'll have a look if this is something h5netcdf can control.
from h5netcdf.
One thing you can try is setting track_order=False
when opening.
It might have to do how hdf5 is keeping track of creation order. Not sure if we can do anything about it.
from h5netcdf.
track_order=False
fixes it
import h5netcdf
from time import sleep
import hashlib
track_order = False
fname1 = "./sample1.nc"
fname2 = "./sample2.nc"
with h5netcdf.File(fname1, "w", track_order=False) as f1:
f1.dimensions = {"x": 5}
v = f1.create_variable("hello", ("x",), float)
v[:] = [1, 1, 1, 1, 1]
# comment out this line and the md5 checksum will match
sleep(5)
with h5netcdf.File(fname2, "w", track_order=False) as f2:
f2.dimensions = {"x": 5}
v = f2.create_variable("hello", ("x",), float)
v[:] = [1, 1, 1, 1, 1]
s1hash = hashlib.md5(open(fname1, "rb").read()).hexdigest()
s2hash = hashlib.md5(open(fname2, "rb").read()).hexdigest()
assert s1hash == s2hash
from h5netcdf.
Great this works. But that comes with a cost. If you want to use it in your original use case, to create netcdf-files, then this might create issues down the road. By design netcdf-c and netcdf4-python use track_order=True to keep track of object creation order.
from h5netcdf.
Thanks!
from h5netcdf.
Related Issues (20)
- Can numpy objects be supported? HOT 5
- h5netcdf writes invalid netcdf to existing netcdf files HOT 6
- Corrupted headers when serialising using dask.distributed client HOT 9
- Test failures with NetCDF 4.9.0 HOT 6
- Remove h5py2 related code and CI builds HOT 3
- FAILED h5netcdf/tests/test_h5netcdf.py::test_group_names HOT 6
- AttributeError for '_phony_dim_count' when trying to convert a file made with h5py HOT 12
- Tests test_more_than_7_attr_creation_track_order and test_bool_slicing_length_one_dim fail in the test suite HOT 18
- very slow partial reading when saved with index shift HOT 10
- h5py minimum version update? HOT 5
- Improving performance for h5netcdf HOT 7
- Documentation request: Alternative way to obtain h5netcdf HOT 1
- Segmentation fault after upgrading to h5netcdf==1.1.0 HOT 14
- ValueError raised when attribute has type `h5py.Reference` HOT 5
- support for HDF5 dimension scales with null dataspace HOT 2
- Modifying attributes safely is not possible with all datasets. HOT 10
- Better Error for illegal variable names HOT 3
- Question: does `h5netcdf` bring in the entire data from a netCDF file on a remote disk (like S3)? HOT 5
- Provide an example with time dimension readable by paraview HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from h5netcdf.