Coder Social home page Coder Social logo

Comments (7)

kmuehlbauer avatar kmuehlbauer commented on May 29, 2024

Thanks @abunimeh for raising this.

Can you, by chance, create a MCVE with h5netcdf only?

from h5netcdf.

abunimeh avatar abunimeh commented on May 29, 2024

MCVE

import h5netcdf
from time import sleep

with h5netcdf.File("sample1.nc", "w") as f1:
    f1.dimensions = {"x": 5}
    v = f1.create_variable("hello", ("x",), float)
    v[:] = [1, 1, 1, 1, 1]

# comment out this line and the md5 checksum will match
sleep(5)

with h5netcdf.File("sample2.nc", "w") as f2:
    f2.dimensions = {"x": 5}
    v = f2.create_variable("hello", ("x",), float)
    v[:] = [1, 1, 1, 1, 1]

without sleep() md5 checksum matches

md5 sample1.nc sample2.nc                                             
MD5 (sample1.nc) = 5a8ece3bce095ae7d651936306e21351
MD5 (sample2.nc) = 5a8ece3bce095ae7d651936306e21351

it looks like data/time of creation is embedded in the file which is causing the file to have different checksums. It would be nice to prevent embedding such info inside the file that way the file checksum is always the same regardless of when it was created.

with sleep() md5 checksum doesn't match

md5 sample1.nc sample2.nc 
MD5 (sample1.nc) = eeeb814c19f9efbaeb1f062ac89bab0f
MD5 (sample2.nc) = 9f1703c5ef4bf27d4c10b93e724b35db

from h5netcdf.

kmuehlbauer avatar kmuehlbauer commented on May 29, 2024

Thanks @abunimeh for providing that minimal example.

I'll have a look if this is something h5netcdf can control.

from h5netcdf.

kmuehlbauer avatar kmuehlbauer commented on May 29, 2024

One thing you can try is setting track_order=False when opening.

It might have to do how hdf5 is keeping track of creation order. Not sure if we can do anything about it.

from h5netcdf.

abunimeh avatar abunimeh commented on May 29, 2024

track_order=False fixes it

import h5netcdf
from time import sleep
import hashlib


track_order = False

fname1 = "./sample1.nc"
fname2 = "./sample2.nc"

with h5netcdf.File(fname1, "w", track_order=False) as f1:
    f1.dimensions = {"x": 5}
    v = f1.create_variable("hello", ("x",), float)
    v[:] = [1, 1, 1, 1, 1]

# comment out this line and the md5 checksum will match
sleep(5)

with h5netcdf.File(fname2, "w", track_order=False) as f2:
    f2.dimensions = {"x": 5}
    v = f2.create_variable("hello", ("x",), float)
    v[:] = [1, 1, 1, 1, 1]

s1hash = hashlib.md5(open(fname1, "rb").read()).hexdigest()
s2hash = hashlib.md5(open(fname2, "rb").read()).hexdigest()

assert s1hash == s2hash

from h5netcdf.

kmuehlbauer avatar kmuehlbauer commented on May 29, 2024

Great this works. But that comes with a cost. If you want to use it in your original use case, to create netcdf-files, then this might create issues down the road. By design netcdf-c and netcdf4-python use track_order=True to keep track of object creation order.

from h5netcdf.

abunimeh avatar abunimeh commented on May 29, 2024

Thanks!

from h5netcdf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.