Coder Social home page Coder Social logo

Comments (22)

gauteh avatar gauteh commented on June 19, 2024 1

from netcdf.

magnusuMET avatar magnusuMET commented on June 19, 2024 1

We have https://crates.io/crates/hdf5file, but quite a lot of work is neccessary to get this up to spec, and gain performance relative to a linked thread-safe hdf5

from netcdf.

magnusuMET avatar magnusuMET commented on June 19, 2024 1

For CDF-1,2,5 we could realistically create a safe dispatcher, I already have a CDF-parser written in nom

from netcdf.

magnusuMET avatar magnusuMET commented on June 19, 2024

Constraints from Rust should also be taken into account when investigating (multiple reader, no writers), which could allow relaxing of the locking

from netcdf.

magnusuMET avatar magnusuMET commented on June 19, 2024

There is currently no locking (in netcdf-c) of the global NCList, which contains the file-handles. One could either lock when opening/closing files, or try to upstream a RwLock protecting this structure.

A newer version of netcdf-c has the nc_initialize function which initializes all the dispatchers. This could be called once on initializing the library.

from netcdf.

magnusuMET avatar magnusuMET commented on June 19, 2024

Caching of variables and attributes requires a separate lock when reading/writing variables/groups.

from netcdf.

gauteh avatar gauteh commented on June 19, 2024

Continuing from #42 which seems to overlap with this one. HDF5 does support single-writer / multiple-readers (e.g. http://docs.h5py.org/en/stable/swmr.html). So maybe this is something that could be supported on newer HDF5 based netcdf files. I saw somewhere that using hdf5 to write, and especially adding new variables, to a netcdf file is likely to make the file unreadable by netcdf. But reading should be fine.

from netcdf.

magnusuMET avatar magnusuMET commented on June 19, 2024

Multiple readers should be supported out of the box, but some initialisation might be required on first access of the variable. I think we can have a lock surrounding opening/closing files and smaller locks on variable access

from netcdf.

magnusuMET avatar magnusuMET commented on June 19, 2024

@gauteh Do you have a suitable benchmark we could use for testing parallell performance?

from netcdf.

gauteh avatar gauteh commented on June 19, 2024

from netcdf.

gauteh avatar gauteh commented on June 19, 2024

from netcdf.

magnusuMET avatar magnusuMET commented on June 19, 2024

Are we talking about concurrent access from threads or from processes?

I believe we are mostly interested in threads

the difference between this issue and the parallel read issue

The parallel access requires an MPI communicator, and parallell instances of the same program. This is not exposed at all in the current form of the library

from netcdf.

magnusuMET avatar magnusuMET commented on June 19, 2024

And to same file struct? Or one struct per thread to same netcdf file?

I am not really sure whether Variable should be Send/Sync yet. I'll need to dig into netcdf-c and see.

from netcdf.

gauteh avatar gauteh commented on June 19, 2024

Tried making a small example now using latest master. I'm not able to pass a Arc<netcdf::File> between threads anymore due to UnsafeCell on groups: https://github.com/gauteh/rust-netcdf/blob/thread-perf/tests/concurrency.rs

I also noticed you are working on some big changes in #51, which is probably good to get in before resolving this..?

   Compiling netcdf v0.3.1 (/home/gauteh/dev/rust-netcdf)
error[E0277]: `std::rc::Rc<std::cell::UnsafeCell<netcdf::group::Group>>` cannot be sent between threads safely
  --> tests/concurrency.rs:18:10
   |
18 |     pool.scope(move |s| {
   |          ^^^^^ `std::rc::Rc<std::cell::UnsafeCell<netcdf::group::Group>>` cannot be sent between threads safely
   |
   = help: within `netcdf::file::ReadOnlyFile`, the trait `std::marker::Send` is not implemented for `std::rc::Rc<std::cell::UnsafeCell<netcdf::group::Group>>`
   = note: required because it appears within the type `netcdf::file::File`
   = note: required because it appears within the type `netcdf::file::ReadOnlyFile`
   = note: required because of the requirements on the impl of `std::marker::Send` for `std::sync::Arc<netcdf::file::ReadOnlyFile>`
   = note: required because it appears within the type `[closure@tests/concurrency.rs:18:16: 31:6 f:std::sync::Arc<netcdf::file::ReadOnlyFile>]`

error[E0277]: `std::rc::Rc<std::cell::UnsafeCell<netcdf::group::Group>>` cannot be shared between threads safely
  --> tests/concurrency.rs:18:10
   |
18 |     pool.scope(move |s| {
   |          ^^^^^ `std::rc::Rc<std::cell::UnsafeCell<netcdf::group::Group>>` cannot be shared between threads safely
   |
   = help: within `netcdf::file::ReadOnlyFile`, the trait `std::marker::Sync` is not implemented for `std::rc::Rc<std::cell::UnsafeCell<netcdf::group::Group>>`
   = note: required because it appears within the type `netcdf::file::File`
   = note: required because it appears within the type `netcdf::file::ReadOnlyFile`
   = note: required because of the requirements on the impl of `std::marker::Send` for `std::sync::Arc<netcdf::file::ReadOnlyFile>`
   = note: required because it appears within the type `[closure@tests/concurrency.rs:18:16: 31:6 f:std::sync::Arc<netcdf::file::ReadOnlyFile>]`

error: aborting due to 2 previous errors

from netcdf.

magnusuMET avatar magnusuMET commented on June 19, 2024

Yeah, I've pretty much redone entirely how this crate parses the netcdf file, which will remove the UnsafeCell and pull the crate to a one-to-one mapping towards netcdf. I don't think there is any strong opposition against merging, so I'll go ahead and do just that.

from netcdf.

gauteh avatar gauteh commented on June 19, 2024

Great!

from netcdf.

magnusuMET avatar magnusuMET commented on June 19, 2024

Some observations from using a non-threadsafe HDF5 library: Segfaults and aborts, even when working on unrelated netCDF files. HDF5 should be considered highly unsafe for multithreading.

from netcdf.

gauteh avatar gauteh commented on June 19, 2024

I guess that settles the discussion on HDF5. I asked developers of Hyrax (an official opendap server) about their HDF5 interface. It seems to be more thread-safe, but I am not sure if this refers to the HDF5 reading library or possible an interface in-between. If somehow HDF5 can be made thread-safe w.r.t. at least reading that will put the performance much higher than netcdf-libs in other languages (depending on use case of course).

OPENDAP/hdf5_handler#13

from netcdf.

lwandrebeck avatar lwandrebeck commented on June 19, 2024

Maybe using https://github.com/aldanor/hdf5-rust could help ? Note that I haven’t used that crate, but readme says « provides thread-safe Rust bindings and high-level wrappers for the HDF5 library API. »
HTH.

from netcdf.

gauteh avatar gauteh commented on June 19, 2024

Maybe using https://github.com/aldanor/hdf5-rust could help ? Note that I haven’t used that crate, but readme says « provides thread-safe Rust bindings and high-level wrappers for the HDF5 library API. »
HTH.

It also relies on the official HDF5 library, but provides thread-safety through a global lock. This means that a single process can only sequentially use any of the (unsafe) functions from HDF5.

from netcdf.

magnusuMET avatar magnusuMET commented on June 19, 2024

Just to demonstrate the crash in netcdf (I have yet to repeat this in raw hdf): https://gist.github.com/magnusuMET/28a7991db0fcb5392b56573837aa7289

from netcdf.

magnusuMET avatar magnusuMET commented on June 19, 2024

netcdf-c does not make any reasonable guarantee regarding thread safety, so a global mutex is needed. If multi-reading is necessary, consider using MPI (requires some additional support in this library), or copy the approach of dars

from netcdf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.