Comments (22)
from netcdf.
We have https://crates.io/crates/hdf5file, but quite a lot of work is neccessary to get this up to spec, and gain performance relative to a linked thread-safe hdf5
from netcdf.
For CDF-1,2,5 we could realistically create a safe dispatcher, I already have a CDF-parser written in nom
from netcdf.
Constraints from Rust should also be taken into account when investigating (multiple reader, no writers), which could allow relaxing of the locking
from netcdf.
There is currently no locking (in netcdf-c
) of the global NCList
, which contains the file-handles. One could either lock when opening/closing files, or try to upstream a RwLock protecting this structure.
A newer version of netcdf-c
has the nc_initialize
function which initializes all the dispatchers. This could be called once on initializing the library.
from netcdf.
Caching of variables and attributes requires a separate lock when reading/writing variables/groups.
from netcdf.
Continuing from #42 which seems to overlap with this one. HDF5 does support single-writer / multiple-readers (e.g. http://docs.h5py.org/en/stable/swmr.html). So maybe this is something that could be supported on newer HDF5 based netcdf files. I saw somewhere that using hdf5 to write, and especially adding new variables, to a netcdf file is likely to make the file unreadable by netcdf. But reading should be fine.
from netcdf.
Multiple readers should be supported out of the box, but some initialisation might be required on first access of the variable. I think we can have a lock surrounding opening/closing files and smaller locks on variable access
from netcdf.
@gauteh Do you have a suitable benchmark we could use for testing parallell performance?
from netcdf.
from netcdf.
from netcdf.
Are we talking about concurrent access from threads or from processes?
I believe we are mostly interested in threads
the difference between this issue and the parallel read issue
The parallel access requires an MPI communicator, and parallell instances of the same program. This is not exposed at all in the current form of the library
from netcdf.
And to same file struct? Or one struct per thread to same netcdf file?
I am not really sure whether Variable
should be Send
/Sync
yet. I'll need to dig into netcdf-c
and see.
from netcdf.
Tried making a small example now using latest master. I'm not able to pass a Arc<netcdf::File>
between threads anymore due to UnsafeCell
on groups: https://github.com/gauteh/rust-netcdf/blob/thread-perf/tests/concurrency.rs
I also noticed you are working on some big changes in #51, which is probably good to get in before resolving this..?
Compiling netcdf v0.3.1 (/home/gauteh/dev/rust-netcdf)
error[E0277]: `std::rc::Rc<std::cell::UnsafeCell<netcdf::group::Group>>` cannot be sent between threads safely
--> tests/concurrency.rs:18:10
|
18 | pool.scope(move |s| {
| ^^^^^ `std::rc::Rc<std::cell::UnsafeCell<netcdf::group::Group>>` cannot be sent between threads safely
|
= help: within `netcdf::file::ReadOnlyFile`, the trait `std::marker::Send` is not implemented for `std::rc::Rc<std::cell::UnsafeCell<netcdf::group::Group>>`
= note: required because it appears within the type `netcdf::file::File`
= note: required because it appears within the type `netcdf::file::ReadOnlyFile`
= note: required because of the requirements on the impl of `std::marker::Send` for `std::sync::Arc<netcdf::file::ReadOnlyFile>`
= note: required because it appears within the type `[closure@tests/concurrency.rs:18:16: 31:6 f:std::sync::Arc<netcdf::file::ReadOnlyFile>]`
error[E0277]: `std::rc::Rc<std::cell::UnsafeCell<netcdf::group::Group>>` cannot be shared between threads safely
--> tests/concurrency.rs:18:10
|
18 | pool.scope(move |s| {
| ^^^^^ `std::rc::Rc<std::cell::UnsafeCell<netcdf::group::Group>>` cannot be shared between threads safely
|
= help: within `netcdf::file::ReadOnlyFile`, the trait `std::marker::Sync` is not implemented for `std::rc::Rc<std::cell::UnsafeCell<netcdf::group::Group>>`
= note: required because it appears within the type `netcdf::file::File`
= note: required because it appears within the type `netcdf::file::ReadOnlyFile`
= note: required because of the requirements on the impl of `std::marker::Send` for `std::sync::Arc<netcdf::file::ReadOnlyFile>`
= note: required because it appears within the type `[closure@tests/concurrency.rs:18:16: 31:6 f:std::sync::Arc<netcdf::file::ReadOnlyFile>]`
error: aborting due to 2 previous errors
from netcdf.
Yeah, I've pretty much redone entirely how this crate parses the netcdf file, which will remove the UnsafeCell
and pull the crate to a one-to-one mapping towards netcdf
. I don't think there is any strong opposition against merging, so I'll go ahead and do just that.
from netcdf.
Great!
from netcdf.
Some observations from using a non-threadsafe HDF5 library: Segfaults and aborts, even when working on unrelated netCDF files. HDF5 should be considered highly unsafe for multithreading.
from netcdf.
I guess that settles the discussion on HDF5. I asked developers of Hyrax (an official opendap server) about their HDF5 interface. It seems to be more thread-safe, but I am not sure if this refers to the HDF5 reading library or possible an interface in-between. If somehow HDF5 can be made thread-safe w.r.t. at least reading that will put the performance much higher than netcdf-libs in other languages (depending on use case of course).
from netcdf.
Maybe using https://github.com/aldanor/hdf5-rust could help ? Note that I haven’t used that crate, but readme says « provides thread-safe Rust bindings and high-level wrappers for the HDF5 library API. »
HTH.
from netcdf.
Maybe using https://github.com/aldanor/hdf5-rust could help ? Note that I haven’t used that crate, but readme says « provides thread-safe Rust bindings and high-level wrappers for the HDF5 library API. »
HTH.
It also relies on the official HDF5 library, but provides thread-safety through a global lock. This means that a single process can only sequentially use any of the (unsafe) functions from HDF5.
from netcdf.
Just to demonstrate the crash in netcdf (I have yet to repeat this in raw hdf): https://gist.github.com/magnusuMET/28a7991db0fcb5392b56573837aa7289
from netcdf.
netcdf-c
does not make any reasonable guarantee regarding thread safety, so a global mutex is needed. If multi-reading is necessary, consider using MPI (requires some additional support in this library), or copy the approach of dars
from netcdf.
Related Issues (20)
- Enable shuffle filter HOT 5
- Change selectors for put/get
- How to read variables with string values? HOT 3
- Typing error when compiling on ARM64 Docker container HOT 4
- Unable to link with `netcdf` library on macOS HOT 2
- Wrong values fetched HOT 6
- scale_factor & offset_factor HOT 9
- not being linked properly on windows for end binary HOT 1
- Hash of file changes after opening with rust-netcdf HOT 3
- All paths should accept and return OsString HOT 2
- Investigate and fix docs.rs build HOT 1
- ncdap_test not existing in latest netcdf-src HOT 1
- CI failed HOT 1
- Troubles reading xarray netcdf files HOT 5
- Update to ndarray 0.14.0 HOT 2
- netcdf features should be enabled if available HOT 1
- Update netcdf-src to latest version of netcdf-c
- Add derive macros for more advanced types
- CF Time attribute HOT 13
- thread 'main' panicked at 'Unable to locate HDF5 root directory and/or headers.' HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from netcdf.