ScippNexus
About
ScippNexus is a h5py-like utility for NeXus files with seamless scipp integration. See the documentation for more details.
Installation
python -m pip install scippnexus
h5py-like utility for NeXus files with seamless scipp integration
Home Page: https://scipp.github.io/scippnexus/
License: BSD 3-Clause "New" or "Revised" License
ScippNexus is a h5py-like utility for NeXus files with seamless scipp integration. See the documentation for more details.
python -m pip install scippnexus
load_nexus
relies on warnings if it cannot load certain parts of the file, to ensure that something incomplete rather than nothing is returned to the users.
For the new lower-level interface around NXobject
this does not seem like the right choice.
load_nexus
, use exceptions in low-level functions shared with NXobject
subclasses.See #30 and recent actions.
This is almost good to let us publish or update docs. One remaining subtlety:
If we make docs of an old branch, e.g., old tag, we may want to use an old package. So we should extend this to not simply take a publish
flag, but a version number? Then this can be used to:
dev
is specified).This would also need to be able to handle extra folders, and specifying 'latest', to just use latest package (and publish into root).
scn.load
and scn.load_nexus
.scippnexus.v2
Currently, when there is a depend_on
in, e.g., an NXdetector
, the corresponding transformation is loaded as a Scipp affine_transform3
. This is fine. However, if there is no depends_on
, NXtransformations
are just loaded as their datasets. The problem is that vital information for transformations is stored in their attributes. This is cumbersome (hard to use) and currently not supported by scipp.Variable
or scipp.DataGroup
.
Instead, we should load the transformations in NXtransformations
as scipp.Variable
with the correct spatial dtype (could be, e.g., rotation3
or translation3
). The code for this all exists since it is used for computing the depends_on
transformation chain, but it is not called when loading a plain NXtransformations
group.
In Scipp, we are considering the addition of a DataGroup
container. This would be similar to Dataset
, but without coords and without restricting the dims or shapes of the items. This is thus quite similar to a Nexus "group". We would therefore like to support loading groups in ScippNexus, returning a DataGroup
. There are a number of things to consider:
__getitem__
to return Python scalars instead of scipp.Variable
if not shape are unit is given. This was for more convenient storage in a Python dict
. For DataGroup
, the we are currently leaning towards requiring items to have dims
and shape
. Should we thus undo this change in ScippNexus? Or should DataGroup
be more flexible?DataGroup
, since most errors are from "higher level" logic, such as trying to interpret fields for an NXevent_data or NXdetector group. There are a number of subtleties here, especially implementation wise, as the current design puts some hurdles here.
DimensionError
. Currently these are skipped with a warning. We could instead return the entire NXdata
as DataGroup
, but this would likely not be useful in many cases. But not doing that would be inconsistent.The current implementation is minimal. We need more in order to save the relevant metadata to the file. See also scipp/esssans#33
https://scipp.github.io/scippnexus/user-guide/application-definitions.html#Writing-files explains the "advanced" method, but we lacks docs for the pedestrian way.
We currently get a ton of warnings when loading files without real data (see below). In particular after #172 we may be able to avoid some of those.
warnings.warn(msg)
/home/simon/code/scipp/scippnexus/src/scippnexus/field.py:240: UserWarning: Unrecognized unit 'hz' for value dataset in '/entry/instrument/T0_chopper/rotation_speed/value'; setting unit as 'dimensionless'
warnings.warn(
/home/simon/code/scipp/scippnexus/src/scippnexus/base.py:376: UserWarning: Failed to load /entry/instrument/band_chopper/delay as NXlog: Could not determine signal field or dimensions. Falling back to loading HDF5 group children as scipp.DataGroup.
warnings.warn(msg)
/home/simon/code/scipp/scippnexus/src/scippnexus/field.py:240: UserWarning: Unrecognized unit 'hz' for value dataset in '/entry/instrument/band_chopper/rotation_speed/value'; setting unit as 'dimensionless'
warnings.warn(
CPU times: user 817 ms, sys: 1.93 s, total: 2.75 s
Wall time: 506 ms
/home/simon/code/scipp/scippnexus/src/scippnexus/base.py:376: UserWarning: Failed to load /entry/instrument/monitor_bunker/monitor_bunker_events as NXevent_data: Required field event_time_zero not found in NXevent_data Falling back to loading HDF5 group children as scipp.DataGroup.
warnings.warn(msg)
/home/simon/code/scipp/scippnexus/src/scippnexus/base.py:376: UserWarning: Failed to load /entry/instrument/monitor_bunker as NXmonitor: Signal is not an array-like. Falling back to loading HDF5 group children as scipp.DataGroup.
warnings.warn(msg)
/home/simon/code/scipp/scippnexus/src/scippnexus/base.py:376: UserWarning: Failed to load /entry/instrument/monitor_cave/monitor_cave_events as NXevent_data: Required field event_time_zero not found in NXevent_data Falling back to loading HDF5 group children as scipp.DataGroup.
warnings.warn(msg)
/home/simon/code/scipp/scippnexus/src/scippnexus/base.py:376: UserWarning: Failed to load /entry/instrument/monitor_cave as NXmonitor: Signal is not an array-like. Falling back to loading HDF5 group children as scipp.DataGroup.
warnings.warn(msg)
/home/simon/code/scipp/scippnexus/src/scippnexus/base.py:376: UserWarning: Failed to load /entry/instrument/overlap_chopper/delay as NXlog: Could not determine signal field or dimensions. Falling back to loading HDF5 group children as scipp.DataGroup.
warnings.warn(msg)
/home/simon/code/scipp/scippnexus/src/scippnexus/field.py:240: UserWarning: Unrecognized unit 'hz' for value dataset in '/entry/instrument/overlap_chopper/rotation_speed/value'; setting unit as 'dimensionless'
warnings.warn(
/home/simon/code/scipp/scippnexus/src/scippnexus/base.py:376: UserWarning: Failed to load /entry/instrument/polarizer/rate as NXlog: Could not determine signal field or dimensions. Falling back to loading HDF5 group children as scipp.DataGroup.
warnings.warn(msg)
/home/simon/code/scipp/scippnexus/src/scippnexus/base.py:376: UserWarning: Failed to load /entry/instrument/pulse_shaping_chopper1/delay as NXlog: Could not determine signal field or dimensions. Falling back to loading HDF5 group children as scipp.DataGroup.
warnings.warn(msg)
/home/simon/code/scipp/scippnexus/src/scippnexus/field.py:240: UserWarning: Unrecognized unit 'hz' for value dataset in '/entry/instrument/pulse_shaping_chopper1/rotation_speed/value'; setting unit as 'dimensionless'
warnings.warn(
/home/simon/code/scipp/scippnexus/src/scippnexus/base.py:376: UserWarning: Failed to load /entry/instrument/pulse_shaping_chopper2/delay as NXlog: Could not determine signal field or dimensions. Falling back to loading HDF5 group children as scipp.DataGroup.
warnings.warn(msg)
/home/simon/code/scipp/scippnexus/src/scippnexus/field.py:240: UserWarning: Unrecognized unit 'hz' for value dataset in '/entry/instrument/pulse_shaping_chopper2/rotation_speed/value'; setting unit as 'dimensionless'
warnings.warn(
/home/simon/code/scipp/scippnexus/src/scippnexus/base.py:376: UserWarning: Failed to load /entry/instrument/sans_detector/sans_event_data as NXevent_data: Required field event_time_zero not found in NXevent_data Falling back to loading HDF5 group children as scipp.DataGroup.
warnings.warn(msg)
/home/simon/code/scipp/scippnexus/src/scippnexus/base.py:376: UserWarning: Failed to load /entry/instrument/sans_detector as NXdetector: Signal is not an array-like. Falling back to loading HDF5 group children as scipp.DataGroup.
warnings.warn(msg)
/home/simon/code/scipp/scippnexus/src/scippnexus/base.py:376: UserWarning: Failed to load /entry/instrument/source/current as NXlog: Could not determine signal field or dimensions. Falling back to loading HDF5 group children as scipp.DataGroup.
warnings.warn(msg)
Currently, the fallback loader in NXobject
catches (nearly) all exceptions from loaders for concrete classes and uses the fallback. This is intended to allow loading files with partially bad structure. But it also hides user errors like a bad index (wrong dim, bad slice, etc.).
We should distinguish between errors originating in the file structure and error originating from the user/caller. Only the former should trigger the fallback.
/opt/anaconda3/envs/scippneutron/lib/python3.8/site-packages/scippneutron/file_loading/nxobject.py:166: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
if self.dims == [] and shape == [1]:
See https://manual.nexusformat.org/classes/base_classes/NXdetector.html#nxdetector-pixel-mask-field. We should turn this into masks of a scipp.DataArray
(the the detector is loaded as a DataArray).
Scipp only really works with boolean masks. Therefore, we should inspect the bitmask, and split it into individual masks. Only bits that are actually in use should result in creation of a corresponding mask.
H5Base.attrs
is annotated to return List[int]
but the actual implementations return Attrs
(dict-like).
In the current implementation, no major effort was put into optimization. Basically, the implementation is "naive" in most cases, which might, e.g., result in repeated or redundant calls to h5py
, etc.
We should profile ScippNexus for the "typical" cases, i.e., files with hundreds of groups and thousands of datasets. Attention should be payed not just to loading large datasets, but first and foremost all of the "overhead" from dealing with small but many file contents.
See https://manual.nexusformat.org/datarules.html?highlight=uncertainties#rules-for-storing-data-items-in-nexus-files, specifically the "Reserved suffixes", e.g., _mask
is something we could handle:
Reserved suffixes
When naming a field, NeXus has reserved certain suffixes to the names
so that a specific meaning may be attached. Consider a field named DATASET
,
the following table lists the suffixes reserved by NeXus.
suffix | reference | meaning |
---|---|---|
_end | NXtransformations | end points of the motions that start with DATASET |
_errors | NXdata | uncertainties (a.k.a., errors) |
_increment_set | NXtransformations | intended average range through which the corresponding axis moves during the exposure of a frame |
_indices | NXdata | Integer array that defines the indices of the signal field which need to be used in the DATASET in order to reference the corresponding axis value |
_mask | Field containing a signal mask, where 0 means the pixel is not masked. If required, bit masks are defined in NXdetector pixel_mask. | |
_set | target values | Target value of DATASET |
_weights | divide DATASET by these weights [4] |
Check the code, and add a test.
import scippnexus.v2 as snx
# ...
dg = f['event_time_zero', 0:1] # ok, loads one pulse
dg = f['event_time_zero', 0:0] # seems to load everything?
Follow-up to #63. Basically, given a data structure such as a scipp.DataArray
and a NeXus application definition, we want to create/write groups. Currently Nxobject
provides
scippnexus/src/scippnexus/nxobject.py
Line 472 in ece37dc
We can consider extending this with support for an application definition:
with snx.File(name, definition=NXcanSAS) as f:
group = f[path]
group.create_class(name, definition=SASdata, data=my_data_array)
Here definition
provides a key for lookup of the child strategy via group._strategy
(which was setup from the NXcanSAS
root definition). The child strategy must then provide everything necessary for writing the group and its content (attributes, fields+field attributes, child groups, ...) and how these relate to properties of the data
.
Design wise, one key aspect to address is how to handle recursion. Should the method on NXobject be allowed to deal with this, i.e., the strategy may write an entire subtree, or should this be handled in another way, in a way that avoids handling the tree in the strategy?
An alternative opportunity to explore is whether NXobject.__setitem__
could be generalized. Currently it only supports creations of fields (from scipp.Variable
), as scipp.DataArray
does not contain enough information to create a NeXus group. However, an application definition might provide a wrapper for this?
group['sasdata01'] = SASdata(my_data_array)
Need to figure out if the dual purpose of SASdata
as a definition/strategy for loading and a wrapper for data is a reasonable design.
After scipp/scipp#2895, broadcasting of variances will no longer be supported. This implies that the "standard" paradigm of handling uncertainties will not be feasible any more. This further implies that there is limited use to carrying uncertainties of "counts", since uncertainties can simply be computed later on.
We should therefore consider avoiding the overhead of creating variances for the weights when an NXdetector
with NXevent_data
is loaded. This would save both memory and compute resources.
To do: Decide on exact date.
So they behave a bit more like dicts. Should behave like len(grp) == len(grp.keys())
After #117, the following remains to be done:
NXevent_data
fields embedded in NXmonitor
or NXdetector
. This is used by SNS files.scippnexus.v2
depends_on
chains.depends_on
chains.NXoff_geometry
and NXcylindrical_geometry
to per-detector "shapes".Also check if there are existing pieces for executing depends_on chains that need refactoring.
Example:
from scippnexus import data
filename = data.get_path('PG3_4844_event.nxs')
import scippnexus as snx
f = snx.File(filename)
data = f['entry/bank103']
data['x_pixel_offset', 0]
raises
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 data['x_pixel_offset', 0]
File ~/code/nxus/jupyter/scippnexus/nxobject.py:315, in NXobject.__getitem__(self, name)
312 def __getitem__(
313 self,
314 name: NXobjectIndex) -> Union['NXobject', Field, sc.DataArray, sc.Dataset]:
--> 315 return self._get_child(name, use_field_dims=True)
File ~/code/nxus/jupyter/scippnexus/nxobject.py:307, in NXobject._get_child(self, name, use_field_dims)
305 else:
306 return _make(item)
--> 307 da = self._getitem(name)
308 if (t := self.depends_on) is not None:
309 da.coords['depends_on'] = t if isinstance(t, sc.Variable) else sc.scalar(t)
File ~/code/nxus/jupyter/scippnexus/nxdata.py:149, in NXdata._getitem(self, select)
148 def _getitem(self, select: ScippIndex) -> sc.DataArray:
--> 149 signal = self._signal[select]
150 if self._errors_name in self:
151 stddevs = self[self._errors_name][select]
File ~/code/nxus/jupyter/scippnexus/nxobject.py:185, in Field.__getitem__(self, select)
183 shape = list(self.shape)
184 for i, ind in enumerate(index):
--> 185 shape[i] = len(range(*ind.indices(shape[i])))
187 variable = sc.empty(dims=self.dims,
188 shape=shape,
189 dtype=self.dtype,
190 unit=self.unit)
192 # If the variable is empty, return early
AttributeError: 'int' object has no attribute 'indices'
Note that data['x_pixel_offset', :10]
works as expected.
We have allowed UserWarning
in the pytest warning setup. However, scipp.VisibleDeprecationWarning
inherits UserWarning
, so we are unintentionally hiding those.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.