Coder Social home page Coder Social logo

hyperspy / rosettasciio Goto Github PK

View Code? Open in Web Editor NEW
46.0 12.0 26.0 119.23 MB

Python library for reading and writing scientific data format

Home Page: https://hyperspy.org/rosettasciio

License: GNU General Public License v3.0

Python 99.32% Cython 0.68% C 0.01%
python scientific-formats hyperspectral image multi-dimensional electron-microscopy hyperspy raman

rosettasciio's Introduction

Azure pipelines status Tests status Documentation status Codecov coverage pre-commit.ci status

Python Version PyPI Version Anaconda Version License: GPL v3

gitter DOI

HyperSpy is an open source Python library for the interactive analysis of multidimensional datasets that can be described as multidimensional arrays of a given signal (for example, a 2D array of spectra, also known as a spectrum image).

HyperSpy makes it straightforward to apply analytical procedures that operate on an individual signal to multidimensional arrays, as well as providing easy access to analytical tools that exploit the multidimensionality of the dataset.

Its modular structure makes it easy to add features to analyze many different types of signals. Visit the HyperSpy Website to learn more about HyperSpy and its extension packages and for instructions on how to install the package and get started in analysing your data.

HyperSpy is released under the GPL v3 license.

Since version 0.8.4, HyperSpy only supports Python 3. If you need to install HyperSpy in Python 2.7, please install version 0.8.3.

Contributing

Everyone is welcome to contribute. Please read our contributing guidelines and get started!

rosettasciio's People

Contributors

actions-user avatar attolight-ntappy avatar cssfrancis avatar densmerijn avatar dependabot[bot] avatar din14970 avatar dnjohnstone avatar ericpre avatar francisco-dlp avatar jan-car avatar jat255 avatar jlaehne avatar k8macarthur avatar lmsc-ntappy avatar magnunor avatar msarahan avatar nem1234 avatar pburdet avatar pietsjoh avatar pquinn-dls avatar pre-commit-ci[bot] avatar ptim0626 avatar sem-geologist avatar ssomnath avatar thomasaarholt avatar tjof2 avatar to266 avatar tomslater avatar vidartf avatar woozey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rosettasciio's Issues

Jeol.py error when loading asw files containing sequential acquisition

Hi everyone,
I get the following error "_read_eds() takes 1 positional argument but 2 were given" when using the jeol.py io plugin on asw files (such as this one 1.zip). After a quick glance at the code I have the impression that reading sequential acquisition is not supported. Can someone confirm that ?
Is someone else interested in this feature ?
Since I would like to use this feature, has anyone any idea of how the region associated to the eds file can be identified in the asw tag structure returned by the _parsejeol function ? Could it be associated withe the PositionMM tag associated to the enumerated entries in the ViewData structure ? (to be honest I do not fully understand what the PosiitonMM and PositionMM2 arrays represent...)

Thanks in advance for your help,

Pier

Improve metadata handling

Describe the functionality you would like to see.

As brought up by @francisco-dlp in LumiSpy/lumispy#53 (comment), it would be desirable to have a more universal metadata handling. Currently, metadata is mapped from original_metadata in every file_reader independently following the HyperSpy conventions. If other packages would want to built on RosettaSciIO, this is not the most convenient. Also it does include a lot of redundant code. Instead, we could for example use something like yaml files to define the mapping, and then each folder could include a hyperspy.yaml, but potentially also other mapping files for other applications.

Of course, metadata mapping is not always 1:1 (node from one tree is directly mapped to position in other metadata tree), which can be done using a basic dictionary. The mapping definition would need to include several extra situations:

  • if/elif/else like statement, where a certain field in original_metadata can decide which other field is mapped or what string/value is set in a certain node of metadata
  • processing the content of a field by python (e.g. one line code segments), such as unit conversion, calculation of an overall exposure time from multiple acquisitions (number of frames x time per frame)

The developers of the https://github.com/nomad-coe/nomad repository/ELN have implemented a similar functionality based on what they call "schemas". Maybe, we can team up with them @markus1978, @haltugyildirim to implement such a mapping in RosettaSciIO, as the possibility to read in a number of (partly binary) data formats provided by RosettaSciIO should in turn be valuable to Nomad in order to support a broader range of experiments and to integrate processing via e.g. HyperSpy.

Additional information

Should not hold back an initial release, but should be on the roadmap.

FEI emi STEM import loses detector metadata

When importing the FEI .emi format, several images (one per detector) are imported at once.
There's lots of metadata, especially in original_metadata, but there's no way to tell which image is from which detector - either by some detector metadata, or by the image title (typically titled "Acquire HAADF", "Acquire DF4" etc.)

I've had a look through @francisco-dlp's emi code, and it references that there's some sort of issue with reading the xml data? Is it because the emi file contains a single xml document rather than one per image/detector?

In any case, the emi format is slowly being phased out in favour of the new emd format, but I thought I'd ask in case anyone had a suggestion.

Add support for reading bcf file containing EBSD dataset

Hello,

I am trying to load bruker bcf file, but I got errors as attached below:

ERROR:hyperspy.io:If this file format is supported, please report this error to the HyperSpy developers.
Traceback (most recent call last):
File "EBSD_analysis.py", line 4, in
file = hs.load("APMT_200x_EBSD_2x4_indent_single_indent_top4.bcf")
File "/anaconda3/lib/python3.7/site-packages/hyperspy/io.py", line 467, in load
for filename in filenames]
File "/anaconda3/lib/python3.7/site-packages/hyperspy/io.py", line 467, in
for filename in filenames]
File "/anaconda3/lib/python3.7/site-packages/hyperspy/io.py", line 525, in load_single_file
return load_with_reader(filename=filename, reader=reader, **kwds)
File "/anaconda3/lib/python3.7/site-packages/hyperspy/io.py", line 545, in load_with_reader
file_data_list = reader.file_reader(filename, **kwds)
File "/anaconda3/lib/python3.7/site-packages/hyperspy/io_plugins/bruker.py", line 1247, in file_reader
return bcf_reader(filename, *args, **kwds)
File "/anaconda3/lib/python3.7/site-packages/hyperspy/io_plugins/bruker.py", line 1281, in bcf_reader
obj_bcf = BCF_reader(filename, instrument=instrument)
File "/anaconda3/lib/python3.7/site-packages/hyperspy/io_plugins/bruker.py", line 913, in init
header_file = self.get_file('EDSDatabase/HeaderData')
File "/anaconda3/lib/python3.7/site-packages/hyperspy/io_plugins/bruker.py", line 451, in get_file
item = item[i]
KeyError: 'EDSDatabase'

How can I resolve these errors?

Thanks,

Add a function of saving file as Gatan's dm3/dm4 file

Describe the functionality you would like to see.

Why Gatan's dm3/dm4 write function is not completed since their read function is. Is feasible of adding it?
We're trying to develop the write function of Gatan's dm3/dm4 files, by reading the source code.
What kinds of problems will we face?

Error when trying to load JEOL's .map file

Describe the bug

I am trying to load a .map file from JEOL's EPMA (Electron Probe Micro Analyzer).
Then I get the below error.

WARNING:hyperspy.io_plugins.jeol:Not a valid JEOL img format
ERROR:hyperspy.io:If this file format is supported, please report this error to the HyperSpy developers.

UnboundLocalError Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 mdata = hs.load(map_fname)

File ~\anaconda3\envs\new202302\lib\site-packages\hyperspy\io.py:466, in load(filenames, signal_type, stack, stack_axis, new_axis_name, lazy, convert_units, escape_square_brackets, stack_metadata, load_original_metadata, show_progressbar, **kwds)
463 objects.append(signal)
464 else:
465 # No stack, so simply we load all signals in all files separately
--> 466 objects = [load_single_file(filename, lazy=lazy, **kwds)
467 for filename in filenames]
469 if len(objects) == 1:
470 objects = objects[0]

File ~\anaconda3\envs\new202302\lib\site-packages\hyperspy\io.py:466, in (.0)
463 objects.append(signal)
464 else:
465 # No stack, so simply we load all signals in all files separately
--> 466 objects = [load_single_file(filename, lazy=lazy, **kwds)
467 for filename in filenames]
469 if len(objects) == 1:
470 objects = objects[0]

File ~\anaconda3\envs\new202302\lib\site-packages\hyperspy\io.py:525, in load_single_file(filename, **kwds)
518 raise ValueError(
519 "reader should be one of None, str, "
520 "or a custom file reader object"
521 )
523 try:
524 # Try and load the file
--> 525 return load_with_reader(filename=filename, reader=reader, **kwds)
527 except BaseException:
528 _logger.error(
529 "If this file format is supported, please "
530 "report this error to the HyperSpy developers."
531 )

File ~\anaconda3\envs\new202302\lib\site-packages\hyperspy\io.py:545, in load_with_reader(filename, reader, signal_type, convert_units, load_original_metadata, **kwds)
543 """Load a supported file with a given reader."""
544 lazy = kwds.get('lazy', False)
--> 545 file_data_list = reader.file_reader(filename, **kwds)
546 signal_list = []
548 for signal_dict in file_data_list:

File ~\anaconda3\envs\new202302\lib\site-packages\hyperspy\io_plugins\jeol.py:109, in file_reader(filename, **kwds)
107 fd.close()
108 else:
--> 109 d = extension_to_reader_mapping[file_ext](filename, **kwds)
110 if isinstance(d, list):
111 dictionary.extend(d)

File ~\anaconda3\envs\new202302\lib\site-packages\hyperspy\io_plugins\jeol.py:194, in _read_img(filename, scale, **kwargs)
190 _logger.warning("Not a valid JEOL img format")
192 fd.close()
--> 194 return dictionary

UnboundLocalError: local variable 'dictionary' referenced before assignment

To Reproduce

Steps to reproduce the behavior:

import hyperspy.api as hs
from pathlib import Path

map_fname = Path("data/inputdata/data001.map")
mdata = hs.load(map_fname)

Expected behavior

Load the contents of .map file

Python environement:

  • HyperSpy version: 1.7.3
  • Python version: 3.9.16

Additional context

Permissive license or LGPL?

Now that the IO code of hyperspy has been split, it may worth considering re-licensing the code with a more permissive license in RosettaSciIO.
I don't a strong view on this topic, but I start this discussion because it is coming up once in a while, and it seems that there is appetite for a more permissive license, as keeping the GPL license could prevent adoption by some libraries/softwares.

[QUESTION] Any chance to make SFS reading code LGPL?

Hello,

Recently I got a bcf file from a Bruker user containing an XRF map. Perhaps I am wrong, but my understanding is that it is just a container of files for which I already have reading support.

Your SFS reading is embedded into:

https://github.com/hyperspy/hyperspy/blob/RELEASE_next_minor/hyperspy/io_plugins/bruker.py

The license of my project is MIT and I would not like to make it GPL just for dealing with SFS nor I would like to re-invent the wheel reverse engineering SFS... Any suggestion from your side? If the SFS container is described somewhere I could also write the handling code myself.

Increase coverage

This issue is a progress tracker on the potential to improve the test coverage in RosettaSciIO:

Formats missing test files:

  • netCDF

Formats with coverage below 80%:

  • digital-micrograph [specific exceptions not covered as well]
  • digitalsurf (sur)
  • fei
  • image
  • ripple
  • tia

Formats with coverage below 90%:

  • bruker
  • dens
  • emd
  • mrc
  • msa
  • pantarhei (prz)
  • phenom
  • renishaw

Formats with coverage below 95%:

  • edax
  • empad
  • hspy
  • mrcz
  • nexus
  • semper
  • tiff
  • usid

Other files with low coverage:

  • utils/exceptions.py
  • utils/readfile.py

See https://app.codecov.io/gh/hyperspy/rosettasciio/tree/main/rsciio for coverage overview and to identify uncovered code in the plugins.

[Edit: Updated list on Dec. 2, 2023.]

About string

We still need to set a descriptive string in the About section of the main github page @ericpre

Non-FEI MRC Files Fail to Load

Describe the bug

For MRC files that are not collected by FEI software, an error will result when using hs.load() to read them. The issue is related to a change in NumPy which occurred several years ago. The bug was fixed for cases where the MRC file contains an FEI-style header, but the case for other MRC files was ignored.

The issue is that the value std_header['NEXT'] is a NumPy array, whereas the code expects it to be an integer.

To Reproduce

A test non-FEI MRC file can be found here:

https://drive.google.com/file/d/1zv1gaa3YYe8Sg5kbaUsPx3qV5yfcFsyX/view?usp=sharing
It is a short tilt series consisting of 9 images which are 512x512 pixels each with 'int16' data type which was generated using SerialEM.

If Hyperspy is used to import the MRC file, it will fail.

import hyperspy.api as hs
s = hs.load('NonFEI_MRC_Test_File.mrc')

File ~/anaconda3/envs/tomo/lib/python3.9/site-packages/hyperspy/io_plugins/mrc.py:155, in file_reader(filename, endianess, **kwds)
    153 else:
    154     _logger.warning("There was a problem reading the extended header")
--> 155     f.seek(1024 + std_header['NEXT'])
    156     fei_header = None
    157 NX, NY, NZ = std_header['NX'], std_header['NY'], std_header['NZ']

TypeError: only integer scalar arrays can be converted to a scalar index

To fix

All that needs to be done is to change line 155 of mrc.py by adding a [0] after the call to the std_header dictionary. Currently, the line reads:

f.seek(1024 + std_header['NEXT'])

It should read:

 f.seek(1024 + std_header['NEXT'][0])

Acquisition (live and real) time is apparently not loaded from bcf file

If I do :
s=hs.load('zone 2.bcf', signal_type = "EDS_TEM")
I get the following warning:
"WARNING:hyperspy.io_plugins.bcf:spectrum have no dead time records..."

Then a number of parameters are imported from the .bcf file, but apparently not the realtime and live time of the acquisition.
Ex below :

adf, eds = s
eds.metadata

โ”œโ”€โ”€ Acquisition_instrument
โ”‚ โ””โ”€โ”€ TEM
โ”‚ โ”œโ”€โ”€ Detector
โ”‚ โ”‚ โ””โ”€โ”€ EDS
โ”‚ โ”‚ โ”œโ”€โ”€ azimuth_angle = 45.0
โ”‚ โ”‚ โ”œโ”€โ”€ detector_type = SuperX
โ”‚ โ”‚ โ”œโ”€โ”€ elevation_angle = 18.0
โ”‚ โ”‚ โ””โ”€โ”€ energy_resolution_MnKa = 130
โ”‚ โ”œโ”€โ”€ beam_energy = 300.0
โ”‚ โ”œโ”€โ”€ stage_x = None
โ”‚ โ”œโ”€โ”€ stage_y = None
โ”‚ โ””โ”€โ”€ tilt_stage = 25
โ”œโ”€โ”€ General
โ”‚ โ”œโ”€โ”€ datetime = datetime.datetime(2016, 10, 21, 12, 44, 24)
โ”‚ โ”œโ”€โ”€ original_filename = zone 2.bcf
โ”‚ โ””โ”€โ”€ title = EDX
โ”œโ”€โ”€ Sample
โ”‚ โ”œโ”€โ”€ elements = ['O', 'Fe', 'Mg', 'Mn', 'C', 'Cu', 'Si']
โ”‚ โ”œโ”€โ”€ name = Map data
โ”‚ โ””โ”€โ”€ xray_lines = ['O_Ka', 'Fe_Ka', 'Mg_Ka', 'Mn_Ka', 'C_Ka', 'Cu_Ka', 'Si_Ka']
โ””โ”€โ”€ Signal
โ”œโ”€โ”€ binned = True
โ””โ”€โ”€ signal_type = EDS_TEM

tvips memory usage on windows

From @harripj in hyperspy/hyperspy#2781:

Just a follow up on this. On my Windows machine I seem to be experiencing the same behaviour as before:

If I load one of our experimental datasets (size 9.9 GB) as data = hs.load('dset.tvips', lazy=True) the file is not loaded to memory as expected. With lazy=False the peak memory usage is double the file size before settling back to ~file size:

image

Using the lazy loader again, if I just extract the center pixel from each frame as:

center = data[:, 256, 256].compute()

The whole file is loaded to memory and persists:
image

Interestingly the memory presists even if I call del center; del data.

Dask version: 2022.03.0
NumPy version : 1.21.5
Hyperspy version: 1.7.0.dev0, ie. local install of this PR.

Originally posted by @harripj in hyperspy/hyperspy#2781 (comment)

Folder/submodule naming

Currently, the folder naming for plugins does not follow the plugin naming in the specifications.yml files. As the submodules are named according to the folder names one would for example load from rsciio.prz import file_reader, while the documented plugin-name (e.g. used for the HyperSpy reader argument) is PantaRhei. The prz as import-name is not documented anywhere in the docs. I would therefore propose to make the folder name and format name consistent for all plugins to avoid confusion.

Currently the following plugins have deviating folder names (ignoring partial capitalization in plugin names):

plugin name        | folder name        -> proposed folder name
------------------------------------------------------
BrukerComposite    | bruker             -> brukercomposite
DigitalMicrograph  | digital_micrograph -> digitalmicrograph   (alternatively renaming to DM to shorten everything?)
TIA                | fei                -> tia
JobinYvon          | jobin_yvon         -> jobinyvon
PantaRhei          | prz                -> pantarhei
Semper             | semper_unf         -> semper
DigitalSurfSurface | sur                -> digitalsurf  (making DigitalSurfSurface alias name)
USID               | usid_hdf           -> usid

I would though tend to keep the capitalization in the plugin names as it is clear that module names should be lowercase (the reader argument in HyperSpy is insensitive to capitalization), but an alternative could be to also change all plugin names to lowercase to be fully consistent?

(I would though wait with the implementation until #76 is finished and merged to avoid a conflict.)

Speeding Up Binary Data Reading

Speeding up binary data reading and offering more General support

I think that this is a very good step forward in centralizing many of the file readers that exist in different packages. That being said I think that reading the data could be greatly generalized.

This problems are more sensitive with 4-D STEM data simply because of size but I do think we should have a good way to read binary data and metadata that is consistent and easy to read. On top of that all of the binary datasets should at the least be properly loading using memory mapping and maybe also provide alternatives.


Dealing with Metadata

Let's start with the example of loading metadata from a binary file. I've been playing around with defining my metadata as a dictionary with a {"metadatakey":{"pos":530,"dtype":"u4"},...} but this could also be a json or xml file which directly explains where in the file the metadata is located.

This can be then read by:

def seek_read(file, dtype, pos):
    file.seek(pos)
    return np.squeeze(np.fromfile(file, dtype, count=1))

metadata = {m: seek_read(f, mapping_dict[m]["pos"],mapping_dict[m]["dtype"]) for m in mapping_dict}

Then the metadata can be read with a simple function, even more complex dtypes like arrays can be read by defining the dtype correctly for numpy.

Ultimately that makes defining the metadata quite easy. Then you can easily map that metadata to be more cleaned for use further on.


Dealing with the Data

For reading in data I think that a similar approach can be used as well. If each signal in the dataset is defined than it can be easily memory mapped.

An example of this is:

dtype_list = [(("Array"), np.int16,(256,128)),  ("sec","<u4"), ("ms","<u2"),  ("mis","<u2"), ("Empty",bytes, 120)]

def read_binary(file, dtypes, offset, navigation_shape=None):
    keys = [d[0] for d in dtypes]
    mapped = np.memmap(file, offset=offset, dtype=dtypes, shape=navigation_shape)
    binary_data = {k: mapped[k] for k in keys}
    return binary_data

In this case trailing bytes are read efficiently and stored and it is generally clear what format the binary data exists. It also is fast and efficient at accessing the data in chunks of each signal.

Additional information:

I don't know if this ends up being the fastest way to read data (ultimately that becomes more of a challenge based on the system and if you are rechunking etc.) but there are some cases like with reading the empad detector where we are calling memmap and then reshaping the data with dask that is fairly inefficient.

I would love it if @sk1p or @uellue would chime in here as well. I think that we could maybe generalize some of the other loading schemes they have for binary data using different hardware or streaming data. Hopefully with just a general set of loading methods it would be easy to call the file reader function with different backend readers and really optimize performance.

It would also make adding new formats easier and faster with a focus on maintaining speed and flexibility to try new loading schemes as file storage changes or adapts.

Error in reading 4DSTEM .mrc files from Velox

Dear HyperSpy community,
I would like to report an error in loading .mrc files created using Velox. It was discussed previously in issue #130. The error is the same as in the issue #130, but I listed it below for ease of reference. In my case the situation is less easy to solve, because the .mrc files contain 4DSTEM data, which you cannot easily save as different format in Velox. The workaround I use is to load the dataset to ImageJ and save it as a Tiff stack which you can load into hyperspy. This approach requires you to have enough memory to load the whole dataset, but these files can easily exceed 100 GB, which can be problematic if you do not have workstation with enough memory. Therefore, it would be very useful to be able to read the file using hs.load and take advantage of lazy loading.
There is a full description of their .mrc header in Velox manual, which is unfortunately confidential, but I can ask them for permission to share it.

Petr

Error

WARNING:hyperspy.io_plugins.mrc:There was a problem reading the extended header
ERROR:hyperspy.io:If this file format is supported, please report this error to the HyperSpy developers.
Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?d72db32d-806f-4926-ae72-9110b3125e9f)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
c:\Users\Petr\Desktop\recycled battery\edx.ipynb Cell 2 in <cell line: 1>()
----> [1](vscode-notebook-cell:/c%3A/Users/Petr/Desktop/recycled%20battery/edx.ipynb#X66sZmlsZQ%3D%3D?line=0) a = hs.load('D:/Prรกce/Experimenty/raw data/20221014 Camera 91 mm Ceta 1520-e4df9eac320047e6b030968f25823eb0.mrc')

File c:\Users\Petr\anaconda3\envs\hyperspy171pyxem0142\lib\site-packages\hyperspy\io.py:454, in load(filenames, signal_type, stack, stack_axis, new_axis_name, lazy, convert_units, escape_square_brackets, stack_metadata, load_original_metadata, show_progressbar, **kwds)
    451         objects.append(signal)
    452 else:
    453     # No stack, so simply we load all signals in all files separately
--> 454     objects = [load_single_file(filename, lazy=lazy, **kwds)
    455                for filename in filenames]
    457 if len(objects) == 1:
    458     objects = objects[0]

File c:\Users\Petr\anaconda3\envs\hyperspy171pyxem0142\lib\site-packages\hyperspy\io.py:454, in <listcomp>(.0)
    451         objects.append(signal)
    452 else:
    453     # No stack, so simply we load all signals in all files separately
--> 454     objects = [load_single_file(filename, lazy=lazy, **kwds)
    455                for filename in filenames]
    457 if len(objects) == 1:
    458     objects = objects[0]

File c:\Users\Petr\anaconda3\envs\hyperspy171pyxem0142\lib\site-packages\hyperspy\io.py:513, in load_single_file(filename, **kwds)
    506     raise ValueError(
...
--> 155     f.seek(1024 + std_header['NEXT'])
    156     fei_header = None
    157 NX, NY, NZ = std_header['NX'], std_header['NY'], std_header['NZ']

TypeError: only integer scalar arrays can be converted to a scalar index

Installing from source in editable mode adds an empty directory and a shared library (.so) file

Describe the bug

Installing from source in editable mode pip install -e .[dev] adds a home/<path_to_project_root_directory>/rsciio/tests/bruker_data/ directory with a file test_compilers.o and a rsciio/bruker/unbcf_fast.cpython-310-x86_64-linux-gnu.so file to the project root directory (i.e. the home/<path... starts in the project root directory). I assume these are created when building Cython extensions, but I don't know anything about that, I'm afraid.

To Reproduce

Steps to reproduce the behavior:

# No changes
> git status
On branch remove-persitent-search-field
Your branch and 'origin/remove-persitent-search-field' have diverged,
and have 16 and 3 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)

nothing to commit, working tree clean

> pip install -e .[dev]
Obtaining file:///home/hakon/kode/rosettasciio
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Preparing editable metadata (pyproject.toml) ... done
[...]
Successfully built RosettaSciIO
Installing collected packages: RosettaSciIO
  Attempting uninstall: RosettaSciIO
    Found existing installation: RosettaSciIO 0.1.dev0
    Uninstalling RosettaSciIO-0.1.dev0:
      Successfully uninstalled RosettaSciIO-0.1.dev0
Successfully installed RosettaSciIO-0.1.dev0

# Two files added (one inside a directory)
> git status
On branch remove-persitent-search-field
Your branch and 'origin/remove-persitent-search-field' have diverged,
and have 16 and 3 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	home/
	rsciio/bruker/unbcf_fast.cpython-310-x86_64-linux-gnu.so

nothing added to commit but untracked files present (use "git add" to track)

Expected behavior

Expected a clean repository after installing dev resources from source.

Python environement:

  • RosettaSciIO version: 0.1.dev
  • Python version: 3.10.9

pip / conda install for rosettascio?

What is the planned roadmap for adding a pip / conda install to rosettascio? We could then do some testing using it as a reader for many file formats which py4DSTEM does not currently support.

Thanks for spinning this off as a separate library!

Unification of XML to dict list tree translation

I have spotted that there are quite many duplication efforts in parsing and translating hierarchical metadata in xml into pythonic dict and list structures. I will keep this updated. So I think XML translator used in bruker api is most built-up and I expand that to take into account most bizzare XML cases (and XML can be really unreadable but valid mess).

Progress:

  • move XML translation from bruker._api to utils.tools.py and expand to work on more cases. (done #111 )
  • trivista adoption (#138)
  • search for next api target to start using this
  • ....

Support for .mib Files

Describe the functionality you would like to see.

Currently pyxem partially supports loading .mib files. There was an effort to increase the ability to load these type of files in pyxem/pyxem#732 but that never reached fruition. I think that it would be a good idea to port some of this loading here.

LiberTEM has a much more complete reader here and while I would like to not duplicate efforts too much it might be a good place to start/ it would be good to migrate some of the effort to a package with less dependencies.

Describe the context

This will support Merlin Detectors often used for 4-D STEM. These detectors are kind of unique in that often they are split detectors.

Additional information

Honestly, I have never used a Merlin detector/looked at data from a Merlin detector. I can transfer the loading capabilities from pyxem fairly easily. But I might need some additional information/help.

Add support for the new EDAX .hd5 format

As brought up on gitter by @TommasoCostanzo, it would be nice to support the new EDAX .h5 (hdf5) file format for EDX measurements.

The main thing is that one file can contain quite a number of scans. Hierarchically, the file is divided in samples, then areas and each of them can contain multiple spectra, linescans and/or maps. So we would need a good mechanism for choosing whether to import one or several elements from the file - and which exactly - and then to correctly transform them into hyperspy object(s).

As APEX-EBSD is also using .h5 (though the image files are saved separately in .up2 format), KikuchiPy @hakonanes seems to already have a certain support for the new EDAX format? And the hierarchy should be similar between EDX and EBSD files.

File Reader for Oxford Instruments

I have been discussing with the guys at OI for sometime about if they want to help us read from their files. Unfortunately I don't seem to be making much progress. Is there more interest than just me in the community? Has anyone attempted this yet?
I think it would be a useful addition to Hyperspy.

Incorrect Scale Conversion for Non-FEI MRC Files

For MRC files that are not collected by FEI software, the scale is assumed to be in Angstroms and Hyperspy converts it to nanometers. However, the conversion is done by multiplying by 10 rather than dividing by 10. The relevant lines of code are 179 - 187 in mrc.py.

    if fei_header is None:
        # The scale is in Amstrongs, we convert it to nm
        scales = [10 * float(std_header['Zlen'] / std_header['MZ'])
                  if float(std_header['MZ']) != 0 else 1,
                  10 * float(std_header['Ylen'] / std_header['MY'])
                  if float(std_header['MY']) != 0 else 1,
                  10 * float(std_header['Xlen'] / std_header['MX'])
                  if float(std_header['MX']) != 0 else 1, ]
        offsets = [10 * float(std_header['ZORIGIN']),
                   10 * float(std_header['YORIGIN']),
                   10 * float(std_header['XORIGIN']), ]

Opening Bruker .bcf micro-XRF files

I have a colleague who is interested in doing some analysis on x-ray florescence data with HyperSpy. The data is saved by Bruker in a .bcf file (the same as the EDS maps implemented by @sem-geologist).

Unfortunately, the existing reader cannot handle these files, and I am starting to look into what it would take to get them opened. I'm currently awaiting word whether or not I can share his example data file, but I figured I'd get the discussion started here.

@sem-geologist, did you ever get any documentation about the .bcf format from Bruker, or was it all achieved by reverse engineering the from the file?

True lazy reading to open EMD files

Currently lazy reading of EDX spectra in EMD files is not truly lazy. The data is stored in a compressed format in the file. HyperSpy reads the compressed data in memory and uncompresses it lazily.

This lead to error messages if the number of frame is consequent.

Implementing pure lazy reading could permit to work with big datasets (i.e. cartography with too much frames)

RosettaSciIO Format Naming

The naming of formats in RosettaSciIO is historically very inconsistent.

  1. We have formats with overly complicated names, e.g.:
    Digital Micrograph dm3, SEMPER UNF (unformatted), Electron Microscopy Data (EMD), ...

  2. We have formats with capitalized names (either file extensions, acronyms or shorter manufacturer names), e.g.:
    HSPY, DENS, MRCZ, TIFF, EMPAD, JEOL, ...

  3. And we have formats with concise, but not capitalized names, e.g.:
    ZSpy, Nexus, Blockfile, PantaRhei, ...

In case 2&3, the format_name often, but not always, corresponds to the module name.

With the first release, we should consider making the naming more consistent. To allow backwards compatibility for HyperSpy users, we could allow aliases for the format_name. I would propose to add a field format_alias to the .yaml dictionary defining a format, similar to the aliases of signal_types in HyperSpy.

I would propose to use a combination of case 2&3 with the following rules for the names:

  • do not contain spaces or special characters
  • should not be all lowercase
  • can be an extension or acronym, then it is capitalized
  • extensions or acronyms should not be added to longer names
    (a special case are multiple filetypes from e.g. the same manufacturer, but so far we mostly triage that within the reader specific to that manufacturer)

Signal type of EDS dataset

We should improve the situation with setting the signal_type of EDS dataset, make it consistent across EDS data file reader and document its behaviour in rosettasciio and in hyperspy.

The approach used in the bruker reader seems to be the most sensible:

def guess_mode(hv):
"""there is no way to determine what kind of instrument
was used from metadata: TEM or SEM.
However simple guess can be made using the acceleration
voltage, assuming that SEM is <= 30kV or TEM is >30kV"""
if hv > 30.0:
mode = "TEM"
else:
mode = "SEM"
_logger.info(
"Guessing that the acquisition instrument is %s " % mode
+ "because the beam energy is %i keV. If this is wrong, " % hv
+ "please provide the right instrument using the 'instrument' "
+ "keyword."
)
return mode

In any case, data acquired on a SEM doesn't guaranty that the signal_type should be EDS_SEM, as the specimen can be thin enough to make EDS_TEM more suitable!

Adding Documentation About Dask-Distributed Support for file types

Describe the functionality you would like to see.

I would like to add to the documentation information about which file loaders support the dask-distributed backend. Mostly just add an extra column here

Currently I believe that this is only the zspy and the new file loader #11 but we can think about adding in support for the hspy file format as well as any of the other binary files.

Describe the context

I have defined a function in #11 that works as a drop in replacement for np.memmap and allows for distributed loading of some data. This is particularly useful for large data sets as well as does a much better job handling the available resources.

Additional information

Using the dask-distributed scheduler is the preferred way to interact with dask in most cases. Supporting distributed schedulers at the loading level is important for larger datasets and allows for much better scalable preformance.

Image annotating from dm3 and dm4

When opening dm3 or dm4 files in Digital Micrograph, the various annotating done when acquiring the data is shown. For example in a HAADF-STEM image one can see where a line scan was done, as shown in the picture.
annoted_stem_example

Looking around s.original_metadata I find AnnotationGroupList, which I guess are these annotations. It would be nice if these could be shown when using s.plot, with some argument to enable or disable showing them.

One possibility would be "linking" to the line scan or spectrum image, using the discussed ROI feature hyperspy/hyperspy#44

Cannot load elid file

Describe the bug

Did not read the elid file as it should have

To Reproduce

Steps to reproduce the behavior:
Please use this google drive link to download the elid file on your end

import hyperspy.api as hs
s = hs.load("Co3O4.elid")

Expected behavior

Expected to read the elid file

Python environement:

  • HyperSpy version: 1.7.3
  • Python version: 3.10.5

Additional context

Screen Shot 2023-01-27 at 4 11 52 PM

The error ERROR:hyperspy.io:If this file format is supported, please report this error to the HyperSpy developers. --------------------------------------------------------------------------- Exception Traceback (most recent call last) Input In [151], in () ----> 1 s = hs.load("Co3O4.elid") 2 s

File ~/miniforge3/envs/myenv/lib/python3.10/site-packages/hyperspy/io.py:466, in load(filenames, signal_type, stack, stack_axis, new_axis_name, lazy, convert_units, escape_square_brackets, stack_metadata, load_original_metadata, show_progressbar, **kwds)
463 objects.append(signal)
464 else:
465 # No stack, so simply we load all signals in all files separately
--> 466 objects = [load_single_file(filename, lazy=lazy, **kwds)
467 for filename in filenames]
469 if len(objects) == 1:
470 objects = objects[0]

File ~/miniforge3/envs/myenv/lib/python3.10/site-packages/hyperspy/io.py:466, in (.0)
463 objects.append(signal)
464 else:
465 # No stack, so simply we load all signals in all files separately
--> 466 objects = [load_single_file(filename, lazy=lazy, **kwds)
467 for filename in filenames]
469 if len(objects) == 1:
470 objects = objects[0]

File ~/miniforge3/envs/myenv/lib/python3.10/site-packages/hyperspy/io.py:525, in load_single_file(filename, **kwds)
518 raise ValueError(
519 "reader should be one of None, str, "
520 "or a custom file reader object"
521 )
523 try:
524 # Try and load the file
--> 525 return load_with_reader(filename=filename, reader=reader, **kwds)
527 except BaseException:
528 _logger.error(
529 "If this file format is supported, please "
530 "report this error to the HyperSpy developers."
531 )

File ~/miniforge3/envs/myenv/lib/python3.10/site-packages/hyperspy/io.py:545, in load_with_reader(filename, reader, signal_type, convert_units, load_original_metadata, **kwds)
543 """Load a supported file with a given reader."""
544 lazy = kwds.get('lazy', False)
--> 545 file_data_list = reader.file_reader(filename, **kwds)
546 signal_list = []
548 for signal_dict in file_data_list:

File ~/miniforge3/envs/myenv/lib/python3.10/site-packages/hyperspy/io_plugins/phenom.py:682, in file_reader(filename, log_info, lazy, **kwds)
681 def file_reader(filename, log_info=False, lazy=False, **kwds):
--> 682 reader = ElidReader(filename)
683 return reader.dictionaries

File ~/miniforge3/envs/myenv/lib/python3.10/site-packages/hyperspy/io_plugins/phenom.py:112, in ElidReader.init(self, pathname, block_size)
110 raise Exception('not an ELID file')
111 if version > 2:
--> 112 raise Exception('unsupported ELID format')
113 self._version = version
114 self.dictionaries = self._read_Project()

Initial Release

v0.1

We have progressed quite far on preparing an initial release and improving consistency within RosettaSciIO as documented in the milestone. Therefore, we should aim for a timely initial release. Let's try to use this issue to speed up that process and see what is left. In principle, I think we could even release the current state.

However, it would be good to clarify the following points:

  • #51
  • #14 (currently the test data blows up the project size)
  • #69
  • #79

Finally, there are a number of reported bugs that would be nice to get fixed (and most of them should be low hanging fruits):

And of course, it is always good to

Once, we are happy with the state, we need to

  • Create release workflow file (#126)
  • Document releasing workflow (#126)
  • Create RosettaSciIO on pypi - this is setup with trusted publisher in #126
  • Zenodo is setup to create DOI and entry on creation of the github release.
  • Do initial release on github - #126 set it up to create GitHub release after successfully generating sdist and wheel and uploading to pypi.org
  • Create conda-forge feedstock

improvements to tiff parser

Describe the functionality you would like to see

Get the ability to load original metadata from every multipage tiff page.

Describe the context

Do you want to extend existing functionalities, what types of signals should it apply to, etc.

I want to modify tiff reader to accept keyword to read tiff pages as separate signals. Currently it uses tifffile built-in series mechanism, which allows for memory mapping, but series ignores any metadata except from first page. Thus I want to add alternative method which would return list of signals instead of stacked single signal.

Additional information

I have datasest from few years ago, where FIB-SEM milling tomography had to be subdivided into sessions. Unfortunately, few sessions overwrote few previous runs, and data from those are available only in multipage tiff files generated at end of every session. ZEISS software interface (installed in 2014) is quite susceptible to these kind of overwriting mistakes. Original metadata has some quite important data which is dynamically changing during the milling, and its inspection is important to filter out some analytical anomalies. My aim is to concatenate data from all sessions and want original metadata stacked accordingly so that could select and consolidate those into arrayed metadata.

This again brings me to limitations of Hyperspy metadata oversimplification. There are many experimental techniques where some of parameters are changed (scanned) while acquiring series of measurements. It can be such fundamental parameters like acceleration voltage, working distance, beam current and many more. Having an access to original_metadata is practical in many cases.

Which branch I should aim for PR?
After successful merge I will forward the changes to rsciio.

Remove deprecated code

A number of readers have deprecation warnings, usually for specific keywords -- sometimes HyperSpy v2.0 is explicitly mentioned as the point where they will be removed. We could consider to remove these deprecated keywords already before releasing the first version of RosettaSciIO as it would be a good occasion to do so and the library should be first used by HyperSpy 2.0 anyway.

Make DM filereading more robust

As a few file loading bugs have surfaced recently, we should make sure bugs in the metadata loading doesn't stop loading of files.

We should try to make the individual metadata parsers robust for malformed input, but we should also make sure bugs which slip through the cracks are caught somewhere.

One way of doing this is by having the whole metadata parser encapsulated in a try:, except:. And if it fails, output some text warning and in the logger.

Not-implemented features of Bruker's format(s)

This Issue is for discussion, polling priorities and tracking the progress of additional implementation of Bruker formats, and eventual re-organization of Bruker fileformat plugins. (P.S. It is going to be edited and updated.)

Non bcf, but with reusable part of codebase already implemented in bcf plugin or usable for bcf:

  • 1. *.spx files (single bruker spectra) (#1854 )
  1. Esprit project files (tree like data structure with images, markers and spectras, quantification results)
    ? Could hyperspy load return dict of signals instead of list?
  2. peak width (in spectra part saved as sigma offset and slope)
    *? hyperspy requires only FWHM for Mn, which is static value, while bruker provides resolution with slope. (update: according fragment of bruker's manual, it could be also acquired from zero energy peak #239; update2: The resolution curve can be retrieved from SigmaAbs and SigmaLin parameters saved with spectra in xml, however for usability the hyperspy needs to be improved to accept such curve as alternative to FWHM for MnKa)

BCF features which are not implemented (missing):

  1. pixel times array (some bcf have array full of 0, but some OEM implementation have some sensible data, which marks dwell time(?) per pixel)
    update: actually the pixel times theoretically can be retrieved from zero energy peak. (#1355)
  2. More images saved as overview (Currently only the first one is returned)
  • 6. Stage data (now it is returned in original_metadata, but new metadata specification of hyperspy have the stage metadata definition.
    *? Esprit recognize only tilt, hyperspy have 2 tilt fields, probably it should map to tilt_alpha?

major bcf extension which requires new signal_type's to be defined:

  • EBSD
  • micro-XRF
    @jat255 is working on this (#1783, #19 )

Error reading Velox-generated mrc format

In loading the following example files (one is from a STEM dataset, the two other from an EDS dataset), I get the following same warning and error:
These mrc files were generated with Velox version 2.13.

Velox Exports.zip

WARNING:hyperspy.io_plugins.mrc:There was a problem reading the extended header

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-11223d99f415> in <module>
      1 # In the following, the r-letter before the string tells python that the backslashes in this string
      2 # are not "escaping" the following character, but should be used as backslashes
----> 3 s = hs.load(r"C:\Users\thomasaar\Documents\Velox Exports\20200909 1412 STEM HAADF-DF4-DF2-BF 250 kx Nano Diffraction.mrc")

c:\users\thomasaar\documents\github\hyperspy\hyperspy\io.py in load(filenames, signal_type, stack, stack_axis, new_axis_name, lazy, convert_units, escape_square_brackets, **kwds)
    355     else:
    356         # No stack, so simply we load all signals in all files separately
--> 357         objects = [load_single_file(filename, lazy=lazy, **kwds) for filename in filenames]
    358 
    359     if len(objects) == 1:

c:\users\thomasaar\documents\github\hyperspy\hyperspy\io.py in <listcomp>(.0)
    355     else:
    356         # No stack, so simply we load all signals in all files separately
--> 357         objects = [load_single_file(filename, lazy=lazy, **kwds) for filename in filenames]
    358 
    359     if len(objects) == 1:

c:\users\thomasaar\documents\github\hyperspy\hyperspy\io.py in load_single_file(filename, **kwds)
    398     else:
    399         reader = io_plugins[i]
--> 400         return load_with_reader(filename=filename, reader=reader, **kwds)
    401 
    402 

c:\users\thomasaar\documents\github\hyperspy\hyperspy\io.py in load_with_reader(filename, reader, signal_type, convert_units, **kwds)
    404                      **kwds):
    405     lazy = kwds.get('lazy', False)
--> 406     file_data_list = reader.file_reader(filename,
    407                                         **kwds)
    408     objects = []

c:\users\thomasaar\documents\github\hyperspy\hyperspy\io_plugins\mrc.py in file_reader(filename, endianess, **kwds)
    151     else:
    152         _logger.warning("There was a problem reading the extended header")
--> 153         f.seek(1024 + std_header['NEXT'])
    154         fei_header = None
    155     NX, NY, NZ = std_header['NX'], std_header['NY'], std_header['NZ']

TypeError: only integer scalar arrays can be converted to a scalar index

Convert EMD format to Relion supported format

Hi, I've been trying to convert image files in the FEI EMD format into a usable format for Relion. In the hyperspy descriptions it says:

To save files that are compatible with other programs that can use MRC such as GMS, IMOD, Relion, MotionCorr, etc. save with compressor=None, extension .mrc. JSON metadata will not be recognized by other MRC-supporting software but should not cause crashes.

The EMD files I have are composed of 10 frames acquired from a Talos L120C microscope.

I have been trying to use the code:
s=hs.load(inputfile, select_type='images', load_SI_image_stack=True, first_frame=0, last_frame=9, sum_frames=False)
s.save(outputfile, compressor='none')

If I save as tiff using this approach relion does not recognize the file. However, if I try to save as mrc as the hyperspy document describes I receive:

ValueError: Writing to this format is not supported. Supported file extensions are: mrcz, emd, nxs, hspy, rpl, blo, h5, msa, tif, unf, and png.

Is there any support for formats that are compatible with Relion? Is there something I need to add to load or save my files?

Thank you,
Dylan

SFS Reader Origins and license.

Describe the bug

Could you provide background information as to how the SFS Reader was developed? The SFS Format is a commercial product from AidAim software? Did AidAim license the format to HyperSpy for GPLv3 use under Python?

To Reproduce

Steps to reproduce the behavior:

Minimum working example of code

Expected behavior

A clear and concise description of what you expected to happen.

Python environement:

  • HyperSpy version: 1.x.x
  • Python version: 3.x

Additional context

We would like to start using HyperSpy and add more features to the Bruker.py file (EBSD data extraction specifically) but we want to be sure of the correct licensing issues. I looked back through the git commit history and the bruker.py file (after following file renames) just comes in one commit with no other comments.

Latest `python-box` release breaks rosettasciio

With python-box 7.0, we have the following error:

From https://github.com/hyperspy/rosettasciio/actions/runs/4091545239/jobs/7055708806

___________________________ test_read_EELS_metadata ____________________________
[gw0] linux -- Python 3.9.16 /opt/hostedtoolcache/Python/3.9.16/x64/bin/python

    def test_read_EELS_metadata():
        fname = os.path.join(MY_PATH, "dm3_1D_data", "test-EELS_spectrum.dm3")
>       s = hs.load(fname)

rsciio/tests/test_digitalmicrograph.py:1[43](https://github.com/hyperspy/rosettasciio/actions/runs/4091545239/jobs/7055708806#step:10:44): 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/hyperspy/io.py:517: in load
    objects = [load_single_file(filename, lazy=lazy, **kwds)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/hyperspy/io.py:517: in <listcomp>
    objects = [load_single_file(filename, lazy=lazy, **kwds)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/hyperspy/io.py:576: in load_single_file
    return load_with_reader(filename=filename, reader=reader, **kwds)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/hyperspy/io.py:597: in load_with_reader
    file_data_list = importlib.import_module(reader["api"]).file_reader(filename,
rsciio/digitalmicrograph/_api.py:1267: in file_reader
    images = [
rsciio/digitalmicrograph/_api.py:1268: in <listcomp>
    ImageObject(imdict, f, order=order)
rsciio/digitalmicrograph/_api.py:[44](https://github.com/hyperspy/rosettasciio/actions/runs/4091545239/jobs/7055708806#step:10:45)5: in __init__
    self.imdict = Box(imdict, box_dots=True)
box/box.py:286: in box.box.Box.__init__
    ???
box/box.py:660: in box.box.Box.__setitem__
    ???
box/box.py:569: in box.box.Box.__convert_and_store
    ???
box/box.py:286: in box.box.Box.__init__
    ???
box/box.py:660: in box.box.Box.__setitem__
    ???
box/box.py:569: in box.box.Box.__convert_and_store
    ???
box/box.py:286: in box.box.Box.__init__
    ???
box/box.py:660: in box.box.Box.__setitem__
    ???
box/box.py:569: in box.box.Box.__convert_and_store
    ???
box/box.py:286: in box.box.Box.__init__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   box.exceptions.BoxKeyError: "'<class 'box.box.Box'>' object has no attribute Align min"

box/box.py:6[53](https://github.com/hyperspy/rosettasciio/actions/runs/4091545239/jobs/7055708806#step:10:54): BoxKeyError
------------------------------ Captured log call -------------------------------
ERROR    hyperspy.io:io.py:[57](https://github.com/hyperspy/rosettasciio/actions/runs/4091545239/jobs/7055708806#step:10:58)9 If this file format is supported, please report this error to the HyperSpy developers.
------------------------------ Captured log call -------------------------------
ERROR    hyperspy.io:io.py:5[79](https://github.com/hyperspy/rosettasciio/actions/runs/4091545239/jobs/7055708806#step:10:80) If this file format is supported, please report this error to the HyperSpy developers.
------------------------------ Captured log call -------------------------------
ERROR    hyperspy.io:io.py:579 If this file format is supported, please report this error to the HyperSpy developers.
------------------------------ Captured log call -------------------------------
ERROR    hyperspy.io:io.py:579 If this file format is supported, please report this error to the HyperSpy developers.

import of flucam image emi file yields empty list

Describe the bug

Using hs.load to load a emi file of a flucam image yields an empty list []. I have checked that normal TEM images (ie. from a proper camera) work properly. Note that flucam images only save .emi files, not a complimentary ser file.

To Reproduce

Example file:
Spotsize3GL6.zip

$ conda activate hyperspy
$ ipython

In [1]: import hyperspy.api as hs

In [2]: hs.load('Spotsize3GL6.emi')
Out[2]: []

Expected behavior

Object returned by hs.load should contain image and metadata (just pixel size in this instance)

Python environement:

Latest version of hyperspy as installed through conda

$ python --version

Python 3.10.6

$ conda env export
name: hyperspy
channels:

  • conda-forge

  • defaults
    dependencies:

  • _libgcc_mutex=0.1=conda_forge

  • _openmp_mutex=4.5=2_gnu

  • alsa-lib=1.2.7.2=h166bdaf_0

  • aom=3.5.0=h27087fc_0

  • asciitree=0.3.3=py_2

  • asttokens=2.0.8=pyhd8ed1ab_0

  • attr=2.5.1=h166bdaf_1

  • attrs=22.1.0=pyh71513ae_1

  • backcall=0.2.0=pyh9f0ad1d_0

  • backports=1.0=py_2

  • backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0

  • bokeh=2.4.3=pyhd8ed1ab_3

  • brotli=1.0.9=h166bdaf_7

  • brotli-bin=1.0.9=h166bdaf_7

  • brunsli=0.1=h9c3ff4c_0

  • bzip2=1.0.8=h7f98852_4

  • c-ares=1.18.1=h7f98852_0

  • c-blosc2=2.4.2=h7a311fb_0

  • ca-certificates=2022.9.24=ha878542_0

  • cached-property=1.5.2=hd8ed1ab_1

  • cached_property=1.5.2=pyha770c72_1

  • certifi=2022.9.24=pyhd8ed1ab_0

  • cfitsio=4.1.0=hd9d235c_0

  • charls=2.3.4=h9c3ff4c_0

  • charset-normalizer=2.1.1=pyhd8ed1ab_0

  • click=8.1.3=unix_pyhd8ed1ab_2

  • cloudpickle=2.2.0=pyhd8ed1ab_0

  • colorama=0.4.5=pyhd8ed1ab_0

  • cycler=0.11.0=pyhd8ed1ab_0

  • dask=2022.2.1=pyhd3eb1b0_0

  • dask-core=2022.2.1=pyhd3eb1b0_0

  • dav1d=1.0.0=h166bdaf_1

  • dbus=1.13.6=h5008d03_3

  • decorator=5.1.1=pyhd8ed1ab_0

  • dill=0.3.5.1=pyhd8ed1ab_0

  • distributed=2022.2.1=pyhd3eb1b0_0

  • entrypoints=0.4=pyhd8ed1ab_0

  • executing=1.1.0=pyhd8ed1ab_0

  • expat=2.5.0=h27087fc_0

  • fasteners=0.17.3=pyhd8ed1ab_0

  • fftw=3.3.10=nompi_hf0379b8_105

  • font-ttf-dejavu-sans-mono=2.37=hab24e00_0

  • font-ttf-inconsolata=3.000=h77eed37_0

  • font-ttf-source-code-pro=2.038=h77eed37_0

  • font-ttf-ubuntu=0.83=hab24e00_0

  • fontconfig=2.14.1=hc2a2eb6_0

  • fonts-conda-ecosystem=1=0

  • fonts-conda-forge=1=0

  • freetype=2.12.1=hca18f0e_0

  • fsspec=2022.8.2=pyhd8ed1ab_0

  • gettext=0.21.1=h27087fc_0

  • giflib=5.2.1=h36c2ea0_2

  • glib=2.74.1=h6239696_0

  • glib-tools=2.74.1=h6239696_0

  • gmp=6.2.1=h58526e2_0

  • gst-plugins-base=1.20.3=h57caac4_2

  • gstreamer=1.20.3=hd4edc92_2

  • hdf5=1.12.2=nompi_h2386368_100

  • heapdict=1.0.1=py_0

  • hyperspy-base=1.7.3=py310h5764c6d_0

  • hyperspy-gui-ipywidgets=1.5.0=pyhd8ed1ab_0

  • hyperspy-gui-traitsui=1.5.2=pyhd8ed1ab_0

  • icu=70.1=h27087fc_0

  • idna=3.4=pyhd8ed1ab_0

  • imageio=2.22.0=pyhfa7a67d_0

  • importlib_metadata=4.11.4=hd8ed1ab_0

  • importlib_resources=5.10.0=pyhd8ed1ab_0

  • ipyfilechooser=0.6.0=pyhd8ed1ab_0

  • ipykernel=6.16.0=pyh210e3f2_0

  • ipyparallel=8.4.1=pyhd8ed1ab_0

  • ipython=8.5.0=pyh41d4057_1

  • ipywidgets=8.0.2=pyhd8ed1ab_1

  • jack=1.9.21=h2a1e645_0

  • jedi=0.18.1=pyhd8ed1ab_2

  • jinja2=3.1.2=pyhd8ed1ab_1

  • joblib=1.2.0=pyhd8ed1ab_0

  • jpeg=9e=h166bdaf_2

  • jsonschema=4.17.0=pyhd8ed1ab_0

  • jupyter_client=7.3.5=pyhd8ed1ab_0

  • jupyter_core=4.11.1=py310hff52083_0

  • jupyterlab_widgets=3.0.3=pyhd8ed1ab_0

  • jxrlib=1.1=h7f98852_2

  • keyutils=1.6.1=h166bdaf_0

  • krb5=1.19.3=h3790be6_0

  • lame=3.100=h166bdaf_1003

  • lcms2=2.12=hddcbb42_0

  • ld_impl_linux-64=2.36.1=hea4e1c9_2

  • lerc=4.0.0=h27087fc_0

  • libaec=1.0.6=h9c3ff4c_0

  • libavif=0.10.1=h5cdd6b5_2

  • libblas=3.9.0=16_linux64_openblas

  • libbrotlicommon=1.0.9=h166bdaf_7

  • libbrotlidec=1.0.9=h166bdaf_7

  • libbrotlienc=1.0.9=h166bdaf_7

  • libcap=2.66=ha37c62d_0

  • libcblas=3.9.0=16_linux64_openblas

  • libclang=14.0.6=default_hc1a23ef_0

  • libclang13=14.0.6=default_h31cde19_0

  • libcups=2.3.3=h3e49a29_2

  • libcurl=7.83.1=h7bff187_0

  • libdb=6.2.32=h9c3ff4c_0

  • libdeflate=1.14=h166bdaf_0

  • libedit=3.1.20191231=he28a2e2_2

  • libev=4.33=h516909a_1

  • libevent=2.1.10=h9b69904_4

  • libffi=3.4.2=h7f98852_5

  • libflac=1.4.2=h27087fc_0

  • libgcc-ng=12.1.0=h8d9b700_16

  • libgfortran-ng=12.1.0=h69a702a_16

  • libgfortran5=12.1.0=hdcd56e2_16

  • libglib=2.74.1=h7a41b64_0

  • libgomp=12.1.0=h8d9b700_16

  • libiconv=1.17=h166bdaf_0

  • liblapack=3.9.0=16_linux64_openblas

  • libllvm11=11.1.0=he0ac6c6_4

  • libllvm14=14.0.6=he0ac6c6_0

  • libnghttp2=1.47.0=hdcd2b5c_1

  • libnsl=2.0.0=h7f98852_0

  • libogg=1.3.4=h7f98852_1

  • libopenblas=0.3.21=pthreads_h78a6416_3

  • libopus=1.3.1=h7f98852_1

  • libpng=1.6.38=h753d276_0

  • libpq=14.5=hd77ab85_1

  • libsndfile=1.1.0=h27087fc_0

  • libsodium=1.0.18=h36c2ea0_1

  • libsqlite=3.39.3=h753d276_0

  • libssh2=1.10.0=haa6b8db_3

  • libstdcxx-ng=12.1.0=ha89aaad_16

  • libtiff=4.4.0=h55922b4_4

  • libtool=2.4.6=h9c3ff4c_1008

  • libudev1=252=h166bdaf_0

  • libuuid=2.32.1=h7f98852_1000

  • libvorbis=1.3.7=h9c3ff4c_0

  • libwebp-base=1.2.4=h166bdaf_0

  • libxcb=1.13=h7f98852_1004

  • libxkbcommon=1.0.3=he3ba5ed_0

  • libxml2=2.10.3=h7463322_0

  • libzlib=1.2.13=h166bdaf_4

  • libzopfli=1.0.3=h9c3ff4c_0

  • link-traits=1.0.3=pyhd8ed1ab_3

  • locket=1.0.0=pyhd8ed1ab_0

  • lz4-c=1.9.3=h9c3ff4c_1

  • matplotlib-base=3.6.1=py310h8d5ebf3_1

  • matplotlib-inline=0.1.6=pyhd8ed1ab_0

  • mpc=1.2.1=h9f54685_0

  • mpfr=4.1.0=h9202a9a_1

  • mpg123=1.30.2=h27087fc_1

  • mpmath=1.2.1=pyhd8ed1ab_0

  • mrcz=0.5.6=pyh9f0ad1d_1

  • msgpack-python=1.0.4=py310hbf28c38_0

  • munkres=1.1.4=pyh9f0ad1d_0

  • mysql-common=8.0.31=haf5c9bc_0

  • mysql-libs=8.0.31=h28c427c_0

  • natsort=8.2.0=pyhd8ed1ab_0

  • nbformat=5.7.0=pyhd8ed1ab_0

  • ncurses=6.3=h27087fc_1

  • nest-asyncio=1.5.6=pyhd8ed1ab_0

  • networkx=2.8.7=pyhd8ed1ab_0

  • nomkl=1.0=h5ca1d4c_0

  • nspr=4.32=h9c3ff4c_1

  • nss=3.78=h2350873_0

  • openjpeg=2.5.0=h7d73246_1

  • openssl=1.1.1s=h166bdaf_0

  • packaging=21.3=pyhd8ed1ab_0

  • parso=0.8.3=pyhd8ed1ab_0

  • partd=1.3.0=pyhd8ed1ab_0

  • pcre2=10.37=hc3806b6_1

  • pexpect=4.8.0=pyh9f0ad1d_2

  • pickleshare=0.7.5=py_1003

  • pint=0.19.2=pyhd8ed1ab_0

  • pip=22.2.2=pyhd8ed1ab_0

  • pkgutil-resolve-name=1.3.10=pyhd8ed1ab_0

  • ply=3.11=py_1

  • prettytable=3.4.1=pyhd8ed1ab_0

  • prompt-toolkit=3.0.31=pyha770c72_0

  • pthread-stubs=0.4=h36c2ea0_1001

  • ptyprocess=0.7.0=pyhd3deb0d_0

  • pulseaudio=14.0=habe0971_10

  • pure_eval=0.2.2=pyhd8ed1ab_0

  • pycparser=2.21=pyhd8ed1ab_0

  • pyface=7.4.2=pyhd8ed1ab_0

  • pygments=2.13.0=pyhd8ed1ab_0

  • pyopenssl=22.0.0=pyhd8ed1ab_1

  • pyparsing=3.0.9=pyhd8ed1ab_0

  • pyqt=5.15.7=py310h29803b5_2

  • pysocks=1.7.1=pyha2e5f31_6

  • python=3.10.6=h582c2e5_0_cpython

  • python-blosc=1.10.6=py310h769672d_1

  • python-dateutil=2.8.2=pyhd8ed1ab_0

  • python-fastjsonschema=2.16.2=pyhd8ed1ab_0

  • python_abi=3.10=2_cp310

  • pytz=2022.6=pyhd8ed1ab_0

  • pyusid=0.0.10=pyhd8ed1ab_0

  • qt-main=5.15.6=hc525480_0

  • readline=8.1.2=h0f457ee_0

  • requests=2.28.1=pyhd8ed1ab_1

  • setuptools=65.4.0=pyhd8ed1ab_0

  • sidpy=0.11=pyhd8ed1ab_0

  • six=1.16.0=pyh6c4a22f_0

  • snappy=1.1.9=hbd366e4_1

  • sortedcontainers=2.4.0=pyhd8ed1ab_0

  • sparse=0.13.0=pyhd8ed1ab_0

  • sqlite=3.39.3=h5082296_0

  • stack_data=0.5.1=pyhd8ed1ab_0

  • tblib=1.7.0=pyhd8ed1ab_0

  • threadpoolctl=3.1.0=pyh8a188c0_0

  • tifffile=2022.8.12=pyhd8ed1ab_0

  • tk=8.6.12=h27826a3_0

  • toml=0.10.2=pyhd8ed1ab_0

  • toolz=0.12.0=pyhd8ed1ab_0

  • tqdm=4.64.1=pyhd8ed1ab_0

  • traitlets=5.4.0=pyhd8ed1ab_0

  • traitsui=7.4.1=pyhd8ed1ab_0

  • typing-extensions=4.3.0=hd8ed1ab_0

  • typing_extensions=4.3.0=pyha770c72_0

  • tzdata=2022d=h191b570_0

  • urllib3=1.26.11=pyhd8ed1ab_0

  • wcwidth=0.2.5=pyh9f0ad1d_2

  • wheel=0.37.1=pyhd8ed1ab_0

  • widgetsnbextension=4.0.3=pyhd8ed1ab_0

  • xcb-util=0.4.0=h166bdaf_0

  • xcb-util-image=0.4.0=h166bdaf_0

  • xcb-util-keysyms=0.4.0=h166bdaf_0

  • xcb-util-renderutil=0.3.9=h166bdaf_0

  • xcb-util-wm=0.4.1=h166bdaf_0

  • xorg-libxau=1.0.9=h7f98852_0

  • xorg-libxdmcp=1.1.3=h7f98852_0

  • xz=5.2.6=h166bdaf_0

  • yaml=0.2.5=h7f98852_2

  • zarr=2.13.2=pyhd8ed1ab_1

  • zeromq=4.3.4=h9c3ff4c_1

  • zfp=1.0.0=h27087fc_1

  • zict=2.2.0=pyhd8ed1ab_0

  • zipp=3.8.1=pyhd8ed1ab_0

  • zlib=1.2.13=h166bdaf_4

  • zlib-ng=2.0.6=h166bdaf_0

  • zstd=1.5.2=h6239696_4

  • pip:

    • blosc==1.10.6
    • brotlipy==0.7.0
    • cffi==1.15.1
    • contourpy==1.0.5
    • cryptography==37.0.4
    • cytoolz==0.12.0
    • debugpy==1.6.3
    • fonttools==4.37.4
    • gmpy2==2.1.2
    • h5py==3.7.0
    • hyperspy==1.7.3
    • imagecodecs==2022.9.26
    • importlib-metadata==4.11.4
    • jupyter-core==4.11.1
    • kiwisolver==1.4.4
    • llvmlite==0.39.1
    • markupsafe==2.1.1
    • matplotlib==3.6.1
    • mrcfile==1.4.3
    • msgpack==1.0.4
    • numba==0.56.2
    • numcodecs==0.10.2
    • numexpr==2.8.3
    • numpy==1.23.3
    • pandas==1.5.1
    • pillow==9.2.0
    • psutil==5.9.2
    • pypng==0.20220715.0
    • pyqt5==5.15.7
    • pyqt5-sip==12.11.0
    • pyrsistent==0.19.1
    • pywavelets==1.3.0
    • pyyaml==6.0
    • pyzmq==24.0.1
    • scikit-image==0.19.3
    • scikit-learn==1.1.3
    • scipy==1.9.1
    • sip==6.7.3
    • sympy==1.11.1
    • tornado==6.2
    • traits==6.4.1
    • unicodedata2==14.0.0
      prefix: /home/hbrown/anaconda3/envs/hyperspy

Discussions

Could the github discussion tab be enabled for this repository? Some stuff are for discussions, and could take place out from issues.

Automatically make PR's for `black`

Describe the functionality you would like to see.

In pyxem we recently added in a Github action which automatically pushes a PR when the code base doesn't follow the most recent black specification. (pyxem/pyxem#914) The idea is that if someone merges some code without applying black we can easily remedy that by merging in the auto created PR. (See here for what those look like pyxem/pyxem#915)

Is this something that would be helpful here?

Describe the context

There are a couple of reasons for this addition. Mainly it helps when new contributors who aren't familiar with black try to add code. While eventually you want people to be running black locally there are cases where merging the code and then just merging the automatically contributed PR is easier than trying to get someone to run black just to get the code working.

It also will create PR's when new versions of black are released helping to keep on top of those changes.

set_log_level for rosettasciio is needed

Describe the functionality you would like to see.

Like hyperspy's hs.set_log_level function, rosettasciio.set_log_level is needed.
May be hs.set_log_level should also control rsciio.set_log_level.

Error when writing two-dimensional signal in mrcz format

Describe the bug

When writing 2-dimensional signal in mrcz format,
IndexError exception is raised in rsciio/mrcz/api.py.

>   pixelsize = [signal["axes"][I]["scale"] for I in _WRITE_ORDER]
E   IndexError: list index out of range

To Reproduce

Steps to reproduce the behavior:

from hyperspy.signals import Signal2D

s = Signal2D(np.arange(256, dtype=np.uint8).reshape(16,16))
s.save("test.mrcz")

Expected behavior

current mrcz/specifications.yaml shows
writes: [[2, 0], [2, 1], [2, 2], [3, 0]]
but it seems that only 3-dimensional data can be accepted.
writes should be [[3, 0]] ?

Python environement:

  • RosettaSciIO version: main/HEAD
  • Python version: 3.8
  • HyperSpy version: RELEASE_next_major/HEAD

Additional context

Support for the MSA / MAS / AMAS Hyper-Dimensional Data file?

When I was looking around for the specification/documentation of the msa format, I found this pure python package, which implement reading/writing MSA / MAS / AMAS Hyper-Dimensional data files. Although this format looks interesting, I don't know if there are a lot of people using it.
This a pure python package, which is available in pypi, then it should be fairly straightforward to add a
io plugin in HyperSpy.

https://github.com/pyhmsa/pyhmsa

http://www.csiro.au/luminescence/HMSA/index.html

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.