Universal Spectroscopy and Imaging Data (h5USID)#
Background#
Universal Spectroscopy and Imaging Data
(USID) is an open, community-driven, self-describing, and standardized schema for
representing imaging and spectroscopy data of any size, dimensionality, precision,
instrument of origin, or modality. USID data is typically stored in
Hierarchical Data Format Files (HDF5) and the combination of USID within .hdf5
files is referred to as h5USID.
pyUSID provides a convenient interface to I/O operations on such h5USID files. USID (via pyUSID) forms the foundation for other materials microscopy scientific python package called pycroscopy. If you have any questions regarding this module, please consider contacting the developers of pyUSID.
Also see the HDF5 utility functions for inspecting HDF5 files.
Note
h5USID files can contain multiple USID datasets within the same file. RosettaSciIO supports reading in one or more USID datasets.
Note
When writing files with this plugin, the model and other secondary data artifacts linked to the signal are not written to the file but these can be implemented at a later stage.
Requirements#
Reading and writing h5USID files requires the installation of pyUSID.
In HyperSpy, files must use the .h5
file extension
in order to use this IO
plugin or the reader=usid_hdf5
parameter of the load
function needs to be
set explicitly. Otherwise, using the .hdf5
extension will default to the
HyperSpy plugin.
API functions#
- rsciio.usid.file_reader(filename, lazy=False, dataset_path=None, ignore_non_uniform_dims=True)#
Read a USID Main dataset present in an HDF5 file into a HyperSpy Signal.
- Parameters:
- filename
str
,pathlib.Path
Filename of the file to read or corresponding pathlib.Path.
- lazybool, default=False
Whether to open the file lazily or not.
- dataset_path
str
, optional Absolute path of USID Main HDF5 dataset. Default is
None
- all Main Datasets will be read. Given that HDF5 files can accommodate very large datasets, lazy reading is strongly recommended. If a string like"/Measurement_000/Channel_000/My_Dataset"
is provided, the specific dataset will be loaded.- ignore_non_uniform_dimsbool, optional
If
True
(default), parameters that were varied non-uniformly in the desired dataset will result in Exceptions. Else, all such non-uniformly varied parameters will be treated as uniformly varied parameters and a Signal object will be generated.
- filename
- Returns:
list
ofdict
List of dictionaries containing the following fields:
‘data’ – multidimensional
numpy.ndarray
ordask.array.Array
‘axes’ – list of dictionaries describing the axes containing the fields ‘name’, ‘units’, ‘index_in_array’, and either ‘size’, ‘offset’, and ‘scale’ or a numpy array ‘axis’ containing the full axes vector
‘metadata’ – dictionary containing the parsed metadata
‘original_metadata’ – dictionary containing the full metadata tree from the input file
- rsciio.usid.file_writer(filename, signal, **kwds)#
Write a HyperSpy Signal object to a HDF5 file formatted according to USID.
- Parameters:
- filename
str
,pathlib.Path
Filename of the file to write to or corresponding pathlib.Path.
- signal
dict
Dictionary containing the signal object. Should contain the following fields:
‘data’ – multidimensional numpy array
‘axes’ – list of dictionaries describing the axes containing the fields ‘name’, ‘units’, ‘index_in_array’, and either ‘size’, ‘offset’, and ‘scale’ or a numpy array ‘axis’ containing the full axes vector
‘metadata’ – dictionary containing the metadata tree
- **kwds
dict
, optional All other keyword arguments will be passed to
pyUSID.io.hdf_utils.model.write_main_dataset()
.
- filename
Usage examples#
Reading the sole dataset within a h5USID file using HyperSpy:
>>> import hyperspy.api as hs
>>> hs.load("sample.h5")
<Signal2D, title: HAADF, dimensions: (|128, 128)>
If multiple datasets are present within the h5USID file, all available datasets will be loaded.
Note
Given that HDF5 files can accommodate very large datasets, setting lazy=True
is strongly recommended if the contents of the HDF5 file are not known apriori.
This prevents issues with regard to loading datasets far larger than memory.
Also note that setting lazy=True
leaves the file handle to the HDF5 file open.
If it is important that the files be closed after reading, set lazy=False
.
>>> hs.load("sample.h5")
[<Signal2D, title: HAADF, dimensions: (|128, 128)>,
<Signal1D, title: EELS, dimensions: (|64, 64, 1024)>]
We can load a specific dataset using the dataset_path
keyword argument.
Setting it to the absolute path of the desired dataset will cause the single
dataset to be loaded.
>>> # Loading a specific dataset
>>> hs.load("sample.h5", dataset_path='/Measurement_004/Channel_003/Main_Data')
<Signal2D, title: HAADF, dimensions: (|128, 128)>
h5USID files support the storage of HDF5 dataset with compound data types. As an (oversimplified) example, one could store a color image using a compound data type that allows each color channel to be accessed by name rather than an index. Naturally, reading in such a compound dataset into HyperSpy will result in a separate signal for each named component in the dataset:
>>> hs.load("file_with_a_compound_dataset.h5")
[<Signal2D, title: red, dimensions: (|128, 128)>,
Signal2D, title: blue, dimensions: (|128, 128)>,
Signal2D, title: green, dimensions: (|128, 128)>]
h5USID files also support parameters or dimensions that have been varied non-uniformly. Currently, the reading of non-uniform axes is not implemented in RosettaSciIO, the USID plugin will default to a warning when it encounters a parameter that has been varied non-uniformly:
>>> hs.load("sample.h5")
UserWarning: Ignoring non-uniformity of dimension: Bias
<BaseSignal, title: , dimensions: (|7, 3, 5, 2)>
In order to prevent accidental misinterpretation of information downstream, the keyword argument
ignore_non_uniform_dims
can be set to False
which will result in a ValueError
instead.
>>> hs.load("sample.h5")
ValueError: Cannot load provided dataset. Parameter: Bias was varied non-uniformly.
Supply keyword argument "ignore_non_uniform_dims=True" to ignore this error