ZSpy - HyperSpy’s Zarr Specification#

Similarly to the hspy format, the .zspy format guarantees that no information will be lost in the writing process and that supports saving data of arbitrary dimensions. It is based on the Zarr project. Which exists as a drop in replacement for hdf5 with the intention to fix some of the speed and scaling issues with the hdf5 format and is therefore suitable for saving big data. Example using HyperSpy:

>>> import hyperspy.api as hs
>>> s = hs.signals.BaseSignal([0])
>>> s.save('test.zspy') # will save in nested directory
>>> hs.load('test.zspy') # loads the directory

When saving to zspy, all supported objects in the signal’s hyperspy.api.signals.BaseSignal.metadata is stored. This includes lists, tuples and signals. Please note that in order to increase saving efficiency and speed, if possible, the inner-most structures are converted to numpy arrays when saved. This procedure homogenizes any types of the objects inside, most notably casting numbers as strings if any other strings are present:

By default, a zarr.storage.NestedDirectoryStore is used, but other zarr store can be used by providing a zarr.storage instead as argument to the hyperspy.api.signals.BaseSignal.save() or the hyperspy.api.load() function. If a .zspy file has been saved with a different store, it would need to be loaded by passing a store of the same type:

>>> import zarr
>>> filename = 'test.zspy'
>>> store = zarr.LMDBStore(filename)
>>> signal.save(store) # saved to LMDB

To load this file again

>>> import zarr
>>> filename = 'test.zspy'
>>> store = zarr.LMDBStore(filename)
>>> s = hs.load(store) # load from LMDB

API functions#

rsciio.zspy.file_reader(filename, lazy=False, **kwds)#

Read data from zspy files saved with the HyperSpy zarr format specification.

Parameters:
filenamestr, pathlib.Path

Filename of the file to read or corresponding pathlib.Path.

lazybool, default=False

Whether to open the file lazily or not.

**kwdsdict, optional

Pass keyword arguments to the zarr.convenience.open() function.

Returns:
list of dict

List of dictionaries containing the following fields:

  • ‘data’ – multidimensional numpy.ndarray or dask.array.Array

  • ‘axes’ – list of dictionaries describing the axes containing the fields ‘name’, ‘units’, ‘index_in_array’, and either ‘size’, ‘offset’, and ‘scale’ or a numpy array ‘axis’ containing the full axes vector

  • ‘metadata’ – dictionary containing the parsed metadata

  • ‘original_metadata’ – dictionary containing the full metadata tree from the input file

rsciio.zspy.file_writer(filename, signal, chunks=None, compressor=None, close_file=True, write_dataset=True, show_progressbar=True, **kwds)#

Write data to HyperSpy’s zarr format.

Parameters:
filenamestr, pathlib.Path

Filename of the file to write to or corresponding pathlib.Path.

signaldict

Dictionary containing the signal object. Should contain the following fields:

  • ‘data’ – multidimensional numpy array

  • ‘axes’ – list of dictionaries describing the axes containing the fields ‘name’, ‘units’, ‘index_in_array’, and either ‘size’, ‘offset’, and ‘scale’ or a numpy array ‘axis’ containing the full axes vector

  • ‘metadata’ – dictionary containing the metadata tree

chunkstuple of int or None, default=None

Define the chunking used for saving the dataset. If None, calculates chunks for the signal, with preferably at least one chunk per signal space.

compressornumcodecs.abc.Codec or None, default=None

A compressor can be passed to the save function to compress the data efficiently, see Numcodecs codec. If None, use a Blosc compressor.

close_filebool, default=True

Close the file after writing. Only relevant for some zarr storages (zarr.storage.ZipStore, zarr.storage.DBMStore) requiring store to flush data to disk. If False, doesn’t close the file after writing. The file should not be closed if the data needs to be accessed lazily after saving.

write_datasetbool, default=True

If False, doesn’t write the dataset when writing the file. This can be useful to overwrite signal attributes only (for example axes_manager) without having to write the whole dataset, which can take time.

show_progressbarbool, default=True

Whether to show the progressbar or not.

**kwds

The keyword arguments are passed to the zarr.hierarchy.Group.require_dataset() function.

Examples

>>> from numcodecs import Blosc
>>> compressor = Blosc(cname='zstd', clevel=1, shuffle=Blosc.SHUFFLE) # Used by default
>>> file_writer('test.zspy', s, compressor = compressor) # will save with Blosc compression

Note

Lazy operations are often i-o bound. Reading and writing the data creates a bottle neck in processes due to the slow read write speed of many hard disks. In these cases, compressing your data is often beneficial to the speed of some operations. Compression speeds up the process as there is less to read/write with the trade off of slightly more computational work on the CPU.