ZSpy - HyperSpy’s Zarr Specification#
Similarly to the hspy format, the .zspy
format guarantees that no
information will be lost in the writing process and that supports saving data
of arbitrary dimensions. It is based on the Zarr project. Which exists as a drop in
replacement for hdf5 with the intention to fix some of the speed and scaling
issues with the hdf5 format and is therefore suitable for saving
big data. Example using HyperSpy:
>>> import hyperspy.api as hs
>>> s = hs.signals.BaseSignal([0])
>>> s.save('test.zspy') # will save in nested directory
>>> hs.load('test.zspy') # loads the directory
When saving to zspy, all supported objects in the signal’s
hyperspy.api.signals.BaseSignal.metadata
is stored. This includes lists, tuples and signals.
Please note that in order to increase saving efficiency and speed, if possible,
the inner-most structures are converted to numpy arrays when saved. This
procedure homogenizes any types of the objects inside, most notably casting
numbers as strings if any other strings are present:
By default, a zarr.storage.NestedDirectoryStore
is used, but other
zarr store can be used by providing a zarr.storage
instead as argument to the hyperspy.api.signals.BaseSignal.save()
or the
hyperspy.api.load()
function. If a .zspy
file has been saved with a different
store, it would need to be loaded by passing a store of the same type:
>>> import zarr
>>> filename = 'test.zspy'
>>> store = zarr.LMDBStore(filename)
>>> signal.save(store) # saved to LMDB
To load this file again
>>> import zarr
>>> filename = 'test.zspy'
>>> store = zarr.LMDBStore(filename)
>>> s = hs.load(store) # load from LMDB
API functions#
- rsciio.zspy.file_reader(filename, lazy=False, **kwds)#
Read data from zspy files saved with the HyperSpy zarr format specification.
- Parameters:
- filename
str
,pathlib.Path
Filename of the file to read or corresponding pathlib.Path.
- lazybool, default=False
Whether to open the file lazily or not.
- **kwds
dict
, optional Pass keyword arguments to the
zarr.convenience.open()
function.
- filename
- Returns:
list
ofdict
List of dictionaries containing the following fields:
‘data’ – multidimensional
numpy.ndarray
ordask.array.Array
‘axes’ – list of dictionaries describing the axes containing the fields ‘name’, ‘units’, ‘index_in_array’, and either ‘size’, ‘offset’, and ‘scale’ or a numpy array ‘axis’ containing the full axes vector
‘metadata’ – dictionary containing the parsed metadata
‘original_metadata’ – dictionary containing the full metadata tree from the input file
- rsciio.zspy.file_writer(filename, signal, chunks=None, compressor=None, close_file=True, write_dataset=True, show_progressbar=True, **kwds)#
Write data to HyperSpy’s zarr format.
- Parameters:
- filename
str
,pathlib.Path
Filename of the file to write to or corresponding pathlib.Path.
- signal
dict
Dictionary containing the signal object. Should contain the following fields:
‘data’ – multidimensional numpy array
‘axes’ – list of dictionaries describing the axes containing the fields ‘name’, ‘units’, ‘index_in_array’, and either ‘size’, ‘offset’, and ‘scale’ or a numpy array ‘axis’ containing the full axes vector
‘metadata’ – dictionary containing the metadata tree
- chunks
tuple
ofint
orNone
, default=None Define the chunking used for saving the dataset. If
None
, calculates chunks for the signal, with preferably at least one chunk per signal space.- compressor
numcodecs.abc.Codec
orNone
, default=None A compressor can be passed to the save function to compress the data efficiently, see Numcodecs codec. If None, use a Blosc compressor.
- close_filebool, default=True
Close the file after writing. Only relevant for some zarr storages (
zarr.storage.ZipStore
,zarr.storage.DBMStore
) requiring store to flush data to disk. IfFalse
, doesn’t close the file after writing. The file should not be closed if the data needs to be accessed lazily after saving.- write_datasetbool, default=True
If
False
, doesn’t write the dataset when writing the file. This can be useful to overwrite signal attributes only (for exampleaxes_manager
) without having to write the whole dataset, which can take time.- show_progressbarbool, default=True
Whether to show the progressbar or not.
- **kwds
The keyword arguments are passed to the
zarr.hierarchy.Group.require_dataset()
function.
- filename
Examples
>>> from numcodecs import Blosc >>> compressor = Blosc(cname='zstd', clevel=1, shuffle=Blosc.SHUFFLE) # Used by default >>> file_writer('test.zspy', s, compressor = compressor) # will save with Blosc compression
Note
Lazy operations are often i-o bound. Reading and writing the data creates a bottle neck in processes due to the slow read write speed of many hard disks. In these cases, compressing your data is often beneficial to the speed of some operations. Compression speeds up the process as there is less to read/write with the trade off of slightly more computational work on the CPU.