MRCZ format#

Note

To read this format, the optional dependencies blosc and mrcz are required.

The mrcz format is an extension of the CCP-EM MRC2014 file format. CCP-EM MRC2014 file format. It uses the blosc meta-compression library to bitshuffle and compress files in a blocked, multi-threaded environment. The supported data types are float32, int8, uint16, int16 and complex64.

It supports arbitrary meta-data, which is serialized into JSON.

MRCZ also supports asynchronous reads and writes.

More information on the `mrcz` format#
Repository	em-MRCZ
PyPI	https://pypi.org/project/mrcz
Citation	https://doi.org/10.1016/j.jsb.2017.11.012
Preprint	https://www.biorxiv.org/content/10.1101/116533v1

Support for this format is not enabled by default. In order to enable it, the mrcz library needs to be installed and optionally blosc to use compression.

API functions#

rsciio.mrcz.file_reader(filename, lazy=False, mmap_mode='c', endianess='<', **kwds)#

File reader for the MRCZ format for tomographic data.

Parameters:

filenamestr, pathlib.Path: Filename of the file to read or corresponding pathlib.Path.
lazybool, default=False: Whether to open the file lazily or not. The file will stay open until closed in compute() or closed manually. get_file_handle() can be used to access the file handler and close it manually.
mmap_mode{None, “r+”, “r”, “w+”, “c”}, default=None: Argument passed to numpy.memmap. A memory-mapped array is stored on disk, and not directly loaded into memory. However, it can be accessed and sliced like any ndarray. Lazy loading does not support in-place writing (i.e lazy loading and the "r+" mode are incompatible). The MRCZ reader currently only supports C-ordering memory-maps. If None (default), the value is "r" when lazy=True, otherwise it is "c".
endianessstr, default=”<”: "<" or ">", depending on how the bits are written to the file.
**kwdsdict, optional: The keyword arguments are passed to mrcz.readMRC().

Returns:

list of dict

List of dictionaries containing the following fields:

‘data’ – multidimensional numpy.ndarray or dask.array.Array
‘axes’ – list of dictionaries describing the axes containing the fields ‘name’, ‘units’, ‘index_in_array’, and either ‘size’, ‘offset’, and ‘scale’ or a numpy array ‘axis’ containing the full axes vector
‘metadata’ – dictionary containing the parsed metadata
‘original_metadata’ – dictionary containing the full metadata tree from the input file

When the file contains several datasets, each dataset will be loaded as separate dictionary.

Examples

>>> from rsciio.mrcz import file_reader
>>> new_signal = file_reader('file.mrcz')

rsciio.mrcz.file_writer(filename, signal, endianess='<', do_async=False, compressor=None, clevel=1, n_threads=None)#

Write signal to MRCZ format.

Parameters:

filenamestr, pathlib.Path

Filename of the file to write to or corresponding pathlib.Path.

signaldict

Dictionary containing the signal object. Should contain the following fields:

‘data’ – multidimensional numpy array
‘axes’ – list of dictionaries describing the axes containing the fields ‘name’, ‘units’, ‘index_in_array’, and either ‘size’, ‘offset’, and ‘scale’ or a numpy array ‘axis’ containing the full axes vector
‘metadata’ – dictionary containing the metadata tree

endianessstr, default=”<”

"<" or ">", depending on how the bits are written to the file.

do_asyncbool, Default=False

Currently supported within RosettaSciIO for writing only, this will save the file in a background thread and return immediately. Warning: there is no method currently implemented within RosettaSciIO to tell if an asychronous write has finished.

compressor{None, “zlib”, “zstd”, “lz4”}, Default=None

The compression codec.

clevelint, Default=1

The compression level, an int from 1 to 9.

n_threadsint

The number of threads to use for blosc compression. Defaults to the maximum number of virtual cores (including Intel Hyperthreading) on your system, which is recommended for best performance. If do_async = True you may wish to leave one thread free for the Python GIL.

Notes

The recommended compression codec is zstd (zStandard) with clevel=1 for general use. If speed is critical, use lz4 (LZ4) with clevel=9. Integer data compresses more redably than floating-point data, and in general the histogram of values in the data reflects how compressible it is.

To save files that are compatible with other programs that can use MRC such as GMS, IMOD, Relion, MotionCorr, etc. save with compressor=None, extension .mrc. JSON metadata will not be recognized by other MRC-supporting software but should not cause crashes.

Examples

>>> from rsciio.mrcz import file_writer
>>> file_writer('file.mrcz', signal, do_async=True, compressor='zstd', clevel=1)