Lazy loading#
Data can be loaded lazily by using lazy=True
, however not all formats are supported.
The data will be loaded as a dask array instead of a numpy array.
>>> from rsciio import msa
>>> d = msa.file_reader("file.mrc", lazy=True)
>>> d["data"]
dask.array<array, shape=(10, 20, 30), dtype=int64, chunksize=(10, 20, 30), chunktype=numpy.ndarray>
Chunks#
Depending on the intended processing after loading the data, it may be necessary to
define the chunking manually to control the memory usage or compute distribution.
The chunking can also be specified as follow using the chunks
parameter:
>>> s = hs.load("file.mrc", lazy=True, chunks=(5, 10, 10))
Memory mapping#
Binary file formats are loaded lazily using memory mapping and are compatible with the dask distributed
scheduler. This implementation uses an approach similar to that described in the dask documentation on
memory mapping - see the memmap_distributed()
function for more information.
Distributed Loading#
Not all formats are compatible with the dask distributed scheduler. See the last columns of the supported formats table to know which reader are supported.
In almost all cases the memmap_distributed()
function can be dropped in-place of the
numpy.memmap
function. It also now supports the positions
parameter which is different from the equivalent
numpy function. The positions
parameter is a numpy array of positions which maps some arbitrary scan positions
to a grid. This is useful for loading arbitrary scan positions from a file. The positions
parameter does require
that the data is chunked only in the navigation axis.