Electron Microscopy Dataset (EMD)#

EMD stands for “Electron Microscopy Dataset”. It is a subset of the open source HDF5 wrapper format. N-dimensional data arrays of any standard type can be stored in an HDF5 file, as well as tags and other metadata.

Note

To read this format, the optional dependency h5py is required.

EMD (NCEM)#

This EMD format was developed by Colin Ophus at the National Center for Electron Microscopy (NCEM). This format is used by the prismatic software to save the simulation outputs.

Usage examples#

For files containing several datasets, the dataset_path argument can be used to select a specific one:

>>> from rsciio.emd import file_reader
>>> s = file_reader("adatafile.emd", dataset_path="/experimental/science_data_1/data")

Or several by using a list:

>>> s = file_reader("adatafile.emd",
...             dataset_path=[
...                 "/experimental/science_data_1/data",
...                 "/experimental/science_data_2/data"])

EMD (Velox)#

This is a non-compliant variant of the standard EMD format developed by ThermoFisher (former FEI). RosettaSciIO supports importing images, EDS spectrum and EDS spectrum streams (spectrum images stored in a sparse format). For spectrum streams, there are several loading options (described in the docstring below) to control the frames and detectors to load and whether to sum them on loading. The default is to import the sum over all frames and over all detectors in order to decrease the data size in memory.

Note

Pruned Velox EMD files only contain the spectrum image in a proprietary format that RosettaSciIO cannot read. Therefore, don’t prune Velox EMD files if you intend to read them with RosettaSciIO.

Note

When using HyperSpy, FFTs made in Velox are loaded in as-is as a HyperSpy ComplexSignal2D object. The FFT is not centered and only positive frequencies are stored in the file. Making FFTs with HyperSpy from the respective image datasets is recommended.

Note

When using HyperSpy, DPC data is loaded in as a HyperSpy ComplexSignal2D object.

Note

Currently, only lazy uncompression rather than lazy loading is implemented. This means that it is not currently possible to read EDS SI Velox EMD files with size bigger than the available memory.

Note

To load EDS data, the optional dependency sparse is required.

Warning

This format is still not stable and files generated with the most recent version of Velox may not be supported. If you experience issues loading a file, please report it to the RosettaSciIO developers so that they can add support for newer versions of the format.

Usage examples#

>>> from rsciio.emd import file_reader
>>> file_reader("sample.emd")
[<Signal2D, title: HAADF, dimensions: (|179, 161)>,
<EDSSEMSpectrum, title: EDS, dimensions: (179, 161|4096)>]
>>> file_reader("sample.emd", sum_EDS_detectors=False)
[<Signal2D, title: HAADF, dimensions: (|179, 161)>,
<EDSSEMSpectrum, title: EDS - SuperXG21, dimensions: (179, 161|4096)>,
<EDSSEMSpectrum, title: EDS - SuperXG22, dimensions: (179, 161|4096)>,
<EDSSEMSpectrum, title: EDS - SuperXG23, dimensions: (179, 161|4096)>,
<EDSSEMSpectrum, title: EDS - SuperXG24, dimensions: (179, 161|4096)>]

>>> file_reader("sample.emd", sum_frames=False, load_SI_image_stack=True, SI_dtype=np.int8, rebin_energy=4)
[<Signal2D, title: HAADF, dimensions: (50|179, 161)>,
<EDSSEMSpectrum, title: EDS, dimensions: (50, 179, 161|1024)>]

API functions#

rsciio.emd.file_reader(filename, lazy=False, dataset_path=None, stack_group=None, select_type=None, first_frame=0, last_frame=None, sum_frames=True, sum_EDS_detectors=True, rebin_energy=1, SI_dtype=None, load_SI_image_stack=False)#

Read EMD file, which can be an NCEM or a Velox variant of the EMD format. Also reads Direct Electron’s DE5 format, which is read according to the NCEM specifications.

Parameters:
filenamestr, pathlib.Path

Filename of the file to read or corresponding pathlib.Path.

lazybool, default=False

Whether to open the file lazily or not. The file will stay open until closed in compute() or closed manually. get_file_handle() can be used to access the file handler and close it manually.

dataset_pathNone, str or list of str, default=None

NCEM only: Path of the dataset. If None, load all supported datasets, otherwise the specified dataset(s).

stack_groupNone, bool, default=None

NCEM only: Stack datasets of groups with common path. Relevant for emd file version >= 0.5, where groups can be named group0000, group0001, etc.

select_type{None, ‘image’, ‘single_spectrum’, ‘spectrum_image’}

Velox only: specifies the type of data to load: if 'image' is selected, only images (including EDS maps) are loaded, if 'single_spectrum' is selected, only single spectra are loaded and if 'spectrum_image' is selected, only the spectrum image will be loaded.

first_frameint, default=0

Velox only: Select the start for the frame range of the EDS spectrum image to load.

last_frameint or None, default=None

Velox only: Select the end for the frame range of the EDS spectrum image to load.

sum_framesbool, default=True

Velox only: Load each individual EDS frame. The EDS spectrum image will be loaded with an extra navigation dimension corresponding to the frame index (time axis).

sum_EDS_detectorsbool, default=True

Velox only: Load the EDS signal as a sum over the signals from all EDS detectors (default) or, alternatively, load the signal of each individual EDS detector. In the latter case, a corresponding number of distinct EDS signals is returned.

rebin_energyint, default=1

Velox only: Rebin the energy axis by given factor. Useful in combination with sum_frames=False to reduce the data size when reading the individual frames of the spectrum image.

SI_dtypenumpy.dtype or None, default=None

Velox only: Change the datatype of a spectrum image. Useful in combination with sum_frames=False to reduce the data size when reading the individual frames of the spectrum image. If None, the dtype of the data in the emd file is used.

load_SI_image_stackbool, default=False

Velox only: Allows loading the stack of STEM images acquired simultaneously with the EDS spectrum image. This option can be useful to monitor any specimen changes during the acquisition or to correct the spatial drift in the spectrum image by using the STEM images.

Returns:
list of dict

List of dictionaries containing the following fields:

  • ‘data’ – multidimensional numpy.ndarray or dask.array.Array

  • ‘axes’ – list of dictionaries describing the axes containing the fields ‘name’, ‘units’, ‘index_in_array’, and either ‘size’, ‘offset’, and ‘scale’ or a numpy array ‘axis’ containing the full axes vector

  • ‘metadata’ – dictionary containing the parsed metadata

  • ‘original_metadata’ – dictionary containing the full metadata tree from the input file

When the file contains several datasets, each dataset will be loaded as separate dictionary.

rsciio.emd.file_writer(filename, signal, chunks=None, **kwds)#

Write signal to EMD file. Only the specifications by the National Center for Electron Microscopy (NCEM) are supported.

Parameters:
filenamestr, pathlib.Path

Filename of the file to write to or corresponding pathlib.Path.

signaldict

Dictionary containing the signal object. Should contain the following fields:

  • ‘data’ – multidimensional numpy array

  • ‘axes’ – list of dictionaries describing the axes containing the fields ‘name’, ‘units’, ‘index_in_array’, and either ‘size’, ‘offset’, and ‘scale’ or a numpy array ‘axis’ containing the full axes vector

  • ‘metadata’ – dictionary containing the metadata tree

chunkstuple of int or None, default=None

Define the chunking used for saving the dataset. If None, calculates chunks for the signal, with preferably at least one chunk per signal space.

**kwdsdict, optional

Dictionary containing metadata, which will be written as attribute of the root group.