Electron Microscopy Dataset (EMD)#
EMD stands for “Electron Microscopy Dataset”. It is a subset of the open source HDF5 wrapper format. N-dimensional data arrays of any standard type can be stored in an HDF5 file, as well as tags and other metadata.
Note
To read this format, the optional dependency h5py
is required.
EMD (NCEM)#
This EMD format was developed by Colin Ophus at the National Center for Electron Microscopy (NCEM). This format is used by the prismatic software to save the simulation outputs.
Usage examples#
For files containing several datasets, the dataset_path argument can be used to select a specific one:
>>> from rsciio.emd import file_reader
>>> s = file_reader("adatafile.emd", dataset_path="/experimental/science_data_1/data")
Or several by using a list:
>>> s = file_reader("adatafile.emd",
... dataset_path=[
... "/experimental/science_data_1/data",
... "/experimental/science_data_2/data"])
EMD (Velox)#
This is a non-compliant variant of the standard EMD format developed by ThermoFisher (former FEI). RosettaSciIO supports importing images, EDS spectrum and EDS spectrum streams (spectrum images stored in a sparse format). For spectrum streams, there are several loading options (described in the docstring below) to control the frames and detectors to load and whether to sum them on loading. The default is to import the sum over all frames and over all detectors in order to decrease the data size in memory.
Note
Pruned Velox EMD files only contain the spectrum image in a proprietary format that RosettaSciIO cannot read. Therefore, don’t prune Velox EMD files if you intend to read them with RosettaSciIO.
Note
When using HyperSpy, FFTs made in Velox are loaded in as-is as a HyperSpy ComplexSignal2D object. The FFT is not centered and only positive frequencies are stored in the file. Making FFTs with HyperSpy from the respective image datasets is recommended.
Note
When using HyperSpy, DPC data is loaded in as a HyperSpy ComplexSignal2D object.
Note
Currently, only lazy uncompression rather than lazy loading is implemented. This means that it is not currently possible to read EDS SI Velox EMD files with size bigger than the available memory.
Note
To load EDS data, the optional dependency sparse
is required.
Warning
This format is still not stable and files generated with the most recent version of Velox may not be supported. If you experience issues loading a file, please report it to the RosettaSciIO developers so that they can add support for newer versions of the format.
Usage examples#
>>> from rsciio.emd import file_reader
>>> file_reader("sample.emd")
[<Signal2D, title: HAADF, dimensions: (|179, 161)>,
<EDSSEMSpectrum, title: EDS, dimensions: (179, 161|4096)>]
>>> file_reader("sample.emd", sum_EDS_detectors=False)
[<Signal2D, title: HAADF, dimensions: (|179, 161)>,
<EDSSEMSpectrum, title: EDS - SuperXG21, dimensions: (179, 161|4096)>,
<EDSSEMSpectrum, title: EDS - SuperXG22, dimensions: (179, 161|4096)>,
<EDSSEMSpectrum, title: EDS - SuperXG23, dimensions: (179, 161|4096)>,
<EDSSEMSpectrum, title: EDS - SuperXG24, dimensions: (179, 161|4096)>]
>>> file_reader("sample.emd", sum_frames=False, load_SI_image_stack=True, SI_dtype=np.int8, rebin_energy=4)
[<Signal2D, title: HAADF, dimensions: (50|179, 161)>,
<EDSSEMSpectrum, title: EDS, dimensions: (50, 179, 161|1024)>]
API functions#
- rsciio.emd.file_reader(filename, lazy=False, dataset_path=None, stack_group=None, select_type=None, first_frame=0, last_frame=None, sum_frames=True, sum_EDS_detectors=True, rebin_energy=1, SI_dtype=None, load_SI_image_stack=False)#
Read EMD file, which can be an NCEM or a Velox variant of the EMD format. Also reads Direct Electron’s DE5 format, which is read according to the NCEM specifications.
- Parameters:
- filename
str
,pathlib.Path
Filename of the file to read or corresponding pathlib.Path.
- lazybool, default=False
Whether to open the file lazily or not.
- dataset_path
None
,str
orlist
ofstr
, default=None NCEM only: Path of the dataset. If None, load all supported datasets, otherwise the specified dataset(s).
- stack_group
None
, bool, default=None NCEM only: Stack datasets of groups with common path. Relevant for emd file version >= 0.5, where groups can be named
group0000
,group0001
, etc.- select_type{
None
, ‘image’, ‘single_spectrum’, ‘spectrum_image’} Velox only: specifies the type of data to load: if
'image'
is selected, only images (including EDS maps) are loaded, if'single_spectrum'
is selected, only single spectra are loaded and if'spectrum_image'
is selected, only the spectrum image will be loaded.- first_frame
int
, default=0 Velox only: Select the start for the frame range of the EDS spectrum image to load.
- last_frame
int
orNone
, default=None Velox only: Select the end for the frame range of the EDS spectrum image to load.
- sum_framesbool, default=True
Velox only: Load each individual EDS frame. The EDS spectrum image will be loaded with an extra navigation dimension corresponding to the frame index (time axis).
- sum_EDS_detectorsbool, default=True
Velox only: Load the EDS signal as a sum over the signals from all EDS detectors (default) or, alternatively, load the signal of each individual EDS detector. In the latter case, a corresponding number of distinct EDS signals is returned.
- rebin_energy
int
, default=1 Velox only: Rebin the energy axis by given factor. Useful in combination with
sum_frames=False
to reduce the data size when reading the individual frames of the spectrum image.- SI_dtype
numpy.dtype
orNone
, default=None Velox only: Change the datatype of a spectrum image. Useful in combination with
sum_frames=False
to reduce the data size when reading the individual frames of the spectrum image. IfNone
, the dtype of the data in the emd file is used.- load_SI_image_stackbool, default=False
Velox only: Allows loading the stack of STEM images acquired simultaneously with the EDS spectrum image. This option can be useful to monitor any specimen changes during the acquisition or to correct the spatial drift in the spectrum image by using the STEM images.
- filename
- Returns:
list
ofdict
List of dictionaries containing the following fields:
‘data’ – multidimensional
numpy.ndarray
ordask.array.Array
‘axes’ – list of dictionaries describing the axes containing the fields ‘name’, ‘units’, ‘index_in_array’, and either ‘size’, ‘offset’, and ‘scale’ or a numpy array ‘axis’ containing the full axes vector
‘metadata’ – dictionary containing the parsed metadata
‘original_metadata’ – dictionary containing the full metadata tree from the input file
When the file contains several datasets, each dataset will be loaded as separate dictionary.
- rsciio.emd.file_writer(filename, signal, chunks=None, **kwds)#
Write signal to EMD file. Only the specifications by the National Center for Electron Microscopy (NCEM) are supported.
- Parameters:
- filename
str
,pathlib.Path
Filename of the file to write to or corresponding pathlib.Path.
- signal
dict
Dictionary containing the signal object. Should contain the following fields:
‘data’ – multidimensional numpy array
‘axes’ – list of dictionaries describing the axes containing the fields ‘name’, ‘units’, ‘index_in_array’, and either ‘size’, ‘offset’, and ‘scale’ or a numpy array ‘axis’ containing the full axes vector
‘metadata’ – dictionary containing the metadata tree
- chunks
tuple
ofint
orNone
, default=None Define the chunking used for saving the dataset. If
None
, calculates chunks for the signal, with preferably at least one chunk per signal space.- **kwds
dict
, optional Dictionary containing metadata, which will be written as attribute of the root group.
- filename