hyperspy._signals.lazy module
- class hyperspy._signals.lazy.LazySignal(data, **kwds)
Bases:
hyperspy.signal.BaseSignal
A Lazy Signal instance that delays computation until explicitly saved (assuming storing the full result of computation in memory is not feasible)
Create a Signal from a numpy array.
- Parameters
data (
numpy.ndarray
) – The signal data. It can be an array of any dimensions.axes (dict, optional) – Dictionary to define the axes (see the documentation of the
AxesManager
class for more details).attributes (dict, optional) – A dictionary whose items are stored as attributes.
metadata (dict, optional) – A dictionary containing a set of parameters that will to stores in the
metadata
attribute. Some parameters might be mandatory in some cases.original_metadata (dict, optional) – A dictionary containing a set of parameters that will to stores in the
original_metadata
attribute. It typically contains all the parameters that has been imported from the original data file.
- _block_iterator(flat_signal=True, get=<function get>, navigation_mask=None, signal_mask=None)
A function that allows iterating lazy signal data by blocks, defining the dask.Array.
- Parameters
flat_signal (bool) – returns each block flattened, such that the shape (for the particular block) is (navigation_size, signal_size), with optionally masked elements missing. If false, returns the equivalent of s.inav[{blocks}].data, where masked elements are set to np.nan or 0.
get (dask scheduler) – the dask scheduler to use for computations; default dask.threaded.get
navigation_mask ({BaseSignal, numpy array, dask array}) – The navigation locations marked as True are not returned (flat) or set to NaN or 0.
signal_mask ({BaseSignal, numpy array, dask array}) – The signal locations marked as True are not returned (flat) or set to NaN or 0.
- _get_dask_chunks(axis=None, dtype=None)
Returns dask chunks.
- Aims:
Have at least one signal (or specified axis) in a single chunk, or as many as fit in memory
- Parameters
axis ({int, string, None, axis, tuple}) – If axis is None (default), returns chunks for current data shape so that at least one signal is in the chunk. If an axis is specified, only that particular axis is guaranteed to be “not sliced”.
dtype ({string, np.dtype}) – The dtype of target chunks.
- Returns
- Return type
Tuple of tuples, dask chunks
- _iterate_signal()
Iterates over the signal data.
It is faster than using the signal iterator.
- _lazy_data(axis=None, rechunk=True, dtype=None)
Return the data as a dask array, rechunked if necessary.
- Parameters
axis (None, DataAxis or tuple of data axes) – The data axis that must not be broken into chunks when rechunk is True. If None, it defaults to the current signal axes.
rechunk (bool, "dask_auto") – If True, it rechunks the data if necessary making sure that the axes in
axis
are not split into chunks. If False it does not rechunk at least the data is not a dask array, in which case it chunks as if rechunk was True. If “dask_auto”, rechunk if necessary using dask’s automatic chunk guessing.
- _map_all(function, inplace=True, **kwargs)
The function has to have either ‘axis’ or ‘axes’ keyword argument, and hence support operating on the full dataset efficiently.
Replaced for lazy signals
- _map_iterate(function, iterating_kwargs=None, show_progressbar=None, parallel=None, max_workers=None, ragged=False, inplace=True, output_signal_size=None, output_dtype=None, **kwargs)
Iterates the signal navigation space applying the function.
- Parameters
function (function) – the function to apply
iterating_kwargs (tuple (of tuples)) – A tuple with structure ((‘key1’, value1), (‘key2’, value2), ..) where the key-value pairs will be passed as kwargs for the function to be mapped, and the values will be iterated together with the signal navigation. The value needs to be a signal instance because passing array can be ambigous and will be removed in HyperSpy 2.0.
show_progressbar (None or bool) – If
True
, display a progress bar. IfNone
, the default from the preferences settings is used.parallel (None or bool) – If
True
, perform computation in parallel using multithreading. IfNone
, the default from the preferences settings is used. The number of threads is controlled by themax_workers
argument.max_workers (None or int) – Maximum number of threads used when
parallel=True
. If None, defaults tomin(32, os.cpu_count())
.inplace (bool, default True) – if
True
, the data is replaced by the result. Otherwise a new signal with the results is returned.ragged (None or bool, default None) – Indicates if results for each navigation pixel are of identical shape (and/or numpy arrays to begin with). If
None
, an appropriate choice is made while processing. Note:None
is not allowed for Lazy signals!**kwargs (dict) – Additional keyword arguments passed to function
Notes
This method is replaced for lazy signals.
Examples
Pass a larger array of different shape
>>> s = hs.signals.Signal1D(np.arange(20.).reshape((20,1))) >>> def func(data, value=0): ... return data + value >>> # pay attention that it's a tuple of tuples - need commas >>> s._map_iterate(func, ... iterating_kwargs=(('value', ... np.random.rand(5,400).flat),)) >>> s.data.T array([[ 0.82869603, 1.04961735, 2.21513949, 3.61329091, 4.2481755 , 5.81184375, 6.47696867, 7.07682618, 8.16850697, 9.37771809, 10.42794054, 11.24362699, 12.11434077, 13.98654036, 14.72864184, 15.30855499, 16.96854373, 17.65077064, 18.64925703, 19.16901297]])
Storing function result to other signal (e.g. calculated shifts)
>>> s = hs.signals.Signal1D(np.arange(20.).reshape((5,4))) >>> def func(data): # the original function ... return data.sum() >>> result = s._get_navigation_signal().T >>> def wrapped(*args, data=None): ... return func(data) >>> result._map_iterate(wrapped, ... iterating_kwargs=(('data', s),)) >>> result.data array([ 6., 22., 38., 54., 70.])
- change_dtype(dtype, rechunk=True)
Change the data type of a Signal.
- Parameters
dtype (str or
numpy.dtype
) – Typecode string or data-type to which the Signal’s data array is cast. In addition to all the standard numpy Data type objects (dtype), HyperSpy supports four extra dtypes for RGB images:'rgb8'
,'rgba8'
,'rgb16'
, and'rgba16'
. Changing from and to anyrgb(a)
dtype is more constrained than most other dtype conversions. To change to anrgb(a)
dtype, the signal_dimension must be 1, and its size should be 3 (forrgb
) or 4 (forrgba
) dtypes. The original dtype should beuint8
oruint16
if converting torgb(a)8
orrgb(a))16
, and the navigation_dimension should be at least 2. After conversion, the signal_dimension becomes 2. The dtype of images with original dtypergb(a)8
orrgb(a)16
can only be changed touint8
oruint16
, and the signal_dimension becomes 1.rechunk (bool) – Only has effect when operating on lazy signal. If
True
(default), the data may be automatically rechunked before performing this operation.
Examples
>>> s = hs.signals.Signal1D([1,2,3,4,5]) >>> s.data array([1, 2, 3, 4, 5]) >>> s.change_dtype('float') >>> s.data array([ 1., 2., 3., 4., 5.])
- close_file()
Closes the associated data file if any.
Currently it only supports closing the file associated with a dask array created from an h5py DataSet (default HyperSpy hdf5 reader).
- compute(close_file=False, show_progressbar=None, **kwargs)
Attempt to store the full signal in memory.
- Parameters
close_file (bool, default False) – If True, attemp to close the file associated with the dask array data if any. Note that closing the file will make all other associated lazy signals inoperative.
show_progressbar (None or bool) – If
True
, display a progress bar. IfNone
, the default from the preferences settings is used.
- Returns
- Return type
Compute the navigator by taking the sum over a single chunk contained the specified coordinate. Taking the sum over a single chunk is a computationally efficient approach to compute the navigator. The data can be rechunk by specifying the
chunks_number
argument.- Parameters
index ((int, float, None) or iterable, optional) – Specified where to take the sum, follows HyperSpy indexing syntax for integer and float. If None, the index is the centre of the signal_space
chunks_number ((int, None) or iterable, optional) – Define the number of chunks in the signal space used for rechunk the when calculating of the navigator. Useful to define the range over which the sum is calculated. If None, the existing chunking will be considered when picking the chunk used in the navigator calculation.
show_progressbar (None or bool) – If
True
, display a progress bar. IfNone
, the default from the preferences settings is used.
- Returns
- Return type
None.
Note
The number of chunks will affect where the sum is taken. If the sum needs to be taken in the centre of the signal space (for example, in the case of diffraction pattern), the number of chunk needs to be an odd number, so that the middle is centered.
- decomposition(normalize_poissonian_noise=False, algorithm='SVD', output_dimension=None, signal_mask=None, navigation_mask=None, get=<function get>, num_chunks=None, reproject=True, print_info=True, **kwargs)
Perform Incremental (Batch) decomposition on the data.
The results are stored in
self.learning_results
.Read more in the User Guide.
- Parameters
normalize_poissonian_noise (bool, default False) – If True, scale the signal to normalize Poissonian noise using the approach described in [KeenanKotula2004].
algorithm ({'SVD', 'PCA', 'ORPCA', 'ORNMF'}, default 'SVD') – The decomposition algorithm to use.
output_dimension (int or None, default None) – Number of components to keep/calculate. If None, keep all (only valid for ‘SVD’ algorithm)
get (dask scheduler) – the dask scheduler to use for computations; default dask.threaded.get
num_chunks (int or None, default None) – the number of dask chunks to pass to the decomposition model. More chunks require more memory, but should run faster. Will be increased to contain at least
output_dimension
signals.navigation_mask ({BaseSignal, numpy array, dask array}) – The navigation locations marked as True are not used in the decomposition. Not implemented for the ‘SVD’ algorithm.
signal_mask ({BaseSignal, numpy array, dask array}) – The signal locations marked as True are not used in the decomposition. Not implemented for the ‘SVD’ algorithm.
reproject (bool, default True) – Reproject data on the learnt components (factors) after learning.
print_info (bool, default True) – If True, print information about the decomposition being performed. In the case of sklearn.decomposition objects, this includes the values of all arguments of the chosen sklearn algorithm.
**kwargs – passed to the partial_fit/fit functions.
References
- KeenanKotula2004
M. Keenan and P. Kotula, “Accounting for Poisson noise in the multivariate analysis of ToF-SIMS spectrum images”, Surf. Interface Anal 36(3) (2004): 203-212.
See also
decomposition()
for non-lazy signals
- diff(axis, order=1, out=None, rechunk=True)
Returns a signal with the n-th order discrete difference along given axis. i.e. it calculates the difference between consecutive values in the given axis: out[n] = a[n+1] - a[n]. See
numpy.diff()
for more details.- Parameters
axis (
int
,str
, orDataAxis
) – The axis can be passed directly, or specified using the index of the axis in the Signal’s axes_manager or the axis name.order (int) – The order of the discrete difference.
out (
BaseSignal
(or subclasses) orNone
) – IfNone
, a new Signal is created with the result of the operation and returned (default). If a Signal is passed, it is used to receive the output of the operation, and nothing is returned.rechunk (bool) – Only has effect when operating on lazy signal. If
True
(default), the data may be automatically rechunked before performing this operation.
- Returns
s – Note that the size of the data on the given
axis
decreases by the givenorder
. i.e. ifaxis
is"x"
andorder
is 2, the x dimension is N,der
’s x dimension is N - 2.- Return type
BaseSignal
(or subclasses) or None
See also
derivative
,integrate1D
,integrate_simpson
Examples
>>> import numpy as np >>> s = BaseSignal(np.random.random((64,64,1024))) >>> s.data.shape (64,64,1024) >>> s.diff(-1).data.shape (64,64,1023)
- get_histogram(bins='fd', out=None, rechunk=True, **kwargs)
Return a histogram of the signal data.
More sophisticated algorithms for determining the bins can be used by passing a string as the
bins
argument. Other than the'blocks'
and'knuth'
methods, the available algorithms are the same asnumpy.histogram()
.Note: The lazy version of the algorithm only supports
"scott"
and"fd"
as a string argument forbins
.- Parameters
bins (int or sequence of scalars or str, default "fd") –
If bins is an int, it defines the number of equal-width bins in the given range. If bins is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths.
If bins is a string from the list below, will use the method chosen to calculate the optimal bin width and consequently the number of bins (see Notes for more detail on the estimators) from the data that falls within the requested range. While the bin width will be optimal for the actual data in the range, the number of bins will be computed to fill the entire range, including the empty portions. For visualisation, using the ‘auto’ option is suggested. Weighted data is not supported for automated bin size selection.
- ’auto’
Maximum of the ‘sturges’ and ‘fd’ estimators. Provides good all around performance.
- ’fd’ (Freedman Diaconis Estimator)
Robust (resilient to outliers) estimator that takes into account data variability and data size.
- ’doane’
An improved version of Sturges’ estimator that works better with non-normal datasets.
- ’scott’
Less robust estimator that that takes into account data variability and data size.
- ’stone’
Estimator based on leave-one-out cross-validation estimate of the integrated squared error. Can be regarded as a generalization of Scott’s rule.
- ’rice’
Estimator does not take variability into account, only data size. Commonly overestimates number of bins required.
- ’sturges’
R’s default method, only accounts for data size. Only optimal for gaussian data and underestimates number of bins for large non-gaussian datasets.
- ’sqrt’
Square root (of data size) estimator, used by Excel and other programs for its speed and simplicity.
- ’knuth’
Knuth’s rule is a fixed-width, Bayesian approach to determining the optimal bin width of a histogram.
- ’blocks’
Determination of optimal adaptive-width histogram bins using the Bayesian Blocks algorithm.
range_bins (tuple or None, optional) – the minimum and maximum range for the histogram. If range_bins is
None
, (x.min()
,x.max()
) will be used.max_num_bins (int, default 250) – When estimating the bins using one of the str methods, the number of bins is capped by this number to avoid a MemoryError being raised by
numpy.histogram()
.out (
BaseSignal
(or subclasses) orNone
) – IfNone
, a new Signal is created with the result of the operation and returned (default). If a Signal is passed, it is used to receive the output of the operation, and nothing is returned.rechunk (bool) – Only has effect when operating on lazy signal. If
True
(default), the data may be automatically rechunked before performing this operation.**kwargs – other keyword arguments (weight and density) are described in
numpy.histogram()
.
- Returns
hist_spec – A 1D spectrum instance containing the histogram.
- Return type
See also
print_summary_statistics
dask.histogram()
Examples
>>> s = hs.signals.Signal1D(np.random.normal(size=(10, 100))) >>> # Plot the data histogram >>> s.get_histogram().plot() >>> # Plot the histogram of the signal at the current coordinates >>> s.get_current_signal().get_histogram().plot()
- integrate_simpson(axis, out=None)
Calculate the integral of a Signal along an axis using Simpson’s rule.
- Parameters
axis (
int
,str
, orDataAxis
) – The axis can be passed directly, or specified using the index of the axis in the Signal’s axes_manager or the axis name.out (
BaseSignal
(or subclasses) orNone
) – IfNone
, a new Signal is created with the result of the operation and returned (default). If a Signal is passed, it is used to receive the output of the operation, and nothing is returned.
- Returns
s – A new Signal containing the integral of the provided Signal along the specified axis.
- Return type
BaseSignal
(or subclasses)
See also
diff
,derivative
,integrate1D
Examples
>>> import numpy as np >>> s = BaseSignal(np.random.random((64,64,1024))) >>> s.data.shape (64,64,1024) >>> s.integrate_simpson(-1).data.shape (64,64)
- plot(navigator='auto', **kwargs)
Plot the signal at the current coordinates.
For multidimensional datasets an optional figure, the “navigator”, with a cursor to navigate that data is raised. In any case it is possible to navigate the data using the sliders. Currently only signals with signal_dimension equal to 0, 1 and 2 can be plotted.
- Parameters
navigator (str, None, or
BaseSignal
(or subclass). Allowed string values are'auto'
,'slider'
, and'spectrum'
.) –If
'auto'
:If navigation_dimension > 0, a navigator is provided to explore the data.
If navigation_dimension is 1 and the signal is an image the navigator is a sum spectrum obtained by integrating over the signal axes (the image).
If navigation_dimension is 1 and the signal is a spectrum the navigator is an image obtained by stacking all the spectra in the dataset horizontally.
If navigation_dimension is > 1, the navigator is a sum image obtained by integrating the data over the signal axes.
Additionally, if navigation_dimension > 2, a window with one slider per axis is raised to navigate the data.
For example, if the dataset consists of 3 navigation axes X, Y, Z and one signal axis, E, the default navigator will be an image obtained by integrating the data over E at the current Z index and a window with sliders for the X, Y, and Z axes will be raised. Notice that changing the Z-axis index changes the navigator in this case.
For lazy signals, the navigator will be calculated using the
compute_navigator()
method.
If
'slider'
:If navigation dimension > 0 a window with one slider per axis is raised to navigate the data.
If
'spectrum'
:If navigation_dimension > 0 the navigator is always a spectrum obtained by integrating the data over all other axes.
Not supported for lazy signals, the
'auto'
option will be used instead.
If
None
, no navigator will be provided.Alternatively a
BaseSignal
(or subclass) instance can be provided. The navigation or signal shape must match the navigation shape of the signal to plot or the navigation_shape + signal_shape must be equal to the navigator_shape of the current object (for a dynamic navigator). If the signal dtype is RGB or RGBA this parameter has no effect and the value is always set to'slider'
.axes_manager (None or
AxesManager
) – If None, the signal’s axes_manager attribute is used.plot_markers (bool, default True) – Plot markers added using s.add_marker(marker, permanent=True). Note, a large number of markers might lead to very slow plotting.
navigator_kwds (dict) – Only for image navigator, additional keyword arguments for
matplotlib.pyplot.imshow()
.norm (str, optional) – The function used to normalize the data prior to plotting. Allowable strings are:
'auto'
,'linear'
,'log'
. (default value is'auto'
). If'auto'
, intensity is plotted on a linear scale except whenpower_spectrum=True
(only for complex signals).autoscale (str) – The string must contain any combination of the ‘x’ and ‘v’ characters. If ‘x’ or ‘v’ (for values) are in the string, the corresponding horizontal or vertical axis limits are set to their maxima and the axis limits will reset when the data or the navigation indices are changed. Default is ‘v’.
**kwargs (dict) – Only when plotting an image: additional (optional) keyword arguments for
matplotlib.pyplot.imshow()
.
- rebin(new_shape=None, scale=None, crop=False, out=None, rechunk=True)
Rebin the signal into a smaller or larger shape, based on linear interpolation. Specify either new_shape or scale.
- Parameters
new_shape (list (of floats or integer) or None) – For each dimension specify the new_shape. This will internally be converted into a scale parameter.
scale (list (of floats or integer) or None) – For each dimension, specify the new:old pixel ratio, e.g. a ratio of 1 is no binning and a ratio of 2 means that each pixel in the new spectrum is twice the size of the pixels in the old spectrum. The length of the list should match the dimension of the Signal’s underlying data array. Note : Only one of `scale` or `new_shape` should be specified, otherwise the function will not run
crop (bool) –
Whether or not to crop the resulting rebinned data (default is
True
). When binning by a non-integer number of pixels it is likely that the final row in each dimension will contain fewer than the full quota to fill one pixel.e.g. a 5*5 array binned by 2.1 will produce two rows containing 2.1 pixels and one row containing only 0.8 pixels. Selection of
crop=True
orcrop=False
determines whether or not this “black” line is cropped from the final binned array or not.
Please note that if
crop=False
is used, the final row in each dimension may appear black if a fractional number of pixels are left over. It can be removed but has been left to preserve total counts before and after binning.out (
BaseSignal
(or subclasses) orNone
) – IfNone
, a new Signal is created with the result of the operation and returned (default). If a Signal is passed, it is used to receive the output of the operation, and nothing is returned.
- Returns
s – The resulting cropped signal.
- Return type
BaseSignal
(or subclass)
Examples
>>> spectrum = hs.signals.EDSTEMSpectrum(np.ones([4, 4, 10])) >>> spectrum.data[1, 2, 9] = 5 >>> print(spectrum) <EDXTEMSpectrum, title: dimensions: (4, 4|10)> >>> print ('Sum = ', sum(sum(sum(spectrum.data)))) Sum = 164.0 >>> scale = [2, 2, 5] >>> test = spectrum.rebin(scale) >>> print(test) <EDSTEMSpectrum, title: dimensions (2, 2|2)> >>> print('Sum = ', sum(sum(sum(test.data)))) Sum = 164.0
- rechunk(nav_chunks='auto', sig_chunks=- 1, inplace=True, **kwargs)
Rechunks the data using the same rechunking formula from Dask expect that the navigation and signal chunks are defined seperately. Note, for most functions sig_chunks should remain
None
so that it spans the entire signal axes.- Parameters
nav_chunks ({tuple, int, "auto", None}) – The navigation block dimensions to create. -1 indicates the full size of the corresponding dimension. Default is “auto” which automatically determines chunk sizes.
sig_chunks ({tuple, int, "auto", None}) – The signal block dimensions to create. -1 indicates the full size of the corresponding dimension. Default is -1 which automatically spans the full signal dimension
**kwargs (dict) – Any other keyword arguments for
dask.array.rechunk()
.
- valuemax(axis, out=None, rechunk=True)
Returns a signal with the value of coordinates of the maximum along an axis.
- Parameters
axis (
int
,str
, orDataAxis
) – The axis can be passed directly, or specified using the index of the axis in the Signal’s axes_manager or the axis name.out (
BaseSignal
(or subclasses) orNone
) – IfNone
, a new Signal is created with the result of the operation and returned (default). If a Signal is passed, it is used to receive the output of the operation, and nothing is returned.rechunk (bool) – Only has effect when operating on lazy signal. If
True
(default), the data may be automatically rechunked before performing this operation.
- Returns
s – A new Signal containing the calibrated coordinate values of the maximum along the specified axis.
- Return type
BaseSignal
(or subclasses)
Examples
>>> import numpy as np >>> s = BaseSignal(np.random.random((64,64,1024))) >>> s.data.shape (64,64,1024) >>> s.valuemax(-1).data.shape (64,64)
- valuemin(axis, out=None, rechunk=True)
Returns a signal with the value of coordinates of the minimum along an axis.
- Parameters
axis (
int
,str
, orDataAxis
) – The axis can be passed directly, or specified using the index of the axis in the Signal’s axes_manager or the axis name.out (
BaseSignal
(or subclasses) orNone
) – IfNone
, a new Signal is created with the result of the operation and returned (default). If a Signal is passed, it is used to receive the output of the operation, and nothing is returned.rechunk (bool) – Only has effect when operating on lazy signal. If
True
(default), the data may be automatically rechunked before performing this operation.
- Returns
s – A new Signal containing the calibrated coordinate values of the minimum along the specified axis.
- Return type
BaseSignal
(or subclasses)
- hyperspy._signals.lazy._reshuffle_mixed_blocks(array, ndim, sshape, nav_chunks)
Reshuffles dask block-shuffled array
- Parameters
array (np.ndarray) – the array to reshuffle
ndim (int) – the number of navigation (shuffled) dimensions
sshape (tuple of ints) – The shape
- hyperspy._signals.lazy.to_array(thing, chunks=None)
Accepts BaseSignal, dask or numpy arrays and always produces either numpy or dask array.
- Parameters
thing ({BaseSignal, dask.array.Array, numpy.ndarray}) – the thing to be converted
chunks ({None, tuple of tuples}) – If None, the returned value is a numpy array. Otherwise returns dask array with the chunks as specified.
- Returns
res
- Return type
{numpy.ndarray, dask.array.Array}