LazySignal#

class hyperspy._signals.lazy.LazySignal(*args, **kwargs)#

Bases: BaseSignal

Lazy general signal class. The computation is delayed until explicitly requested.

This class is not expected to be instantiated directly, instead use:

>>> data = da.ones((10, 10))
>>> s = hs.signals.BaseSignal(data).as_lazy()

Create a signal instance.

Parameters:
datanumpy.ndarray

The signal data. It can be an array of any dimensions.

axes[dict/axes], optional

List of either dictionaries or axes objects to define the axes (see the documentation of the AxesManager class for more details).

attributesdict, optional

A dictionary whose items are stored as attributes.

metadatadict, optional

A dictionary containing a set of parameters that will to stores in the metadata attribute. Some parameters might be mandatory in some cases.

original_metadatadict, optional

A dictionary containing a set of parameters that will to stores in the original_metadata attribute. It typically contains all the parameters that has been imported from the original data file.

raggedbool or None, optional

Define whether the signal is ragged or not. Overwrite the ragged value in the attributes dictionary. If None, it does nothing. Default is None.

change_dtype(dtype, rechunk=False)#

Change the data type of a Signal.

Parameters:
dtypestr or numpy.dtype

Typecode string or data-type to which the Signal’s data array is cast. In addition to all the standard numpy Data type objects (dtype), HyperSpy supports four extra dtypes for RGB images: 'rgb8', 'rgba8', 'rgb16', and 'rgba16'. Changing from and to any rgb(a) dtype is more constrained than most other dtype conversions. To change to an rgb(a) dtype, the signal_dimension must be 1, and its size should be 3 (for rgb) or 4 (for rgba) dtypes. The original dtype should be uint8 or uint16 if converting to rgb(a)8 or rgb(a))16, and the navigation_dimension should be at least 2. After conversion, the signal_dimension becomes 2. The dtype of images with original dtype rgb(a)8 or rgb(a)16 can only be changed to uint8 or uint16, and the signal_dimension becomes 1.

rechunkbool

Only has effect when operating on lazy signal. Default False, which means the chunking structure will be retained. If True, the data may be automatically rechunked before performing this operation.

Examples

>>> s = hs.signals.Signal1D([1, 2, 3, 4, 5])
>>> s.data
array([1, 2, 3, 4, 5])
>>> s.change_dtype('float')
>>> s.data
array([1., 2., 3., 4., 5.])
close_file()#

Closes the associated data file if any.

Currently it only supports closing the file associated with a dask array created from an h5py DataSet (default HyperSpy hdf5 reader).

compute(close_file=False, show_progressbar=None, **kwargs)#

Attempt to store the full signal in memory.

Parameters:
close_filebool, default False

If True, attempt to close the file associated with the dask array data if any. Note that closing the file will make all other associated lazy signals inoperative.

show_progressbarNone or bool

If True, display a progress bar. If None, the default from the preferences settings is used.

**kwargsdict

Any other keyword arguments for dask.array.Array.compute(). For example scheduler or num_workers.

Returns:
None

Notes

For alternative ways to set the compute settings see https://docs.dask.org/en/stable/scheduling.html#configuration

Examples

>>> import dask.array as da
>>> data = da.zeros((100, 100, 100), chunks=(10, 20, 20))
>>> s = hs.signals.Signal2D(data).as_lazy()

With default parameters

>>> s1 = s.deepcopy()
>>> s1.compute()

Using 2 workers, which can reduce the memory usage (depending on the data and your computer hardware). Note that num_workers only work for the ‘threads’ and ‘processes’ scheduler.

>>> s2 = s.deepcopy()
>>> s2.compute(num_workers=2)

Using a single threaded scheduler, which is useful for debugging

>>> s3 = s.deepcopy()
>>> s3.compute(scheduler='single-threaded')
compute_navigator(index=None, chunks_number=None, show_progressbar=None)#

Compute the navigator by taking the sum over a single chunk contained the specified coordinate. Taking the sum over a single chunk is a computationally efficient approach to compute the navigator. The data can be rechunk by specifying the chunks_number argument.

Parameters:
index(int, float, None) or iterable, optional

Specified where to take the sum, follows HyperSpy indexing syntax for integer and float. If None, the index is the centre of the signal_space

chunks_number(int, None) or iterable, optional

Define the number of chunks in the signal space used for rechunk the when calculating of the navigator. Useful to define the range over which the sum is calculated. If None, the existing chunking will be considered when picking the chunk used in the navigator calculation.

show_progressbarNone or bool

If True, display a progress bar. If None, the default from the preferences settings is used.

Returns:
None.

Notes

The number of chunks will affect where the sum is taken. If the sum needs to be taken in the centre of the signal space (for example, in the case of diffraction pattern), the number of chunk needs to be an odd number, so that the middle is centered.

decomposition(normalize_poissonian_noise=False, algorithm='SVD', output_dimension=None, signal_mask=None, navigation_mask=None, get=None, num_chunks=None, reproject=True, print_info=True, **kwargs)#

Perform Incremental (Batch) decomposition on the data.

The results are stored in the learning_results attribute.

Read more in the User Guide.

Parameters:
normalize_poissonian_noisebool, default False

If True, scale the signal to normalize Poissonian noise using the approach described in [KeenanKotula2004].

algorithm{‘SVD’, ‘PCA’, ‘ORPCA’, ‘ORNMF’}, default ‘SVD’

The decomposition algorithm to use.

output_dimensionint or None, default None

Number of components to keep/calculate. If None, keep all (only valid for ‘SVD’ algorithm)

getdask scheduler or None

The dask scheduler to use for computations. If None, dask.threaded.get` will be used if possible, otherwise ``dask.get will be used, for example in pyodide interpreter.

num_chunksint or None, default None

the number of dask chunks to pass to the decomposition model. More chunks require more memory, but should run faster. Will be increased to contain at least output_dimension signals.

navigation_mask:class:~.api.signals.BaseSignal, numpy.ndarray or dask.array.Array

The navigation locations marked as True are not used in the decomposition. Not implemented for the ‘SVD’ algorithm.

signal_mask:class:~.api.signals.BaseSignal, numpy.ndarray or dask.array.Array

The signal locations marked as True are not used in the decomposition. Not implemented for the ‘SVD’ algorithm.

reprojectbool, default True

Reproject data on the learnt components (factors) after learning.

print_infobool, default True

If True, print information about the decomposition being performed. In the case of sklearn.decomposition objects, this includes the values of all arguments of the chosen sklearn algorithm.

**kwargs

passed to the partial_fit/fit functions.

References

[KeenanKotula2004]

M. Keenan and P. Kotula, “Accounting for Poisson noise in the multivariate analysis of ToF-SIMS spectrum images”, Surf. Interface Anal 36(3) (2004): 203-212.

diff(axis, order=1, out=None, rechunk=False)#

Returns a signal with the n-th order discrete difference along given axis. i.e. it calculates the difference between consecutive values in the given axis: out[n] = a[n+1] - a[n]. See numpy.diff() for more details.

Parameters:
axisint, str, or DataAxis

The axis can be passed directly, or specified using the index of the axis in the Signal’s axes_manager or the axis name. If "sig" or "nav", the signal or navigation axis will be used, respectively.

orderint

The order of the discrete difference.

outBaseSignal (or subclass) or None

If None, a new Signal is created with the result of the operation and returned (default). If a Signal is passed, it is used to receive the output of the operation, and nothing is returned.

rechunkbool

Only has effect when operating on lazy signal. Default False, which means the chunking structure will be retained. If True, the data may be automatically rechunked before performing this operation.

Returns:
BaseSignal or None

Note that the size of the data on the given axis decreases by the given order. i.e. if axis is "x" and order is 2, the x dimension is N, der’s x dimension is N - 2.

Notes

If you intend to calculate the numerical derivative, please use the proper derivative() function instead. To avoid erroneous misuse of the diff function as derivative, it raises an error when when working with a non-uniform axis.

Examples

>>> import numpy as np
>>> s = BaseSignal(np.random.random((64, 64, 1024)))
>>> s
<BaseSignal, title: , dimensions: (|1024, 64, 64)>
>>> s.diff(0)
<BaseSignal, title: , dimensions: (|1023, 64, 64)>
get_chunk_size(axes=None)#

Returns the chunk size as tuple for a set of given axes. The order of the returned tuple follows the order of the dask array.

Parameters:
axes: int, str, DataAxis or tuple

Either a single or multiple axes in a tuple can be passed. In both cases, the axes can be passed directly, or specified using the index in axes_manager or the name of the axis. Any duplicates are removed. If "sig" or "nav", the signal or navigation axes will be used, respectively. If None, the operation is performed over all navigation axes (default).

Examples

>>> import dask.array as da
>>> data = da.random.random((10, 200, 300))
>>> data.chunksize
(10, 200, 300)
>>> s = hs.signals.Signal1D(data).as_lazy()
>>> s.get_chunk_size() # All navigation axes
((10,), (200,))
>>> s.get_chunk_size(0) # The first navigation axis
((200,),)
get_histogram(bins='fd', range_bins=None, out=None, rechunk=False, **kwargs)#

Return a histogram of the signal data.

More sophisticated algorithms for determining the bins can be used by passing a string as the bins argument. Other than the 'blocks' and 'knuth' methods, the available algorithms are the same as numpy.histogram().

Note: The lazy version of the algorithm only supports "scott" and "fd" as a string argument for bins.

Parameters:
binsint or sequence of float or str, default “fd”

If bins is an int, it defines the number of equal-width bins in the given range. If bins is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths.

If bins is a string from the list below, will use the method chosen to calculate the optimal bin width and consequently the number of bins (see Notes for more detail on the estimators) from the data that falls within the requested range. While the bin width will be optimal for the actual data in the range, the number of bins will be computed to fill the entire range, including the empty portions. For visualisation, using the 'auto' option is suggested. Weighted data is not supported for automated bin size selection.

Possible strings are:

  • 'auto' : Maximum of the ‘sturges’ and ‘fd’ estimators. Provides good all around performance.

  • 'fd' : Freedman Diaconis Estimator, robust (resilient to outliers) estimator that takes into account data variability and data size.

  • 'doane' : An improved version of Sturges’ estimator that works better with non-normal datasets.

  • 'scott' : Less robust estimator that that takes into account data variability and data size.

  • 'stone' : Estimator based on leave-one-out cross-validation estimate of the integrated squared error. Can be regarded as a generalization of Scott’s rule.

  • 'rice' : Estimator does not take variability into account, only data size. Commonly overestimates number of bins required.

  • 'sturges' : R’s default method, only accounts for data size. Only optimal for gaussian data and underestimates number of bins for large non-gaussian datasets.

  • 'sqrt' : Square root (of data size) estimator, used by Excel and other programs for its speed and simplicity.

  • 'knuth' : Knuth’s rule is a fixed-width, Bayesian approach to determining the optimal bin width of a histogram.

  • 'blocks' : Determination of optimal adaptive-width histogram bins using the Bayesian Blocks algorithm.

range_bins(float, float), optional

The lower and upper limit of the range of bins. If not provided, range is simply (a.min(), a.max()). Values outside the range are ignored. The first element of the range must be less than or equal to the second. range affects the automatic bin computation as well. While bin width is computed to be optimal based on the actual data within range, the bin count will fill the entire range including portions containing no data.

max_num_binsint, default 250

When estimating the bins using one of the str methods, the number of bins is capped by this number to avoid a MemoryError being raised by numpy.histogram().

outBaseSignal (or subclass) or None

If None, a new Signal is created with the result of the operation and returned (default). If a Signal is passed, it is used to receive the output of the operation, and nothing is returned.

rechunkbool

Only has effect when operating on lazy signal. Default False, which means the chunking structure will be retained. If True, the data may be automatically rechunked before performing this operation.

**kwargs

other keyword arguments (weight and density) are described in numpy.histogram().

Returns:
hist_specSignal1D

A 1D spectrum instance containing the histogram.

Notes

See numpy.histogram() for more details on the meaning of the returned values.

Examples

>>> s = hs.signals.Signal1D(np.random.normal(size=(10, 100)))
>>> # Plot the data histogram
>>> s.get_histogram().plot()
>>> # Plot the histogram of the signal at the current coordinates
>>> s.get_current_signal().get_histogram().plot()
integrate_simpson(axis, out=None, rechunk=False)#

Calculate the integral of a Signal along an axis using Simpson’s rule.

Parameters:
axisint, str, or DataAxis

The axis can be passed directly, or specified using the index of the axis in the Signal’s axes_manager or the axis name. If "sig" or "nav", the signal or navigation axis will be used, respectively.

outBaseSignal (or subclass) or None

If None, a new Signal is created with the result of the operation and returned (default). If a Signal is passed, it is used to receive the output of the operation, and nothing is returned.

rechunkbool

Only has effect when operating on lazy signal. Default False, which means the chunking structure will be retained. If True, the data may be automatically rechunked before performing this operation.

Returns:
sBaseSignal (or subclass)

A new Signal containing the integral of the provided Signal along the specified axis.

Examples

>>> s = BaseSignal(np.random.random((64, 64, 1024)))
>>> s
<BaseSignal, title: , dimensions: (|1024, 64, 64)>
>>> s.integrate_simpson(0)
<Signal2D, title: , dimensions: (|64, 64)>
plot(navigator='auto', **kwargs)#

Plot the signal at the current coordinates.

For multidimensional datasets an optional figure, the “navigator”, with a cursor to navigate that data is raised. In any case it is possible to navigate the data using the sliders. Currently only signals with signal_dimension equal to 0, 1 and 2 can be plotted.

Parameters:
navigatorstr, None, or BaseSignal (or subclass).
Allowed string values are ``’auto’``, ``’slider’``, and ``’spectrum’``.
  • If 'auto':

    • If navigation_dimension > 0, a navigator is provided to explore the data.

    • If navigation_dimension is 1 and the signal is an image the navigator is a sum spectrum obtained by integrating over the signal axes (the image).

    • If navigation_dimension is 1 and the signal is a spectrum the navigator is an image obtained by stacking all the spectra in the dataset horizontally.

    • If navigation_dimension is > 1, the navigator is a sum image obtained by integrating the data over the signal axes.

    • Additionally, if navigation_dimension > 2, a window with one slider per axis is raised to navigate the data.

    • For example, if the dataset consists of 3 navigation axes “X”, “Y”, “Z” and one signal axis, “E”, the default navigator will be an image obtained by integrating the data over “E” at the current “Z” index and a window with sliders for the “X”, “Y”, and “Z” axes will be raised. Notice that changing the “Z”-axis index changes the navigator in this case.

    • For lazy signals, the navigator will be calculated using the compute_navigator() method.

  • If 'slider':

    • If navigation dimension > 0 a window with one slider per axis is raised to navigate the data.

  • If 'spectrum':

    • If navigation_dimension > 0 the navigator is always a spectrum obtained by integrating the data over all other axes.

    • Not supported for lazy signals, the 'auto' option will be used instead.

  • If None, no navigator will be provided.

Alternatively a BaseSignal (or subclass) instance can be provided. The navigation or signal shape must match the navigation shape of the signal to plot or the navigation_shape + signal_shape must be equal to the navigator_shape of the current object (for a dynamic navigator). If the signal dtype is RGB or RGBA this parameter has no effect and the value is always set to 'slider'.

axes_managerNone or AxesManager

If None, the signal’s axes_manager attribute is used.

plot_markersbool, default True

Plot markers added using s.add_marker(marker, permanent=True). Note, a large number of markers might lead to very slow plotting.

navigator_kwdsdict

Only for image navigator, additional keyword arguments for matplotlib.pyplot.imshow().

normstr, default 'auto'

The function used to normalize the data prior to plotting. Allowable strings are: 'auto', 'linear', 'log'. If 'auto', intensity is plotted on a linear scale except when power_spectrum=True (only for complex signals).

autoscalestr

The string must contain any combination of the 'x' and 'v' characters. If 'x' or 'v' (for values) are in the string, the corresponding horizontal or vertical axis limits are set to their maxima and the axis limits will reset when the data or the navigation indices are changed. Default is 'v'.

**kwargsdict

Only when plotting an image: additional (optional) keyword arguments for matplotlib.pyplot.imshow().

rebin(new_shape=None, scale=None, crop=False, dtype=None, out=None, rechunk=False)#

Rebin the signal into a smaller or larger shape, based on linear interpolation. Specify either new_shape or scale. Scale of 1 means no binning and scale less than one results in up-sampling.

Parameters:
new_shapelist (of float or int) or None

For each dimension specify the new_shape. This will internally be converted into a scale parameter.

scalelist (of float or int) or None

For each dimension, specify the new:old pixel ratio, e.g. a ratio of 1 is no binning and a ratio of 2 means that each pixel in the new spectrum is twice the size of the pixels in the old spectrum. The length of the list should match the dimension of the Signal’s underlying data array. Note : Only one of ``scale`` or ``new_shape`` should be specified, otherwise the function will not run

cropbool

Whether or not to crop the resulting rebinned data (default is True). When binning by a non-integer number of pixels it is likely that the final row in each dimension will contain fewer than the full quota to fill one pixel. For example, a 5*5 array binned by 2.1 will produce two rows containing 2.1 pixels and one row containing only 0.8 pixels. Selection of crop=True or crop=False determines whether or not this “black” line is cropped from the final binned array or not. Please note that if ``crop=False`` is used, the final row in each dimension may appear black if a fractional number of pixels are left over. It can be removed but has been left to preserve total counts before and after binning.

dtype{None, numpy.dtype, “same”}

Specify the dtype of the output. If None, the dtype will be determined by the behaviour of numpy.sum(), if "same", the dtype will be kept the same. Default is None.

outBaseSignal (or subclass) or None

If None, a new Signal is created with the result of the operation and returned (default). If a Signal is passed, it is used to receive the output of the operation, and nothing is returned.

Returns:
BaseSignal

The resulting cropped signal.

Raises:
NotImplementedError

If trying to rebin over a non-uniform axis.

Examples

>>> spectrum = hs.signals.Signal1D(np.ones([4, 4, 10]))
>>> spectrum.data[1, 2, 9] = 5
>>> print(spectrum)
<Signal1D, title: , dimensions: (4, 4|10)>
>>> print ('Sum =', sum(sum(sum(spectrum.data))))
Sum = 164.0
>>> scale = [2, 2, 5]
>>> test = spectrum.rebin(scale)
>>> print(test)
<Signal1D, title: , dimensions: (2, 2|5)>
>>> print('Sum =', sum(sum(sum(test.data))))
Sum = 164.0
>>> s = hs.signals.Signal1D(np.ones((2, 5, 10), dtype=np.uint8))
>>> print(s)
<Signal1D, title: , dimensions: (5, 2|10)>
>>> print(s.data.dtype)
uint8

Use dtype=np.unit16 to specify a dtype

>>> s2 = s.rebin(scale=(5, 2, 1), dtype=np.uint16)
>>> print(s2.data.dtype)
uint16

Use dtype=”same” to keep the same dtype

>>> s3 = s.rebin(scale=(5, 2, 1), dtype="same")
>>> print(s3.data.dtype)
uint8

By default dtype=None, the dtype is determined by the behaviour of numpy.sum, in this case, unsigned integer of the same precision as the platform integer

>>> s4 = s.rebin(scale=(5, 2, 1))
>>> print(s4.data.dtype) 
uint32
rechunk(nav_chunks='auto', sig_chunks=-1, inplace=True, **kwargs)#

Rechunks the data using the same rechunking formula from Dask expect that the navigation and signal chunks are defined seperately. Note, for most functions sig_chunks should remain None so that it spans the entire signal axes.

Parameters:
nav_chunks{tuple, int, “auto”, None}

The navigation block dimensions to create. -1 indicates the full size of the corresponding dimension. Default is “auto” which automatically determines chunk sizes.

sig_chunks{tuple, int, “auto”, None}

The signal block dimensions to create. -1 indicates the full size of the corresponding dimension. Default is -1 which automatically spans the full signal dimension

**kwargsdict

Any other keyword arguments for dask.array.rechunk().

valuemax(axis, out=None, rechunk=False)#

Returns a signal with the value of coordinates of the maximum along an axis.

Parameters:
axisint, str, or DataAxis

The axis can be passed directly, or specified using the index of the axis in the Signal’s axes_manager or the axis name. If "sig" or "nav", the signal or navigation axis will be used, respectively.

outBaseSignal (or subclass) or None

If None, a new Signal is created with the result of the operation and returned (default). If a Signal is passed, it is used to receive the output of the operation, and nothing is returned.

rechunkbool

Only has effect when operating on lazy signal. Default False, which means the chunking structure will be retained. If True, the data may be automatically rechunked before performing this operation.

Returns:
sBaseSignal (or subclass)

A new Signal containing the calibrated coordinate values of the maximum along the specified axis.

Examples

>>> import numpy as np
>>> s = BaseSignal(np.random.random((64, 64, 1024)))
>>> s
<BaseSignal, title: , dimensions: (|1024, 64, 64)>
>>> s.valuemax(0)
<Signal2D, title: , dimensions: (|64, 64)>
valuemin(axis, out=None, rechunk=False)#

Returns a signal with the value of coordinates of the minimum along an axis.

Parameters:
axisint, str, or DataAxis

The axis can be passed directly, or specified using the index of the axis in the Signal’s axes_manager or the axis name. If "sig" or "nav", the signal or navigation axis will be used, respectively.

outBaseSignal (or subclass) or None

If None, a new Signal is created with the result of the operation and returned (default). If a Signal is passed, it is used to receive the output of the operation, and nothing is returned.

rechunkbool

Only has effect when operating on lazy signal. Default False, which means the chunking structure will be retained. If True, the data may be automatically rechunked before performing this operation.

Returns:
BaseSignal or subclass

A new Signal containing the calibrated coordinate values of the minimum along the specified axis.