LazySignal
#
- class hyperspy._signals.lazy.LazySignal(*args, **kwargs)#
Bases:
BaseSignal
Lazy general signal class. The computation is delayed until explicitly requested.
This class is not expected to be instantiated directly, instead use:
>>> data = da.ones((10, 10)) >>> s = hs.signals.BaseSignal(data).as_lazy()
Create a signal instance.
- Parameters:
- data
numpy.ndarray
The signal data. It can be an array of any dimensions.
- axes[dict/axes], optional
List of either dictionaries or axes objects to define the axes (see the documentation of the
AxesManager
class for more details).- attributes
dict
, optional A dictionary whose items are stored as attributes.
- metadata
dict
, optional A dictionary containing a set of parameters that will to stores in the
metadata
attribute. Some parameters might be mandatory in some cases.- original_metadata
dict
, optional A dictionary containing a set of parameters that will to stores in the
original_metadata
attribute. It typically contains all the parameters that has been imported from the original data file.- raggedbool or
None
, optional Define whether the signal is ragged or not. Overwrite the
ragged
value in theattributes
dictionary. If None, it does nothing. Default is None.
- data
- change_dtype(dtype, rechunk=False)#
Change the data type of a Signal.
- Parameters:
- dtype
str
ornumpy.dtype
Typecode string or data-type to which the Signal’s data array is cast. In addition to all the standard numpy Data type objects (dtype), HyperSpy supports four extra dtypes for RGB images:
'rgb8'
,'rgba8'
,'rgb16'
, and'rgba16'
. Changing from and to anyrgb(a)
dtype is more constrained than most other dtype conversions. To change to anrgb(a)
dtype, the signal_dimension must be 1, and its size should be 3 (forrgb
) or 4 (forrgba
) dtypes. The original dtype should beuint8
oruint16
if converting torgb(a)8
orrgb(a))16
, and the navigation_dimension should be at least 2. After conversion, the signal_dimension becomes 2. The dtype of images with original dtypergb(a)8
orrgb(a)16
can only be changed touint8
oruint16
, and the signal_dimension becomes 1.- rechunkbool
Only has effect when operating on lazy signal. Default
False
, which means the chunking structure will be retained. IfTrue
, the data may be automatically rechunked before performing this operation.
- dtype
Examples
>>> s = hs.signals.Signal1D([1, 2, 3, 4, 5]) >>> s.data array([1, 2, 3, 4, 5]) >>> s.change_dtype('float') >>> s.data array([1., 2., 3., 4., 5.])
- close_file()#
Closes the associated data file if any.
Currently it only supports closing the file associated with a dask array created from an h5py DataSet (default HyperSpy hdf5 reader).
- compute(close_file=False, show_progressbar=None, **kwargs)#
Attempt to store the full signal in memory.
- Parameters:
- close_filebool, default
False
If True, attempt to close the file associated with the dask array data if any. Note that closing the file will make all other associated lazy signals inoperative.
- show_progressbar
None
or bool If
True
, display a progress bar. IfNone
, the default from the preferences settings is used.- **kwargs
dict
Any other keyword arguments for
dask.array.Array.compute()
. For example scheduler or num_workers.
- close_filebool, default
- Returns:
Notes
For alternative ways to set the compute settings see https://docs.dask.org/en/stable/scheduling.html#configuration
Examples
>>> import dask.array as da >>> data = da.zeros((100, 100, 100), chunks=(10, 20, 20)) >>> s = hs.signals.Signal2D(data).as_lazy()
With default parameters
>>> s1 = s.deepcopy() >>> s1.compute()
Using 2 workers, which can reduce the memory usage (depending on the data and your computer hardware). Note that num_workers only work for the ‘threads’ and ‘processes’ scheduler.
>>> s2 = s.deepcopy() >>> s2.compute(num_workers=2)
Using a single threaded scheduler, which is useful for debugging
>>> s3 = s.deepcopy() >>> s3.compute(scheduler='single-threaded')
Compute the navigator by taking the sum over a single chunk contained the specified coordinate. Taking the sum over a single chunk is a computationally efficient approach to compute the navigator. The data can be rechunk by specifying the
chunks_number
argument.- Parameters:
- index(
int
,float
,None
) or iterable, optional Specified where to take the sum, follows HyperSpy indexing syntax for integer and float. If None, the index is the centre of the signal_space
- chunks_number(
int
,None
) or iterable, optional Define the number of chunks in the signal space used for rechunk the when calculating of the navigator. Useful to define the range over which the sum is calculated. If None, the existing chunking will be considered when picking the chunk used in the navigator calculation.
- show_progressbar
None
or bool If
True
, display a progress bar. IfNone
, the default from the preferences settings is used.
- index(
- Returns:
- None.
Notes
The number of chunks will affect where the sum is taken. If the sum needs to be taken in the centre of the signal space (for example, in the case of diffraction pattern), the number of chunk needs to be an odd number, so that the middle is centered.
- decomposition(normalize_poissonian_noise=False, algorithm='SVD', output_dimension=None, signal_mask=None, navigation_mask=None, get=None, num_chunks=None, reproject=True, print_info=True, **kwargs)#
Perform Incremental (Batch) decomposition on the data.
The results are stored in the
learning_results
attribute.Read more in the User Guide.
- Parameters:
- normalize_poissonian_noisebool, default
False
If True, scale the signal to normalize Poissonian noise using the approach described in [KeenanKotula2004].
- algorithm{‘SVD’, ‘PCA’, ‘ORPCA’, ‘ORNMF’}, default ‘SVD’
The decomposition algorithm to use.
- output_dimension
int
orNone
, defaultNone
Number of components to keep/calculate. If None, keep all (only valid for ‘SVD’ algorithm)
- getdask scheduler or
None
The dask scheduler to use for computations. If
None
,dask.threaded.get` will be used if possible, otherwise ``dask.get
will be used, for example in pyodide interpreter.- num_chunks
int
orNone
, defaultNone
the number of dask chunks to pass to the decomposition model. More chunks require more memory, but should run faster. Will be increased to contain at least
output_dimension
signals.- navigation_mask:class:~.api.signals.BaseSignal,
numpy.ndarray
ordask.array.Array
The navigation locations marked as True are not used in the decomposition. Not implemented for the ‘SVD’ algorithm.
- signal_mask:class:~.api.signals.BaseSignal,
numpy.ndarray
ordask.array.Array
The signal locations marked as True are not used in the decomposition. Not implemented for the ‘SVD’ algorithm.
- reprojectbool, default
True
Reproject data on the learnt components (factors) after learning.
- print_infobool, default
True
If True, print information about the decomposition being performed. In the case of sklearn.decomposition objects, this includes the values of all arguments of the chosen sklearn algorithm.
- **kwargs
passed to the partial_fit/fit functions.
- normalize_poissonian_noisebool, default
See also
References
[KeenanKotula2004]M. Keenan and P. Kotula, “Accounting for Poisson noise in the multivariate analysis of ToF-SIMS spectrum images”, Surf. Interface Anal 36(3) (2004): 203-212.
- diff(axis, order=1, out=None, rechunk=False)#
Returns a signal with the
n
-th order discrete difference along given axis. i.e. it calculates the difference between consecutive values in the given axis:out[n] = a[n+1] - a[n]
. Seenumpy.diff()
for more details.- Parameters:
- axis
int
,str
, orDataAxis
The axis can be passed directly, or specified using the index of the axis in the Signal’s axes_manager or the axis name.
- order
int
The order of the discrete difference.
- out
BaseSignal
(or subclass) orNone
If
None
, a new Signal is created with the result of the operation and returned (default). If a Signal is passed, it is used to receive the output of the operation, and nothing is returned.- rechunkbool
Only has effect when operating on lazy signal. Default
False
, which means the chunking structure will be retained. IfTrue
, the data may be automatically rechunked before performing this operation.
- axis
- Returns:
BaseSignal
orNone
Note that the size of the data on the given
axis
decreases by the givenorder
. i.e. ifaxis
is"x"
andorder
is 2, the x dimension is N,der
’s x dimension is N - 2.
See also
Notes
If you intend to calculate the numerical derivative, please use the proper
derivative()
function instead. To avoid erroneous misuse of thediff
function as derivative, it raises an error when when working with a non-uniform axis.Examples
>>> import numpy as np >>> s = BaseSignal(np.random.random((64, 64, 1024))) >>> s <BaseSignal, title: , dimensions: (|1024, 64, 64)> >>> s.diff(0) <BaseSignal, title: , dimensions: (|1023, 64, 64)>
- get_chunk_size(axes=None)#
Returns the chunk size as tuple for a set of given axes. The order of the returned tuple follows the order of the dask array.
- Parameters:
- axes:
int
,str
,DataAxis
ortuple
Either one on its own, or many axes in a tuple can be passed. In both cases the axes can be passed directly, or specified using the index in axes_manager or the name of the axis. Any duplicates are removed. If
None
, the operation is performed over all navigation axes (default).
- axes:
Examples
>>> import dask.array as da >>> data = da.random.random((10, 200, 300)) >>> data.chunksize (10, 200, 300) >>> s = hs.signals.Signal1D(data).as_lazy() >>> s.get_chunk_size() # All navigation axes ((10,), (200,)) >>> s.get_chunk_size(0) # The first navigation axis ((200,),)
- get_histogram(bins='fd', out=None, rechunk=False, **kwargs)#
Return a histogram of the signal data.
More sophisticated algorithms for determining the bins can be used by passing a string as the
bins
argument. Other than the'blocks'
and'knuth'
methods, the available algorithms are the same asnumpy.histogram()
.Note: The lazy version of the algorithm only supports
"scott"
and"fd"
as a string argument forbins
.- Parameters:
- bins
int
or sequence offloat
orstr
, default “fd” If
bins
is an int, it defines the number of equal-width bins in the given range. Ifbins
is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths.If
bins
is a string from the list below, will use the method chosen to calculate the optimal bin width and consequently the number of bins (see Notes for more detail on the estimators) from the data that falls within the requested range. While the bin width will be optimal for the actual data in the range, the number of bins will be computed to fill the entire range, including the empty portions. For visualisation, using the'auto'
option is suggested. Weighted data is not supported for automated bin size selection.- ‘auto’
Maximum of the ‘sturges’ and ‘fd’ estimators. Provides good all around performance.
- ‘fd’ (Freedman Diaconis Estimator)
Robust (resilient to outliers) estimator that takes into account data variability and data size.
- ‘doane’
An improved version of Sturges’ estimator that works better with non-normal datasets.
- ‘scott’
Less robust estimator that that takes into account data variability and data size.
- ‘stone’
Estimator based on leave-one-out cross-validation estimate of the integrated squared error. Can be regarded as a generalization of Scott’s rule.
- ‘rice’
Estimator does not take variability into account, only data size. Commonly overestimates number of bins required.
- ‘sturges’
R’s default method, only accounts for data size. Only optimal for gaussian data and underestimates number of bins for large non-gaussian datasets.
- ‘sqrt’
Square root (of data size) estimator, used by Excel and other programs for its speed and simplicity.
- ‘knuth’
Knuth’s rule is a fixed-width, Bayesian approach to determining the optimal bin width of a histogram.
- ‘blocks’
Determination of optimal adaptive-width histogram bins using the Bayesian Blocks algorithm.
- range_bins
tuple
orNone
, optional the minimum and maximum range for the histogram. If range_bins is
None
, (x.min()
,x.max()
) will be used.- max_num_bins
int
, default 250 When estimating the bins using one of the str methods, the number of bins is capped by this number to avoid a MemoryError being raised by
numpy.histogram()
.- out
BaseSignal
(or subclass) orNone
If
None
, a new Signal is created with the result of the operation and returned (default). If a Signal is passed, it is used to receive the output of the operation, and nothing is returned.- rechunkbool
Only has effect when operating on lazy signal. Default
False
, which means the chunking structure will be retained. IfTrue
, the data may be automatically rechunked before performing this operation.- **kwargs
other keyword arguments (weight and density) are described in
numpy.histogram()
.
- bins
- Returns:
- hist_spec
Signal1D
A 1D spectrum instance containing the histogram.
- hist_spec
See also
Examples
>>> s = hs.signals.Signal1D(np.random.normal(size=(10, 100))) >>> # Plot the data histogram >>> s.get_histogram().plot() >>> # Plot the histogram of the signal at the current coordinates >>> s.get_current_signal().get_histogram().plot()
- integrate_simpson(axis, out=None, rechunk=False)#
Calculate the integral of a Signal along an axis using Simpson’s rule.
- Parameters:
- axis
int
,str
, orDataAxis
The axis can be passed directly, or specified using the index of the axis in the Signal’s axes_manager or the axis name.
- out
BaseSignal
(or subclass) orNone
If
None
, a new Signal is created with the result of the operation and returned (default). If a Signal is passed, it is used to receive the output of the operation, and nothing is returned.- rechunkbool
Only has effect when operating on lazy signal. Default
False
, which means the chunking structure will be retained. IfTrue
, the data may be automatically rechunked before performing this operation.
- axis
- Returns:
- s
BaseSignal
(or subclass) A new Signal containing the integral of the provided Signal along the specified axis.
- s
Examples
>>> s = BaseSignal(np.random.random((64, 64, 1024))) >>> s <BaseSignal, title: , dimensions: (|1024, 64, 64)> >>> s.integrate_simpson(0) <Signal2D, title: , dimensions: (|64, 64)>
- plot(navigator='auto', **kwargs)#
Plot the signal at the current coordinates.
For multidimensional datasets an optional figure, the “navigator”, with a cursor to navigate that data is raised. In any case it is possible to navigate the data using the sliders. Currently only signals with signal_dimension equal to 0, 1 and 2 can be plotted.
- Parameters:
- navigator
str
,None
, orBaseSignal
(or subclass). - Allowed string values are ``’auto’``, ``’slider’``, and ``’spectrum’``.
If
'auto'
:If
navigation_dimension
> 0, a navigator is provided to explore the data.If
navigation_dimension
is 1 and the signal is an image the navigator is a sum spectrum obtained by integrating over the signal axes (the image).If
navigation_dimension
is 1 and the signal is a spectrum the navigator is an image obtained by stacking all the spectra in the dataset horizontally.If
navigation_dimension
is > 1, the navigator is a sum image obtained by integrating the data over the signal axes.Additionally, if
navigation_dimension
> 2, a window with one slider per axis is raised to navigate the data.For example, if the dataset consists of 3 navigation axes “X”, “Y”, “Z” and one signal axis, “E”, the default navigator will be an image obtained by integrating the data over “E” at the current “Z” index and a window with sliders for the “X”, “Y”, and “Z” axes will be raised. Notice that changing the “Z”-axis index changes the navigator in this case.
For lazy signals, the navigator will be calculated using the
compute_navigator()
method.
If
'slider'
:If
navigation dimension
> 0 a window with one slider per axis is raised to navigate the data.
If
'spectrum'
:If
navigation_dimension
> 0 the navigator is always a spectrum obtained by integrating the data over all other axes.Not supported for lazy signals, the
'auto'
option will be used instead.
If
None
, no navigator will be provided.
Alternatively a
BaseSignal
(or subclass) instance can be provided. The navigation or signal shape must match the navigation shape of the signal to plot or thenavigation_shape
+signal_shape
must be equal to thenavigator_shape
of the current object (for a dynamic navigator). If the signaldtype
is RGB or RGBA this parameter has no effect and the value is always set to'slider'
.- axes_manager
None
orAxesManager
If None, the signal’s
axes_manager
attribute is used.- plot_markersbool, default
True
Plot markers added using s.add_marker(marker, permanent=True). Note, a large number of markers might lead to very slow plotting.
- navigator_kwds
dict
Only for image navigator, additional keyword arguments for
matplotlib.pyplot.imshow()
.- norm
str
, default'auto'
The function used to normalize the data prior to plotting. Allowable strings are:
'auto'
,'linear'
,'log'
. If'auto'
, intensity is plotted on a linear scale except whenpower_spectrum=True
(only for complex signals).- autoscale
str
The string must contain any combination of the
'x'
and'v'
characters. If'x'
or'v'
(for values) are in the string, the corresponding horizontal or vertical axis limits are set to their maxima and the axis limits will reset when the data or the navigation indices are changed. Default is'v'
.- **kwargs
dict
Only when plotting an image: additional (optional) keyword arguments for
matplotlib.pyplot.imshow()
.
- navigator
- rebin(new_shape=None, scale=None, crop=False, dtype=None, out=None, rechunk=False)#
Rebin the signal into a smaller or larger shape, based on linear interpolation. Specify either
new_shape
orscale
. Scale of 1 means no binning and scale less than one results in up-sampling.- Parameters:
- new_shape
list
(offloat
orint
) orNone
For each dimension specify the new_shape. This will internally be converted into a
scale
parameter.- scale
list
(offloat
orint
) orNone
For each dimension, specify the new:old pixel ratio, e.g. a ratio of 1 is no binning and a ratio of 2 means that each pixel in the new spectrum is twice the size of the pixels in the old spectrum. The length of the list should match the dimension of the Signal’s underlying data array. Note : Only one of ``scale`` or ``new_shape`` should be specified, otherwise the function will not run
- cropbool
Whether or not to crop the resulting rebinned data (default is
True
). When binning by a non-integer number of pixels it is likely that the final row in each dimension will contain fewer than the full quota to fill one pixel. For example, a 5*5 array binned by 2.1 will produce two rows containing 2.1 pixels and one row containing only 0.8 pixels. Selection ofcrop=True
orcrop=False
determines whether or not this “black” line is cropped from the final binned array or not. Please note that if ``crop=False`` is used, the final row in each dimension may appear black if a fractional number of pixels are left over. It can be removed but has been left to preserve total counts before and after binning.- dtype{
None
,numpy.dtype
, “same”} Specify the dtype of the output. If None, the dtype will be determined by the behaviour of
numpy.sum()
, if"same"
, the dtype will be kept the same. Default is None.- out
BaseSignal
(or subclass) orNone
If
None
, a new Signal is created with the result of the operation and returned (default). If a Signal is passed, it is used to receive the output of the operation, and nothing is returned.
- new_shape
- Returns:
BaseSignal
The resulting cropped signal.
- Raises:
NotImplementedError
If trying to rebin over a non-uniform axis.
Examples
>>> spectrum = hs.signals.Signal1D(np.ones([4, 4, 10])) >>> spectrum.data[1, 2, 9] = 5 >>> print(spectrum) <Signal1D, title: , dimensions: (4, 4|10)> >>> print ('Sum =', sum(sum(sum(spectrum.data)))) Sum = 164.0
>>> scale = [2, 2, 5] >>> test = spectrum.rebin(scale) >>> print(test) <Signal1D, title: , dimensions: (2, 2|5)> >>> print('Sum =', sum(sum(sum(test.data)))) Sum = 164.0
>>> s = hs.signals.Signal1D(np.ones((2, 5, 10), dtype=np.uint8)) >>> print(s) <Signal1D, title: , dimensions: (5, 2|10)> >>> print(s.data.dtype) uint8
Use
dtype=np.unit16
to specify a dtype>>> s2 = s.rebin(scale=(5, 2, 1), dtype=np.uint16) >>> print(s2.data.dtype) uint16
Use dtype=”same” to keep the same dtype
>>> s3 = s.rebin(scale=(5, 2, 1), dtype="same") >>> print(s3.data.dtype) uint8
By default
dtype=None
, the dtype is determined by the behaviour of numpy.sum, in this case, unsigned integer of the same precision as the platform integer>>> s4 = s.rebin(scale=(5, 2, 1)) >>> print(s4.data.dtype) uint32
- rechunk(nav_chunks='auto', sig_chunks=-1, inplace=True, **kwargs)#
Rechunks the data using the same rechunking formula from Dask expect that the navigation and signal chunks are defined seperately. Note, for most functions sig_chunks should remain
None
so that it spans the entire signal axes.- Parameters:
- nav_chunks{
tuple
,int
, “auto”,None
} The navigation block dimensions to create. -1 indicates the full size of the corresponding dimension. Default is “auto” which automatically determines chunk sizes.
- sig_chunks{
tuple
,int
, “auto”,None
} The signal block dimensions to create. -1 indicates the full size of the corresponding dimension. Default is -1 which automatically spans the full signal dimension
- **kwargs
dict
Any other keyword arguments for
dask.array.rechunk()
.
- nav_chunks{
- valuemax(axis, out=None, rechunk=False)#
Returns a signal with the value of coordinates of the maximum along an axis.
- Parameters:
- axis
int
,str
, orDataAxis
The axis can be passed directly, or specified using the index of the axis in the Signal’s axes_manager or the axis name.
- out
BaseSignal
(or subclass) orNone
If
None
, a new Signal is created with the result of the operation and returned (default). If a Signal is passed, it is used to receive the output of the operation, and nothing is returned.- rechunkbool
Only has effect when operating on lazy signal. Default
False
, which means the chunking structure will be retained. IfTrue
, the data may be automatically rechunked before performing this operation.
- axis
- Returns:
- s
BaseSignal
(or subclass) A new Signal containing the calibrated coordinate values of the maximum along the specified axis.
- s
See also
hyperspy.api.signals.BaseSignal.max
,hyperspy.api.signals.BaseSignal.min
hyperspy.api.signals.BaseSignal.sum
,hyperspy.api.signals.BaseSignal.mean
hyperspy.api.signals.BaseSignal.std
,hyperspy.api.signals.BaseSignal.var
hyperspy.api.signals.BaseSignal.indexmax
,hyperspy.api.signals.BaseSignal.indexmin
hyperspy.api.signals.BaseSignal.valuemin
Examples
>>> import numpy as np >>> s = BaseSignal(np.random.random((64, 64, 1024))) >>> s <BaseSignal, title: , dimensions: (|1024, 64, 64)> >>> s.valuemax(0) <Signal2D, title: , dimensions: (|64, 64)>
- valuemin(axis, out=None, rechunk=False)#
Returns a signal with the value of coordinates of the minimum along an axis.
- Parameters:
- axis
int
,str
, orDataAxis
The axis can be passed directly, or specified using the index of the axis in the Signal’s axes_manager or the axis name.
- out
BaseSignal
(or subclass) orNone
If
None
, a new Signal is created with the result of the operation and returned (default). If a Signal is passed, it is used to receive the output of the operation, and nothing is returned.- rechunkbool
Only has effect when operating on lazy signal. Default
False
, which means the chunking structure will be retained. IfTrue
, the data may be automatically rechunked before performing this operation.
- axis
- Returns:
BaseSignal
or subclassA new Signal containing the calibrated coordinate values of the minimum along the specified axis.
See also
hyperspy.api.signals.BaseSignal.max
,hyperspy.api.signals.BaseSignal.min
hyperspy.api.signals.BaseSignal.sum
,hyperspy.api.signals.BaseSignal.mean
hyperspy.api.signals.BaseSignal.std
,hyperspy.api.signals.BaseSignal.var
hyperspy.api.signals.BaseSignal.indexmax
,hyperspy.api.signals.BaseSignal.indexmin
hyperspy.api.signals.BaseSignal.valuemax