A dictionary containing a set of parameters
that will to stores in the original_metadata attribute. It
typically contains all the parameters that has been
imported from the original data file.
The transpose of the signal, with signal and navigation spaces
swapped. Enables calling
transpose() with the default
parameters as a property of a Signal.
The operation is performed in-place (i.e. the data of the signal
is modified). This method requires the signal to have a float data type,
otherwise it will raise a TypeError.
The marker or iterable (list, tuple, …) of markers to add.
See the Markers section in the User Guide if you want
to add a large number of markers as an iterable, since this will
be much faster. For signals with navigation dimensions,
the markers can be made to change for different navigation
indices. See the examples for info.
If False, the marker will only appear in the current
plot. If True, the marker will be added to the
metadata.Markers list, and be plotted with
plot(plot_markers=True). If the signal is saved as a HyperSpy
HDF5 file, the markers will be stored in the HDF5 signal and be
restored when the file is loaded.
If True, keep the original data type of the signal data. For
example, if the data type was initially 'float64', the result of
the operation (usually 'int64') will be converted to
'float64'.
Only used if window='hann'
If integer n is provided, a Hann window of n-th order will be
used. If None, a first order Hann window is used.
Higher orders result in more homogeneous intensity distribution.
Shape parameter of the Tukey window, representing the
fraction of the window inside the cosine tapered region. If
zero, the Tukey window is equivalent to a rectangular window.
If one, the Tukey window is equivalent to a Hann window.
The chosen spectral axis is moved to the last index in the
array and the data is made contiguous for efficient iteration over
spectra. By default, the method ensures the data is stored optimally,
hence often making a copy of the data. See
transpose() for a more general
method with more options.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
If True, the location of the data in memory is optimised for the
fastest iteration over the navigation axes. This operation can
cause a peak of memory usage and requires considerable processing
times for large datasets and/or low specification hardware.
See the Transposing (changing signal spaces) section of the HyperSpy user guide
for more information. When operating on lazy signals, if True,
the chunks are optimised for the new axes configuration.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
If True, the location of the data in memory is optimised for the
fastest iteration over the navigation axes. This operation can
cause a peak of memory usage and requires considerable processing
times for large datasets and/or low specification hardware.
See the Transposing (changing signal spaces) section of the HyperSpy user guide
for more information. When operating on lazy signals, if True,
the chunks are optimised for the new axes configuration.
``”CuBICA”`` | ``”TDSEP”``} or object, default “sklearn_fastica”
The BSS algorithm to use. If algorithm is an object,
it must implement a fit_transform() method or fit() and
transform() methods, in the same manner as a scikit-learn estimator.
If None and on_loadings is False, when diff_order is greater than 1
and signal_dimension is greater than 1, the differences are calculated
across all signal axes
If None and on_loadings is True, when diff_order is greater than 1
and navigation_dimension is greater than 1, the differences are calculated
across all navigation axes
Factors to decompose. If None, the BSS is performed on the
factors of a previous decomposition. If a Signal instance, the
navigation dimension must be 1 and the size greater than 1.
If not None, the signal locations marked as True are masked. The
mask shape must be equal to the signal shape
(navigation shape) when on_loadings is False (True).
Use either the factors or the loadings to determine if the
component needs to be reversed.
whiten_method{"PCA" | "ZCA"} or None, default “PCA”
How to whiten the data prior to blind source separation.
If None, no whitening is applied. See whiten_data()
for more details.
return_info: bool, default False
The result of the decomposition is stored internally. However,
some algorithms generate some extra information that is not
stored. If True, return any extra information if available.
In the case of sklearn.decomposition objects, this includes the
sklearn Estimator object.
If True, print information about the decomposition being performed.
In the case of sklearn.decomposition objects, this includes the
values of all arguments of the chosen sklearn algorithm.
Typecode string or data-type to which the Signal’s data array is
cast. In addition to all the standard numpy Data type objects (dtype),
HyperSpy supports four extra dtypes for RGB images: 'rgb8',
'rgba8', 'rgb16', and 'rgba16'. Changing from and to
any rgb(a) dtype is more constrained than most other dtype
conversions. To change to an rgb(a) dtype,
the signal_dimension must be 1, and its size should be 3 (for
rgb) or 4 (for rgba) dtypes. The original dtype
should be uint8 or uint16 if converting to rgb(a)8
or rgb(a))16, and the navigation_dimension should be at
least 2. After conversion, the signal_dimension becomes 2. The
dtype of images with original dtype rgb(a)8 or rgb(a)16
can only be changed to uint8 or uint16, and the
signal_dimension becomes 1.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
Cluster analysis of a signal or decomposition results of a signal
Results are stored in learning_results.
Parameters:
cluster_sourcestr {"bss" | "decomposition" | "signal"} or BaseSignal
If “bss” the blind source separation results are used
If “decomposition” the decomposition results are used
if “signal” the signal data is used
Note that using the signal or BaseSignal can be memory intensive
and is only recommended if the Signal dimension is small
BaseSignal must have the same navigation dimensions as the signal.
source_for_centersNone, str {"decomposition" | "bss" | "signal"} or BaseSignal
default : None
If None the cluster_source is used
If “bss” the blind source separation results are used
If “decomposition” the decomposition results are used
if “signal” the signal data is used
BaseSignal must have the same navigation dimensions as the signal.
preprocessingstr {"standard" | "norm" | "minmax"}, None or object
default: ‘norm’
Preprocessing the data before cluster analysis requires preprocessing
the data to be clustered to similar scales. Standard preprocessing
adjusts each feature to have uniform variation. Norm preprocessing
adjusts treats the set of features like a vector and
each measurement is scaled to length 1.
You can also pass one of the scikit-learn preprocessing
scale_method = import sklearn.processing.StandadScaler()
preprocessing = scale_method
See preprocessing methods in scikit-learn preprocessing for further
details. If object, must be sklearn.preprocessing-like.
If you are getting the cluster centers using the decomposition
results (cluster_source_for_centers=”decomposition”) you can define how
many components to use. If set to None the method uses the
estimate of significant components found in the decomposition step
using the elbow method and stored in the
learning_results.number_significant_components attribute.
This applies to both bss and decomposition results.
The signal locations marked as True are not used in the
clustering for “signal” or Signals supplied as cluster source.
This is not applied to decomposition results or source_for_centers
(as it may be a different shape to the cluster source)
The result of the cluster analysis is stored internally. However,
the cluster class used contain a number of attributes.
If True (the default is False)
return the cluster object so the attributes can be accessed.
Additional parameters passed to the clustering class for initialization.
For example, in case of the “kmeans” algorithm, n_init can be
used to define the number of times the algorithm is restarted to
optimize results.
If 'return_info' is True returns the Scikit-learn cluster object
used for clustering. Useful if you wish to examine inertia or other outputs.
Other Parameters:
int
Number of clusters to find using the one of the pre-defined methods
“kmeans”, “agglomerative”, “minibatchkmeans”, “spectralclustering”
See sklearn.cluster for details
Return a “shallow copy” of this Signal using the
standard library’s copy() function. Note: this will
return a copy of the signal, but it will not duplicate the underlying
data in memory, and both Signals will reference the same data.
Specify the data axis in which to perform the cropping
operation. The axis can be specified using the index of the
axis in axes_manager or the axis name.
The beginning of the cropping interval. If type is int,
the value is taken as the axis index. If type is float the index
is calculated using the axis calibration. If start/end is
None the method crops from/to the low/high end of the axis.
The end of the cropping interval. If type is int,
the value is taken as the axis index. If type is float the index
is calculated using the axis calibration. If start/end is
None the method crops from/to the low/high end of the axis.
``”mini_batch_sparse_pca”``, ``”RPCA”``, ``”ORPCA”``, ``”ORNMF”``} or object, default ``”SVD”``
The decomposition algorithm to use. If algorithm is an object,
it must implement a fit_transform() method or fit() and
transform() methods, in the same manner as a scikit-learn estimator.
For cupy arrays, only “SVD” is supported.
If None, ignored
If callable, applies the function to the data to obtain var_array.
Only used by the “MLPCA” algorithm.
If numpy array, creates var_array by applying a polynomial function
defined by the array of coefficients to the data. Only used by
the “MLPCA” algorithm.
reprojectNone or str {“signal”, “navigation”, “both”}, default None
If not None, the results of the decomposition will be projected in
the selected masked area.
return_info: bool, default False
The result of the decomposition is stored internally. However,
some algorithms generate some extra information that is not
stored. If True, return any extra information if available.
In the case of sklearn.decomposition objects, this includes the
sklearn Estimator object.
If True, print information about the decomposition being performed.
In the case of sklearn.decomposition objects, this includes the
values of all arguments of the chosen sklearn algorithm.
If "auto": the solver is selected by a default policy based on data.shape and
output_dimension: if the input data is larger than 500x500 and the
number of components to extract is lower than 80% of the smallest
dimension of the data, then the more efficient "randomized"
method is enabled. Otherwise the exact full SVD is computed and
optionally truncated afterwards.
If "full": run exact SVD, calling the standard LAPACK solver via
scipy.linalg.svd(), and select the components by postprocessing
If "arpack": use truncated SVD, calling ARPACK solver via
scipy.sparse.linalg.svds(). It strictly requires
0<output_dimension<min(data.shape)
If True, stores a copy of the data before any pre-treatments
such as normalization in s._data_before_treatments. The original
data can then be restored by calling s.undo_treatments().
If False, no copy is made. This can be beneficial for memory
usage, but care must be taken since data will be overwritten.
Return a “deep copy” of this Signal using the
standard library’s deepcopy() function. Note: this means
the underlying data structure will be duplicated in memory.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Note that the size of the data on the given axis decreases by
the given order. i.e. if axis is "x" and order is
2, if the x dimension is N, then der’s x dimension is N - 2.
Returns a signal with the n-th order discrete difference along
given axis. i.e. it calculates the difference between consecutive
values in the given axis: out[n]=a[n+1]-a[n]. See
numpy.diff() for more details.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
Note that the size of the data on the given axis decreases by
the given order. i.e. if axis is "x" and order is
2, the x dimension is N, der’s x dimension is N - 2.
If you intend to calculate the numerical derivative, please use the
proper derivative() function
instead. To avoid erroneous misuse of the diff function as derivative,
it raises an error when when working with a non-uniform axis.
Estimate the elbow position of a scree plot curve.
Used to estimate the number of significant components in
a PCA variance ratio plot or other “elbow” type curves.
Find a line between first and last point on the scree plot.
With a classic elbow scree plot, this line more or less
defines a triangle. The elbow should be the point which
is the furthest distance from this line. For more details,
see [1].
Explained variance ratio values that form the scree plot.
If None, uses the explained_variance_ratio array stored
in s.learning_results, so a decomposition must have
been performed first.
V. Satopää, J. Albrecht, D. Irwin, and B. Raghavan.
“Finding a “Kneedle” in a Haystack: Detecting Knee Points in
System Behavior,. 31st International Conference on Distributed
Computing Systems Workshops, pp. 166-171, June 2011.
Performs cluster analysis of a signal for cluster sizes ranging from
n_clusters =2 to max_clusters ( default 12)
Note that this can be a slow process for large datasets so please
consider reducing max_clusters in this case.
For each cluster it evaluates the silhouette score which is a metric of
how well separated the clusters are. Maximima or peaks in the scores
indicate good choices for cluster sizes.
Parameters:
cluster_sourcestr {“bss”, “decomposition”, “signal”} or BaseSignal
If “bss” the blind source separation results are used
If “decomposition” the decomposition results are used
if “signal” the signal data is used
Note that using the signal can be memory intensive
and is only recommended if the Signal dimension is small.
Input Signal must have the same navigation dimensions as the
signal instance.
Max number of clusters to use. The method will scan from 2 to
max_clusters.
preprocessingstr {“standard”, “norm”, “minmax”} or object
default: ‘norm’
Preprocessing the data before cluster analysis requires preprocessing
the data to be clustered to similar scales. Standard preprocessing
adjusts each feature to have uniform variation. Norm preprocessing
adjusts treats the set of features like a vector and
each measurement is scaled to length 1.
You can also pass an instance of a sklearn preprocessing module.
See preprocessing methods in scikit-learn preprocessing for further
details. If object, must be sklearn.preprocessing-like.
If you are getting the cluster centers using the decomposition
results (cluster_source_for_centers=”decomposition”) you can define how
many PCA components to use. If set to None the method uses the
estimate of significant components found in the decomposition step
using the elbow method and stored in the
learning_results.number_significant_components attribute.
Use distance,silhouette analysis or gap statistics to estimate
the optimal number of clusters.
Gap is believed to be, overall, the best metric but it’s also
the slowest. Elbow measures the distances between points in
each cluster as an estimate of how well grouped they are and
is the fastest metric.
For elbow the optimal k is the knee or elbow point.
For gap the optimal k is the first k gap(k)>= gap(k+1)-std_error
For silhouette the optimal k will be one of the “maxima” found with
this method
Number of references to use in gap statistics method
Gap statistics compares the results from clustering the data to
clustering uniformly distributed data. As clustering has
a random variation it is typically averaged n_ref times
to get an statistical average.
Number of clusters to find using the one of the pre-defined methods
“kmeans”,”agglomerative”,”minibatchkmeans”,”spectralclustering”
See sklearn.cluster for details
Estimate the Poissonian noise variance of the signal.
The variance is stored in the
metadata.Signal.Noise_properties.variance attribute.
The Poissonian noise variance is equal to the expected value. With the
default arguments, this method simply sets the variance attribute to
the given expected_value. However, more generally (although then the
noise is not strictly Poissonian), the variance may be proportional to
the expected value. Moreover, when the noise is a mixture of white
(Gaussian) and Poissonian noise, the variance is described by the
following linear model:
\[\mathrm{Var}[X] = (a * \mathrm{E}[X] + b) * c\]
Where a is the gain_factor, b is the gain_offset (the Gaussian
noise variance) and c the correlation_factor. The correlation
factor accounts for correlation of adjacent signal elements that can
be modeled as a convolution with a Gaussian point spread function.
a in the above equation. Must be positive. If None, take the
value from metadata.Signal.Noise_properties.Variance_linear_model
if defined. Otherwise, suppose pure Poissonian noise (i.e.gain_factor=1). If not None, the value is stored in
metadata.Signal.Noise_properties.Variance_linear_model.
b in the above equation. Must be positive. If None, take the
value from metadata.Signal.Noise_properties.Variance_linear_model
if defined. Otherwise, suppose pure Poissonian noise (i.e.gain_offset=0). If not None, the value is stored in
metadata.Signal.Noise_properties.Variance_linear_model.
c in the above equation. Must be positive. If None, take the
value from metadata.Signal.Noise_properties.Variance_linear_model
if defined. Otherwise, suppose pure Poissonian noise (i.e.correlation_factor=1). If not None, the value is stored in
metadata.Signal.Noise_properties.Variance_linear_model.
If None, returns all components/loadings.
If an int, returns components/loadings with ids from 0 to the
given value.
If a list of ints, returns components/loadings with ids provided in
the given list.
The extension of the format that you wish to save to. default
is 'hspy'. The format determines the kind of output:
For image formats ('tif', 'png', 'jpg', etc.),
plots are created using the plotting flags as below, and saved
at 600 dpi. One plot is saved per loading.
For multidimensional formats ('rpl', 'hspy'), arrays
are saved in single files. All loadings are contained in the
one file.
For spectral formats ('msa'), each loading is saved to a
separate file.
If True, one file will be created for each factor and loading.
Otherwise, only two files will be created, one for
the factors and another for the loadings. The default value can
be chosen in the preferences.
if None, returns all clusters/centers.
if int, returns clusters/centers with ids from 0 to
given int.
if list of ints, returnsclusters/centers with ids in
given list.
If True, on exporting a file per center will
be created. Otherwise only two files will be created, one for
the centers and another for the membership. The default value can
be chosen in the preferences.
If None, returns all components/loadings.
If an int, returns components/loadings with ids from 0 to the
given value.
If a list of ints, returns components/loadings with ids provided in
the given list.
The extension of the format that you wish to save to. default
is 'hspy'. The format determines the kind of output:
For image formats ('tif', 'png', 'jpg', etc.),
plots are created using the plotting flags as below, and saved
at 600 dpi. One plot is saved per loading.
For multidimensional formats ('rpl', 'hspy'), arrays
are saved in single files. All loadings are contained in the
one file.
For spectral formats ('msa'), each loading is saved to a
separate file.
If True, one file will be created for each factor and loading.
Otherwise, only two files will be created, one for
the factors and another for the loadings. The default value can
be chosen in the preferences.
Apply an
apodization window
before calculating the FFT in order to suppress streaks.
Valid string values are {'hann' or 'hamming' or 'tukey'}
If True or 'hann', applies a Hann window.
If 'hamming' or 'tukey', applies Hamming or Tukey
windows, respectively (default is False).
If None, rebuilds signal instance from all components
If int, rebuilds signal instance from components in range 0-given int
If list of ints, rebuilds signal instance from only components in given list
If False the cluster label signal has a navigation axes of length
number_of_clusters and the signal along the the navigation
direction is binary - 0 the point is not in the cluster, 1 it is
included. If True, the cluster labels are merged (no navigation
axes). The value of the signal at any point will be between -1 and
the number of clusters. -1 represents the points that
were masked for cluster analysis if any.
If True and tmp_parameters.filename is defined
(which is always the case when the Signal has been read from a
file), the filename stored in the metadata is modified by
appending an underscore and the current indices in parentheses.
Get the dimension parameters from the Signal’s underlying data.
Useful when the data structure was externally modified, or when the
spectrum image was not loaded from a file
More sophisticated algorithms for determining the bins can be used
by passing a string as the bins argument. Other than the 'blocks'
and 'knuth' methods, the available algorithms are the same as
numpy.histogram().
Note: The lazy version of the algorithm only supports "scott"
and "fd" as a string argument for bins.
If bins is an int, it defines the number of equal-width
bins in the given range. If bins is a
sequence, it defines the bin edges, including the rightmost
edge, allowing for non-uniform bin widths.
If bins is a string from the list below, will use
the method chosen to calculate the optimal bin width and
consequently the number of bins (see Notes for more detail on
the estimators) from the data that falls within the requested
range. While the bin width will be optimal for the actual data
in the range, the number of bins will be computed to fill the
entire range, including the empty portions. For visualisation,
using the 'auto' option is suggested. Weighted data is not
supported for automated bin size selection.
‘auto’
Maximum of the ‘sturges’ and ‘fd’ estimators. Provides good
all around performance.
‘fd’ (Freedman Diaconis Estimator)
Robust (resilient to outliers) estimator that takes into
account data variability and data size.
‘doane’
An improved version of Sturges’ estimator that works better
with non-normal datasets.
‘scott’
Less robust estimator that that takes into account data
variability and data size.
‘stone’
Estimator based on leave-one-out cross-validation estimate of
the integrated squared error. Can be regarded as a generalization
of Scott’s rule.
‘rice’
Estimator does not take variability into account, only data
size. Commonly overestimates number of bins required.
‘sturges’
R’s default method, only accounts for data size. Only
optimal for gaussian data and underestimates number of bins
for large non-gaussian datasets.
‘sqrt’
Square root (of data size) estimator, used by Excel and
other programs for its speed and simplicity.
‘knuth’
Knuth’s rule is a fixed-width, Bayesian approach to determining
the optimal bin width of a histogram.
‘blocks’
Determination of optimal adaptive-width histogram bins using
the Bayesian Blocks algorithm.
When estimating the bins using one of the str methods, the
number of bins is capped by this number to avoid a MemoryError
being raised by numpy.histogram().
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
**kwargs
other keyword arguments (weight and density) are described in
numpy.histogram().
>>> s=hs.signals.Signal1D(np.random.normal(size=(10,100)))>>> # Plot the data histogram>>> s.get_histogram().plot()>>> # Plot the histogram of the signal at the current coordinates>>> s.get_current_signal().get_histogram().plot()
This function computes the real part of the inverse of the discrete
Fourier Transform over the signal axes by means of the Fast Fourier
Transform (FFT) as implemented in numpy.
If None, the shift option will be set to the original status
of the FFT using the value in metadata. If no FFT entry is
present in metadata, the parameter will be set to False.
If True, the origin of the FFT will be shifted to the centre.
If False, the origin will be kept at (0, 0)
(default is None).
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
The integration is performed using
Simpson’s rule if
axis.is_binned is False and simple summation over the given axis
if True (along binned axes, the detector already provides
integrated counts per bin).
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
:class:`hyperspy.axes.DataAxis` or :class:`hyperspy.axes.FunctionalDataAxis`
Axis which replaces the one specified by the axis argument.
If this new axis exceeds the range of the old axis,
a warning is raised that the data will be extrapolated.
Specifies the axis which will be replaced using the index of the
axis in the axes_manager. The axis can be specified using the index of the
axis in axes_manager or the axis name.
If True the data of self is replaced by the result and
the axis is changed inplace. Otherwise self is not changed
and a new signal with the changes incorporated is returned.
degree: int, default 1
Specifies the B-Spline degree of the used interpolator.
Apply a function to the signal data at all the navigation
coordinates.
The function must operate on numpy arrays. It is applied to the data at
each navigation coordinate pixel-py-pixel. Any extra keyword arguments
are passed to the function. The keywords can take different values at
different coordinates. If the function takes an axis or axes
argument, the function is assumed to be vectorized and the signal axes
are assigned to axis or axes. Otherwise, the signal is iterated
over the navigation axes and a progress bar is displayed to monitor the
progress.
In general, only navigation axes (order, calibration, and number) are
guaranteed to be preserved.
Any function that can be applied to the signal. This function should
not alter any mutable input arguments or input data. So do not do
operations which alter the input, without copying it first.
For example, instead of doing image *= mask, rather do
image = image * mask. Likewise, do not do image[5, 5] = 10
directly on the input data or arguments, but make a copy of it
first. For example via image = copy.deepcopy(image).
If True, the output will be returned as a lazy signal. This means
the calculation itself will be delayed until either compute() is used,
or the signal is stored as a file.
If False, the output will be returned as a non-lazy signal, this
means the outputs will be calculated directly, and loaded into memory.
If None the output will be lazy if the input signal is lazy, and
non-lazy if the input signal is non-lazy.
Indicates if the results for each navigation pixel are of identical
shape (and/or numpy arrays to begin with). If None,
the output signal will be ragged only if the original signal is ragged.
Set the navigation_chunks argument to a tuple of integers to split
the navigation axes into chunks. This can be useful to enable
using multiple cores with signals which are less that 100 MB.
This argument is passed to rechunk().
Since the size and dtype of the signal dimension of the output
signal can be different from the input signal, this output signal
size must be calculated somehow. If both output_signal_size
and output_dtype is None, this is automatically determined.
However, if for some reason this is not working correctly, this
can be specified via output_signal_size and output_dtype.
The most common reason for this failing is due to the signal size
being different for different navigation positions. If this is the
case, use ragged=True. None is default.
All extra keyword arguments are passed to the provided function
Notes
If the function results do not have identical shapes, the result is an
array of navigation shape, where each element corresponds to the result
of the function (of arbitrary object type), called a “ragged array”. As
such, most functions are not able to operate on the result and the data
should be used directly.
This method is similar to Python’s map() that can
also be utilized with a BaseSignal
instance for similar purposes. However, this method has the advantage of
being faster because it iterates the underlying numpy data array
instead of the BaseSignal.
Currently requires a uniform axis.
Examples
Apply a Gaussian filter to all the images in the dataset. The sigma
parameter is constant:
Rotate the two signal dimensions, with different amount as a function
of navigation index. Delay the calculation by getting the output
lazily. The calculation is then done using the compute method.
Rotate the two signal dimensions, with different amount as a function
of navigation index. In addition, the output is returned as a new
signal, instead of replacing the old signal.
If you want some more control over computing a signal that isn’t lazy
you can always set lazy_output to True and then compute the signal setting
the scheduler to ‘threading’, ‘processes’, ‘single-threaded’ or ‘distributed’.
Additionally, you can set the navigation_chunks argument to a tuple of
integers to split the navigation axes into chunks. This can be useful if your
signal is less that 100 mb but you still want to use multiple cores.
Either one on its own, or many axes in a tuple can be passed. In
both cases the axes can be passed directly, or specified using the
index in axes_manager or the name of the axis. Any duplicates are
removed. If None, the operation is performed over all navigation
axes (default).
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
Either one on its own, or many axes in a tuple can be passed. In
both cases the axes can be passed directly, or specified using the
index in axes_manager or the name of the axis. Any duplicates are
removed. If None, the operation is performed over all navigation
axes (default).
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
Either one on its own, or many axes in a tuple can be passed. In
both cases the axes can be passed directly, or specified using the
index in axes_manager or the name of the axis. Any duplicates are
removed. If None, the operation is performed over all navigation
axes (default).
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
Each target component is divided by the output of function(target).
The function must return a scalar when operating on numpy arrays and
must have an axis argument.
Each target component is divided by the output of function(target).
The function must return a scalar when operating on numpy arrays and
must have an axis argument.
For multidimensional datasets an optional figure,
the “navigator”, with a cursor to navigate that data is
raised. In any case it is possible to navigate the data using
the sliders. Currently only signals with signal_dimension equal to
0, 1 and 2 can be plotted.
Allowed string values are ``’auto’``, ``’slider’``, and ``’spectrum’``.
If 'auto':
If navigation_dimension > 0, a navigator is
provided to explore the data.
If navigation_dimension is 1 and the signal is an image
the navigator is a sum spectrum obtained by integrating
over the signal axes (the image).
If navigation_dimension is 1 and the signal is a spectrum
the navigator is an image obtained by stacking all the
spectra in the dataset horizontally.
If navigation_dimension is > 1, the navigator is a sum
image obtained by integrating the data over the signal axes.
Additionally, if navigation_dimension > 2, a window
with one slider per axis is raised to navigate the data.
For example, if the dataset consists of 3 navigation axes “X”,
“Y”, “Z” and one signal axis, “E”, the default navigator will
be an image obtained by integrating the data over “E” at the
current “Z” index and a window with sliders for the “X”, “Y”,
and “Z” axes will be raised. Notice that changing the “Z”-axis
index changes the navigator in this case.
For lazy signals, the navigator will be calculated using the
compute_navigator()
method.
If 'slider':
If navigationdimension > 0 a window with one slider per
axis is raised to navigate the data.
If 'spectrum':
If navigation_dimension > 0 the navigator is always a
spectrum obtained by integrating the data over all other axes.
Not supported for lazy signals, the 'auto' option will
be used instead.
If None, no navigator will be provided.
Alternatively a BaseSignal (or subclass)
instance can be provided. The navigation or signal shape must
match the navigation shape of the signal to plot or the
navigation_shape + signal_shape must be equal to the
navigator_shape of the current object (for a dynamic navigator).
If the signal dtype is RGB or RGBA this parameter has no effect and
the value is always set to 'slider'.
The function used to normalize the data prior to plotting.
Allowable strings are: 'auto', 'linear', 'log'.
If 'auto', intensity is plotted on a linear scale except when
power_spectrum=True (only for complex signals).
The string must contain any combination of the 'x' and 'v'
characters. If 'x' or 'v' (for values) are in the string, the
corresponding horizontal or vertical axis limits are set to their
maxima and the axis limits will reset when the data or the
navigation indices are changed. Default is 'v'.
Plot factors from blind source separation results. In case of 1D
signal axis, each factors line can be toggled on and off by clicking
on their corresponding line in the legend.
If comp_ids is None, maps of all components will be
returned. If it is an int, maps of components with ids from 0 to
the given value will be returned. If comp_ids is a list of
ints, maps of components with ids contained in the list will be
returned.
Plot loadings from blind source separation results. In case of 1D
navigation axis, each loading line can be toggled on and off by
clicking on their corresponding line in the legend.
If comp_ids=None, maps of all components will be
returned. If it is an int, maps of components with ids from 0 to
the given value will be returned. If comp_ids is a list of
ints, maps of components with ids contained in the list will be
returned.
One of: 'all', 'ticks', 'off', or None
Controls how the axes are displayed on each image;
default is 'all'
If 'all', both ticks and axis labels will be shown
If 'ticks', no axis labels will be shown, but ticks/labels will
If 'off', all decorations and frame will be disabled
If None, no axis decorations will be shown, but ticks/frame will
Plot the blind source separation factors and loadings.
Unlike plot_bss_factors() and
plot_bss_loadings(),
this method displays one component at a time. Therefore it provides a
more compact visualization than then other two methods.
The loadings and factors are displayed in different windows and each
has its own navigator/sliders to navigate them if they are
multidimensional. The component index axis is synchronized between
the two.
One of: 'smart_auto', 'auto', None, 'spectrum' or a
BaseSignal object.
'smart_auto' (default) displays sliders if the navigation
dimension is less than 3. For a description of the other options
see the plot() documentation
for details.
Currently HyperSpy cannot plot a signal when the signal dimension is
higher than two. Therefore, to visualize the BSS results when the
factors or the loadings have signal dimension greater than 2,
the data can be viewed as spectra (or images) by setting this
parameter to 1 (or 2). (The default is 2)
if None (default), returns maps of all components using the
number_of_cluster was defined when
executing cluster. Otherwise it raises a ValueError.
if int, returns maps of cluster labels with ids from 0 to
given int.
if list of ints, returns maps of cluster labels with ids in
given list.
the number of plots in each row, when the same_window
parameter is True.
axes_decorNone or str {‘all’, ‘ticks’, ‘off’}, optional
Controls how the axes are displayed on each image; default is ‘all’
If ‘all’, both ticks and axis labels will be shown
If ‘ticks’, no axis labels will be shown, but ticks/labels will
If ‘off’, all decorations and frame will be disabled
If None, no axis decorations will be shown, but ticks/frame will
Plot cluster labels from a cluster analysis. In case of 1D navigation axis,
each loading line can be toggled on and off by clicking on the legended
line.
if None (default), returns maps of all components using the
number_of_cluster was defined when
executing cluster. Otherwise it raises a ValueError.
if int, returns maps of cluster labels with ids from 0 to
given int.
if list of ints, returns maps of cluster labels with ids in
given list.
the number of plots in each row, when the same_window
parameter is True.
axes_decorNone or str {'all', 'ticks', 'off'}, default 'all'
Controls how the axes are displayed on each image; default is ‘all’
If ‘all’, both ticks and axis labels will be shown
If ‘ticks’, no axis labels will be shown, but ticks/labels will
If ‘off’, all decorations and frame will be disabled
If None, no axis decorations will be shown, but ticks/frame will
Unlike plot_cluster_labels() and
plot_cluster_signals(), this
method displays one component at a time.
Therefore it provides a more compact visualization than then other
two methods. The labels and centers are displayed in different
windows and each has its own navigator/sliders to navigate them if
they are multidimensional. The component index axis is synchronized
between the two.
Currently HyperSpy cannot plot signals of dimension higher than
two. Therefore, to visualize the clustering results when the
centers or the labels have signal dimension greater than 2
we can view the data as spectra(images) by setting this parameter
to 1(2)
If None, returns maps of all clusters.
If int, returns maps of clusters with ids from 0 to given
int.
If list of ints, returns maps of clusters with ids in
given list.
Plot factors from a decomposition. In case of 1D signal axis, each
factors line can be toggled on and off by clicking on their
corresponding line in the legend.
If comp_ids is None, maps of all components will be
returned if the output_dimension was defined when executing
decomposition(). Otherwise it
raises a ValueError.
If comp_ids is an int, maps of components with ids from 0 to
the given value will be returned. If comp_ids is a list of
ints, maps of components with ids contained in the list will be
returned.
If comp_ids is None, maps of all components will be
returned if the output_dimension was defined when executing
decomposition().
Otherwise it raises a ValueError.
If comp_ids is an int, maps of components with ids from 0 to
the given value will be returned. If comp_ids is a list of
ints, maps of components with ids contained in the list will be
returned.
One of: 'all', 'ticks', 'off', or None
Controls how the axes are displayed on each image; default is
'all'
If 'all', both ticks and axis labels will be shown.
If 'ticks', no axis labels will be shown, but ticks/labels will.
If 'off', all decorations and frame will be disabled.
If None, no axis decorations will be shown, but ticks/frame
will.
Unlike plot_decomposition_factors()
and plot_decomposition_loadings(),
this method displays one component at a time. Therefore it provides a
more compact visualization than then other two methods. The loadings
and factors are displayed in different windows and each has its own
navigator/sliders to navigate them if they are multidimensional. The
component index axis is synchronized between the two.
One of: 'smart_auto', 'auto', None, 'spectrum' or a
BaseSignal object.
'smart_auto' (default) displays sliders if the navigation
dimension is less than 3. For a description of the other options
see the plot() documentation
for details.
Currently HyperSpy cannot plot a signal when the signal dimension is
higher than two. Therefore, to visualize the BSS results when the
factors or the loadings have signal dimension greater than 2,
the data can be viewed as spectra (or images) by setting this
parameter to 1 (or 2). (The default is 2)
Threshold used to determine how many components should be
highlighted as signal (as opposed to noise).
If a float (between 0 and 1), threshold will be
interpreted as a cutoff value, defining the variance at which to
draw a line showing the cutoff between signal and noise;
the number of signal components will be automatically determined
by the cutoff value.
If an int, threshold is interpreted as the number of
components to highlight as signal (and no cutoff line will be
drawn)
hline: {‘auto’, True, False}
Whether or not to draw a horizontal line illustrating the variance
cutoff for signal/noise determination. Default is to draw the line
at the value given in threshold (if it is a float) and not
draw in the case threshold is an int, or not given.
If True, (and threshold is an int), the line will be drawn
through the last component defined as signal.
If False, the line will not be drawn in any circumstance.
vline: bool, default False
Whether or not to draw a vertical line illustrating an estimate of
the number of significant components. If True, the line will be
drawn at the the knee or elbow position of the curve indicating the
number of significant components.
If False, the line will not be drawn in any circumstance.
xaxis_type{‘index’, ‘number’}
Determines the type of labeling applied to the x-axis.
If 'index', axis will be labeled starting at 0 (i.e.
“pythonic index” labeling); if 'number', it will start at 1
(number labeling).
Determines the format of the x-axis tick labels. If 'ordinal',
“1st, 2nd, …” will be used; if 'cardinal', “1, 2,
…” will be used. If None, an appropriate default will be
selected.
Prints the five-number summary statistics of the data, the mean, and
the standard deviation.
Prints the mean, standard deviation (std), maximum (max), minimum
(min), first quartile (Q1), median, and third quartile. nans are
removed from the calculations.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
Rebin the signal into a smaller or larger shape, based on linear
interpolation. Specify eithernew_shape or scale. Scale of 1
means no binning and scale less than one results in up-sampling.
For each dimension, specify the new:old pixel ratio, e.g. a ratio
of 1 is no binning and a ratio of 2 means that each pixel in the new
spectrum is twice the size of the pixels in the old spectrum.
The length of the list should match the dimension of the
Signal’s underlying data array.
Note : Only one of ``scale`` or ``new_shape`` should be specified,
otherwise the function will not run
Whether or not to crop the resulting rebinned data (default is
True). When binning by a non-integer number of
pixels it is likely that the final row in each dimension will
contain fewer than the full quota to fill one pixel. For example,
a 5*5 array binned by 2.1 will produce two rows containing
2.1 pixels and one row containing only 0.8 pixels. Selection of
crop=True or crop=False determines whether or not this
“black” line is cropped from the final binned array or not.
Please note that if ``crop=False`` is used, the final row in each
dimension may appear black if a fractional number of pixels are left
over. It can be removed but has been left to preserve total counts
before and after binning.
Specify the dtype of the output. If None, the dtype will be
determined by the behaviour of numpy.sum(), if "same",
the dtype will be kept the same. Default is None.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
By default dtype=None, the dtype is determined by the behaviour of
numpy.sum, in this case, unsigned integer of the same precision as
the platform integer
The axis can be passed directly, or specified using the index of
the axis in the Signal’s axes_manager or the axis name. The axis to roll backwards.
The positions of the other axes do not change relative to one
another.
The axis can be passed directly, or specified using the index of
the axis in the Signal’s axes_manager or the axis name. The axis is rolled until it lies before this other axis.
If True, the location of the data in memory is optimised for the
fastest iteration over the navigation axes. This operation can
cause a peak of memory usage and requires considerable processing
times for large datasets and/or low specification hardware.
See the Transposing (changing signal spaces) section of the HyperSpy user guide
for more information. When operating on lazy signals, if True,
the chunks are optimised for the new axes configuration.
The function gets the format from the specified extension (see
Supported formats in the User Guide for more information):
'hspy' for HyperSpy’s HDF5 specification
'rpl' for Ripple (useful to export to Digital Micrograph)
'msa' for EMSA/MSA single spectrum saving.
'unf' for SEMPER unf binary format.
'blo' for Blockfile diffraction stack saving.
Many image formats such as 'png', 'tiff', 'jpeg'…
If no extension is provided the default file format as defined
in the preferences is used.
Please note that not all the formats supports saving datasets of
arbitrary dimensions, e.g. 'msa' only supports 1D data, and
blockfiles only supports image stacks with a navigation_dimension < 2.
Each format accepts a different set of parameters. For details
see the specific format documentation.
If None (default) and tmp_parameters.filename and
tmp_parameters.folder are defined, the
filename and path will be taken from there. A valid
extension can be provided e.g. 'my_file.rpl'
(see extension parameter).
The extension of the file that defines the file format.
Allowable string values are: {'hspy', 'hdf5', 'rpl',
'msa', 'unf', 'blo', 'emd', and common image
extensions e.g. 'tiff', 'png', etc.}
'hspy' and 'hdf5' are equivalent. Use 'hdf5' if
compatibility with HyperSpy versions older than 1.2 is required.
If None, the extension is determined from the following list in
this order:
HyperSpy, Nexus and EMD NCEM format only. Define chunks used when
saving. The chunk shape should follow the order of the array
(s.data.shape), not the shape of the axes_manager.
If None and lazy signal, the dask array chunking is used.
If None and non-lazy signal, the chunks are estimated automatically
to have at least one chunk per signal space.
If True, the chunking is determined by the the h5py guess_chunk
function.
Nexus file only. Option to save hyperspy.original_metadata with
the signal. A loaded Nexus file may have a large amount of data
when loaded which you may wish to omit on saving
Nexus file only. Define the default dataset in the file.
If set to True the signal or first signal in the list of signals
will be defined as the default (following Nexus v3 data rules).
Only for hspy files. If True, write the dataset, otherwise, don’t
write it. Useful to save attributes without having to write the
whole dataset. Default is True.
Set the signal type and convert the current signal accordingly.
The signal_type attribute specifies the type of data that the signal
contains e.g. electron energy-loss spectroscopy data,
photoemission spectroscopy data, etc.
When setting signal_type to a “known” type, HyperSpy converts the
current signal to the most appropriate
BaseSignal subclass. Known signal types are
signal types that have a specialized
BaseSignal subclass associated, usually
providing specific features for the analysis of that type of signal.
HyperSpy ships with a minimal set of known signal types. External
packages can register extra signal types. To print a list of
registered signal types in the current installation, call
print_known_signal_types(), and see
the developer guide for details on how to add new signal_types.
A non-exhaustive list of HyperSpy extensions is also maintained
here: hyperspy/hyperspy-extensions-list.
If no arguments are passed, the signal_type is set to undefined
and the current signal converted to a generic signal subclass.
Otherwise, set the signal_type to the given signal
type or to the signal type corresponding to the given signal type
alias. Setting the signal_type to a known signal type (if exists)
is highly advisable. If none exists, it is good practice
to set signal_type to a value that best describes the data signal
type.
The split can be defined by giving the number_of_parts, a homogeneous
step size, or a list of customized step sizes. By default ('auto'),
the function is the reverse of stack().
The axis can be passed directly, or specified using the index of
the axis in the Signal’s axes_manager or the axis name.
If 'auto' and if the object has been created with
stack() (and stack_metadata=True),
this method will return the former list of signals (information
stored in metadata._HyperSpy.Stacking_history).
If it was not created with stack(),
the last navigation axis will be used.
Number of parts in which the spectrum image will be split. The
splitting is homogeneous. When the axis size is not divisible
by the number_of_parts the remainder data is lost without
warning. If number_of_parts and step_sizes is 'auto',
number_of_parts equals the length of the axis,
step_sizes equals one, and the axis is suppressed from each
sub-spectrum.
Either one on its own, or many axes in a tuple can be passed. In
both cases the axes can be passed directly, or specified using the
index in axes_manager or the name of the axis. Any duplicates are
removed. If None, the operation is performed over all navigation
axes (default).
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
Either one on its own, or many axes in a tuple can be passed. In
both cases the axes can be passed directly, or specified using the
index in axes_manager or the name of the axis. Any duplicates are
removed. If None, the operation is performed over all navigation
axes (default).
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
If you intend to calculate the numerical integral of an unbinned signal,
please use the integrate1D() function
instead. To avoid erroneous misuse of the sum function as integral,
it raises a warning when working with an unbinned, non-uniform axis.
If True, the location of the data in memory is optimised for the
fastest iteration over the navigation axes. This operation can
cause a peak of memory usage and requires considerable processing
times for large datasets and/or low specification hardware.
See the Transposing (changing signal spaces) section of the HyperSpy user guide
for more information. When operating on lazy signals, if True,
the chunks are optimised for the new axes configuration.
Transfer data array from host to GPU device memory using cupy.asarray.
Lazy signals are not supported by this method, see user guide for
information on how to process data lazily using the GPU.
If True, the location of the data in memory is optimised for the
fastest iteration over the navigation axes. This operation can
cause a peak of memory usage and requires considerable processing
times for large datasets and/or low specification hardware.
See the Transposing (changing signal spaces) section of the HyperSpy user guide
for more information. When operating on lazy signals, if True,
the chunks are optimised for the new axes configuration.
With the exception of both axes parameters (signal_axes and
navigation_axes getting iterables, generally one has to be None
(i.e. “floating”). The other one specifies either the required number
or explicitly the indices of axes to move to the corresponding space.
If both are iterables, full control is given as long as all axes
are assigned to one space only.
Examples
>>> # just create a signal with many distinct dimensions>>> s=hs.signals.BaseSignal(np.random.rand(1,2,3,4,5,6,7,8,9))>>> s<BaseSignal, title: , dimensions: (|9, 8, 7, 6, 5, 4, 3, 2, 1)>
>>> s.transpose()# swap signal and navigation spaces<BaseSignal, title: , dimensions: (9, 8, 7, 6, 5, 4, 3, 2, 1|)>
>>> s.T# a shortcut for no arguments<BaseSignal, title: , dimensions: (9, 8, 7, 6, 5, 4, 3, 2, 1|)>
>>> # roll to leave 5 axes in navigation space>>> s.transpose(signal_axes=5)<BaseSignal, title: , dimensions: (4, 3, 2, 1|9, 8, 7, 6, 5)>
>>> # 3 explicitly defined axes in signal space>>> s.transpose(signal_axes=[0,2,6])<BaseSignal, title: , dimensions: (8, 6, 5, 4, 2, 1|9, 7, 3)>
>>> # A mix of two lists, but specifying all axes explicitly>>> # The order of axes is preserved in both lists>>> s.transpose(navigation_axes=[1,2,3,4,5,8],signal_axes=[0,6,7])<BaseSignal, title: , dimensions: (8, 7, 6, 5, 4, 1|9, 3, 2)>
Use this function together with a with statement to have the
signal be unfolded for the scope of the with block, before
automatically refolding when passing out of scope.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.
Either one on its own, or many axes in a tuple can be passed. In
both cases the axes can be passed directly, or specified using the
index in axes_manager or the name of the axis. Any duplicates are
removed. If None, the operation is performed over all navigation
axes (default).
If None, a new Signal is created with the result of the
operation and returned (default). If a Signal is passed,
it is used to receive the output of the operation, and nothing is
returned.
Only has effect when operating on lazy signal. Default False,
which means the chunking structure will be retained. If True,
the data may be automatically rechunked before performing this
operation.