hyperspy.learn package¶
Submodules¶
hyperspy.learn.mlpca module¶

hyperspy.learn.mlpca.
mlpca
(X, varX, p, convlim=1e10, maxiter=50000, fast=False)¶ This function performs MLPCA with missing data.
Parameters:  X (numpy array) – is the mxn matrix of observations.
 stdX (numpy array) – is the mxn matrix of standard deviations associated with X (zeros for missing measurements).
 p (int) – The model dimensionality.
Returns:  U,S,V (numpy array) – are the pseudosvd parameters.
 Sobj (numpy array) – is the value of the objective function.
 ErrFlag ({0, 1}) – indicates exit conditions: 0 = nkmal termination 1 = max iterations exceeded.
hyperspy.learn.mva module¶

class
hyperspy.learn.mva.
LearningResults
¶ Bases:
object

bss_algorithm
= None¶

bss_factors
= None¶

bss_loadings
= None¶

centre
= None¶

crop_decomposition_dimension
(n, compute=False)¶ Crop the score matrix up to the given number. It is mainly useful to save memory and reduce the storage size
Parameters:  n (int) – Number of components to keep.
 compute (bool) – If the decomposition results are lazy, also compute the results. Default is False.

decomposition_algorithm
= None¶

explained_variance
= None¶

explained_variance_ratio
= None¶

factors
= None¶

load
(filename)¶  Load the results of a previous decomposition and
 demixing analysis from a file.
Parameters: filename (string) –

loadings
= None¶

mean
= None¶

original_shape
= None¶

output_dimension
= None¶

poissonian_noise_normalized
= None¶

save
(filename, overwrite=None)¶ Save the result of the decomposition and demixing analysis :param filename: :type filename: string :param overwrite: If True(False) overwrite(don’t overwrite) the file if it exists.
If None (default) ask what to do if file exists.

signal_mask
= None¶

summary
()¶ Prints a summary of the decomposition and demixing parameters to the stdout

unfolded
= None¶

unmixing_matrix
= None¶


class
hyperspy.learn.mva.
MVA
¶ Bases:
object
Multivariate analysis capabilities for the Signal1D class.

blind_source_separation
(number_of_components=None, algorithm='sklearn_fastica', diff_order=1, diff_axes=None, factors=None, comp_list=None, mask=None, on_loadings=False, pretreatment=None, compute=False, **kwargs)¶ Blind source separation (BSS) on the result on the decomposition.
Available algorithms: FastICA, JADE, CuBICA, and TDSEP
Parameters:  number_of_components (int) – number of principal components to pass to the BSS algorithm
 algorithm ({FastICA, JADE, CuBICA, TDSEP}) –
 diff_order (int) – Sometimes it is convenient to perform the BSS on the derivative of the signal. If diff_order is 0, the signal is not differentiated.
 diff_axes (None or list of ints or strings) – If None, when diff_order is greater than 1 and signal_dimension (navigation_dimension) when on_loadings is False (True) is greater than 1, the differences are calculated across all signal (navigation) axes. Otherwise the axes can be specified in a list.
 factors (Signal or numpy array.) – Factors to decompose. If None, the BSS is performed on the factors of a previous decomposition. If a Signal instance the navigation dimension must be 1 and the size greater than 1.
 comp_list (boolen numpy array) –
 choose the components to use by the boolean list. It permits
 to choose non contiguous components.
 mask (bool numpy array or Signal instance.) – If not None, the signal locations marked as True are masked. The mask shape must be equal to the signal shape (navigation shape) when on_loadings is False (True).
 on_loadings (bool) – If True, perform the BSS on the loadings of a previous decomposition. If False, performs it on the factors.
 pretreatment (dict) –
 compute (bool) – If the decomposition results are lazy, compute the BSS components so that they are not lazy. Default is False.
 **kwargs (extra key word arguments) – Any keyword arguments are passed to the BSS algorithm.
 documentation is here, with more arguments that can be passed as **kwargs (FastICA) –
 http (//scikitlearn.org/stable/modules/generated/sklearn.decomposition.FastICA.html) –

decomposition
(normalize_poissonian_noise=False, algorithm='svd', output_dimension=None, centre=None, auto_transpose=True, navigation_mask=None, signal_mask=None, var_array=None, var_func=None, polyfit=None, reproject=None, return_info=False, **kwargs)¶ Decomposition with a choice of algorithms
The results are stored in self.learning_results
Parameters:  normalize_poissonian_noise (bool) – If True, scale the SI to normalize Poissonian noise
 algorithm ('svd'  'fast_svd'  'mlpca'  'fast_mlpca'  'nmf' ) – ‘sparse_pca’  ‘mini_batch_sparse_pca’  ‘RPCA_GoDec’  ‘ORPCA’
 output_dimension (None or int) – number of components to keep/calculate
 centre (None  'variables'  'trials') – If None no centring is applied. If ‘variable’ the centring will be performed in the variable axis. If ‘trials’, the centring will be performed in the ‘trials’ axis. It only has effect when using the svd or fast_svd algorithms
 auto_transpose (bool) – If True, automatically transposes the data to boost performance. Only has effect when using the svd of fast_svd algorithms.
 navigation_mask (boolean numpy array) – The navigation locations marked as True are not used in the decompostion.
 signal_mask (boolean numpy array) – The signal locations marked as True are not used in the decomposition.
 var_array (numpy array) – Array of variance for the maximum likelihood PCA algorithm
 var_func (function or numpy array) – If function, it will apply it to the dataset to obtain the var_array. Alternatively, it can a an array with the coefficients of a polynomial.
 reproject (None  signal  navigation  both) – If not None, the results of the decomposition will be projected in the selected masked area.
 return_info (bool, default False) – The result of the decomposition is stored internally. However, some algorithms generate some extra information that is not stored. If True (the default is False) return any extra information if available
Returns: (X, E) – If ‘algorithm’ == ‘RPCA_GoDec’ or ‘ORPCA’ and ‘return_info’ is True, returns the lowrank (X) and sparse (E) matrices from robust PCA.
Return type: (numpy array, numpy array)
See also
plot_decomposition_factors()
,plot_decomposition_loadings()
,plot_lev()

get_bss_model
(components=None)¶ Return the spectrum generated with the selected number of independent components
Parameters: components (None, int, or list of ints) – if None, rebuilds SI from all components if int, rebuilds SI from components in range 0given int if list of ints, rebuilds SI from only components in given list Returns: Return type: Signal instance

get_decomposition_model
(components=None)¶ Return the spectrum generated with the selected number of principal components
Parameters: components (None, int, or list of ints) – if None, rebuilds SI from all components if int, rebuilds SI from components in range 0given int if list of ints, rebuilds SI from only components in given list Returns: Return type: Signal instance

get_explained_variance_ratio
()¶ Return the explained variation ratio of the PCA components as a Signal1D.
Returns:  s (Signal1D) – Explained variation ratio.
 See Also
 ———
 plot_explained_variance_ration, decomposition,
 get_decomposition_loadings,
 get_decomposition_factors.

normalize_bss_components
(target='factors', function=<function sum>)¶ Normalize BSS components.
Parameters:  target ({"factors", "loadings"}) –
 function (numpy universal function, optional, default np.sum) – Each target component is divided by the output of function(target). function must return a scalar when operating on numpy arrays and must have an axis.

normalize_decomposition_components
(target='factors', function=<function sum>)¶ Normalize decomposition components.
Parameters:  target ({"factors", "loadings"}) –
 function (numpy universal function, optional, default np.sum) – Each target component is divided by the output of function(target). function must return a scalar when operating on numpy arrays and must have an axis.

normalize_poissonian_noise
(navigation_mask=None, signal_mask=None)¶ Scales the SI following Surf. Interface Anal. 2004; 36: 203–212 to “normalize” the poissonian data for decomposition analysis
Parameters:  navigation_mask (boolen numpy array) –
 signal_mask (boolen numpy array) –

plot_cumulative_explained_variance_ratio
(n=50)¶ Plot the principal components explained variance up to the given number
Parameters: n (int) –

plot_explained_variance_ratio
(n=30, log=True, threshold=0, hline='auto', xaxis_type='index', xaxis_labeling=None, signal_fmt=None, noise_fmt=None, fig=None, ax=None, **kwargs)¶ Plot the decomposition explained variance ratio vs index number (Scree Plot).
Parameters:  n (int or None) – Number of components to plot. If None, all components will be plot
 log (bool) – If True, the y axis uses a log scale.
 threshold (float or int) – Threshold used to determine how many components should be
highlighted as signal (as opposed to noise).
If a float (between 0 and 1),
threshold
will be interpreted as a cutoff value, defining the variance at which to draw a line showing the cutoff between signal and noise; the number of signal components will be automatically determined by the cutoff value. If an int,threshold
is interpreted as the number of components to highlight as signal (and no cutoff line will be drawn)  hline ({'auto', True, False}) – Whether or not to draw a horizontal line illustrating the variance
cutoff for signal/noise determination. Default is to draw the line
at the value given in
threshold
(if it is a float) and not draw in the casethreshold
is an int, or not given. If True, (andthreshold
is an int), the line will be drawn through the last component defined as signal. If False, the line will not be drawn in any circumstance.  xaxis_type ({'index', 'number'}) – Determines the type of labeling applied to the xaxis.
If
'index'
, axis will be labeled starting at 0 (i.e. “pythonic index” labeling); if'number'
, it will start at 1 (number labeling).  xaxis_labeling ({'ordinal', 'cardinal', None}) – Determines the format of the xaxis tick labels. If
'ordinal'
, “1st, 2nd, …” will be used; if'cardinal'
, “1, 2, …” will be used. If None, an appropriate default will be selected.  signal_fmt (dict) – Dictionary of matplotlib formatting values for the signal components
 noise_fmt (dict) – Dictionary of matplotlib formatting values for the noise components
 fig (matplotlib figure or None) – If None, a default figure will be created, otherwise will plot into fig
 ax (matplotlib ax (subplot) or None) – If None, a default ax will be created, otherwise will plot into ax
 **kwargs – remaining keyword arguments are passed to matplotlib.figure()
Example
To generate a scree plot with customized symbols for signal vs. noise components and a modified cutoff threshold value:
>>> s = hs.load("some_spectrum_image") >>> s.decomposition() >>> s.plot_explained_variance_ratio(n=40, >>> threshold=0.005, >>> signal_fmt={'marker': 'v', >>> 's': 150, >>> 'c': 'pink'} >>> noise_fmt={'marker': '*', >>> 's': 200, >>> 'c': 'green'})
Returns: ax Return type: matplotlib.axes

reverse_bss_component
(component_number)¶ Reverse the independent component
Parameters: component_number (list or int) – component index/es Examples
>>> s = hs.load('some_file') >>> s.decomposition(True) # perform PCA >>> s.blind_source_separation(3) # perform ICA on 3 PCs >>> s.reverse_bss_component(1) # reverse IC 1 >>> s.reverse_bss_component((0, 2)) # reverse ICs 0 and 2

reverse_decomposition_component
(component_number)¶ Reverse the decomposition component
Parameters: component_number (list or int) – component index/es Examples
>>> s = hs.load('some_file') >>> s.decomposition(True) # perform PCA >>> s.reverse_decomposition_component(1) # reverse IC 1 >>> s.reverse_decomposition_component((0, 2)) # reverse ICs 0 and 2

undo_treatments
()¶ Undo normalize_poissonian_noise


hyperspy.learn.mva.
centering_and_whitening
(X)¶

hyperspy.learn.mva.
get_derivative
(signal, diff_axes, diff_order)¶
hyperspy.learn.onmf module¶

class
hyperspy.learn.onmf.
ONMF
(rank, lambda1=1.0, kappa=1.0, store_r=False, robust=False)¶ Bases:
object
This class performs Online Robust NMF with missing or corrupted data.

fit
()¶ learn factors from the given data

project
()¶ project the learnt factors on the given data

finish
()¶ return the learnt factors and loadings
Notes
The ONMF code is based on a transcription of the OPGD algorithm MATLAB code obtained from the authors of the following research paper:
Zhao, Renbo, and Vincent YF Tan. “Online nonnegative matrix factorization with outliers.” Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016.It has been updated to also include L2normalization cost function that is able to deal with sparse corruptions and/or outliers slightly faster (please see ORPCA implementation for details).

finish
() Return the learnt factors and loadings.

fit
(X, batch_size=None) Learn NMF components from the data.
Parameters:  X ({numpy.ndarray, iterator}) – [nsamplex x nfeatures] matrix of observations or an iterator that yields samples, each with nfeatures elements.
 batch_size ({None, int}) – If not None, learn the data in batches, each of batch_size samples or less.

project
(X, return_R=False) Project the learnt components on the data.
Parameters:  X ({numpy.ndarray, iterator}) – [nsamplex x nfeatures] matrix of observations or an iterator that yields samples, each with nfeatures elements.
 return_R (bool) – If True, returns the sparse error matrix as well. Otherwise only the weights (loadings)


hyperspy.learn.onmf.
onmf
(X, rank, lambda1=1, kappa=1, store_r=False, project=False, robust=False)¶
hyperspy.learn.rpca module¶

class
hyperspy.learn.rpca.
ORPCA
(rank, fast=False, lambda1=None, lambda2=None, method=None, learning_rate=None, init=None, training_samples=None, momentum=None)¶ Bases:
object

finish
()¶

fit
(X, iterating=None)¶

project
(X)¶


hyperspy.learn.rpca.
orpca
(X, rank, fast=False, lambda1=None, lambda2=None, method=None, learning_rate=None, init=None, training_samples=None, momentum=None)¶ This function performs Online Robust PCA with missing or corrupted data.
Parameters:  X ({numpy array, iterator}) – [nfeatures x nsamples] matrix of observations or an iterator that yields samples, each with nfeatures elements.
 rank (int) – The model dimensionality.
 lambda1 ({None, float}) – Nuclear norm regularization parameter. If None, set to 1 / sqrt(nsamples)
 lambda2 ({None, float}) – Sparse error regularization parameter. If None, set to 1 / sqrt(nsamples)
 method ({None, 'CF', 'BCD', 'SGD', 'MomentumSGD'}) – ‘CF’  Closedform solver ‘BCD’  Blockcoordinate descent ‘SGD’  Stochastic gradient descent ‘MomentumSGD’  Stochastic gradient descent with momentum If None, set to ‘CF’
 learning_rate ({None, float}) – Learning rate for the stochastic gradient descent algorithm If None, set to 1
 init ({None, 'qr', 'rand', np.ndarray}) – ‘qr’  QRbased initialization ‘rand’  Random initialization np.ndarray if the shape [nfeatures x rank]. If None, set to ‘qr’
 training_samples ({None, integer}) – Specifies the number of training samples to use in the ‘qr’ initialization If None, set to 10
 momentum ({None, float}) – Momentum parameter for ‘MomentumSGD’ method, should be a float between 0 and 1. If None, set to 0.5
Returns:  Xhat (numpy array) – is the [nfeatures x nsamples] lowrank matrix
 Ehat (numpy array) – is the [nfeatures x nsamples] sparse error matrix
 U, S, V (numpy arrays) – are the results of an SVD on Xhat
Notes
The ORPCA code is based on a transcription of MATLAB code obtained from the following research paper:
Jiashi Feng, Huan Xu and Shuicheng Yuan, “Online Robust PCA via Stochastic Optimization”, Advances in Neural Information Processing Systems 26, (2013), pp. 404412.It has been updated to include a new initialization method based on a QR decomposition of the first n “training” samples of the data. A stochastic gradient descent (SGD) solver is also implemented, along with a MomentumSGD solver for improved convergence and robustness with respect to local minima. More information about the gradient descent methods and choosing appropriate parameters can be found here:
Sebastian Ruder, “An overview of gradient descent optimization algorithms”, arXiv:1609.04747, (2016), http://arxiv.org/abs/1609.04747.

hyperspy.learn.rpca.
rpca_godec
(X, rank, fast=False, lambda1=None, power=None, tol=None, maxiter=None)¶ This function performs Robust PCA with missing or corrupted data, using the GoDec algorithm.
Parameters:  X (numpy array) – is the [nfeatures x nsamples] matrix of observations.
 rank (int) – The model dimensionality.
 lambda1 (None  float) – Regularization parameter. If None, set to 1 / sqrt(nsamples)
 power (None  integer) – The number of power iterations used in the initialization If None, set to 0 for speed
 tol (None  float) – Convergence tolerance If None, set to 1e3
 maxiter (None  integer) – Maximum number of iterations If None, set to 1e3
Returns:  Xhat (numpy array) – is the [nfeatures x nsamples] lowrank matrix
 Ehat (numpy array) – is the [nfeatures x nsamples] sparse error matrix
 Ghat (numpy array) – is the [nfeatures x nsamples] Gaussian noise matrix
 U, S, V (numpy arrays) – are the results of an SVD on Xhat
Notes
 Algorithm based on the following research paper:
 Tianyi Zhou and Dacheng Tao, “GoDec: Randomized Lowrank & Sparse Matrix Decomposition in Noisy Case”, ICML11, (2011), pp. 3340.
Code: https://sites.google.com/site/godecomposition/matrix/artifact1
hyperspy.learn.svd_pca module¶

hyperspy.learn.svd_pca.
svd_pca
(data, fast=False, output_dimension=None, centre=None, auto_transpose=True)¶ Perform PCA using SVD.
Parameters:  data (numpy array) – MxN array of input data (M variables, N trials)
 fast (bool) – Wheter to use randomized svd estimation to estimate a limited number of componentes given by output_dimension
 output_dimension (int) – Number of components to estimate when fast is True
 centre (None  'variables'  'trials') – If None no centring is applied. If ‘variable’ the centring will be performed in the variable axis. If ‘trials’, the centring will be performed in the ‘trials’ axis.
 auto_transpose (bool) – If True, automatically transposes the data to boost performance
Returns:  factors (numpy array)
 loadings (numpy array)
 explained_variance (numpy array)
 mean (numpy array or None (if center is None))