hyperspy.learn.rpca module

class hyperspy.learn.rpca.ORPCA(rank, store_error=False, lambda1=0.1, lambda2=1.0, method='BCD', init='qr', training_samples=10, subspace_learning_rate=1.0, subspace_momentum=0.5, random_state=None)

Bases: object

Performs Online Robust PCA with missing or corrupted data.

The ORPCA code is based on a transcription of MATLAB code from [Feng2013]. It has been updated to include a new initialization method based on a QR decomposition of the first n “training” samples of the data. A stochastic gradient descent (SGD) solver is also implemented, along with a MomentumSGD solver for improved convergence and robustness with respect to local minima. More information about the gradient descent methods and choosing appropriate parameters can be found in [Ruder2016].

Read more in the User Guide.

References

[Feng2013]

Jiashi Feng, Huan Xu and Shuicheng Yuan, “Online Robust PCA via Stochastic Optimization”, Advances in Neural Information Processing Systems 26, (2013), pp. 404-412.

[Ruder2016]

Sebastian Ruder, “An overview of gradient descent optimization algorithms”, arXiv:1609.04747, (2016), https://arxiv.org/abs/1609.04747.

Creates Online Robust PCA instance that can learn a representation.

Parameters:
  • rank (int) – The rank of the representation (number of components/factors)

  • store_error (bool, default False) – If True, stores the sparse error matrix.

  • lambda1 (float) – Nuclear norm regularization parameter.

  • lambda2 (float) – Sparse error regularization parameter.

  • method ({'CF', 'BCD', 'SGD', 'MomentumSGD'}, default 'BCD') –

    • ‘CF’ - Closed-form solver

    • ’BCD’ - Block-coordinate descent

    • ’SGD’ - Stochastic gradient descent

    • ’MomentumSGD’ - Stochastic gradient descent with momentum

  • init ({'qr', 'rand', np.ndarray}, default 'qr') –

    • ‘qr’ - QR-based initialization

    • ’rand’ - Random initialization

    • np.ndarray if the shape [n_features x rank]

  • training_samples (int) – Specifies the number of training samples to use in the ‘qr’ initialization.

  • subspace_learning_rate (float) – Learning rate for the ‘SGD’ and ‘MomentumSGD’ methods. Should be a float > 0.0

  • subspace_momentum (float) – Momentum parameter for ‘MomentumSGD’ method, should be a float between 0 and 1.

  • random_state (None or int or RandomState instance, default None) – Used to initialize the subspace on the first iteration.

_initialize_subspace(X)

Initialize the subspace estimate.

finish(**kwargs)

Return the learnt factors and loadings.

fit(X, batch_size=None)

Learn RPCA components from the data.

Parameters:
  • X ({numpy.ndarray, iterator}) – [n_samples x n_features] matrix of observations or an iterator that yields samples, each with n_features elements.

  • batch_size ({None, int}) – If not None, learn the data in batches, each of batch_size samples or less.

project(X, return_error=False)

Project the learnt components on the data.

Parameters:
  • X ({numpy.ndarray, iterator}) – [n_samples x n_features] matrix of observations or an iterator that yields n_samples, each with n_features elements.

  • return_error (bool) – If True, returns the sparse error matrix as well. Otherwise only the weights (loadings)

hyperspy.learn.rpca._soft_thresh(X, lambda1)

Soft-thresholding of array X.

hyperspy.learn.rpca.orpca(X, rank, store_error=False, project=False, batch_size=None, lambda1=0.1, lambda2=1.0, method='BCD', init='qr', training_samples=10, subspace_learning_rate=1.0, subspace_momentum=0.5, random_state=None, **kwargs)

Perform online, robust PCA on the data X.

This is a wrapper function for the ORPCA class.

Parameters:
  • X ({numpy array, iterator}) – [n_features x n_samples] matrix of observations or an iterator that yields samples, each with n_features elements.

  • rank (int) – The rank of the representation (number of components/factors)

  • store_error (bool, default False) – If True, stores the sparse error matrix.

  • project (bool, default False) – If True, project the data X onto the learnt model.

  • batch_size ({None, int}, default None) – If not None, learn the data in batches, each of batch_size samples or less.

  • lambda1 (float) – Nuclear norm regularization parameter.

  • lambda2 (float) – Sparse error regularization parameter.

  • method ({'CF', 'BCD', 'SGD', 'MomentumSGD'}, default 'BCD') –

    • ‘CF’ - Closed-form solver

    • ’BCD’ - Block-coordinate descent

    • ’SGD’ - Stochastic gradient descent

    • ’MomentumSGD’ - Stochastic gradient descent with momentum

  • init ({'qr', 'rand', np.ndarray}, default 'qr') –

    • ‘qr’ - QR-based initialization

    • ’rand’ - Random initialization

    • np.ndarray if the shape [n_features x rank]

  • training_samples (int) – Specifies the number of training samples to use in the ‘qr’ initialization.

  • subspace_learning_rate (float) – Learning rate for the ‘SGD’ and ‘MomentumSGD’ methods. Should be a float > 0.0

  • subspace_momentum (float) – Momentum parameter for ‘MomentumSGD’ method, should be a float between 0 and 1.

  • random_state (None or int or RandomState instance, default None) – Used to initialize the subspace on the first iteration.

Returns:

  • If project is True, returns the low-rank factors and loadings only

  • Otherwise, returns the low-rank and sparse error matrices, as well as the results of a singular value decomposition (SVD) applied to the low-rank matrix.

Return type:

numpy arrays

hyperspy.learn.rpca.rpca_godec(X, rank, lambda1=None, power=0, tol=0.001, maxiter=1000, random_state=None, **kwargs)

Perform Robust PCA with missing or corrupted data, using the GoDec algorithm.

Decomposes a matrix Y = X + E, where X is low-rank and E is a sparse error matrix. This algorithm is based on the Matlab code from [Zhou2011]. See code here: https://sites.google.com/site/godecomposition/matrix/artifact-1

Read more in the User Guide.

Parameters:
  • X (numpy array, shape (n_features, n_samples)) – The matrix of observations.

  • rank (int) – The model dimensionality.

  • lambda1 (None or float) – Regularization parameter. If None, set to 1 / sqrt(n_features)

  • power (int, default 0) – The number of power iterations used in the initialization

  • tol (float, default 1e-3) – Convergence tolerance

  • maxiter (int, default 1000) – Maximum number of iterations

  • random_state (None or int or RandomState instance, default None) – Used to initialize the subspace on the first iteration.

Returns:

  • Xhat (numpy array, shape (n_features, n_samples)) – The low-rank matrix

  • Ehat (numpy array, shape (n_features, n_samples)) – The sparse error matrix

  • U, S, V (numpy arrays) – The results of an SVD on Xhat

References

[Zhou2011]

Tianyi Zhou and Dacheng Tao, “GoDec: Randomized Low-rank & Sparse Matrix Decomposition in Noisy Case”, ICML-11, (2011), pp. 33-40.