hyperspy.learn.rpca module

class hyperspy.learn.rpca.ORPCA(rank, store_error=False, lambda1=0.1, lambda2=1.0, method='BCD', init='qr', training_samples=10, subspace_learning_rate=1.0, subspace_momentum=0.5, random_state=None)

Bases: object

Performs Online Robust PCA with missing or corrupted data.

The ORPCA code is based on a transcription of MATLAB code from [Feng2013]. It has been updated to include a new initialization method based on a QR decomposition of the first n “training” samples of the data. A stochastic gradient descent (SGD) solver is also implemented, along with a MomentumSGD solver for improved convergence and robustness with respect to local minima. More information about the gradient descent methods and choosing appropriate parameters can be found in [Ruder2016].

Read more in the User Guide.

References

Feng2013

Jiashi Feng, Huan Xu and Shuicheng Yuan, “Online Robust PCA via Stochastic Optimization”, Advances in Neural Information Processing Systems 26, (2013), pp. 404-412.

Ruder2016

Sebastian Ruder, “An overview of gradient descent optimization algorithms”, arXiv:1609.04747, (2016), http://arxiv.org/abs/1609.04747.

Creates Online Robust PCA instance that can learn a representation.

Parameters
  • rank (int) – The rank of the representation (number of components/factors)

  • store_error (bool, default False) – If True, stores the sparse error matrix.

  • lambda1 (float) – Nuclear norm regularization parameter.

  • lambda2 (float) – Sparse error regularization parameter.

  • method ({'CF', 'BCD', 'SGD', 'MomentumSGD'}, default 'BCD') –

    • ‘CF’ - Closed-form solver

    • ’BCD’ - Block-coordinate descent

    • ’SGD’ - Stochastic gradient descent

    • ’MomentumSGD’ - Stochastic gradient descent with momentum

  • init ({'qr', 'rand', np.ndarray}, default 'qr') –

    • ‘qr’ - QR-based initialization

    • ’rand’ - Random initialization

    • np.ndarray if the shape [n_features x rank]

  • training_samples (int) – Specifies the number of training samples to use in the ‘qr’ initialization.

  • subspace_learning_rate (float) – Learning rate for the ‘SGD’ and ‘MomentumSGD’ methods. Should be a float > 0.0

  • subspace_momentum (float) – Momentum parameter for ‘MomentumSGD’ method, should be a float between 0 and 1.

  • random_state (None or int or RandomState instance, default None) – Used to initialize the subspace on the first iteration.

_initialize_subspace(X)

Initialize the subspace estimate.

finish(**kwargs)

Return the learnt factors and loadings.

fit(X, batch_size=None)

Learn RPCA components from the data.

Parameters
  • X ({numpy.ndarray, iterator}) – [n_samples x n_features] matrix of observations or an iterator that yields samples, each with n_features elements.

  • batch_size ({None, int}) – If not None, learn the data in batches, each of batch_size samples or less.

project(X, return_error=False)

Project the learnt components on the data.

Parameters
  • X ({numpy.ndarray, iterator}) – [n_samples x n_features] matrix of observations or an iterator that yields n_samples, each with n_features elements.

  • return_error (bool) – If True, returns the sparse error matrix as well. Otherwise only the weights (loadings)

hyperspy.learn.rpca._soft_thresh(X, lambda1)

Soft-thresholding of array X.

hyperspy.learn.rpca.orpca(X, rank, store_error=False, project=False, batch_size=None, lambda1=0.1, lambda2=1.0, method='BCD', init='qr', training_samples=10, subspace_learning_rate=1.0, subspace_momentum=0.5, random_state=None, **kwargs)

Perform online, robust PCA on the data X.

This is a wrapper function for the ORPCA class.

Parameters
  • X ({numpy array, iterator}) – [n_features x n_samples] matrix of observations or an iterator that yields samples, each with n_features elements.

  • rank (int) – The rank of the representation (number of components/factors)

  • store_error (bool, default False) – If True, stores the sparse error matrix.

  • project (bool, default False) – If True, project the data X onto the learnt model.

  • batch_size ({None, int}, default None) – If not None, learn the data in batches, each of batch_size samples or less.

  • lambda1 (float) – Nuclear norm regularization parameter.

  • lambda2 (float) – Sparse error regularization parameter.

  • method ({'CF', 'BCD', 'SGD', 'MomentumSGD'}, default 'BCD') –

    • ‘CF’ - Closed-form solver

    • ’BCD’ - Block-coordinate descent

    • ’SGD’ - Stochastic gradient descent

    • ’MomentumSGD’ - Stochastic gradient descent with momentum

  • init ({'qr', 'rand', np.ndarray}, default 'qr') –

    • ‘qr’ - QR-based initialization

    • ’rand’ - Random initialization

    • np.ndarray if the shape [n_features x rank]

  • training_samples (int) – Specifies the number of training samples to use in the ‘qr’ initialization.

  • subspace_learning_rate (float) – Learning rate for the ‘SGD’ and ‘MomentumSGD’ methods. Should be a float > 0.0

  • subspace_momentum (float) – Momentum parameter for ‘MomentumSGD’ method, should be a float between 0 and 1.

  • random_state (None or int or RandomState instance, default None) – Used to initialize the subspace on the first iteration.

Returns

  • If project is True, returns the low-rank factors and loadings only

  • Otherwise, returns the low-rank and sparse error matrices, as well as the results of a singular value decomposition (SVD) applied to the low-rank matrix.

Return type

numpy arrays

hyperspy.learn.rpca.rpca_godec(X, rank, lambda1=None, power=0, tol=0.001, maxiter=1000, random_state=None, **kwargs)

Perform Robust PCA with missing or corrupted data, using the GoDec algorithm.

Decomposes a matrix Y = X + E, where X is low-rank and E is a sparse error matrix. This algorithm is based on the Matlab code from [Zhou2011]. See code here: https://sites.google.com/site/godecomposition/matrix/artifact-1

Read more in the User Guide.

Parameters
  • X (numpy array, shape (n_features, n_samples)) – The matrix of observations.

  • rank (int) – The model dimensionality.

  • lambda1 (None or float) – Regularization parameter. If None, set to 1 / sqrt(n_features)

  • power (int, default 0) – The number of power iterations used in the initialization

  • tol (float, default 1e-3) – Convergence tolerance

  • maxiter (int, default 1000) – Maximum number of iterations

  • random_state (None or int or RandomState instance, default None) – Used to initialize the subspace on the first iteration.

Returns

  • Xhat (numpy array, shape (n_features, n_samples)) – The low-rank matrix

  • Ehat (numpy array, shape (n_features, n_samples)) – The sparse error matrix

  • U, S, V (numpy arrays) – The results of an SVD on Xhat

References

Zhou2011

Tianyi Zhou and Dacheng Tao, “GoDec: Randomized Low-rank & Sparse Matrix Decomposition in Noisy Case”, ICML-11, (2011), pp. 33-40.