hyperspy.learn.rpca module

class hyperspy.learn.rpca.ORPCA(rank, fast=False, lambda1=None, lambda2=None, method=None, learning_rate=None, init=None, training_samples=None, momentum=None)

Bases: object

finish()
fit(X, iterating=None)
project(X)
hyperspy.learn.rpca.orpca(X, rank, fast=False, lambda1=None, lambda2=None, method=None, learning_rate=None, init=None, training_samples=None, momentum=None)

This function performs Online Robust PCA with missing or corrupted data.

Parameters:
  • X ({numpy array, iterator}) – [nfeatures x nsamples] matrix of observations or an iterator that yields samples, each with nfeatures elements.
  • rank (int) – The model dimensionality.
  • lambda1 ({None, float}) – Nuclear norm regularization parameter. If None, set to 1 / sqrt(nsamples)
  • lambda2 ({None, float}) – Sparse error regularization parameter. If None, set to 1 / sqrt(nsamples)
  • method ({None, 'CF', 'BCD', 'SGD', 'MomentumSGD'}) – ‘CF’ - Closed-form solver ‘BCD’ - Block-coordinate descent ‘SGD’ - Stochastic gradient descent ‘MomentumSGD’ - Stochastic gradient descent with momentum If None, set to ‘CF’
  • learning_rate ({None, float}) – Learning rate for the stochastic gradient descent algorithm If None, set to 1
  • init ({None, 'qr', 'rand', np.ndarray}) – ‘qr’ - QR-based initialization ‘rand’ - Random initialization np.ndarray if the shape [nfeatures x rank]. If None, set to ‘qr’
  • training_samples ({None, integer}) – Specifies the number of training samples to use in the ‘qr’ initialization If None, set to 10
  • momentum ({None, float}) – Momentum parameter for ‘MomentumSGD’ method, should be a float between 0 and 1. If None, set to 0.5
Returns:

  • Xhat (numpy array) – is the [nfeatures x nsamples] low-rank matrix
  • Ehat (numpy array) – is the [nfeatures x nsamples] sparse error matrix
  • U, S, V (numpy arrays) – are the results of an SVD on Xhat

Notes

The ORPCA code is based on a transcription of MATLAB code obtained from the following research paper:

Jiashi Feng, Huan Xu and Shuicheng Yuan, “Online Robust PCA via Stochastic Optimization”, Advances in Neural Information Processing Systems 26, (2013), pp. 404-412.

It has been updated to include a new initialization method based on a QR decomposition of the first n “training” samples of the data. A stochastic gradient descent (SGD) solver is also implemented, along with a MomentumSGD solver for improved convergence and robustness with respect to local minima. More information about the gradient descent methods and choosing appropriate parameters can be found here:

Sebastian Ruder, “An overview of gradient descent optimization algorithms”, arXiv:1609.04747, (2016), http://arxiv.org/abs/1609.04747.
hyperspy.learn.rpca.rpca_godec(X, rank, fast=False, lambda1=None, power=None, tol=None, maxiter=None)

This function performs Robust PCA with missing or corrupted data, using the GoDec algorithm.

Parameters:
  • X (numpy array) – is the [nfeatures x nsamples] matrix of observations.
  • rank (int) – The model dimensionality.
  • lambda1 (None | float) – Regularization parameter. If None, set to 1 / sqrt(nsamples)
  • power (None | integer) – The number of power iterations used in the initialization If None, set to 0 for speed
  • tol (None | float) – Convergence tolerance If None, set to 1e-3
  • maxiter (None | integer) – Maximum number of iterations If None, set to 1e3
Returns:

  • Xhat (numpy array) – is the [nfeatures x nsamples] low-rank matrix
  • Ehat (numpy array) – is the [nfeatures x nsamples] sparse error matrix
  • Ghat (numpy array) – is the [nfeatures x nsamples] Gaussian noise matrix
  • U, S, V (numpy arrays) – are the results of an SVD on Xhat

Notes

Algorithm based on the following research paper:
Tianyi Zhou and Dacheng Tao, “GoDec: Randomized Low-rank & Sparse Matrix Decomposition in Noisy Case”, ICML-11, (2011), pp. 33-40.

Code: https://sites.google.com/site/godecomposition/matrix/artifact-1