hyperspy.learn.rpca module
- class hyperspy.learn.rpca.ORPCA(rank, store_error=False, lambda1=0.1, lambda2=1.0, method='BCD', init='qr', training_samples=10, subspace_learning_rate=1.0, subspace_momentum=0.5, random_state=None)
Bases:
object
Performs Online Robust PCA with missing or corrupted data.
The ORPCA code is based on a transcription of MATLAB code from [Feng2013]. It has been updated to include a new initialization method based on a QR decomposition of the first n “training” samples of the data. A stochastic gradient descent (SGD) solver is also implemented, along with a MomentumSGD solver for improved convergence and robustness with respect to local minima. More information about the gradient descent methods and choosing appropriate parameters can be found in [Ruder2016].
Read more in the User Guide.
References
[Feng2013]Jiashi Feng, Huan Xu and Shuicheng Yuan, “Online Robust PCA via Stochastic Optimization”, Advances in Neural Information Processing Systems 26, (2013), pp. 404-412.
[Ruder2016]Sebastian Ruder, “An overview of gradient descent optimization algorithms”, arXiv:1609.04747, (2016), https://arxiv.org/abs/1609.04747.
Creates Online Robust PCA instance that can learn a representation.
- Parameters:
rank (int) – The rank of the representation (number of components/factors)
store_error (bool, default False) – If True, stores the sparse error matrix.
lambda1 (float) – Nuclear norm regularization parameter.
lambda2 (float) – Sparse error regularization parameter.
method ({'CF', 'BCD', 'SGD', 'MomentumSGD'}, default 'BCD') –
‘CF’ - Closed-form solver
’BCD’ - Block-coordinate descent
’SGD’ - Stochastic gradient descent
’MomentumSGD’ - Stochastic gradient descent with momentum
init ({'qr', 'rand', np.ndarray}, default 'qr') –
‘qr’ - QR-based initialization
’rand’ - Random initialization
np.ndarray if the shape [n_features x rank]
training_samples (int) – Specifies the number of training samples to use in the ‘qr’ initialization.
subspace_learning_rate (float) – Learning rate for the ‘SGD’ and ‘MomentumSGD’ methods. Should be a float > 0.0
subspace_momentum (float) – Momentum parameter for ‘MomentumSGD’ method, should be a float between 0 and 1.
random_state (None or int or RandomState instance, default None) – Used to initialize the subspace on the first iteration.
- _initialize_subspace(X)
Initialize the subspace estimate.
- finish(**kwargs)
Return the learnt factors and loadings.
- fit(X, batch_size=None)
Learn RPCA components from the data.
- Parameters:
X ({numpy.ndarray, iterator}) – [n_samples x n_features] matrix of observations or an iterator that yields samples, each with n_features elements.
batch_size ({None, int}) – If not None, learn the data in batches, each of batch_size samples or less.
- project(X, return_error=False)
Project the learnt components on the data.
- Parameters:
X ({numpy.ndarray, iterator}) – [n_samples x n_features] matrix of observations or an iterator that yields n_samples, each with n_features elements.
return_error (bool) – If True, returns the sparse error matrix as well. Otherwise only the weights (loadings)
- hyperspy.learn.rpca._soft_thresh(X, lambda1)
Soft-thresholding of array X.
- hyperspy.learn.rpca.orpca(X, rank, store_error=False, project=False, batch_size=None, lambda1=0.1, lambda2=1.0, method='BCD', init='qr', training_samples=10, subspace_learning_rate=1.0, subspace_momentum=0.5, random_state=None, **kwargs)
Perform online, robust PCA on the data X.
This is a wrapper function for the ORPCA class.
- Parameters:
X ({numpy array, iterator}) – [n_features x n_samples] matrix of observations or an iterator that yields samples, each with n_features elements.
rank (int) – The rank of the representation (number of components/factors)
store_error (bool, default False) – If True, stores the sparse error matrix.
project (bool, default False) – If True, project the data X onto the learnt model.
batch_size ({None, int}, default None) – If not None, learn the data in batches, each of batch_size samples or less.
lambda1 (float) – Nuclear norm regularization parameter.
lambda2 (float) – Sparse error regularization parameter.
method ({'CF', 'BCD', 'SGD', 'MomentumSGD'}, default 'BCD') –
‘CF’ - Closed-form solver
’BCD’ - Block-coordinate descent
’SGD’ - Stochastic gradient descent
’MomentumSGD’ - Stochastic gradient descent with momentum
init ({'qr', 'rand', np.ndarray}, default 'qr') –
‘qr’ - QR-based initialization
’rand’ - Random initialization
np.ndarray if the shape [n_features x rank]
training_samples (int) – Specifies the number of training samples to use in the ‘qr’ initialization.
subspace_learning_rate (float) – Learning rate for the ‘SGD’ and ‘MomentumSGD’ methods. Should be a float > 0.0
subspace_momentum (float) – Momentum parameter for ‘MomentumSGD’ method, should be a float between 0 and 1.
random_state (None or int or RandomState instance, default None) – Used to initialize the subspace on the first iteration.
- Returns:
If project is True, returns the low-rank factors and loadings only
Otherwise, returns the low-rank and sparse error matrices, as well as the results of a singular value decomposition (SVD) applied to the low-rank matrix.
- Return type:
numpy arrays
- hyperspy.learn.rpca.rpca_godec(X, rank, lambda1=None, power=0, tol=0.001, maxiter=1000, random_state=None, **kwargs)
Perform Robust PCA with missing or corrupted data, using the GoDec algorithm.
Decomposes a matrix Y = X + E, where X is low-rank and E is a sparse error matrix. This algorithm is based on the Matlab code from [Zhou2011]. See code here: https://sites.google.com/site/godecomposition/matrix/artifact-1
Read more in the User Guide.
- Parameters:
X (numpy array, shape (n_features, n_samples)) – The matrix of observations.
rank (int) – The model dimensionality.
lambda1 (None or float) – Regularization parameter. If None, set to 1 / sqrt(n_features)
power (int, default 0) – The number of power iterations used in the initialization
tol (float, default 1e-3) – Convergence tolerance
maxiter (int, default 1000) – Maximum number of iterations
random_state (None or int or RandomState instance, default None) – Used to initialize the subspace on the first iteration.
- Returns:
Xhat (numpy array, shape (n_features, n_samples)) – The low-rank matrix
Ehat (numpy array, shape (n_features, n_samples)) – The sparse error matrix
U, S, V (numpy arrays) – The results of an SVD on Xhat
References
[Zhou2011]Tianyi Zhou and Dacheng Tao, “GoDec: Randomized Low-rank & Sparse Matrix Decomposition in Noisy Case”, ICML-11, (2011), pp. 33-40.