hyperspy.misc.machine_learning.import_sklearn module¶
Import sklearn, fast_svd and randomized_svd from scikits-learn with support for multiple versions
-
hyperspy.misc.machine_learning.import_sklearn.
fast_svd
(M, n_components, n_oversamples=10, n_iter='auto', power_iteration_normalizer='auto', transpose='auto', flip_sign=True, random_state=0)¶ Computes a truncated randomized SVD
- Parameters
M (ndarray or sparse matrix) – Matrix to decompose
n_components (int) – Number of singular values and vectors to extract.
n_oversamples (int (default is 10)) – Additional number of random vectors to sample the range of M so as to ensure proper conditioning. The total number of random vectors used to find the range of M is n_components + n_oversamples. Smaller number can improve speed but can negatively impact the quality of approximation of singular vectors and singular values.
n_iter (int or 'auto' (default is 'auto')) –
Number of power iterations. It can be used to deal with very noisy problems. When ‘auto’, it is set to 4, unless n_components is small (< .1 * min(X.shape)) n_iter in which case is set to 7. This improves precision with few components.
Changed in version 0.18.
power_iteration_normalizer ('auto' (default), 'QR', 'LU', 'none') –
Whether the power iterations are normalized with step-by-step QR factorization (the slowest but most accurate), ‘none’ (the fastest but numerically unstable when n_iter is large, e.g. typically 5 or larger), or ‘LU’ factorization (numerically stable but can lose slightly in accuracy). The ‘auto’ mode applies no normalization if n_iter <= 2 and switches to LU otherwise.
New in version 0.18.
transpose (True, False or 'auto' (default)) –
Whether the algorithm should be applied to M.T instead of M. The result should approximately be the same. The ‘auto’ mode will trigger the transposition if M.shape[1] > M.shape[0] since this implementation of randomized SVD tend to be a little faster in that case.
Changed in version 0.18.
flip_sign (boolean, (True by default)) – The output of a singular value decomposition is only unique up to a permutation of the signs of the singular vectors. If flip_sign is set to True, the sign ambiguity is resolved by making the largest loadings for each component in the left singular vectors positive.
random_state (int, RandomState instance or None, optional (default=None)) – The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Notes
This algorithm finds a (usually very good) approximate truncated singular value decomposition using randomization to speed up the computations. It is particularly fast on large matrices on which you wish to extract only a small number of components. In order to obtain further speed up, n_iter can be set <=2 (at the cost of loss of precision).
References
Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions Halko, et al., 2009 https://arxiv.org/abs/0909.4061
A randomized algorithm for the decomposition of matrices Per-Gunnar Martinsson, Vladimir Rokhlin and Mark Tygert
An implementation of a randomized algorithm for principal component analysis A. Szlam et al. 2014
-
hyperspy.misc.machine_learning.import_sklearn.
sklearn_installed
= True¶
-
hyperspy.misc.machine_learning.import_sklearn.
sklearn_version
= LooseVersion ('0.21.2')¶