hyperspy.misc.machine_learning.import_sklearn module

Import sklearn, fast_svd and randomized_svd from scikits-learn with support for multiple versions

hyperspy.misc.machine_learning.import_sklearn.fast_svd(M, n_components, n_oversamples=10, n_iter='auto', power_iteration_normalizer='auto', transpose='auto', flip_sign=True, random_state=0)

Computes a truncated randomized SVD

Parameters
  • M (ndarray or sparse matrix) – Matrix to decompose

  • n_components (int) – Number of singular values and vectors to extract.

  • n_oversamples (int (default is 10)) – Additional number of random vectors to sample the range of M so as to ensure proper conditioning. The total number of random vectors used to find the range of M is n_components + n_oversamples. Smaller number can improve speed but can negatively impact the quality of approximation of singular vectors and singular values.

  • n_iter (int or 'auto' (default is 'auto')) –

    Number of power iterations. It can be used to deal with very noisy problems. When ‘auto’, it is set to 4, unless n_components is small (< .1 * min(X.shape)) n_iter in which case is set to 7. This improves precision with few components.

    Changed in version 0.18.

  • power_iteration_normalizer ('auto' (default), 'QR', 'LU', 'none') –

    Whether the power iterations are normalized with step-by-step QR factorization (the slowest but most accurate), ‘none’ (the fastest but numerically unstable when n_iter is large, e.g. typically 5 or larger), or ‘LU’ factorization (numerically stable but can lose slightly in accuracy). The ‘auto’ mode applies no normalization if n_iter <= 2 and switches to LU otherwise.

    New in version 0.18.

  • transpose (True, False or 'auto' (default)) –

    Whether the algorithm should be applied to M.T instead of M. The result should approximately be the same. The ‘auto’ mode will trigger the transposition if M.shape[1] > M.shape[0] since this implementation of randomized SVD tend to be a little faster in that case.

    Changed in version 0.18.

  • flip_sign (boolean, (True by default)) – The output of a singular value decomposition is only unique up to a permutation of the signs of the singular vectors. If flip_sign is set to True, the sign ambiguity is resolved by making the largest loadings for each component in the left singular vectors positive.

  • random_state (int, RandomState instance or None, optional (default=None)) – The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Notes

This algorithm finds a (usually very good) approximate truncated singular value decomposition using randomization to speed up the computations. It is particularly fast on large matrices on which you wish to extract only a small number of components. In order to obtain further speed up, n_iter can be set <=2 (at the cost of loss of precision).

References

  • Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions Halko, et al., 2009 https://arxiv.org/abs/0909.4061

  • A randomized algorithm for the decomposition of matrices Per-Gunnar Martinsson, Vladimir Rokhlin and Mark Tygert

  • An implementation of a randomized algorithm for principal component analysis A. Szlam et al. 2014

hyperspy.misc.machine_learning.import_sklearn.sklearn_installed = True
hyperspy.misc.machine_learning.import_sklearn.sklearn_version = LooseVersion ('0.21.2')