hyperspy.learn.svd_pca module

hyperspy.learn.svd_pca.svd_flip_signs(u, v, u_based_decision=True)

Sign correction to ensure deterministic output from SVD.

Adjusts the columns of u and the rows of v such that the loadings in the columns in u that are largest in absolute value are always positive.

Parameters:
  • u (numpy array) – u and v are the outputs of a singular value decomposition.

  • v (numpy array) – u and v are the outputs of a singular value decomposition.

  • u_based_decision (bool, default True) – If True, use the columns of u as the basis for sign flipping. Otherwise, use the rows of v. The choice of which variable to base the decision on is generally algorithm dependent.

Returns:

u, v – Adjusted outputs with same dimensions as inputs.

Return type:

numpy array

hyperspy.learn.svd_pca.svd_pca(data, output_dimension=None, svd_solver='auto', centre=None, auto_transpose=True, svd_flip=True, **kwargs)

Perform PCA using singular value decomposition (SVD).

Read more in the User Guide.

Parameters:
  • data (numpy array) – MxN array of input data (M features, N samples)

  • output_dimension (None or int) – Number of components to keep/calculate

  • svd_solver ({"auto", "full", "arpack", "randomized"}, default "auto") –

    If auto:

    The solver is selected by a default policy based on data.shape and output_dimension: if the input data is larger than 500x500 and the number of components to extract is lower than 80% of the smallest dimension of the data, then the more efficient “randomized” method is enabled. Otherwise the exact full SVD is computed and optionally truncated afterwards.

    If full:

    run exact SVD, calling the standard LAPACK solver via scipy.linalg.svd(), and select the components by postprocessing

    If arpack:

    use truncated SVD, calling ARPACK solver via scipy.sparse.linalg.svds(). It requires strictly 0 < output_dimension < min(data.shape)

    If randomized:

    use truncated SVD, calling sklearn.utils.extmath.randomized_svd() to estimate a limited number of components

  • centre ({None, "navigation", "signal"}, default None) –

    • If None, the data is not centered prior to decomposition.

    • If “navigation”, the data is centered along the navigation axis.

    • If “signal”, the data is centered along the signal axis.

  • auto_transpose (bool, default True) – If True, automatically transposes the data to boost performance.

  • svd_flip (bool, default True) – If True, adjusts the signs of the loadings and factors such that the loadings that are largest in absolute value are always positive. See svd_flip() for more details.

Returns:

  • factors (numpy array)

  • loadings (numpy array)

  • explained_variance (numpy array)

  • mean (numpy array or None (if centre is None))

hyperspy.learn.svd_pca.svd_solve(data, output_dimension=None, svd_solver='auto', svd_flip=True, u_based_decision=True, **kwargs)

Apply singular value decomposition to input data.

Parameters:
  • data (numpy array, shape (m, n)) – Input data array

  • output_dimension (None or int) – Number of components to keep/calculate

  • svd_solver ({"auto", "full", "arpack", "randomized"}, default "auto") –

    If auto:

    The solver is selected by a default policy based on data.shape and output_dimension: if the input data is larger than 500x500 and the number of components to extract is lower than 80% of the smallest dimension of the data, then the more efficient “randomized” method is enabled. Otherwise the exact full SVD is computed and optionally truncated afterwards.

    If full:

    run exact SVD, calling the standard LAPACK solver via scipy.linalg.svd(), and select the components by postprocessing

    If arpack:

    use truncated SVD, calling ARPACK solver via scipy.sparse.linalg.svds(). It requires strictly 0 < output_dimension < min(data.shape)

    If randomized:

    use truncated SVD, calling sklearn.utils.extmath.randomized_svd() to estimate a limited number of components

  • svd_flip (bool, default True) – If True, adjusts the signs of the loadings and factors such that the loadings that are largest in absolute value are always positive. See svd_flip() for more details.

  • u_based_decision (bool, default True) – If True, and svd_flip is True, use the columns of u as the basis for sign-flipping. Otherwise, use the rows of v. The choice of which variable to base the decision on is generally algorithm dependent.

Returns:

U, S, V – Output of SVD such that X = U*S*V.T

Return type:

numpy array