hyperspy.learn.svd_pca module

hyperspy.learn.svd_pca.svd_flip_signs(u, v, u_based_decision=True)

Sign correction to ensure deterministic output from SVD.

Adjusts the columns of u and the rows of v such that the loadings in the columns in u that are largest in absolute value are always positive.

Parameters

u (numpy array) – u and v are the outputs of a singular value decomposition.
v (numpy array) – u and v are the outputs of a singular value decomposition.
u_based_decision (bool, default True) – If True, use the columns of u as the basis for sign flipping. Otherwise, use the rows of v. The choice of which variable to base the decision on is generally algorithm dependent.

Returns

u, v – Adjusted outputs with same dimensions as inputs.

Return type

numpy array

hyperspy.learn.svd_pca.svd_pca(data, output_dimension=None, svd_solver='auto', centre=None, auto_transpose=True, svd_flip=True, **kwargs)

Perform PCA using singular value decomposition (SVD).

Read more in the User Guide.

Parameters

data (numpy array) – MxN array of input data (M features, N samples)
output_dimension (None or int) – Number of components to keep/calculate
svd_solver ({"auto", "full", "arpack", "randomized"}, default "auto") –

If auto:
The solver is selected by a default policy based on data.shape and output_dimension: if the input data is larger than 500x500 and the number of components to extract is lower than 80% of the smallest dimension of the data, then the more efficient “randomized” method is enabled. Otherwise the exact full SVD is computed and optionally truncated afterwards.

If full:
run exact SVD, calling the standard LAPACK solver via scipy.linalg.svd(), and select the components by postprocessing

If arpack:
use truncated SVD, calling ARPACK solver via scipy.sparse.linalg.svds(). It requires strictly 0 < output_dimension < min(data.shape)

If randomized:
use truncated SVD, calling sklearn.utils.extmath.randomized_svd() to estimate a limited number of components
centre ({None, "navigation", "signal"}, default None) –
- If None, the data is not centered prior to decomposition.
- If “navigation”, the data is centered along the navigation axis.
- If “signal”, the data is centered along the signal axis.
auto_transpose (bool, default True) – If True, automatically transposes the data to boost performance.
svd_flip (bool, default True) – If True, adjusts the signs of the loadings and factors such that the loadings that are largest in absolute value are always positive. See svd_flip() for more details.

Returns

factors (numpy array)
loadings (numpy array)
explained_variance (numpy array)
mean (numpy array or None (if centre is None))

hyperspy.learn.svd_pca.svd_solve(data, output_dimension=None, svd_solver='auto', svd_flip=True, u_based_decision=True, **kwargs)

Apply singular value decomposition to input data.

Parameters

data (numpy array, shape (m, n)) – Input data array
output_dimension (None or int) – Number of components to keep/calculate
svd_solver ({"auto", "full", "arpack", "randomized"}, default "auto") –

If auto:
The solver is selected by a default policy based on data.shape and output_dimension: if the input data is larger than 500x500 and the number of components to extract is lower than 80% of the smallest dimension of the data, then the more efficient “randomized” method is enabled. Otherwise the exact full SVD is computed and optionally truncated afterwards.

If full:
run exact SVD, calling the standard LAPACK solver via scipy.linalg.svd(), and select the components by postprocessing

If arpack:
use truncated SVD, calling ARPACK solver via scipy.sparse.linalg.svds(). It requires strictly 0 < output_dimension < min(data.shape)

If randomized:
use truncated SVD, calling sklearn.utils.extmath.randomized_svd() to estimate a limited number of components
svd_flip (bool, default True) – If True, adjusts the signs of the loadings and factors such that the loadings that are largest in absolute value are always positive. See svd_flip() for more details.
u_based_decision (bool, default True) – If True, and svd_flip is True, use the columns of u as the basis for sign-flipping. Otherwise, use the rows of v. The choice of which variable to base the decision on is generally algorithm dependent.

Returns

U, S, V – Output of SVD such that X = U*S*V.T

Return type

numpy array