Visualizing results#
HyperSpy includes a number of plotting methods for visualizing the results
of decomposition and blind source separation analyses. All the methods
begin with plot_
.
Scree plots#
Note
Scree plots are only available for the "SVD"
and "PCA"
algorithms.
PCA will sort the components in the dataset in order of decreasing variance. It is often useful to estimate the dimensionality of the data by plotting the explained variance against the component index. This plot is sometimes called a scree plot. For most datasets, the values in a scree plot will decay rapidly, eventually becoming a slowly descending line.
To obtain a scree plot for your dataset, run the
plot_explained_variance_ratio()
method:
>>> s.plot_explained_variance_ratio(n=20)
The point at which the scree plot becomes linear (often referred to as the “elbow”) is generally judged to be a good estimation of the dimensionality of the data (or equivalently, the number of components that should be retained - see below). Components to the left of the elbow are considered part of the “signal”, while components to the right are considered to be “noise”, and thus do not explain any significant features of the data.
By specifying a threshold
value, a cutoff line will be drawn at the total variance
specified, and the components above this value will be styled distinctly from the
remaining components to show which are considered signal, as opposed to noise.
Alternatively, by providing an integer value for threshold
, the line will
be drawn at the specified component (see below).
Note that in the above scree plot, the first component has index 0. This is because
Python uses zero-based indexing. To switch to a “number-based” (rather than
“index-based”) notation, specify the xaxis_type
parameter:
>>> s.plot_explained_variance_ratio(n=20, threshold=4, xaxis_type='number')
The number of significant components can be estimated and a vertical line
drawn to represent this by specifying vline=True
. In this case, the “elbow”
is found in the variance plot by estimating the distance from each point in the
variance plot to a line joining the first and last points of the plot, and then
selecting the point where this distance is largest.
If multiple maxima are found, the index corresponding to the first occurrence
is returned. As the index of the first component is zero, the number of
significant PCA components is the elbow index position + 1. More details
about the elbow-finding technique can be found in
[Satopää2011], and in the documentation for
estimate_elbow_position()
.
These options (together with many others), can be customized to
develop a figure of your liking. See the documentation of
plot_explained_variance_ratio()
for more details.
Sometimes it can be useful to get the explained variance ratio as a spectrum.
For example, to plot several scree plots obtained with
different data pre-treatments in the same figure, you can combine
plot_spectra()
with
get_explained_variance_ratio()
.
Decomposition plots#
HyperSpy provides a number of methods for visualizing the factors and loadings
found by a decomposition analysis. To plot everything in a compact form,
use plot_decomposition_results()
.
You can also plot the factors and loadings separately using the following methods. It is recommended that you provide the number of factors or loadings you wish to visualise, since the default is to plot all of them.
Blind source separation plots#
Visualizing blind source separation results is much the same as decomposition.
You can use plot_bss_results()
for a compact display,
or instead:
Clustering plots#
Visualizing cluster results is much the same as decomposition.
You can use plot_bss_results()
for a compact display,
or instead: