hyperspy.external.astroML.histtools module¶

Tools for working with distributions

class hyperspy.external.astroML.histtools.KnuthF(data)¶

Bases: object

Class which implements the function minimized by knuth_bin_width

Parameters: data (array-like, one dimension) – data to be histogrammed

Notes

the function F is given by

$F(M|x,I) = n\log(M) + \log\Gamma(\frac{M}{2}) - M\log\Gamma(\frac{1}{2}) - \log\Gamma(\frac{2n+M}{2}) + \sum_{k=1}^M \log\Gamma(n_k + \frac{1}{2})$

where $\Gamma$ is the Gamma function, $n$ is the number of data points, $n_k$ is the number of measurements in bin $k$ .

See also

knuth_bin_width, astroML.plotting.hist

bins(M)¶: Return the bin edges given a width dx

eval(M)¶

Evaluate the Knuth function

Parameters: dx (float) – Width of bins
Returns: F – evaluation of the negative Knuth likelihood function: smaller values indicate a better fit.
Return type: float

hyperspy.external.astroML.histtools.dasky_freedman_bin_width(data, return_bins=True)¶

Dask version of freedman_bin_width

Parameters

data (dask array) – the data
return_bins (bool (optional)) – if True, then return the bin edges

Returns

width (float) – optimal bin width using Scott’s rule
bins (ndarray) – bin edges: returned if return_bins is True

Notes

The optimal bin width is

$\Delta_b = \frac{2(q_{75} - q_{25})}{n^{1/3}}$

where $q_{N}$ is the $N$ percent quartile of the data, and $n$ is the number of data points.

hyperspy.external.astroML.histtools.dasky_histogram(a, bins=10, **kwargs)¶

Enhanced histogram for dask arrays. The range keyword is ignored. Reads the data at most two times - once to determine best bins (if required), and second time to actually calculate the histogram.

Parameters

a (array_like) – array of data to be histogrammed
bins (int or list or str (optional)) – If bins is a string, then it must be one of: ‘scotts’ : use Scott’s rule to determine bins ‘freedman’ : use the Freedman-Diaconis rule to determine bins
keyword arguments are described in numpy.hist() (other) –

Returns

hist (array) – The values of the histogram. See normed and weights for a description of the possible semantics.
bin_edges (array of dtype float) – Return the bin edges (length(hist)+1).

hyperspy.external.astroML.histtools.dasky_scotts_bin_width(data, return_bins=True)¶

Dask version of scotts_bin_width

Parameters

data (dask array) – the data
return_bins (bool (optional)) – if True, then return the bin edges

Returns

width (float) – optimal bin width using Scott’s rule
bins (ndarray) – bin edges: returned if return_bins is True

Notes

The optimal bin width is:

$\Delta_b = \frac{3.5\sigma}{n^{1/3}}$

where $\sigma$ is the standard deviation of the data, and $n$ is the number of data points.

hyperspy.external.astroML.histtools.freedman_bin_width(data, return_bins=False)¶

Return the optimal histogram bin width using the Freedman-Diaconis rule

Parameters

data (array-like, ndim=1) – observed (one-dimensional) data
return_bins (bool (optional)) – if True, then return the bin edges

Returns

width (float) – optimal bin width using Scott’s rule
bins (ndarray) – bin edges: returned if return_bins is True

Notes

The optimal bin width is

$\Delta_b = \frac{2(q_{75} - q_{25})}{n^{1/3}}$

where $q_{N}$ is the $N$ percent quartile of the data, and $n$ is the number of data points.

hyperspy.external.astroML.histtools.histogram(a, bins=10, range=None, **kwargs)¶

Enhanced histogram

This is a histogram function that enables the use of more sophisticated algorithms for determining bins. Aside from the bins argument allowing a string specified how bins are computed, the parameters are the same as numpy.histogram().

Parameters

a (array_like) – array of data to be histogrammed
bins (int or list or str (optional)) – If bins is a string, then it must be one of: ‘blocks’ : use bayesian blocks for dynamic bin widths ‘knuth’ : use Knuth’s rule to determine bins ‘scotts’ : use Scott’s rule to determine bins ‘freedman’ : use the Freedman-diaconis rule to determine bins
range (tuple or None (optional)) – the minimum and maximum range for the histogram. If not specified, it will be (x.min(), x.max())
keyword arguments are described in numpy.hist() (other) –

Returns

hist (array) – The values of the histogram. See normed and weights for a description of the possible semantics.
bin_edges (array of dtype float) – Return the bin edges (length(hist)+1).

hyperspy.external.astroML.histtools.knuth_bin_width(data, return_bins=False)¶

Return the optimal histogram bin width using Knuth’s rule 1

Parameters

data (array-like, ndim=1) – observed (one-dimensional) data
return_bins (bool (optional)) – if True, then return the bin edges

Returns

dx (float) – optimal bin width. Bins are measured starting at the first data point.
bins (ndarray) – bin edges: returned if return_bins is True

Notes

The optimal number of bins is the value M which maximizes the function

$F(M|x,I) = n\log(M) + \log\Gamma(\frac{M}{2}) - M\log\Gamma(\frac{1}{2}) - \log\Gamma(\frac{2n+M}{2}) + \sum_{k=1}^M \log\Gamma(n_k + \frac{1}{2})$

where $\Gamma$ is the Gamma function, $n$ is the number of data points, $n_k$ is the number of measurements in bin $k$ .

References

1: Knuth, K.H. “Optimal Data-Based Binning for Histograms”. arXiv:0605197, 2006

hyperspy.external.astroML.histtools.scotts_bin_width(data, return_bins=False)¶

Return the optimal histogram bin width using Scott’s rule:

Parameters

data (array-like, ndim=1) – observed (one-dimensional) data
return_bins (bool (optional)) – if True, then return the bin edges

Returns

width (float) – optimal bin width using Scott’s rule
bins (ndarray) – bin edges: returned if return_bins is True

Notes

The optimal bin width is

$\Delta_b = \frac{3.5\sigma}{n^{1/3}}$

where $\sigma$ is the standard deviation of the data, and $n$ is the number of data points.