hyperspy.external.astroML.histtools module¶
Tools for working with distributions
-
class
hyperspy.external.astroML.histtools.
KnuthF
(data)¶ Bases:
object
Class which implements the function minimized by knuth_bin_width
Parameters: data (array-like, one dimension) – data to be histogrammed Notes
the function F is given by
where is the Gamma function, is the number of data points, is the number of measurements in bin .
See also
knuth_bin_width
,astroML.plotting.hist
-
bins
(M)¶ Return the bin edges given a width dx
-
eval
(M)¶ Evaluate the Knuth function
Parameters: dx (float) – Width of bins Returns: F – evaluation of the negative Knuth likelihood function: smaller values indicate a better fit. Return type: float
-
-
hyperspy.external.astroML.histtools.
dasky_freedman_bin_width
(data, return_bins=True)¶ Dask version of freedman_bin_width
Parameters: - data (dask array) – the data
- return_bins (bool (optional)) – if True, then return the bin edges
Returns: - width (float) – optimal bin width using Scott’s rule
- bins (ndarray) – bin edges: returned if return_bins is True
Notes
The optimal bin width is
where is the percent quartile of the data, and is the number of data points.
See also
knuth_bin_width()
,scotts_bin_width()
,astroML.plotting.hist()
-
hyperspy.external.astroML.histtools.
dasky_histogram
(a, bins=10, **kwargs)¶ Enhanced histogram for dask arrays. The range keyword is ignored. Reads the data at most two times - once to determine best bins (if required), and second time to actually calculate the histogram.
Parameters: - a (array_like) – array of data to be histogrammed
- bins (int or list or str (optional)) – If bins is a string, then it must be one of: ‘scotts’ : use Scott’s rule to determine bins ‘freedman’ : use the Freedman-Diaconis rule to determine bins
- keyword arguments are described in numpy.hist() (other) –
Returns: - hist (array) – The values of the histogram. See normed and weights for a description of the possible semantics.
- bin_edges (array of dtype float) – Return the bin edges
(length(hist)+1)
.
See also
numpy.histogram()
,astroML.plotting.hist()
-
hyperspy.external.astroML.histtools.
dasky_scotts_bin_width
(data, return_bins=True)¶ Dask version of scotts_bin_width
Parameters: - data (dask array) – the data
- return_bins (bool (optional)) – if True, then return the bin edges
Returns: - width (float) – optimal bin width using Scott’s rule
- bins (ndarray) – bin edges: returned if return_bins is True
Notes
The optimal bin width is:
where is the standard deviation of the data, and is the number of data points.
See also
knuth_bin_width()
,freedman_bin_width()
,astroML.plotting.hist()
-
hyperspy.external.astroML.histtools.
freedman_bin_width
(data, return_bins=False)¶ Return the optimal histogram bin width using the Freedman-Diaconis rule
Parameters: - data (array-like, ndim=1) – observed (one-dimensional) data
- return_bins (bool (optional)) – if True, then return the bin edges
Returns: - width (float) – optimal bin width using Scott’s rule
- bins (ndarray) – bin edges: returned if return_bins is True
Notes
The optimal bin width is
where is the percent quartile of the data, and is the number of data points.
See also
knuth_bin_width()
,scotts_bin_width()
,astroML.plotting.hist()
-
hyperspy.external.astroML.histtools.
histogram
(a, bins=10, range=None, **kwargs)¶ Enhanced histogram
This is a histogram function that enables the use of more sophisticated algorithms for determining bins. Aside from the bins argument allowing a string specified how bins are computed, the parameters are the same as numpy.histogram().
Parameters: - a (array_like) – array of data to be histogrammed
- bins (int or list or str (optional)) – If bins is a string, then it must be one of: ‘blocks’ : use bayesian blocks for dynamic bin widths ‘knuth’ : use Knuth’s rule to determine bins ‘scotts’ : use Scott’s rule to determine bins ‘freedman’ : use the Freedman-diaconis rule to determine bins
- range (tuple or None (optional)) – the minimum and maximum range for the histogram. If not specified, it will be (x.min(), x.max())
- keyword arguments are described in numpy.hist() (other) –
Returns: - hist (array) – The values of the histogram. See normed and weights for a description of the possible semantics.
- bin_edges (array of dtype float) – Return the bin edges
(length(hist)+1)
.
See also
numpy.histogram()
,astroML.plotting.hist()
-
hyperspy.external.astroML.histtools.
knuth_bin_width
(data, return_bins=False)¶ Return the optimal histogram bin width using Knuth’s rule [1]
Parameters: - data (array-like, ndim=1) – observed (one-dimensional) data
- return_bins (bool (optional)) – if True, then return the bin edges
Returns: - dx (float) – optimal bin width. Bins are measured starting at the first data point.
- bins (ndarray) – bin edges: returned if return_bins is True
Notes
The optimal number of bins is the value M which maximizes the function
where is the Gamma function, is the number of data points, is the number of measurements in bin .
References
[1] Knuth, K.H. “Optimal Data-Based Binning for Histograms”. arXiv:0605197, 2006 See also
-
hyperspy.external.astroML.histtools.
scotts_bin_width
(data, return_bins=False)¶ Return the optimal histogram bin width using Scott’s rule:
Parameters: - data (array-like, ndim=1) – observed (one-dimensional) data
- return_bins (bool (optional)) – if True, then return the bin edges
Returns: - width (float) – optimal bin width using Scott’s rule
- bins (ndarray) – bin edges: returned if return_bins is True
Notes
The optimal bin width is
where is the standard deviation of the data, and is the number of data points.
See also
knuth_bin_width()
,freedman_bin_width()
,astroML.plotting.hist()