# hyperspy.external.astroML package¶

## hyperspy.external.astroML.bayesian_blocks module¶

### Bayesian Block implementation¶

Dynamic programming algorithm for finding the optimal adaptive-width histogram.

Based on Scargle et al 2012 _

References

class hyperspy.external.astroML.bayesian_blocks.Events(p0=0.05, gamma=None)

Fitness for binned or unbinned events

Parameters: p0 (float) – False alarm probability, used to compute the prior on N (see eq. 21 of Scargle 2012). Default prior is for p0 = 0. gamma (float or None) – If specified, then use this gamma to compute the general prior form, p ~ gamma^N. If gamma is specified, p0 is ignored.
fitness(N_k, T_k)
prior(N, Ntot)
class hyperspy.external.astroML.bayesian_blocks.FitnessFunc(p0=0.05, gamma=None)

Bases: object

Base class for fitness functions

Each fitness function class has the following: - fitness(...) : compute fitness function.

Arguments accepted by fitness must be among [T_k, N_k, a_k, b_k, c_k]
• prior(N, Ntot) : compute prior on N given a total number of points Ntot
args
fitness(**kwargs)
gamma_prior(N, Ntot)

Basic prior, parametrized by gamma (eq. 3 in Scargle 2012)

p0_prior(N, Ntot)
prior(N, Ntot)
validate_input(t, x, sigma)

Check that input is valid

class hyperspy.external.astroML.bayesian_blocks.PointMeasures(p0=None, gamma=None)

Fitness for point measures

Parameters: gamma (float) – specifies the prior on the number of bins: p ~ gamma^N if gamma is not specified, then a prior based on simulations will be used (see sec 3.3 of Scargle 2012)
fitness(a_k, b_k)
prior(N, Ntot)
class hyperspy.external.astroML.bayesian_blocks.RegularEvents(dt, p0=0.05, gamma=None)

Fitness for regular events

This is for data which has a fundamental “tick” length, so that all measured values are multiples of this tick length. In each tick, there are either zero or one counts.

Parameters: dt (float) – tick rate for data gamma (float) – specifies the prior on the number of bins: p ~ gamma^N
fitness(T_k, N_k)
validate_input(t, x, sigma)
hyperspy.external.astroML.bayesian_blocks.bayesian_blocks(t, x=None, sigma=None, fitness='events', **kwargs)

Bayesian Blocks Implementation

This is a flexible implementation of the Bayesian Blocks algorithm described in Scargle 2012 _

Parameters: t (array_like) – data times (one dimensional, length N) x (array_like (optional)) – data values sigma (array_like or float (optional)) – data errors fitness (str or object) – the fitness function to use. If a string, the following options are supported: ‘events’ : binned or unbinned event data extra arguments are p0, which gives the false alarm probability to compute the prior, or gamma which gives the slope of the prior on the number of bins. ‘regular_events’ : non-overlapping events measured at multiples of a fundamental tick rate, dt, which must be specified as an additional argument. The prior can be specified through gamma, which gives the slope of the prior on the number of bins. ‘measures’ : fitness for a measured sequence with Gaussian errors The prior can be specified using gamma, which gives the slope of the prior on the number of bins. If gamma is not specified, then a simulation-derived prior will be used. Alternatively, the fitness can be a user-specified object of type derived from the FitnessFunc class. edges – array containing the (N+1) bin edges ndarray

Examples

Event data:

>>> t = np.random.normal(size=100)
>>> bins = bayesian_blocks(t, fitness='events', p0=0.01)


Event data with repeats:

>>> t = np.random.normal(size=100)
>>> t[80:] = t[:20]
>>> bins = bayesian_blocks(t, fitness='events', p0=0.01)


Regular event data:

>>> dt = 0.01
>>> t = dt * np.arange(1000)
>>> x = np.zeros(len(t))
>>> x[np.random.randint(0, len(t), len(t) / 10)] = 1
>>> bins = bayesian_blocks(t, fitness='regular_events', dt=dt, gamma=0.9)


Measured point data with errors:

>>> t = 100 * np.random.random(100)
>>> x = np.exp(-0.5 * (t - 50) ** 2)
>>> sigma = 0.1
>>> x_obs = np.random.normal(x, sigma)
>>> bins = bayesian_blocks(t, fitness='measures')


References

  Scargle, J et al. (2012) http://adsabs.harvard.edu/abs/2012arXiv1207.5578S

astroML.plotting.hist()
histogram plotting function which can make use of bayesian blocks.

## hyperspy.external.astroML.histtools module¶

Tools for working with distributions

class hyperspy.external.astroML.histtools.KnuthF(data)

Bases: object

Class which implements the function minimized by knuth_bin_width

Parameters: data (array-like, one dimension) – data to be histogrammed

Notes

the function F is given by where is the Gamma function, is the number of data points, is the number of measurements in bin .

knuth_bin_width, astroML.plotting.hist

bins(M)

Return the bin edges given a width dx

eval(M)

Evaluate the Knuth function

Parameters: dx (float) – Width of bins F – evaluation of the negative Knuth likelihood function: smaller values indicate a better fit. float
hyperspy.external.astroML.histtools.freedman_bin_width(data, return_bins=False)

Return the optimal histogram bin width using the Freedman-Diaconis rule

Parameters: data (array-like, ndim=1) – observed (one-dimensional) data return_bins (bool (optional)) – if True, then return the bin edges width (float) – optimal bin width using Scott’s rule bins (ndarray) – bin edges: returned if return_bins is True

Notes

The optimal bin width is where is the percent quartile of the data, and is the number of data points.

knuth_bin_width(), scotts_bin_width(), astroML.plotting.hist()

hyperspy.external.astroML.histtools.histogram(a, bins=10, range=None, **kwargs)

Enhanced histogram

This is a histogram function that enables the use of more sophisticated algorithms for determining bins. Aside from the bins argument allowing a string specified how bins are computed, the parameters are the same as numpy.histogram().

Parameters: a (array_like) – array of data to be histogrammed bins (int or list or str (optional)) – If bins is a string, then it must be one of: ‘blocks’ : use bayesian blocks for dynamic bin widths ‘knuth’ : use Knuth’s rule to determine bins ‘scotts’ : use Scott’s rule to determine bins ‘freedman’ : use the Freedman-diaconis rule to determine bins range (tuple or None (optional)) – the minimum and maximum range for the histogram. If not specified, it will be (x.min(), x.max()) keyword arguments are described in numpy.hist(). (other) – hist (array) – The values of the histogram. See normed and weights for a description of the possible semantics. bin_edges (array of dtype float) – Return the bin edges (length(hist)+1).

numpy.histogram(), astroML.plotting.hist()

hyperspy.external.astroML.histtools.knuth_bin_width(data, return_bins=False)

Return the optimal histogram bin width using Knuth’s rule _

Parameters: data (array-like, ndim=1) – observed (one-dimensional) data return_bins (bool (optional)) – if True, then return the bin edges dx (float) – optimal bin width. Bins are measured starting at the first data point. bins (ndarray) – bin edges: returned if return_bins is True

Notes

The optimal number of bins is the value M which maximizes the function where is the Gamma function, is the number of data points, is the number of measurements in bin .

References

  Knuth, K.H. “Optimal Data-Based Binning for Histograms”. arXiv:0605197, 2006
hyperspy.external.astroML.histtools.scotts_bin_width(data, return_bins=False)

Return the optimal histogram bin width using Scott’s rule:

Parameters: data (array-like, ndim=1) – observed (one-dimensional) data return_bins (bool (optional)) – if True, then return the bin edges width (float) – optimal bin width using Scott’s rule bins (ndarray) – bin edges: returned if return_bins is True

Notes

The optimal bin width is where is the standard deviation of the data, and is the number of data points.

knuth_bin_width(), freedman_bin_width(), astroML.plotting.hist()