bioverse.hypothesis module

Defines the Hypothesis class as well as two hypotheses used in Bixel & Apai (2021).

class bioverse.hypothesis.Hypothesis(f, bounds, params=(), features=(), labels=(), lnprior_function=None, guess_function=None, tfprior_function=None, log=None, h_null=None, **kwargs)

Bases: object

Describes a Bayesian hypothesis.

Parameters:

f (function) – Function describing the hypothesis. Must be defined as f(theta, X) where theta is a tuple of parameter values and X is a set of independent variables. Returns the calculated values of Y, the set of dependent variables for each entry in X.
bounds (array) – Nx2 array describing the [min, max] limits of each parameter. These are enforced even if a different prior distribution is defined.
params (tuple of str, optional) – Names of the parameter(s) of the hypothesis.
features (tuple of str, optional) – Names of the feature(s) or independent variables.
labels (tuple of str, optional) – Names of the label(s) or dependent variables.
lnprior_function (function, optional) – Used by emcee. Function which returns ln(P_prior), must be defined as prior(theta). If None, assume a (log-)uniform distribution.
guess_function (function, optional) – Used by emcee. Function which guesses valid sets of parameters. Must be defined as guess_function(n), and should return an n x m set of parameter guesses. If None, draw parameters randomly within bounds.
tfprior_function (function, optional) – Used by dynesty. Function which transforms (0, 1) into (min, max) with the appropriate prior probability. If None, assume a (log-)uniform distribution.
log (bool array, optional) – Array of length N specifying which parameters should be sampled by a log-uniform distribution.
kwargs (key, value pairs) – Additional keyword arguments (e.g., boolean switches) for the hypothesis function

guess_uniform(n, bounds): Default guess function. Guesses uniformly within self.bounds.

guess(n): Guesses a set of values for theta, preferably where P(theta) > -inf.

lnprior_uniform(theta): Default (log-)uniform prior distribution, checks that all values are within bounds.

lnprior(theta): Returns P(theta) (for emcee).

tfprior(u)

tfprior_uniform(u): Transforms the unit cube u into parameters drawn from (log-)uniform prior distributions.

lnlike_binary(theta, x, y, _): Likelihood function L(y | x, theta) if y is binary. The last argument is a placeholder.

lnlike_multivariate(theta, x, y, sigma): Likelihood function L(y | x, theta) if y is continuous and has sigma uncertainty.

lnprob(theta, x, y, sigma): Posterior probability function P(theta | x, y).

sample_posterior_dynesty(X, Y, sigma, nlive=100, nburn=None, verbose=False, sampler_results=False): Uses dynesty to sample the parameter posterior distributions and compute the log-evidence.

sample_posterior_emcee(x, y, sigma, nsteps=500, nwalkers=32, nburn=100, autocorr=False): Uses emcee to sample the parameter posterior distributions.

compute_AIC(theta_opt, x, y, sigma): Computes the Akaike information criterion for optimal parameter set theta_opt.

compute_BIC(theta_opt, x, y, sigma): Computes the Bayesian information criterion for optimal parameter set theta_opt.

get_observed(data): Identifies which planets in the data set have measurements of the relevant features/labels.

get_XY(data): Returns the X (features) and Y (labels) matrices for valid planets. Computes values as needed.

fit(data, nsteps=500, nwalkers=16, nburn=100, nlive=100, return_chains=False, verbose=False, method='dynesty', mw_alternative='greater', return_data=False, sampler_results=False)

Sample the posterior distribution of h(theta | x, y) using a simulated data set, and compare to the null hypothesis via a model comparison metric.

Parameters:

data (Table) – Simulated data set containing the features and labels.
nsteps (int, optional) – Number of steps per MCMC walker.
nburn (int, optional) – Number of burn-in steps for the Monte Carlo walk.
nlive (int, optional) – Number of live points for the nested sampler.
return_chains (bool, optional) – Wether or not to return the Monte Carlo chains.
verbose – Wether or not to generate extra output during the run.
method (str, optional) – Which sampling method to use. Options: dynesty (default), emcee, mannwhitney,
mw_alternative (str, {'two-sided', 'less', 'greater'}, optional) –
Defines the alternative hypothesis. Default is ‘two-sided’. Let F(u) and G(u) be the cumulative distribution functions of the distributions underlying x and y, respectively. Then the following alternative hypotheses are available:
- ’two-sided’: the distributions are not equal, i.e. F(u) ≠ G(u) for at least one u.
- ’less’: the distribution underlying x is stochastically less than the distribution underlying y, i.e. F(u) > G(u) for all u.
- ’greater’: the distribution underlying x is stochastically greater than the distribution underlying y, i.e. F(u) < G(u) for all u.
return_data (bool) – Wether or not to return the data
sampler_results (bool) – Wether or not to return the whole results object from dynesty runs

Returns:

results –

Dictionary containing the results of the model fit:: ’means’ : mean value of each parameter’s posterior distribution ‘stds’ : std dev of each parameter’s posterior distribution ‘medians’ : median value of each parameter’s posterior distribution ‘UCIs’ : 2-sigma confidence interval above the median ‘LCIs’ : 2-sigma confidence interval below the median ‘CIs’ : width of the +- 2 sigma confidence interval about the median ‘AIC’ : Akaike information criterion compared to the null hypothesis (i.e. AIC_null - AIC_alt) ‘BIC’ : Bayesian information criterion compared to the null hypothesis ‘chains’ : full chain of MCMC samples (if return_chains is True)

Return type:

dict

bioverse.hypothesis.f_null(theta, X): Function for a generic null hypothesis. Returns (theta1, theta2, …) for each element in X.

bioverse.hypothesis.f_HZ(theta, X): Function for the habitable zone hypothesis.

bioverse.hypothesis.f_age_oxygen(theta, X): Function for the age-oxygen correlation hypothesis.

bioverse.hypothesis.magma_ocean_hypo_exp(theta, X)

Define a hypothesis for a magma ocean-adapted radius-sma distribution that follows an exponential decay.

Parameters:

theta (array_like) –
Array of parameters for the hypothesis. f_magma : float

fraction of planets having a magma ocean

a_cut: float
cutoff effective sma for magma oceans. Defines position of the exponential decay.

lambda_a: float
Decay parameter for the semi-major axis dependence of having a global magma ocean.
X (array_like) – Independent variable. Includes semimajor axis a.

Returns:

Functional form of hypothesis

Return type:

array_like

bioverse.hypothesis.magma_ocean_hypo_step(theta, X)

Define a hypothesis for a magma ocean-adapted radius-sma distribution following a step function. Tests the hypothesis that the average planet size is smaller within the cutoff effective radius.

Parameters:

theta (array_like) –
Array of parameters for the hypothesis. f_magma : float

fraction of planets having a magma ocean

a_cut: float
cutoff effective sma for magma oceans. Defines where the step occurs.

radius_reduction: float
The fraction by which a planet’s radius is reduced due to a global magma ocean.

R_avgfloat
Average radius of the planets _without_ magma oceans.
X (array_like) – Independent variable. Includes semimajor axis a.

Returns:

Functional form of hypothesis

Return type:

array_like

bioverse.hypothesis.compute_avg_deltaR_deltaRho(stars_args, planets_args, transiting_only=True, savefile=True)

Compute average radius and bulk density changes of the magma ocean-bearing planets as a function of water-to-rock ratio. This will be used to inform the magma ocean hypothesis function and avoids lengthy computations on each call of the hypothesis.

Parameters:

stars_args (dict) – dictionary containing parameters for star generation. Should contain all non-default arguments for star-related generator modules.
planets_args (dict) – As stars_args, but for planet-related generator modules.
transiting_only (bool) – Consider only transiting planets?
savefile (bool) – Save data to file in DATA_DIR + ‘avg_deltaR_deltaRho.csv’?

Returns:

avg_deltaR_deltaRho – DataFrame containing the average radius/density differences.

Return type:

pandas DataFrame

bioverse.hypothesis.get_avg_deltaR_deltaRho(path=None): Read pre-calculated radius and density differences.

bioverse.hypothesis.magma_ocean_f0(theta, X): Define the null hypothesis that the radius distribution is random and independent of sma.

bioverse.hypothesis.magma_ocean_hypo(theta, X, gh_increase=True, water_incorp=True, simplified=False, diff_frac=-0.1, parameter_of_interest='R', f_dR=None)

Define a hypothesis for a magma ocean-adapted radius-sma distribution following a step function.

Parameters:

theta (array_like) –
Array of parameters for the hypothesis. S_thresh : float

threshold instellation for runaway greenhouse phase

wrrfloat
water-to-rock ratio. Will be discretized to the grid used in Turbet+2020, with possible values [0, 0.0001, 0.001 , 0.005 , 0.01 , 0.02 , 0.03 , 0.04 , 0.05 ].

f_rghfloat
fraction of planets within the runaway gh regime that have a runaway gh climate

avgfloat
average planet radius or bulk density outside the runaway greenhouse region
X (array_like) – Independent variable. Includes effective semimajor axis a_eff.
gh_increase (bool, optional) – wether or not to consider radius increase due to runaway greenhouse effect (Turbet+2020)
water_incorp (bool, optional) – wether or not to consider water incorporation in the melt of global magma oceans (Dorn & Lichtenberg 2021)
simplified (bool, optional) – change the radii of all runaway greenhouse planets by the same fraction
diff_frac (float, optional) – fractional radius or bulk density change in the simplified case. E.g., diff_frac = -0.10 is a 10% decrease.
parameter_of_interest (str, optional) – ‘label’, i.e. the observable in which to search for the pattern. Can be ‘R’ or ‘rho’.
f_dR (scipy.interpolate.interpolate.interp1d, optional) – function that interpolates in the table containing pre-computed average radius and bulk density differences. If not provided, the values will be computed for a grid of water-to-rock ratios (this might be slow).

Returns:

Functional form of hypothesis

Return type:

array_like