bioverse.hypothesis module
Defines the Hypothesis class as well as two hypotheses used in Bixel & Apai (2021).
- class bioverse.hypothesis.Hypothesis(f, bounds, params=(), features=(), labels=(), lnprior_function=None, guess_function=None, tfprior_function=None, log=None, h_null=None, **kwargs)
Bases:
object
Describes a Bayesian hypothesis.
- Parameters:
f (function) – Function describing the hypothesis. Must be defined as f(theta, X) where theta is a tuple of parameter values and X is a set of independent variables. Returns the calculated values of Y, the set of dependent variables for each entry in X.
bounds (array) – Nx2 array describing the [min, max] limits of each parameter. These are enforced even if a different prior distribution is defined.
params (tuple of str, optional) – Names of the parameter(s) of the hypothesis.
features (tuple of str, optional) – Names of the feature(s) or independent variables.
labels (tuple of str, optional) – Names of the label(s) or dependent variables.
lnprior_function (function, optional) – Used by emcee. Function which returns ln(P_prior), must be defined as prior(theta). If None, assume a (log-)uniform distribution.
guess_function (function, optional) – Used by emcee. Function which guesses valid sets of parameters. Must be defined as guess_function(n), and should return an n x m set of parameter guesses. If None, draw parameters randomly within bounds.
tfprior_function (function, optional) – Used by dynesty. Function which transforms (0, 1) into (min, max) with the appropriate prior probability. If None, assume a (log-)uniform distribution.
log (bool array, optional) – Array of length N specifying which parameters should be sampled by a log-uniform distribution.
kwargs (key, value pairs) – Additional keyword arguments (e.g., boolean switches) for the hypothesis function
- guess_uniform(n, bounds)
Default guess function. Guesses uniformly within self.bounds.
- guess(n)
Guesses a set of values for theta, preferably where P(theta) > -inf.
- lnprior_uniform(theta)
Default (log-)uniform prior distribution, checks that all values are within bounds.
- lnprior(theta)
Returns P(theta) (for emcee).
- tfprior(u)
- tfprior_uniform(u)
Transforms the unit cube u into parameters drawn from (log-)uniform prior distributions.
- lnlike_binary(theta, x, y, _)
Likelihood function L(y | x, theta) if y is binary. The last argument is a placeholder.
- lnlike_multivariate(theta, x, y, sigma)
Likelihood function L(y | x, theta) if y is continuous and has sigma uncertainty.
- lnprob(theta, x, y, sigma)
Posterior probability function P(theta | x, y).
- sample_posterior_dynesty(X, Y, sigma, nlive=100, nburn=None, verbose=False, sampler_results=False)
Uses dynesty to sample the parameter posterior distributions and compute the log-evidence.
- sample_posterior_emcee(x, y, sigma, nsteps=500, nwalkers=32, nburn=100, autocorr=False)
Uses emcee to sample the parameter posterior distributions.
- compute_AIC(theta_opt, x, y, sigma)
Computes the Akaike information criterion for optimal parameter set theta_opt.
- compute_BIC(theta_opt, x, y, sigma)
Computes the Bayesian information criterion for optimal parameter set theta_opt.
- get_observed(data)
Identifies which planets in the data set have measurements of the relevant features/labels.
- get_XY(data)
Returns the X (features) and Y (labels) matrices for valid planets. Computes values as needed.
- fit(data, nsteps=500, nwalkers=16, nburn=100, nlive=100, return_chains=False, verbose=False, method='dynesty', mw_alternative='greater', return_data=False, sampler_results=False)
Sample the posterior distribution of h(theta | x, y) using a simulated data set, and compare to the null hypothesis via a model comparison metric.
- Parameters:
data (Table) – Simulated data set containing the features and labels.
nsteps (int, optional) – Number of steps per MCMC walker.
nburn (int, optional) – Number of burn-in steps for the Monte Carlo walk.
nlive (int, optional) – Number of live points for the nested sampler.
return_chains (bool, optional) – Wether or not to return the Monte Carlo chains.
verbose – Wether or not to generate extra output during the run.
method (str, optional) – Which sampling method to use. Options: dynesty (default), emcee, mannwhitney,
mw_alternative (str, {'two-sided', 'less', 'greater'}, optional) –
Defines the alternative hypothesis. Default is ‘two-sided’. Let F(u) and G(u) be the cumulative distribution functions of the distributions underlying x and y, respectively. Then the following alternative hypotheses are available:
’two-sided’: the distributions are not equal, i.e. F(u) ≠ G(u) for at least one u.
’less’: the distribution underlying x is stochastically less than the distribution underlying y, i.e. F(u) > G(u) for all u.
’greater’: the distribution underlying x is stochastically greater than the distribution underlying y, i.e. F(u) < G(u) for all u.
return_data (bool) – Wether or not to return the data
sampler_results (bool) – Wether or not to return the whole results object from dynesty runs
- Returns:
results –
- Dictionary containing the results of the model fit:
’means’ : mean value of each parameter’s posterior distribution ‘stds’ : std dev of each parameter’s posterior distribution ‘medians’ : median value of each parameter’s posterior distribution ‘UCIs’ : 2-sigma confidence interval above the median ‘LCIs’ : 2-sigma confidence interval below the median ‘CIs’ : width of the +- 2 sigma confidence interval about the median ‘AIC’ : Akaike information criterion compared to the null hypothesis (i.e. AIC_null - AIC_alt) ‘BIC’ : Bayesian information criterion compared to the null hypothesis ‘chains’ : full chain of MCMC samples (if return_chains is True)
- Return type:
dict
- bioverse.hypothesis.f_null(theta, X)
Function for a generic null hypothesis. Returns (theta1, theta2, …) for each element in X.
- bioverse.hypothesis.f_HZ(theta, X)
Function for the habitable zone hypothesis.
- bioverse.hypothesis.f_age_oxygen(theta, X)
Function for the age-oxygen correlation hypothesis.
- bioverse.hypothesis.magma_ocean_hypo_exp(theta, X)
Define a hypothesis for a magma ocean-adapted radius-sma distribution that follows an exponential decay.
- Parameters:
theta (array_like) –
Array of parameters for the hypothesis. f_magma : float
fraction of planets having a magma ocean
- a_cut: float
cutoff effective sma for magma oceans. Defines position of the exponential decay.
- lambda_a: float
Decay parameter for the semi-major axis dependence of having a global magma ocean.
X (array_like) – Independent variable. Includes semimajor axis a.
- Returns:
Functional form of hypothesis
- Return type:
array_like
- bioverse.hypothesis.magma_ocean_hypo_step(theta, X)
Define a hypothesis for a magma ocean-adapted radius-sma distribution following a step function. Tests the hypothesis that the average planet size is smaller within the cutoff effective radius.
- Parameters:
theta (array_like) –
Array of parameters for the hypothesis. f_magma : float
fraction of planets having a magma ocean
- a_cut: float
cutoff effective sma for magma oceans. Defines where the step occurs.
- radius_reduction: float
The fraction by which a planet’s radius is reduced due to a global magma ocean.
- R_avgfloat
Average radius of the planets _without_ magma oceans.
X (array_like) – Independent variable. Includes semimajor axis a.
- Returns:
Functional form of hypothesis
- Return type:
array_like
- bioverse.hypothesis.compute_avg_deltaR_deltaRho(stars_args, planets_args, transiting_only=True, savefile=True)
Compute average radius and bulk density changes of the magma ocean-bearing planets as a function of water-to-rock ratio. This will be used to inform the magma ocean hypothesis function and avoids lengthy computations on each call of the hypothesis.
- Parameters:
stars_args (dict) – dictionary containing parameters for star generation. Should contain all non-default arguments for star-related generator modules.
planets_args (dict) – As stars_args, but for planet-related generator modules.
transiting_only (bool) – Consider only transiting planets?
savefile (bool) – Save data to file in DATA_DIR + ‘avg_deltaR_deltaRho.csv’?
- Returns:
avg_deltaR_deltaRho – DataFrame containing the average radius/density differences.
- Return type:
pandas DataFrame
- bioverse.hypothesis.get_avg_deltaR_deltaRho(path=None)
Read pre-calculated radius and density differences.
- bioverse.hypothesis.magma_ocean_f0(theta, X)
Define the null hypothesis that the radius distribution is random and independent of sma.
- bioverse.hypothesis.magma_ocean_hypo(theta, X, gh_increase=True, water_incorp=True, simplified=False, diff_frac=-0.1, parameter_of_interest='R', f_dR=None)
Define a hypothesis for a magma ocean-adapted radius-sma distribution following a step function.
- Parameters:
theta (array_like) –
Array of parameters for the hypothesis. S_thresh : float
threshold instellation for runaway greenhouse phase
- wrrfloat
water-to-rock ratio. Will be discretized to the grid used in Turbet+2020, with possible values [0, 0.0001, 0.001 , 0.005 , 0.01 , 0.02 , 0.03 , 0.04 , 0.05 ].
- f_rghfloat
fraction of planets within the runaway gh regime that have a runaway gh climate
- avgfloat
average planet radius or bulk density outside the runaway greenhouse region
X (array_like) – Independent variable. Includes effective semimajor axis a_eff.
gh_increase (bool, optional) – wether or not to consider radius increase due to runaway greenhouse effect (Turbet+2020)
water_incorp (bool, optional) – wether or not to consider water incorporation in the melt of global magma oceans (Dorn & Lichtenberg 2021)
simplified (bool, optional) – change the radii of all runaway greenhouse planets by the same fraction
diff_frac (float, optional) – fractional radius or bulk density change in the simplified case. E.g., diff_frac = -0.10 is a 10% decrease.
parameter_of_interest (str, optional) – ‘label’, i.e. the observable in which to search for the pattern. Can be ‘R’ or ‘rho’.
f_dR (scipy.interpolate.interpolate.interp1d, optional) – function that interpolates in the table containing pre-computed average radius and bulk density differences. If not provided, the values will be computed for a grid of water-to-rock ratios (this might be slow).
- Returns:
Functional form of hypothesis
- Return type:
array_like