bioverse.hypothesis module

Defines the Hypothesis class as well as two hypotheses used in Bixel & Apai (2021).

class bioverse.hypothesis.Hypothesis(f, bounds, params=(), features=(), labels=(), lnprior_function=None, guess_function=None, tfprior_function=None, log=None, h_null=None, **kwargs)

Bases: object

Describes a Bayesian hypothesis.

Parameters:
  • f (function) – Function describing the hypothesis. Must be defined as f(theta, X) where theta is a tuple of parameter values and X is a set of independent variables. Returns the calculated values of Y, the set of dependent variables for each entry in X.

  • bounds (array) – Nx2 array describing the [min, max] limits of each parameter. These are enforced even if a different prior distribution is defined.

  • params (tuple of str, optional) – Names of the parameter(s) of the hypothesis.

  • features (tuple of str, optional) – Names of the feature(s) or independent variables.

  • labels (tuple of str, optional) – Names of the label(s) or dependent variables.

  • lnprior_function (function, optional) – Used by emcee. Function which returns ln(P_prior), must be defined as prior(theta). If None, assume a (log-)uniform distribution.

  • guess_function (function, optional) – Used by emcee. Function which guesses valid sets of parameters. Must be defined as guess_function(n), and should return an n x m set of parameter guesses. If None, draw parameters randomly within bounds.

  • tfprior_function (function, optional) – Used by dynesty. Function which transforms (0, 1) into (min, max) with the appropriate prior probability. If None, assume a (log-)uniform distribution.

  • log (bool array, optional) – Array of length N specifying which parameters should be sampled by a log-uniform distribution.

  • kwargs (key, value pairs) – Additional keyword arguments (e.g., boolean switches) for the hypothesis function

guess_uniform(n, bounds)

Default guess function. Guesses uniformly within self.bounds.

guess(n)

Guesses a set of values for theta, preferably where P(theta) > -inf.

lnprior_uniform(theta)

Default (log-)uniform prior distribution, checks that all values are within bounds.

lnprior(theta)

Returns P(theta) (for emcee).

tfprior(u)
tfprior_uniform(u)

Transforms the unit cube u into parameters drawn from (log-)uniform prior distributions.

lnlike_binary(theta, x, y, _)

Likelihood function L(y | x, theta) if y is binary. The last argument is a placeholder.

lnlike_multivariate(theta, x, y, sigma)

Likelihood function L(y | x, theta) if y is continuous and has sigma uncertainty.

lnprob(theta, x, y, sigma)

Posterior probability function P(theta | x, y).

sample_posterior_dynesty(X, Y, sigma, nlive=100, nburn=None, verbose=False, sampler_results=False)

Uses dynesty to sample the parameter posterior distributions and compute the log-evidence.

sample_posterior_emcee(x, y, sigma, nsteps=500, nwalkers=32, nburn=100, autocorr=False)

Uses emcee to sample the parameter posterior distributions.

compute_AIC(theta_opt, x, y, sigma)

Computes the Akaike information criterion for optimal parameter set theta_opt.

compute_BIC(theta_opt, x, y, sigma)

Computes the Bayesian information criterion for optimal parameter set theta_opt.

get_observed(data)

Identifies which planets in the data set have measurements of the relevant features/labels.

get_XY(data)

Returns the X (features) and Y (labels) matrices for valid planets. Computes values as needed.

fit(data, nsteps=500, nwalkers=16, nburn=100, nlive=100, return_chains=False, verbose=False, method='dynesty', mw_alternative='greater', return_data=False, sampler_results=False)

Sample the posterior distribution of h(theta | x, y) using a simulated data set, and compare to the null hypothesis via a model comparison metric.

Parameters:
  • data (Table) – Simulated data set containing the features and labels.

  • nsteps (int, optional) – Number of steps per MCMC walker.

  • nburn (int, optional) – Number of burn-in steps for the Monte Carlo walk.

  • nlive (int, optional) – Number of live points for the nested sampler.

  • return_chains (bool, optional) – Wether or not to return the Monte Carlo chains.

  • verbose – Wether or not to generate extra output during the run.

  • method (str, optional) – Which sampling method to use. Options: dynesty (default), emcee, mannwhitney,

  • mw_alternative (str, {'two-sided', 'less', 'greater'}, optional) –

    Defines the alternative hypothesis. Default is ‘two-sided’. Let F(u) and G(u) be the cumulative distribution functions of the distributions underlying x and y, respectively. Then the following alternative hypotheses are available:

    • ’two-sided’: the distributions are not equal, i.e. F(u) ≠ G(u) for at least one u.

    • ’less’: the distribution underlying x is stochastically less than the distribution underlying y, i.e. F(u) > G(u) for all u.

    • ’greater’: the distribution underlying x is stochastically greater than the distribution underlying y, i.e. F(u) < G(u) for all u.

  • return_data (bool) – Wether or not to return the data

  • sampler_results (bool) – Wether or not to return the whole results object from dynesty runs

Returns:

results

Dictionary containing the results of the model fit:

’means’ : mean value of each parameter’s posterior distribution ‘stds’ : std dev of each parameter’s posterior distribution ‘medians’ : median value of each parameter’s posterior distribution ‘UCIs’ : 2-sigma confidence interval above the median ‘LCIs’ : 2-sigma confidence interval below the median ‘CIs’ : width of the +- 2 sigma confidence interval about the median ‘AIC’ : Akaike information criterion compared to the null hypothesis (i.e. AIC_null - AIC_alt) ‘BIC’ : Bayesian information criterion compared to the null hypothesis ‘chains’ : full chain of MCMC samples (if return_chains is True)

Return type:

dict

bioverse.hypothesis.f_null(theta, X)

Function for a generic null hypothesis. Returns (theta1, theta2, …) for each element in X.

bioverse.hypothesis.f_HZ(theta, X)

Function for the habitable zone hypothesis.

bioverse.hypothesis.f_age_oxygen(theta, X)

Function for the age-oxygen correlation hypothesis.

bioverse.hypothesis.magma_ocean_hypo_exp(theta, X)

Define a hypothesis for a magma ocean-adapted radius-sma distribution that follows an exponential decay.

Parameters:
  • theta (array_like) –

    Array of parameters for the hypothesis. f_magma : float

    fraction of planets having a magma ocean

    a_cut: float

    cutoff effective sma for magma oceans. Defines position of the exponential decay.

    lambda_a: float

    Decay parameter for the semi-major axis dependence of having a global magma ocean.

  • X (array_like) – Independent variable. Includes semimajor axis a.

Returns:

Functional form of hypothesis

Return type:

array_like

bioverse.hypothesis.magma_ocean_hypo_step(theta, X)

Define a hypothesis for a magma ocean-adapted radius-sma distribution following a step function. Tests the hypothesis that the average planet size is smaller within the cutoff effective radius.

Parameters:
  • theta (array_like) –

    Array of parameters for the hypothesis. f_magma : float

    fraction of planets having a magma ocean

    a_cut: float

    cutoff effective sma for magma oceans. Defines where the step occurs.

    radius_reduction: float

    The fraction by which a planet’s radius is reduced due to a global magma ocean.

    R_avgfloat

    Average radius of the planets _without_ magma oceans.

  • X (array_like) – Independent variable. Includes semimajor axis a.

Returns:

Functional form of hypothesis

Return type:

array_like

bioverse.hypothesis.compute_avg_deltaR_deltaRho(stars_args, planets_args, transiting_only=True, savefile=True)

Compute average radius and bulk density changes of the magma ocean-bearing planets as a function of water-to-rock ratio. This will be used to inform the magma ocean hypothesis function and avoids lengthy computations on each call of the hypothesis.

Parameters:
  • stars_args (dict) – dictionary containing parameters for star generation. Should contain all non-default arguments for star-related generator modules.

  • planets_args (dict) – As stars_args, but for planet-related generator modules.

  • transiting_only (bool) – Consider only transiting planets?

  • savefile (bool) – Save data to file in DATA_DIR + ‘avg_deltaR_deltaRho.csv’?

Returns:

avg_deltaR_deltaRho – DataFrame containing the average radius/density differences.

Return type:

pandas DataFrame

bioverse.hypothesis.get_avg_deltaR_deltaRho(path=None)

Read pre-calculated radius and density differences.

bioverse.hypothesis.magma_ocean_f0(theta, X)

Define the null hypothesis that the radius distribution is random and independent of sma.

bioverse.hypothesis.magma_ocean_hypo(theta, X, gh_increase=True, water_incorp=True, simplified=False, diff_frac=-0.1, parameter_of_interest='R', f_dR=None)

Define a hypothesis for a magma ocean-adapted radius-sma distribution following a step function.

Parameters:
  • theta (array_like) –

    Array of parameters for the hypothesis. S_thresh : float

    threshold instellation for runaway greenhouse phase

    wrrfloat

    water-to-rock ratio. Will be discretized to the grid used in Turbet+2020, with possible values [0, 0.0001, 0.001 , 0.005 , 0.01 , 0.02 , 0.03 , 0.04 , 0.05 ].

    f_rghfloat

    fraction of planets within the runaway gh regime that have a runaway gh climate

    avgfloat

    average planet radius or bulk density outside the runaway greenhouse region

  • X (array_like) – Independent variable. Includes effective semimajor axis a_eff.

  • gh_increase (bool, optional) – wether or not to consider radius increase due to runaway greenhouse effect (Turbet+2020)

  • water_incorp (bool, optional) – wether or not to consider water incorporation in the melt of global magma oceans (Dorn & Lichtenberg 2021)

  • simplified (bool, optional) – change the radii of all runaway greenhouse planets by the same fraction

  • diff_frac (float, optional) – fractional radius or bulk density change in the simplified case. E.g., diff_frac = -0.10 is a 10% decrease.

  • parameter_of_interest (str, optional) – ‘label’, i.e. the observable in which to search for the pattern. Can be ‘R’ or ‘rho’.

  • f_dR (scipy.interpolate.interpolate.interp1d, optional) – function that interpolates in the table containing pre-computed average radius and bulk density differences. If not provided, the values will be computed for a grid of water-to-rock ratios (this might be slow).

Returns:

Functional form of hypothesis

Return type:

array_like