# Computing statistical power

Consider the “habitable zone hypothesis”, which proposes that habitable planets with atmospheric water vapor will be more common within the semi-major axis range `a_inner`

< `a_eff`

< `a_outer`

(see Section 6 of the paper and Example 1 for more details). In Bioverse, this effect is injected into the simulated sample by the `Example1_water()`

function, and tested using the `h_HZ`

Hypothesis. To test this hypothesis using a LUVOIR-like direct imaging survey:

```
from bioverse.generator import Generator
from bioverse.survey import ImagingSurvey
from bioverse.hypothesis import h_HZ
# Load the Generator and Survey objects
generator = Generator('imaging')
survey = ImagingSurvey('default')
# Generate a set of planetary systems and a simulated dataset as observed by an imaging survey
# Assume 50% of EECs are habitable (f_water_habitable=0.5)
# Assume 1% of non-habitable planets have water vapor (f_water_nonhabitable=0.01)
sample, detected, data = survey.quickrun(generator, f_water_habitable=0.5, f_water_nonhabitable=0.01)
# Test the habitable zone hypothesis from this dataset
results = h_HZ.fit(data)
print("The evidence in favor of the habitable zone hypothesis is {:.1f}.".format(results['dlnZ']))
```

Output: `The evidence in favor of the habitable zone hypothesis is 6.4.`

This result corresponds to a “p-value” of ~1.7E-3. However, this represents only one possible realization of the survey. Due to Poisson uncertainty, another equivalent survey might detect fewer habitable planets and thus be less capable of testing the hypothesis. To capture this, we can repeat the survey several times and average their results. The `analysis`

module enables this through its `test_hypothesis_grid()`

function.

```
from bioverse import analysis
# Repeat the hypothesis test N=30 times with the same assumptions as above
results = analysis.test_hypothesis_grid(h_HZ, generator, survey, N=30, f_water_habitable=0.5,
f_water_nonhabitable=0.01, processes=8)
# Determine the statistical power assuming a significance threshold of dlnZ > 3
power = analysis.compute_statistical_power(results, method='dlnZ', threshold=3)
print("The statistical power of the survey is {:.1f}%".format(100*power))
```

Output: `The statistical power of the survey is 75.0%.`

Under the assumptions that 50% of exo-Earth candidates are habitable **and** 1% of non-habitable planets have H2O in their atmospheres, it is 75% likely that a LUVOIR-like survey would be able to detect the overabundance of H2O in the habitable zone.

## Parameter grids

Of course, those assumptions are highly uncertain, and a more thorough analysis should investigate how this result depends on key model parameters - such as `f_water_habitable`

or `eta_Earth`

. This can be done by passing an array of values for these parameters to the `test_hypothesis_grid()`

function:

```
# Vary the fraction of EECs with water vapor from 1% to 100% (log spacing)
f_water_habitable = np.logspace(-2, 0, 5)
# Vary eta Earth from 7.5% to 30% (linear spacing)
eta_Earth = np.linspace(0.075, 0.3, 5)
# Test the hypothesis N=30 times for each parameter combination
results = analysis.test_hypothesis_grid(h_HZ, generator, survey, N=30, f_water_habitable=f_water_habitable,
eta_Earth=eta_Earth, f_water_nonhabitable=0.01, processes=8)
# Compute the statistical power for each parameter combination
power = analysis.compute_statistical_power(results, method='dlnZ', threshold=3)
```

`power`

will be a 5x5 array containing the statistical power for each parameter combination. The axis order depends on the order in which arguments are passed to `test_hypothesis_grid()`

; in this case, `f_water_habitable`

will correspond to the first axis and `eta_Earth`

to the second.

## Plotting the results

The `plot_power_grid()`

function can be used to plot the statistical power over a 2-dimensional grid. Starting from the above example:

```
from bioverse.plots import plot_power_grid
# Specify which parameters to plot on the x and y axes
axes = ('f_water_habitable', 'eta_Earth')
# Set the axis labels
labels = ('Fraction of EECs that are habitable', 'eta Earth')
# Set log-scale for the x axis
log = (True, False)
# Create the plot
plot_power_grid(results, axes=axes, labels=labels, log=log)
```

The number and percentage values of the contour lines can be set with the `levels`

argument, or set `levels=None`

to disable them. To create a higher resolution plot with smoother contour lines, simply run `test_hypothesis_grid()`

over a finer grid of parameter values.

## Multiprocessing

To compute the statistical power for a 20x20 parameter grid with `N=50`

simulations in each cell requires 20,000 simulations, or approximately 5-6 hours for the example above. Fortunately, these simulations are entirely independent of each other, making parallel processing an effective solution. You can use the `processes`

argument of `test_hypothesis_grid()`

to indicate how many processes to run in parallel. Note that Bioverse can be memory-intensive, so large values of `processes`

(e.g. greater than 10) can have diminishing returns or lead to a crash.