Simulating survey datasets

The Survey class

The output of generate() is a Table containing the values of several parameters for planets within the bounds of the simulation. However, only a subset of these will be detectable by a transit or direct imaging survey. For those planets, only a subset of their properties can be directly probed, and only with a finite level of precision. Module 2 captures these details by simulating the observing limits and measurement precision of a direct imaging or transit spectroscopy survey of the planet population.

The survey simulation module is implemented by the Survey class [1] and its children classes ImagingSurvey and TransitSurvey. The Survey describes several key components of an exoplanet survey including:

  • diameter: the diameter of the telescope primary in meters (or the area-equivalent diameter for a telescope array)

  • t_slew: slew time between observations, in days

  • ImagingSurvey
    • inner_working_angle and outer_working_angle: IWA/OWA of the coronagraphic imager

    • contrast_limit: log-contrast limit (i.e. faintest detectable planet)

  • TransitSurvey
    • N_obs_max: maximum allowable number of transit observations per target

    • t_max: maximum amount of time across which to combine transit observations, in days

  • T_st_ref, R_st_ref, and d_ref: temperature (Kelvin), radius (\(R_\odot\)), and distance (parsec) of the reference star (see Exposure time calculations)

  • D_ref: diameter of the reference telescope, in meters

Each type of survey “ships” with a default configuration:

from bioverse.survey import ImagingSurvey, TransitSurvey
survey_imaging = ImagingSurvey('default')
survey_transit = TransitSurvey('default')

The default imaging survey is modeled after LUVOIR-A, with a coronagraphic imager and 15-meter primary aperture. The default transit survey is modeled after the Nautilus Space Observatory, with a 50-meter equivalent light-collecting area.

Which planets are detectable?

Given a simulated set of planets to observe, the Survey first determines which of these are detectable. For a TransitSurvey, this set consists of all transiting planets, while for an ImagingSurvey, it consists of all planets within the coronagraphic IWA/OWA and brighter than the limiting contrast. This can be invoked as follows

detected = survey_imaging.compute_yield(sample)

Conducting measurements

The Survey will conduct a series of measurements on the detectable planet sample, each defined by a Measurement object. A Measurement’s parameters include:

  • key: the name of the planet property to be measured

  • precision: the relative or absolute precision of the measurement (e.g. 10% or 0.1 AU)

  • t_ref: the amount of time in days required to conduct this measurement for a typical target (see below)

  • t_total: the amount of survey time in days allocated toward this measurement

  • wl_eff: the effective wavelength of observation in microns

  • priority: a set of rules describing how targets are prioritized (described below)

To conduct these measurements and produce a dataset:

data = survey_imaging.observe(detected)


In total, to produce a simulated sample of planets, determine which planets are detectable, and produce a mock dataset requires the following:

from bioverse.generator import Generator
from bioverse.survey import ImagingSurvey

generator = Generator('imaging')
survey = ImagingSurvey('default')

sample = generator.generate(eta_Earth=0.15)
detected = survey.compute_yield(sample)
data = survey.observe(detected)

The last three lines can be combined into the following:

sample, detected, data = survey.quickrun(generator, eta_Earth=0.15)

quickrun() will pass any keyword arguments to the generate() method, and will by default pass transit_mode=True for a TransitSurvey.

Exposure time calculations

Spectroscopic observations of exoplanets are time-consuming, and for some surveys the amount of time required to conduct them will be a limiting factor on sample size. To accomodate this, Bioverse calculates the exposure time \(t_i\) required to conduct the spectroscopic measurement for each planet, then prioritizes each planet according to \(t_i\) as well as its weight parameter (see Target prioritization). In the simulated dataset, planets that could not be observed within the total allotted time t_total will have nan values for the measured value.

A Measurement’s “reference time”, t_ref, is the exposure time required to perform the measurement for an Earth-like planet (receiving the same flux as Earth) orbiting a typical star (whose properties are defined by the Survey parameters T_st_ref, R_st_ref, and d_ref), with a telescope of diameter D_ref. For the default imaging survey, the typical target orbits a Sun-like star at a distance of 10 pc, while for the transit survey, the host star is a mid-M dwarf.

Bioverse uses t_ref, along the wavelength of observation wl_eff, to determine the exposure time t_i required for each individual planet with the following equation:

\[\frac{t_i}{t_\text{ref}} = f_i \left(\frac{d_i}{d_\text{ref}}\right)^2 \left(\frac{R_*}{R_{*, \text{ref}}}\right)^{-2} \left(\frac{B(\lambda_\text{eff},T_{*,i})}{B(\lambda_\text{eff},T_{*, \text{ref}})}\right)^{-1} \left(\frac{D}{D_\text{ref}}\right)^{-2}\]

\(f_i\) encompasses the different factors affecting spectroscopic signal strength in imaging and transit mode:

\[ \begin{align}\begin{aligned}f_i^\text{imaging} &= \left(\frac{\zeta_i}{\zeta_\oplus}\right)^{-1}\\f_i^\text{transit} &= \left(\frac{h_{i}}{h_\oplus}\right)^{-2} \left(\frac{R_{p,i}}{R_\oplus}\right)^{-2} \left(\frac{R_{*,i}}{R_{*, \text{ref}}}\right)^4\end{aligned}\end{align} \]

Importantly, this calculation is conducted for each Measurement with a different value of t_ref. Therefore, the same planet may have real values for one Measurement and ``nan`` for another. This is particularly relevant for the transit survey, where the total number of transiting planets for which e.g. planet size and orbital period can be measured is much larger than the number that can be spectroscopically characterized. To return just the subset of detected planets that were observed for a given Measurement, use the observed() method:

observed = data.observed('has_O2')

The determination of t_ref often relies on radiative transfer and instrument noise estimates that are generally not done in Bioverse. It can be accomplished by citing relevant studies in the literature or using third-party tools such as the Planetary Spectrum Generator. One method of calculating t_ref for the transit survey is demonstrated in Tutorial 3: Calculating exposure times.

Bioverse can calculate t_ref given two simulated spectra files - one with and one without the targeted absorption feature - both of which contain measurements for wavelength, flux, and flux uncertainty as the first three columns. You must also specify the simulated exposure time and the minimum and maximum wavelengths for the absorption feature. The compute_t_ref() function will then determine the exposure time required for a 5-sigma detection (in the same units as the input exposure time).

from bioverse.util import compute_t_ref

# Scales from simulated spectra for a combined 100 hr exposure time, targeting the O3 feature near 0.6 microns.
t_ref = compute_t_ref(filenames=('spectrum_O3.dat', 'spectrum_noO3.dat'), t_exp=100, wl_min=0.4, wl_max=0.8)
print("Required exposure time: {:.1f} hr".format(t_ref))

Output: Required exposure time: 73.9 hr

Finally, change the t_ref and wl_eff attributes of the associated Measurement object, using units of days and microns respectively:

survey = TransitSurvey('default')
survey.measurements['has_O2'].t_ref = 73.9/24
survey.measurements['has_O2'].wl_eff = 0.6

Target prioritization

For measurements where t_total is finite and t_ref is non-zero, targets must be prioritized in case there is insufficient time to characterize all of them. In Bioverse, target prioritization depends both on the target’s scientific interest (quantified by the weight parameter w_i) and the amount of time t_i required to properly characterize it. Each target’s priority is calculated as follows:

\(p_i = w_i/t_i\)

Bioverse will observe targets in order of decreasing p_i until t_total has been exhausted. The resulting dataset will fill in nan values for any targets that were not observed.

By default, w_i = 1 for all targets, but it can be raised or lowered for planets that meet certain criteria. For example, to assign w_i = 5 for targets with radii between 1-2 \(R_\oplus\):

m = survey.measurement['has_O2']
m.set_weight('R', weight=5, min=1, max=2)

To exclude a set of targets, set w_i = 0. For example, to restrict a measurement to exo-Earth candidates only:

m.set_weight('EEC', weight=0, value=False)

In transit mode, targets are weighted by \(a/R_*\) to correct the detection bias toward shorter period planets. To disable this feature:

m.debias = False