In a recent study posted to the
medRxiv* preprint server, researchers utilized pre-pandemic information from two UK-based observational population studies – the Avon Longitudinal Study of Parents and Children (ALSPAC) and UK Biobank (UKB) – to investigate predictors of selection into analytic subsamples in observational studies on severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection and on coronavirus disease 2019 (COVID-19) severity.
Study: Exploring selection bias in COVID-19 research: Simulations and prospective analyses of two UK cohort studies. Image Credit: Blue Planet Studio/Shutterstock
They also explored potential bias from these selection mechanisms and the use of different comparison groups when estimating the association of factors influencing the risk of SARS-CoV-2 infection and the severity of COVID-19 disease, using body mass index (BMI) as an illustrative example in empirical analyses and simulations.
Selection bias may occur in observational studies of SARS-CoV-2 infection and COVID-19 severity with non-random selection into analytic subsamples. Also, the misclassification of SARS-CoV-2 infection status may be a potential source of bias in these studies. The present study used the data from self-reported questionnaires and national registries to explore the potential presence and impact of selection in such studies.
About the study
The multigenerational ALSPAC birth cohort included 14,541 pregnant women (Generation-0 [G0] mothers) who gave birth to 14,062 children (Generation-1 [G1]). Both mothers and children were regularly assessed through questionnaires, anthropometric and physical measurements. The questionnaires were used to collect self-reported information relevant for studies on the COVID-19 pandemic and its consequences involving general health, seasonal symptoms, recent travel, the impact of the pandemic on behaviors, mental health, wellbeing, healthcare/key worker status, and living arrangements during the pandemic.
In the second observational study, UKB recruited 503,317 adults, and the data was collected via touch-screen questionnaires, face-to-face interviews, physical measurements, and biological samples. Few participants were followed up with further assessments like questionnaires, imaging studies, and serology tests for SARS-CoV-2.
The SARS-CoV-2 subsample in UKB referred to all participants with a PCR test (either positive or negative) for SARS-CoV-2 infection and/or COVID-19 mentioned on their death certificate. Comparing different sets of comparison groups, the researchers explored the impact of selection and misclassification bias on the estimated effect of BMI on SARS-CoV-2 infection and death-with-COVID-19 through simulation studies.
In agreement with our findings, other studies have reported that higher BMI is associated 609 with higher odds SARS-CoV-2 infection and COVID-19 disease severity.”
The study data found an association of various sociodemographic, behavioral, and health-related variables with being selected into the COVID-19 analytical subsamples in ALSPAC and UKB. However, some factors predicted selection in different directions and/or magnitudes between the two studies, which may be attributed to contrasting data collection mechanisms, characteristics of the target population, or pre-pandemic selection pressures.
Further, the empirical estimates of the potential impact of selection on the association between BMI and COVID-19 outcomes were imprecise in ALSPAC. However, the study data suggested an association of higher BMI with higher odds of SARS-CoV-2 infection and death-with-COVID-19 in UKB. A higher association of BMI on SARS-CoV-2 infection was estimated in UKB, using the SARS-CoV-2 (+) versus everyone else definition compared to SARS-CoV-2 (+) versus SARS-CoV-2 (-).
In the study ‘plausible’ simulation scenario, the researchers found a smaller negative bias in the cases versus everyone. Furthermore, bias was also induced in the estimated effect of BMI on death with COVID-19 due to the involvement of all participants who died with COVID-19.
Limitations and conclusion
The analysis involved several assumptions about the data. ALSPAC and UKB did not account for pre-pandemic selection bias due to non-random recruitment into these studies and failure to follow-up. Further, the study considered the misclassification of the comparison groups (e.g., infected as non-infected) but not of the case groups (e.g., non-infected as infected), which may be difficult for self-reported COVID-19 data and cause of COVID-19 death early in the pandemic. In addition, the study mainly focused on the first wave of COVID-19 pandemic in the UK; however, the selection bias may change over time as the pandemic progresses.
In conclusion, non-random selection may cause bias in the analyses involving COVID-19 self-reported or national registry data. The magnitude and direction of this bias depend on the outcome definition, the true effect of the risk factor, and the assumed selection mechanism. The knowledge of risk and prognostic factors for COVID-19 will help to identify interventions to reduce the risk of SARS-CoV-2 infection and COVID-19 severity.
medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.