Misstatements, misperceptions, and mistakes in controlling for covariates in observational research

  1. Xiaoxin Yu
  2. Roger S Zoh  Is a corresponding author
  3. David A Fluharty
  4. Luis M Mestre
  5. Danny Valdez
  6. Carmen D Tekwe
  7. Colby J Vorland
  8. Yasaman Jamshidi-Naeini
  9. Sy Han Chiou
  10. Stella T Lartey
  11. David B Allison  Is a corresponding author
  1. Department of Epidemiology and Biostatistics, Indiana University School of Public Health-Bloomington, United States
  2. Department of Applied Health Science, Indiana University School of Public Health-Bloomington, United States
  3. Department of Statistics and Data Science, Southern Methodist University, United States
  4. University of Memphis, School of Public Health, United Kingdom
7 figures and 5 tables

Figures

Agree (a) vs. disagree (b) with the interpretation of Misperception 5a.

Demonstrates a nonlinear and non-monotonic association between body mass index (BMI) and mortality among U.S. adults aged 18–85 years old. This figure suggests that BMI ranging between 23–26 kg/m2 formed the nadir of the curve with the best outcome while persons with BMI levels below or above the nadir of the curve experienced increased mortality on average. Source: (Fontaine et al., 2003).

Association between body mass index and hazard ratio for death among U.S. adults aged 18–85 years old.
Causal relationships of health outcome, dietary fat consumption, and the belief that consumption of dietary fat is not dangerous.

Direction of arrows represents causal directions and λA, λB, βA, and βB are structural coefficients.

Causal relationships of outcome, covariate, and potentially biasing covariate (PBC).

Direction of arrows represents causal directions and λz, αz, αx, βz, and βx are structural coefficients. The error terms e1 and e2 have variances chosen so Y1 and Y2 have variances 1 (see the Appendices for more details).

Appendix 1—figure 1
Causal relationships of health outcome, dietary fat consumption, and the belief that consumption of dietary fat is not dangerous.

Direction of arrows represents causal directions and 𝜆A, 𝜆B, 𝛽A, and 𝛽B are structural coefficients.

Appendix 2—figure 1
Causal relationships of outcome, covariate, and confounding.

Direction of arrows represents causal directions and 𝜆z, 𝛼z, 𝛼x, 𝛽z, and 𝛽x are structural coefficients.

Appendix 2—figure 2
Possible values of λz based on each choice of the pairs of a, b.

The area shaded in green denotes the area for which a λz value has a value τ that makes Equation 13 equal zero.

Tables

Appendix 1—table 1
Parameters used to generate simulated data for the simulation studies under Misperception 9.
ScenarioβAβBλAλBσϵ2ση2σγ2
I–0.40.33/23/20.930.250.25
II0.4–0.33/23/20.930.250.25
III–0.50.240.80.60.80760.360.64
IV0.5–0.240.80.60.80760.360.64
Appendix 1—table 2
Summary of bias when fitting the full model (𝑀𝐹) and the reduced model (MR).

The bias is defined as β^βA, where β^ is the least-squares estimate under the corresponding model.

Scenarion=500n=1000n=2000
MFMRMFMRMFMR
I–0.00070.22480.00010.2251–0.00010.2249
II0.0005–0.22490.0003–0.22490.0002–0.2248
III–0.00010.240.00030.2405–0.00030.2399
IV0.0004–0.2396–0.0002–0.24–0.0005–0.2402
Appendix 2—table 1
The correlation matrix among Z, X, Y2, and Y1 without selecting on Y1.
ΣZXY2Y1
Z1λzαz+αxλzβz+βxλz
Xλz1αx+αzλzβx+βzλz
Y2αz+αxλzαx+αzλz1(αz+αxλz)βz+βx(αx+αzλz)
Y1βz+βxλzβx+βzλz(αz+αxλz)βz+βx(αx+αzλz)1
Appendix 2—table 2
The squared correlation and slope of regression.
Quantity of interestWithout selection on Y1With selection on Y1
Squared (zero-order) correlation of X and Y2(λzαz+αx)2(σ~23σ~22σ~33)2
Squared (partial) correlation of X and Y2, controlling for Z(σ~23)2σ~22σ~33(σ~23σ~22σ~33)2
Slope of univariable regression of Y2 on X(λzαz+αx)σ~23σ~22
Partial slope of regression of Y2 on X, controlling for Zαx+αxλz21λz2(σ~33σ~22)(ρ~23ρ~13ρ~121ρ~122)
Appendix 2—table 3
Estimated Average Bias of αx Under Various Scenarios.

Where τ,βx,βz are selected to Induce a Zero Correlation Between X and Z After Selecting on Y1 . Results are based on sample size of n=50,000 and 1000 samples obtained from the data-generating model described above.

abλzmaxβxβzλzτAll dataSelect on Y1>τ
αx.zαxαx.zαx
0.20.20.04220.20000.19520.0200–0.5774–0.00000.0118–0.0002–0.0002
0.20.20.04220.19990.19460.03501.3809–0.00010.0208–0.0004–0.0008
0.20.90.55190.19310.83610.27000.4647–0.00010.1620–0.0002–0.0001
0.20.90.55190.18570.81750.40001.8276–0.00010.2398–0.0002–0.0003
0.90.20.22770.89360.06830.12000.67170.00010.07210.00050.0009
0.90.20.22770.88420.05980.19003.05840.00010.1140–0.0080–0.0118
0.90.90.51560.87100.23830.2600–0.1627–0.00020.1556–0.0001–0.0003
0.90.90.51560.82070.18180.45002.3590–0.00020.26990.0011–0.0005

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Xiaoxin Yu
  2. Roger S Zoh
  3. David A Fluharty
  4. Luis M Mestre
  5. Danny Valdez
  6. Carmen D Tekwe
  7. Colby J Vorland
  8. Yasaman Jamshidi-Naeini
  9. Sy Han Chiou
  10. Stella T Lartey
  11. David B Allison
(2024)
Misstatements, misperceptions, and mistakes in controlling for covariates in observational research
eLife 13:e82268.
https://doi.org/10.7554/eLife.82268