Methods/Statistics
P-Hacking and Publication Bias in Epidemiology Research: Trends in Derived P-Values from the American Journal of Epidemiology from 2000-2022 Sarah Ackley* Sarah Ackley Ruijia Chen Jingxuan Wang Grisel Lopes Peter Buto Kendra D. Sims Isabel Elaine Allen Maria Glymour
Concerns about null-hypothesis testing and reporting of p-values have grown in the last two decades. However, it is unknown whether the movement to reduce reliance on p-values has resulted in changes in p-hacking or publication bias.
We obtained AJE abstracts from 2000-2022 with at least one CI. Two-sided p-values were calculated from scraped estimates and CIs. We contextualized findings using theoretical and simulated p-value distributions assuming all studies are equally powered and assuming a range of statistical power across studies. We evaluated linear trends and determined whether the empirical distribution of p-values changed over this time period. We also evaluated changes in the ratio of p-values just below and above 0.05. All reported odds ratios correspond to a one year difference.
A total of 3743 p-values were extracted from 1517 abstracts. From fits to theoretical p-value distributions, we find there is excess density between 0.01 and 0.05 and near 1, with a dearth of p-values <0.01 (see figure); this is the case for both early (2000-11) and later time periods (2012-22). Smaller p-values increased over time (p<0.05: OR=1.02, 95% CI: 1.01 to 1.03; p<0.01: OR=1.02, 95% CI: 1.01 to 1.03). Selecting only the smallest p-value for each abstract yielded similar results: (p<0.05: OR=1.02, 95% CI: 1.0 to 1.04; p<0.01: OR=1.01, 95% CI: 1.00 to 1.03). Statistically significant changes in the proportion of p-values just below versus just above 0.05 were not detected for the full sample (OR=1.01, 95% CI: 0.99 to 1.03) or just selecting the smallest p-value per abstract (OR=1.01, 95% CI: 0.98 to 1.04), although the point estimates indicate this ratio is increasing.
We find evidence of p-hacking but not publication bias in AJE. Decreasing p-values could reflect larger sample sizes, better-motivated hypotheses, or increased publication bias or p-hacking. We find no evidence that de-emphasizing p-values has reduced publication bias or p-hacking.