Describing Community-level Healthcare Gaps from Incomplete Claims Data by Emulating the Ideal Observational Study

Presenting Author

Taylor McCready

New York University Grossman School of Medicine

Submitting Author

Taylor McCready

Additional Authors

Audrey Renson, Ethan Cohen

Abstract

Background

Policymakers and interventionists often seek to characterize healthcare gaps and disparities within geographically defined communities. However, healthcare data sources in the US are fragmented, limiting the completeness of population-based assessments. We address this by emulating an ideal observational study and treating real-world sources as incomplete or mismeasured realizations of that ideal, using colorectal cancer (CRC) screening adherence in Brooklyn, NY as an example.

Methods

We aimed to estimate the proportion of adults aged 45-75 on January 1, 2022, who ever had an open CRC screening gap during 2022 using claims dataset for a value-based care network at NYU. Roughly, the ideal cohort is a random sample of Brooklyn residents with linked all-payer claims. Relative to this, our data contains right censoring (due to switching insurance) and missing data (due to non-representativeness). We handled right-censoring with inverse probability-of-censoring weights (IPCW) including age group, sex, language group, payer type, and tract income quintile. We handled non-representativeness with inverse-probability of selection weights (IPSW) using the 2022 American Community Survey.

Results

The analysis included 15,034 adults. The claims dataset clearly overrepresented Medicaid (OR 1.98) and Medicare (OR 1.30) vs. commercial insurance, and underrepresented Spanish preferred language (OR 0.78). Using IPCW and IPSW, the estimated probability of ever having an open CRC screening care gap during 2022 was 45.9%, vs. 41.1% unweighted, an increase corresponding to 480 additional people per 10,000.

Conclusion

Our approach guides descriptive, population-based healthcare utilization research with fragmented data sources and may generalize to other administrative data sources. By treating real-world datasets as incomplete or mismeasured versions of an ideal community cohort, it links analytic decisions to explicit assumptions making estimates easier to interpret and critique.

Abstract Search

Abstract