Methods/Statistics
Enabling the Analysis of Patient-Level Data Across Jurisdictional Boundaries for Research Use: An Exploration of the Utility of the Likelihood Function Megan Harmon* Megan Harmon Na Li Tolulope Sajobi Jessalyn Holodinsky Tyler Williamson
Background: The secure and ethical handling of personal health information is a growing global concern, especially when sharing patient-level data across jurisdictions for research use. This study evaluates a novel methodology using the likelihood function to enable health data analysis without direct data sharing. The likelihood function carries all the information about the data for a specific model. By likelihood theory, log-likelihood functions from separate models can be combined to create an overall log-likelihood function, identical to that from a fully aggregated dataset.
Methods: We conducted a simulation study to evaluate the likelihood method for inter-jurisdictional health data analysis. A dataset of 200 observations was generated with a binary outcome (prevalence: 51%) and binary exposure (49% exposed), stratified across two jurisdictions. In Jurisdiction 1 (n=100), the outcome prevalence was 54% with 49% exposed; in Jurisdiction 2 (n=100), the outcome prevalence was 47% with 49% exposed. Logistic regression models were fit separately for each jurisdiction and for the combined dataset. Log-likelihoods from the jurisdictional models were aggregated to estimate regression coefficients for the overall population, which were compared to those from the fully pooled model.
Results: High heterogeneity was observed between jurisdictions (Q=50.10; I2=98%). Jurisdiction 1 showed a regression coefficient of 2.02 (SE: 0.46; log-likelihood: -57.82), and Jurisdiction 2 showed -2.81 (SE: 0.51; log-likelihood: -49.63). Aggregating log-likelihood functions yielded a regression coefficient of -0.28 (log-likelihood: -138.13), identically matching the fully pooled model.
Conclusion: Our findings demonstrate the potential of the likelihood-based method as a reliable and effective framework for cross-jurisdictional effect size estimation without sharing data. We offer a scalable solution for advancing health research globally while safeguarding patient privacy.