Skip to content

App Abstracts

Methods/Statistics

Deriving residential histories for population health research: An example from the Sister Study Patrick Ringwald* Patrick Ringwald Diane Ng David Stinchcomb Deborah Bookwalter Aimee D’Aloisio Jennifer Ish Dale Sandler Alexandra White

Background: Epidemiologic studies of cancer etiology are increasingly focused on the impact of environmental exposures over the life course. Obtaining complete and accurate residential histories for study participants, however, is often costly or impractical. Purpose: To apply a method for deriving residential histories in the Sister Study cohort to assess residential mobility from 1980 through enrollment (2003-2009). Methods: We submitted a file with participant names and identifiers (e.g., DOB, baseline address) for the full cohort of 50,884 women (ages 35-74 years) to a commercial data vendor (LexisNexis) for address linkage. The returned addresses were combined with existing self-reported addresses and geocoded to build a comprehensive residential history. Addresses outside of the 1980-2009 period were excluded; further, we linked the USPS Residential Delivery Indicator to identify and exclude businesses. The cleaned dataset was then processed with free, open-source SAS programs developed by Westat which use an algorithm to match and combine common addresses and to resolve overlaps/gaps between dates. Modifications were made to prioritize participant’s self-reported addresses. Results: LexisNexis returned at least one address for 93.5% of the cohort. Those older than 64 at enrollment were less likely to be matched compared to the younger women. On average, we received 8 addresses per person for the years of interest. We excluded 15,414 businesses (4% of addresses returned). Based on the available data, we derived complete residential histories covering 1980 through enrollment for 40% of participants with improved results over time (74% starting in 1985; 96% starting in 1990). Conclusion: We have demonstrated a practical approach to assessing residential mobility in a large, national prospective study by sourcing commercial address data, using an open-source algorithm, and integrating with self-reported residence data to generate plausible residential histories.