Methods/Statistics
A Four Step Iterative Approach in Estimating Prevalence in the Presence of Misclassification Bias in a Population Sample and Selection Bias in a Convenience Sample Christoffer Dharma* Christoffer Dharma Peter Smith Dionne Gesink Travis Salway Victoria Landsman Michael Escobar
Self-reported measures of membership in stigmatized groups or behaviours (e.g., sexual orientation or drug use) are prone to misclassification within population surveys (PS). Community-based convenience samples (CS) typically increase participation and identification from marginalized groups, but they are subject to participation and sampling biases due to non-probabilistic sampling. Using either one of these surveys to calculate prevalence of an outcome in a marginalized population can lead to a biased estimate. We propose a four-step iterative approach that uses information from both surveys that builds on existing methods to obtain a more informed prevalence estimate.
First, Adjusted Logistic Propensity (ALP) is used to generate pseudoweights in the CS data to make the covariate distribution more aligned with the PS data. Second, using Bayes formula, we calculate the probability of being misclassified among those who did not report to be in the group of interest in the PS based on the ALP-weighted CS data. Third, their group memberships are imputed using the probability of misclassification calculated in step 2. Finally, we can calculate the prevalence using either the PS or CS. With the PS, multiple imputation is run m times and prevalence estimates are pooled. With the CS, during each iteration, we use the new combined sets of individuals with characteristics of interest to apply ALP back into the CS, which are then pooled into a single estimate.
We provide an example of calculating the prevalence of any existing mental health diagnosis among sexual minority men in Canada. The prevalence estimates from the two surveys were closer together after the four-step process was applied; the unadjusted prevalence from the PS was 19.97%, while prevalence from CS was 25.72%. After adjustment, the PS prevalence was 18.69% (95% CI: 14.73, 25.02), while for the CS, it was 21.59% (95% CI: 19.22, 23.92). Advantages and disadvantages of the proposed method will be discussed.