Join Dr. Ana Diez-Roux, for SERforum Live!
May 1, 12pm EST


Do you ever find yourself struggling to figure out a question about epidemiologic methods, or other topics in epidemiology, and don’t know who to ask? The SERforum allows for individuals to answer questions that come up in our daily work around substantive and methodological topics in epidemiology.
All topics may be viewed, but to read and post comments, SER membership is required. If you are member, login! Not a member, join us!

You need to log in to create posts and topics.

Dimension reduction methods that retain interpretability


I'm working on a matched case-control study conducted in Kenya as a GEMS sub-study (all data are already conducted, I'm just analyzing). Animal contact is the exposure of interest, and moderate-to-severe diarrheal illness is the outcome of interest. All participants are under 5 years old, and matched on age, sex, and location.

There are 73 matched pairs. Exposure was ascertained via an 800 item questionnaire. Most items are binary or continuous, however several are ordinal categorical, and a few are nominal categorical. All questionnaire items pertain to animal contact, repeated across many species.

My initial thought is to tackle dimension reduction via an a priori approach (identifying which aspects of animal contact are most important based on prior literature, and collapsing variables where it makes sense to do so), however it will be difficult to do this systematically with such a large number of variables. I'm also thinking of doing (as a secondary analysis) a latent class analysis, however I am interested in two latent variables: "farm" type, and degree of animal contact. Any comments on these ideas or other methods for dimension reduction that don't (unduly) compromise interpretability would be very much appreciated.

Many thanks,



The latent variable approach would be an interesting adjunct to your a prior plan for data reduction. However we should keep in mind some of the assumptions about latent variable approaches. Is your hypothesis that there is an underlying continuous dimension of contact with animals, and that these variables are all indicators of the individual’s position on this latent dimension?  One way to think about it is through an analogy. For example, we might be interested in measuring “depression” as a construct. We believe that there is an underlying phenotype of “depression”, but to measure it, we are going to ask a lot of questions that get at the phenotype - low mood, irritability, restlessness, appetite changes, etc. Together, the correlations among these indicators provides us some information on the person’s position on the latent dimension itself.  The question is whether you think these indicators of animal contact provide a similar measurement tool for an underlying construct called “animal contact”.  A similar logic should be applied for farm type.

If the underlying theory that guides the choice of latent variable analysis indicates that it is sensible, then I would proceed with exploratory factory analysis to see whether the data support that the indicators do indeed correlate in the ways that you anticipate - two dimensions, with one indicative of animal contact and one indicative of farm type. Through the exploratory factor analysis you will likely remove items that do not load well on the hypothesized factors, or those that do not load uniquely on the hypothesized factors. In the end you should be able to generate factors that have unique and high loadings on the hypothesized factors.  Finally I would then encourage confirmatory factor analysis to assess model fit.

The fact that these items were all measured differently is going to introduce some measurement invariance, I would imagine. I’m not sure what the latest research is on assessing items together that are all measured differently, but it should definitely be investigated before proceeding, to increase interpretability.

I hope that helps, and good luck!


Kerry Keyes

Thank you very much for your reply, Dr. Keyes.

I think it's reasonable to think about animal contact as one continuous dimension, however it would be more appropriate to conceptualize "farm type" as a categorical variable. Are latent class methods (as opposed to latent variable analysis) appropriate in for latent categorical variables?

I have just one more question for you-- by "measurement invariance", are you referring to the fact that the individual items were measured on different scales and may need to be standardized before performing factor analysis? Or to the fact that the scale of the animal contact trait may vary between "farm type" groups (the measurement invariance referred to here: I don't have plans to look at any interactions between these two latent constructs.

Thanks again-- this has been really helpful!