Machine Learning
Ashley Naimi
Epidemiologists and public health practitioners are increasingly being asked to interpret, use, or judge the merits of machine learning and artificial intelligence techniques. Yet current epidemiology programs are lacking in their coverage of these methods. (Asterisks indicated “must reads”!)
-
*Seminal work by Brieman on the distinction between a more classical statistical modeling approach and more recent “algorithmic” modeling approach. Leo Breiman. “Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author).” Statist. Sci. 16 (3) 199 – 231, August 2001.
-
Excellent introduction to concepts and issues in using machine learning for epidemiologists. Bi Q, Goodman KE, Kaminsky J, Lessler J. What is Machine Learning? A Primer for the Epidemiologist. Am J Epidemiol. 2019;188(12):2222-2239.
https://academic.oup.com/aje/article/188/12/2222/5567515?login=true
-
Attempt to demonstrate the fundamentals behind the super learner. Naimi AI, Balzer LB. Stacked generalization: an introduction to super learning. Eur J Epidemiol. 2018;33(5):459-464.
-
Detailed resource on using the super learner in real data settings. Kennedy, C. “Guide to SuperLearner.” (2017).
https://cran.r-project.org/web/packages/SuperLearner/vignettes/Guide-to-SuperLearner.html
-
Important example of some fundamental constraints on using data with algorithms to predict outcomes fairly. Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2), 153-163.
-
Excellent introduction to machine learning (emphasis on econometrics but very useful for epidemiologists). Mullainathan, S. and J. Spiess, Machine learning: an applied econometric approach. Journal of Economic Perspectives, 2017. 31(2): p. 87-106
-
Important example of how ML algorithms can yield very misleading predictions when deeper aspects of the data-modeling complex are not taken into account. Caruana, R., et al. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day .Readmission in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015. ACM.
Books
-
*Burkov, Andriy. The hundred-page machine learning book. Vol. 1. Canada: Andriy Burkov, 2019.
-
Burkov, Andriy. Machine Learning Engineering. Vol 1. Quebec: True Positives Inc, 2021.
https://www.amazon.com/Machine-Learning-Engineering-Andriy-Burkov/dp/1999579577 -
Kuhn, Max, and Kjell Johnson. “Applied predictive modeling. Corrected 5th printing, vol. 600.” (2016).
Conceptual/Theoretical Understanding & Social Issues
-
*Mitchell, Melanie. Artificial intelligence: A guide for thinking humans. Penguin UK, 2019.
https://en.wikipedia.org/wiki/Artificial_Intelligence:_A_Guide_for_Thinking_Humans -
Broussard, Meredith. Artificial unintelligence: How computers misunderstand the world. mit Press, 2018.
Advanced Texts
-
Wasserman, Larry. All of nonparametric statistics. Springer Science & Business Media, 2006.
-
Shalev-Shwartz, Shai, and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
https://www.cs.huji.ac.il/w~shais/UnderstandingMachineLearning/ -
Hastie, Trevor J. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. Cambridge University Press, 2017.
https://hastie.su.domains/CASI/ -
James, Gareth, et al. An introduction to statistical learning. Vol. 112. New York: springer, 2013.
-
Hastie, Trevor J. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. Cambridge University Press, 2017.