Electronic Health Record (EHR) phenotyping
Raw EHR require a substantial amount of preprocessing before they can be transformed into
research-ready datasets that can be statistically analyzed to answer clinically meaningful questions. Our lab develops computational algorithms for defining, validating and ascertaining multi-modal disease phenotypes in EHR data. Created phenotypes are stored in an open-access Data Portal
More information: Atrial fibrillation phenotyping exemplar
in PLOS ONE.
Unsupervised machine learning for sub-phenotype discovery
There is a growing body of evidence from observational and interventional research suggesting that complex diseases, such as type-II diabetes, asthma and chronic obstructive pulmonary disease (COPD), are composed of distinct sub-phenotypes with different risk factor and prognostic profiles. Our lab develops and evaluates data clustering algorithms to identify, describe and evaluate COPD
and heart failure
. Identifying these disease subtypes can lead to the development of personalized treatments.
Supervised machine learning for risk stratification
The majority of traditional risk prediction approaches rely on regression based statistical approaches and potentially fail to take into account the richness of electronic health record data. Our lab evaluates supervised machine learning approaches for creating accurate and interpretable risk prediction tools. Our recent research
focused on predicting death in coronary artery disease patients.
Data linkage is the process of identifying and linking individuals across heterogeneous data sources. Working with the Federal University of Bahia, our lab is contributing to the development of scalable probabilistic data linkage methods for linking
administrative over 140 million participants in Brazil and evaluating
the quality of the linkage using supervised machine learning.