We are a multidisciplinary research lab of data scientists, epidemiologists, software engineers, and clinicians working at the intersection of medicine and computer science. Our research uses real world data (electronic health records, administrative data, disease audits, and bespoke disease surveillance sources) and data-driven methods to improve human health and healthcare.
Our lab runs CALIBER, a research platform that provides reproducible phenotyping algorithms for electronic health records. We use data from primary care (Clinical Practice Research Datalink), hospital admissions (Hospital Episode Statistics), socioeconomic deprivation information (using the Index of Multiple Deprivation) and cause-specific mortality data (Office for National Statistics) in England for ~15 million individuals. CALIBER enables researchers to recreate the longitudinal pathway of patients through healthcare settings and study disease onset and progression.
Recent research examples:
Our lab is committed to producing open and reproducible science. We lead the HDR CALIBER Phenotype Library, a comprehensive, open-access resource providing the research community with information, tools and phenotyping algorithms for UK electronic health records data. The Phenotype Library curates approx 30000 controlled clinical terminology terms across 350 rule-based phenotyping algorithms using structured UK EHR data sources. Phenotypes have been extensively validated by generating six layers of evidence: aetiological, prognostic, case-note review, genetic, cross-EHR and cross-country replication.
Recent research examples:
Raw EHR require a substantial amount of preprocessing before they can be transformed into research-ready datasets that can be statistically analyzed to answer clinically meaningful questions. Our lab develops computational algorithms for defining, validating and ascertaining multi-modal disease phenotypes in EHR data. Created phenotypes are stored in an open-access Data Portal.
Recent research examples:
There is a growing body of evidence from observational and interventional research suggesting that complex diseases, such as type-II diabetes, asthma and chronic obstructive pulmonary disease (COPD), are composed of distinct sub-phenotypes with different risk factor and prognostic profiles. Our lab develops and evaluates data clustering algorithms to identify, describe and evaluate disease subtypes that can lead to the development of personalized treatments.
Recent research examples:
The majority of traditional risk prediction approaches rely on regression based statistical approaches and potentially fail to take into account the richness of electronic health record data. Our lab evaluates supervised machine learning approaches for creating accurate and interpretable risk prediction tools.
Recent research examples:
Data linkage is the process of identifying and linking individuals across heterogeneous data sources. Working with the Federal University of Bahia, our lab is contributing to the development of scalable probabilistic data linkage methods for linking administrative over 140 million participants in Brazil and evaluating the quality of the linkage using supervised machine learning.
Preserving Antibiotics through Safe Stewardship (PASS) is an innovative programme of research to preserve the effectiveness of antibiotics for years to come by developing antibiotic stewardship interventions tailored to: general practice, hospitals, care homes, and the community.
Responding to the current public health emergency caused by the SARS-CoV-2 virus and COVID-19 pandemic, our Lab has been actively engaged with several national research initiatives:
Gurdasani D. et al. Vaccinating adolescents in England: a risk-benefit analysis. Journal of the Royal Society of Medicine 10.1177/01410768211052589
Media coverage: Daily Mail, The Guardian, BMJ, Royal Society of Medicine, The Times, Financial Times, Daily Mail
Wilde A. H. et al. The association between mechanical ventilator compatible bed occupancy and mortality risk in intensive care patients with COVID-19: a national retrospective cohort study. BMC Medicine 10.1186/s12916-021-02096-0
Media coverage: Vox, Daily Mail, Independent, Evening Express, BBC News, Yahoo! News.
Eyre M. et al. Impact of baseline cases of cough and fever on UK COVID-19 diagnostic testing rates: estimates from the Bug Watch community cohort study. Wellcome Open Research 10.12688/wellcomeopenres.16304.1
Media coverage: BBC News, The Guardian. Research cited in the Academy of Medical Sciences Preparing for a challenging winter 2020-2021 report.
Banerjee A. et al. Estimating excess 1-year mortality associated with the COVID-19 pandemic according to underlying conditions and age: a population-based cohort study. The Lancet 10.1016/S0140-6736(20)30854-0
Interactive online calculator: OurRisk.Cov
Media coverage: FT, Times, Guardian, BBC, Protagon (GR) and elsewhere.
Lai A. et al. Estimated impact of the COVID-19 pandemic on cancer services and excess 1-year mortality in people with cancer and multimorbidity: near real-time data on cancer care, cancer deaths and a population-based cohort study. BMJ Open 10.1136/bmjopen-2020-043828
Media coverage: The Guardian, Daily Mail, Evening Standard, BMJ.
Using large-scale electronic health records in England, we have developed a simple online tool (OurRisk.Cov) that can calculate and visualize excess deaths over one year from the COVID-19 pandemic based on age, sex, and underlying disease-specific estimates.
Cite HDR UK Phenotype Portal OurRisk.Cov excess mortality risk calculator