Denaxas Lab

We are a multidisciplinary research lab of data scientists, epidemiologists, software engineers, and clinicians working at the intersection of medicine and computer science. Our research uses real world data (electronic health records, administrative data, disease audits, and bespoke disease surveillance sources) and data-driven methods to improve human health and healthcare.

Research

CALIBER

illustrations

Our lab runs CALIBER, a research platform that provides reproducible phenotyping algorithms for electronic health records. We use data from primary care (Clinical Practice Research Datalink), hospital admissions (Hospital Episode Statistics), socioeconomic deprivation information (using the Index of Multiple Deprivation) and cause-specific mortality data (Office for National Statistics) in England for ~15 million individuals. CALIBER enables researchers to recreate the longitudinal pathway of patients through healthcare settings and study disease onset and progression.

Recent research examples:

UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER. Journal of the American Medical Informatics Association, 2019.

Cite HDR UK Phenotype Portal

Data resource profile: cardiovascular disease research using linked bespoke studies and electronic health records (CALIBER). International journal of epidemiology, 2012.

Cite




Open science

illustrations

Our lab is committed to producing open and reproducible science. We lead the HDR CALIBER Phenotype Library, a comprehensive, open-access resource providing the research community with information, tools and phenotyping algorithms for UK electronic health records data. The Phenotype Library curates approx 30000 controlled clinical terminology terms across 350 rule-based phenotyping algorithms using structured UK EHR data sources. Phenotypes have been extensively validated by generating six layers of evidence: aetiological, prognostic, case-note review, genetic, cross-EHR and cross-country replication.

Recent research examples:

A chronological map of 308 physical and mental health conditions from 4 million individuals in the English National Health Service. The Lancet Digital Health, 2019.

Cite Github

A semi-supervised approach for rapidly creating clinical biomarker phenotypes in the UK Biobank using different primary care EHR and clinical terminology systems.. medRxiv, 2020.

Cite Github




Electronic Health Record (EHR) phenotyping

sample of phenotyping

Raw EHR require a substantial amount of preprocessing before they can be transformed into research-ready datasets that can be statistically analyzed to answer clinically meaningful questions. Our lab develops computational algorithms for defining, validating and ascertaining multi-modal disease phenotypes in EHR data. Created phenotypes are stored in an open-access Data Portal.

Recent research examples:

Defining disease phenotypes using national linked electronic health records: a case study of atrial fibrillation. PloS one, 2014.

Cite

Bleeding in cardiac patients prescribed antithrombotic drugs: electronic health record phenotyping algorithms, incidence, trends and prognosis. BMC medicine, 2019.

Cite




Unsupervised machine learning for sub-phenotype discovery

sample of phenotyping

There is a growing body of evidence from observational and interventional research suggesting that complex diseases, such as type-II diabetes, asthma and chronic obstructive pulmonary disease (COPD), are composed of distinct sub-phenotypes with different risk factor and prognostic profiles. Our lab develops and evaluates data clustering algorithms to identify, describe and evaluate disease subtypes that can lead to the development of personalized treatments.

Recent research examples:

Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records. BMC Medical Informatics and Decision Making, 2019.

Cite




Supervised machine learning for risk stratification

illustrations

The majority of traditional risk prediction approaches rely on regression based statistical approaches and potentially fail to take into account the richness of electronic health record data. Our lab evaluates supervised machine learning approaches for creating accurate and interpretable risk prediction tools.

Recent research examples:

Application of Clinical Concept Embeddings for Heart Failure Prediction in UK EHR data. NIPS ML4H: Machine Learning for Health arXiv preprint arXiv:1811.11005, 2018.

Cite

Prognostic models for people with stable coronary artery disease based on 115,500 patients from the CALIBER study. EUROPEAN HEART JOURNAL, 2012.

Cite




Data Linkage

illustrations of data linking with scale

Data linkage is the process of identifying and linking individuals across heterogeneous data sources. Working with the Federal University of Bahia, our lab is contributing to the development of scalable probabilistic data linkage methods for linking administrative over 140 million participants in Brazil and evaluating the quality of the linkage using supervised machine learning.




project logo

Antibiotic stewardship

Preserving Antibiotics through Safe Stewardship (PASS) is an innovative programme of research to preserve the effectiveness of antibiotics for years to come by developing antibiotic stewardship interventions tailored to: general practice, hospitals, care homes, and the community.




COVID-19 research initiatives:

Responding to the current public health emergency caused by the SARS-CoV-2 virus and COVID-19 pandemic, our Lab has been actively engaged with several national research initiatives:

  • CONVALESCENCE funded by the NIHR and UKRI, this study will analyze data from EHR and longitudinal studies to define and understand long COVID and the impact it has on health and social outcomes.

  • COVID-RED is part of the EU Innovative Medicines Initivative and will evaluate the use and performance of a CE-marked device (wearable), which uses sensors to measure breathing rate, pulse rate, skin temperature, and heart rate variability for the purpose of early detection and monitoring of COVID-19 in general and high-risk populations (read trial protocol for more information)

  • BHF CVD-COVID-UK aims to understand the relationship between COVID-19 and cardiovascular diseases such as heart attack, heart failure, stroke, and blood clots in the lungs through analyses of de-identified, linked, nationally collated healthcare datasets across the four nations of the UK.

  • DECOVID uses detailed and frequently updated health data from hospitals as the COVID-19 pandemic unfolds, to allow clinicians and researchers to generate rapid and robust insights that can lead to more effective clinical treatment strategies, helping patients, healthcare professionals and society.

  • BHF COVIDITY-COHORT complements CVD-COVID-UK and will harness the power of over a dozen large UK cohort studies, where participants have already provided a wealth of information about their cardiovascular health, and general health and wellbeing.

Engagement with policy-makers

Research output

  • Thygesen J. et al. Understanding COVID-19 trajectories from a nationwide linked electronic health record cohort of 56 million people: phenotypes, severity, waves & vaccination. The Lancet Digital Health (accepted), preprint

  • Wood A. et al. Linked electronic health records for research on a nationwide cohort of more than 54 million people in England: data resource. BMJ. 10.1136/bmj.n826

  • Banerjee A. et al. Clinical academic research in the time of Corona: a simulation study in England and a call for action. PLOS ONE 10.1371/journal.pone.0237298

  • Dennis J. M. et al. Type 2 Diabetes and COVID-19-Related Mortality in the Critical Care Setting: A National Cohort Study in England, March-July 2020. Diabetes Care 10.2337/dc20-1444

  • Katsoulis M. et al. Estimating the effect of reduced attendance at emergency departments for suspected cardiac conditions on cardiac mortality during the COVID-19 pandemic Circulation: Cardiovascular Quality and Outcomes. 10.1161.CIRCOUTCOMES.120.007085

  • CVD-COVID-UK Consortium. The 4C Initiative (Clinical Care for Cardiovascular disease in the COVID-19 pandemic): monitoring the indirect impact of the coronavirus pandemic on services for cardiovascular diseases in the UK . 10.1101/2020.07.10.20151118

  • Banerjee A. et al. Excess deaths in people with cardiovascular diseases during the COVID-19 pandemic. European Journal of Preventive Cardiology. 10.1093/eurjpc/zwaa155

  • Mateen B. et al. A geotemporal survey of hospital bed saturation across England during the first wave of the COVID-19 Pandemic BMJ Open. 10.1136/bmjopen-2020-042945

  • Katsoulis M. et al. Obesity during the COVID-19 pandemic: cause of high risk or an effect of lockdown? A population-based electronic health record analysis in 1 958 184 individuals. Public Health. 10.1016/j.puhe.2020.12.003

Funding:

  • Awarded funding from the EHDEN COVID-19 Rapid Collaboration Call to transform the UK Biobank to the OHDSI OMOP common data model to facilitate COVID-19 research.
  • Funded by the Innovative Medicines Initiative to evaluate wearables for remote diagnosis and monitoring of COVID-19 through the COVID-RED programme.
  • Part of large cross-institutional project, the CONVALESCENCE study, funded by the NIHR and UKRI on defining and understanding Long COVID.

Tools

Online calculator for excess deaths due to COVID-19

Using large-scale electronic health records in England, we have developed a simple online tool (OurRisk.Cov) that can calculate and visualize excess deaths over one year from the COVID-19 pandemic based on age, sex, and underlying disease-specific estimates.

Estimating excess 1-year mortality associated with the COVID-19 pandemic according to underlying conditions and age: a population-based cohort study. The Lancet, 2020.

Cite HDR UK Phenotype Portal OurRisk.Cov excess mortality risk calculator




People

Academic members

Avatar

Cai Ytsma

Health Data Scientist

Avatar

Dr Ana Torralbo

Senior Research Fellow in Health Data Science

Avatar

Dr Arturo Gonzalez-Izquierdo

Senior Research Associate in Electronic Health Records

Avatar

Dr Václav Papež

Clinical Data Scientist

Avatar

Muhammad Qummer Ul Arfeen

Data Manager

Avatar

Natalie Fitzpatrick

Research Data Coordinator

Avatar

Natalie Zelenka

Senior Research Fellow in Health Data Science

Avatar

Spiros Denaxas

Professor of Biomedical Informatics

Visiting researchers

Avatar

Colin Josephson

Assistant Professor of Neurology and Community Health Sciences (Univ. of Calgary)

Avatar

Dr Ghazaleh Fatemifar

Honorary Senior Research Fellow

Avatar

Dr Maria Pikoula

Clinical Data Scientist

Avatar

Dr Marina Daskalopoulou

Research Associate

Avatar

Dr Michalis Katsoulis

Associate Professor in Biomedical Statistics

Avatar

Prof Marcos Barreto

Assistant Professorial Lecturer

Students

Avatar

Albert Henry

PhD Candidate

Avatar

Andre Vauvelle

PhD Candidate

Avatar

Emma Whitfield

PhD Candidate

Contact