OHDSI — Observational Health Data Sciences and Informatics
Overview
OHDSI (pronounced “Odyssey”) is an international, open-science collaborative established in 2014 that generates evidence from observational health data through transparent, reproducible, and multi-database analytics. It maintains the OMOP CDM standard, develops the open-source tools that run on it, and coordinates a global network of 300+ data sources covering nearly 1 billion patient records across 30+ countries as of 2024, enabling large-scale pharmacoepidemiological and outcomes research. Where OMOP CDM is the data standard, OHDSI is the community and network that makes it operationally useful.
What OHDSI Produces
Open-Source Tools
OHDSI develops and maintains the HADES (Health Analytics Data-to-Evidence Suite), a collection of R packages for observational research on OMOP CDM data:
| Tool | Purpose |
|---|---|
| ATLAS | Web-based cohort definition, characterisation, incidence rates, population-level estimation; no-code interface |
| ACHILLES | Automated data quality and characterisation; generates 170+ statistics on CDM contents |
| CohortDiagnostics | Diagnose phenotype algorithms across databases |
| FeatureExtraction | Extract covariates for patient-level prediction and estimation |
| PatientLevelPrediction | Train and validate ML models for clinical outcomes |
| CohortMethod | Population-level effect estimation (comparative cohort studies) |
| SelfControlledCaseSeries | Self-controlled case series analysis |
| Cyclops | Large-scale regression engine (L1/L2 regularised) |
| DataQualityDashboard | Systematic data quality assessment against the Kahn framework |
| White Rabbit | Source data profiling for ETL design |
| Rabbit in a Hat | Visual ETL mapping tool |
Network Studies
OHDSI coordinates multi-database network studies — analytical studies that run identical code across all participating databases, generating site-level results that are meta-analysed centrally. The OHDSI COVID-19 Studies (2020) characterised 34,000+ hospitalised patients across 34 databases in 13 countries within 3 weeks of the pandemic. LEGEND-T2DM compared the safety of second-line type 2 diabetes treatments across 11 databases and 1.5 million patients. OHDSI also runs neurological disease network studies covering dementia, epilepsy, and Parkinson’s disease incidence and treatment patterns.
Phenotype Library
OHDSI maintains an open Phenotype Library of validated ATLAS cohort definitions for hundreds of clinical conditions, enabling researchers to reuse rigorously validated phenotype algorithms for disease cohorts.
Connections
- Maintains: OMOP CDM
- Vocabularies (via Athena): SNOMED CT, LOINC, ICD-10, MedDRA, RxNorm, CCAM, HPO, MONDO
Resources
- https://ohdsi.org
- https://ohdsi.github.io/Hades/ (HADES R packages documentation)
- https://atlas-demo.ohdsi.org (ATLAS public demo instance)
- https://athena.ohdsi.org (Athena — OMOP vocabulary browser)
- https://www.ohdsi.org/data-standardization/ (CDM and tools overview)

