Standards, Data Models and Ontologies

This index covers all data format standards, metadata frameworks, terminologies, and ontologies in the graph.

Research data management

Metadata vocabularies, provenance standards, and persistent identifier schemes that enable FAIR data management across all research domains.

  • DCAT (Data Catalog Vocabulary) is the W3C standard for dataset discoverability.
  • Dublin Core is a 15-element metadata standard used as a base metadata layer across repositories.
  • MIABIS (Minimum Information About BIobank data Sharing) is a metadata standard for describing biobanks and sample collections.
  • OBI (Ontology for Biomedical Investigations) is a formal vocabulary for describing study protocols and experimental designs.
  • PROV-O (Provenance Ontology) is the W3C standard underlying NIDM and DataLad provenance tracking.
  • RRID (Research Resource Identifiers) are persistent identifiers for reagents, software, and core facilities.
  • ROR (Research Organization Registry) provides persistent identifiers for research institutions.

Neuroimaging

Data format standards, metadata frameworks, and annotation vocabularies for brain imaging data.

  • BIDS (Brain Imaging Data Structure) is the community standard for organising neuroimaging datasets.
  • CIFTI (Connectivity Informatics Technology Initiative) is a surface and volume format for cortical data, developed by the Human Connectome Project.
  • Cognitive Atlas is an ontology of cognitive processes and tasks used for task annotation.
  • DICOM (Digital Imaging and Communications in Medicine) is the standard clinical imaging format, converted to NIfTI for analysis.
  • NIfTI (Neuroimaging Informatics Technology Initiative) is the widely adopted processed neuroimaging format.
  • NIDM (Neuroimaging Data Model) is a PROV-O-based standard for neuroimaging experiment provenance.
  • Open Brain Consent provides GDPR-compatible model informed consent forms for open data sharing.
  • openMINDS (Open Metadata Initiative for Neuroscience Data Structures) is the metadata framework required for data deposited on EBRAINS.
  • UBERON (Uber Anatomy Ontology) is a cross-species anatomy ontology used for brain region annotation.

Bioimaging

File formats and metadata standards for biological microscopy and bioimaging data.

  • OME File Formats are the OME microscopy formats: OME-TIFF for archival use and OME-Zarr for cloud-native datasets.
  • REMBI (Recommended Metadata for Biological Images) is the community metadata framework for bioimaging datasets.
  • SWC is a format for three-dimensional neuronal and glial morphology reconstructions.

Neurophysiology

File formats and annotation standards for electrophysiology, EEG, and computational neuroscience data.

  • BrainVision is the Brain Products three-file EEG format (.vhdr/.vmrk/.eeg).
  • EDF (European Data Format) is a widely used format for clinical EEG, iEEG, and polysomnography.
  • HED (Hierarchical Event Descriptors) provides structured event annotation for time-stamped data.
  • NWB (Neurodata Without Borders) is a community standard for electrophysiology and calcium imaging data.
  • SCORE (Standardized Computer-based Organized Reporting of EEG) is a clinical EEG reporting terminology encoded as a HED library schema.
  • SPARC SDS (SPARC Data Structure) is the standard for peripheral nervous system data.

Computational neuroscience

Data model standards for describing computational neuroscience models. For morphological reconstruction formats used as model input, see SWC in Bioimaging above.

  • NeuroML (Neural Open Markup Language) is the data model for simulator-independent descriptions of computational neuron and network models.

Genomics and single-cell

Sequencing file formats, variant standards, and single-cell data formats covering the pipeline from raw reads through to annotated expression matrices.

  • AnnData (Annotated Data) is the standard format (h5ad) for single-cell genomics data in the scverse ecosystem.
  • Cell Ontology is the OBO Foundry ontology for cell types, used for single-cell data annotation.
  • FASTQ is the standard format for raw sequencing reads.
  • GO (Gene Ontology) covers biological process, molecular function, and cellular component for gene products.
  • Phenopackets is the GA4GH standard linking clinical phenotypes to genomic data.
  • SAM-BAM-CRAM (Sequence Alignment Map and its binary and compressed forms) are the standard aligned sequencing read formats.
  • VCF (Variant Call Format) is the standard format for genomic variant data, deposited in EVA and dbSNP.
  • VRS (Variation Representation Specification) is the GA4GH standard for precise, globally unique variant identifiers.

Clinical data models and interoperability

Data models and exchange standards for structuring, querying, and sharing clinical and health data across systems and institutions.

  • CDISC (Clinical Data Interchange Standards Consortium) provides clinical trial data standards for regulatory submissions.
  • HL7 FHIR (Fast Healthcare Interoperability Resources) is the EHDS-mandated standard for EHR exchange.
  • OMOP CDM (Observational Medical Outcomes Partnership Common Data Model) is the OHDSI model for federated observational health research.
  • openEHR (open Electronic Health Record) is a semantic EHR specification built around reusable archetypes and templates.

Clinical classification and coding

Terminologies and classification systems for diagnoses, procedures, observations, and research data coding in clinical and health settings.

  • CCAM (Classification Commune des Actes Médicaux) is the French national procedure classification used in SNDS and AP-HP billing data.
  • ICD-10 (International Classification of Diseases, 10th Revision) is the WHO disease classification.
  • ICD-11 (International Classification of Diseases, 11th Revision) is the updated WHO classification in force since 2022.
  • ICD-O-3 (International Classification of Diseases for Oncology, 3rd Edition) is the WHO/IARC dual-axis tumour classification for cancer registries.
  • LOINC (Logical Observation Identifiers Names and Codes) is the international standard for identifying lab tests and clinical observations.
  • MeSH (Medical Subject Headings) is the NLM controlled vocabulary used for PubMed indexing.
  • OSIRIS is the French national minimum dataset for oncology clinical and genomic data sharing.
  • SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms) is a comprehensive clinical terminology and a core vocabulary in OMOP CDM and HL7 FHIR.

Drug and chemical terminologies

Controlled vocabularies for drugs, chemicals, and adverse events used in pharmacological research and clinical trials.

  • ATC (Anatomical Therapeutic Chemical classification) is the WHO international standard for drug utilisation and an OMOP CDM vocabulary.
  • ChEBI (Chemical Entities of Biological Interest) is the EMBL-EBI ontology covering drugs, metabolites, and neurotransmitters.
  • MedDRA (Medical Dictionary for Regulatory Activities) is the international terminology for adverse event coding in regulatory submissions.
  • NCIT (NCI Thesaurus) is the cancer and clinical research terminology used in CDISC submissions.
  • RxNorm is the NLM standard for clinical drug names and the primary drug vocabulary in OMOP CDM.

Disease, phenotype, and variant curation

Ontologies and reference resources for classifying diseases, annotating phenotypes, and curating the clinical significance of genomic variants.

  • ADO (Alzheimer’s Disease Ontology) covers biomarkers, staging, and genetics for Alzheimer’s cohort data annotation.
  • ERN Vocabularies (European Reference Network Vocabularies) are the ERN-RND and ERN-EpiCARE patient registry terminologies.
  • HPO (Human Phenotype Ontology) is the primary vocabulary for phenotypic abnormalities in rare disease genomics.
  • MP (Mammalian Phenotype Ontology) is the OBO Foundry vocabulary for mouse, rat, and other mammalian phenotypes.
  • MONDO (Monarch Disease Ontology) harmonises ICD-10, OMIM, and ORDO into a single disease hierarchy.
  • NBO (Neurobehavior Ontology) describes behavioural phenotypes in humans and model organisms.
  • OMIM (Online Mendelian Inheritance in Man) is a curated compendium of gene-disease relationships.
  • ORDO (Orphanet Rare Disease Ontology) is the European standard classification for rare neurological diseases.
  • uPheno (Unified Phenotype Ontology) integrates species-specific phenotype ontologies into a single cross-species representation.