About

Hi! What follows is a presentation about why and how this website exists. Feel free to contact me with questions and ideas.

Shaped by principles (1/3)

Researchers & engineers create open knowledge with data standards & data repositories
Lawyers & ethicists protect the individual, creating regulations such as the GDPR and HIPAA
Nations & regions defend security & sovereignty with national & regional laws

graph TD


  r([researchers, engineers]) -->|open knowledge| fair([data standards<br/>& repositories])
  l([lawyers, ethicists]) -->|protect the individual| dp([GDPR, HIPAA,<br/>health codes])
  s([nations, regions]) -->|security & sovereignty| nat([national &<br/>regional laws])

Note

Researchers and engineers need to share knowledge and make data and methods accessible. That gives us FAIR principles, and bottom-up data standards, repositories, and tools for sharing.

Lawyers, ethicists, and data-protection authorities shield people from misuse of a person’s data, especially with health and biometric data that cannot be recalled once leaked. That gives us GDPR, HIPAA, consent frameworks, and controlled access.

Nations and regions keep (sensitive) data within trusted borders for security, sovereignty, and protecting economic interest. This force is increasingly visible, and a reason why parts of the landscape are diverging, for example between the EU and the US.

Shaped by principles (2/3)

Researchers & engineers create open knowledge with data standards & data repositories
Lawyers & ethicists protect the individual, creating regulations such as the GDPR and HIPAA
Nations & regions defend security & sovereignty with national & regional laws

graph TD


  r([researchers, engineers]) -->|open knowledge| fair([data standards<br/>& repositories])
  l([lawyers, ethicists]) -->|protect the individual| dp([GDPR, HIPAA,<br/>health codes])
  s([nations, regions]) -->|security & sovereignty| nat([national &<br/>regional laws])

  fair .-> net([network of rules,<br/>standards & infrastructures])
  dp .-> net
  nat .-> net

Note

You probably agree with all three. However, they push and pull in the same and different directions, leaving a single complex network of standards, tools and repositories, within rules, regulations and policies.

Shaped by principles (3/3)

Researchers & engineers create open knowledge with data standards & data repositories
Lawyers & ethicists protect the individual, creating regulations such as the GDPR and HIPAA
Nations & regions defend security & sovereignty with national & regional laws

graph TD

  wg([working groups]) .-> r
  wg .-> l
  wg .-> s
  
  r([researchers, engineers]) -->|open knowledge| fair([data standards<br/>& repositories])
  l([lawyers, ethicists]) -->|protect the individual| dp([GDPR, HIPAA,<br/>health codes])
  s([nations, regions]) -->|security & sovereignty| nat([national &<br/>regional laws])

  fair .-> net([network of rules,<br/>standards & infrastructures])
  dp .-> net
  nat .-> net

Note

Working groups, committees, and joint actions try to align rules, tools and practices and translate between domain. Examples are CoSO, EHDS joint actions, and standards bodies such as GA4GH and INCF.

Typical questions

Sharing data: Where, and what, can or must I share data, and on what terms?
Reuse data: What data is out there, where can I find it, and how can I use it?
Shape policy: On which rules and practices do we agree, where not, and how can I participate?
Always: What is true, up to date, and what does it mean?

Note

As a result, similar questions face anyone working with neuroscience data.

If you produce data: where, and what, may or must you publish, and on what terms?

If you reuse data: what is out there, where is it, in what format, and under which conditions for use?

If you shape policy: where do we (dis)agree and who is making the decisions? If you want to change something, where do you engage?

And underneath all of them, the hardest one: how do you know that what you are reading is true, current, and that it means what you think it means? Especially when the network keeps shifting, with new tools and new rules. Navigating it with confidence is a challenge.

5 pillars

Curated nodes of actors, standards, resources and governing entities
Explanations in plain readable language for all audiences
Connections showing meaningful relationships with standardized vocabulary
Verified by primary sources and experts
Maintained openly in line with FAIR principles and practices
Editorial perspectives (next slide)

Note

The Open Neuroscience Graph is a response to these challenges, by provided a map, built around six pillars.

It is curated: nodes for the actors who do the research, the standards that encode the data, the resources that store and process it, and the governing entities that regulate it. Not everything, but the entities that matter and connect and are relevant for neuroscience.

Each node is explained in plain language, readable by a newcomer and a policymaker, not only by a specialist already inside one corner of the field.

The nodes are joined by named connections that state a concrete relationship: which repository implements which standard, which mandatTe routes to which platform, which body governs which standard. You can follow the relations rather than infer them.

It is verified against primary sources and, increasingly, by the experts in each community, so the claims can be trusted and the source is on the node.

It is maintained openly by design: a living map that knows where its own edges are, corrected in the open as the field moves. That is also the honest limit, it is semi-complete on purpose, and it improves by being used and challenged.

On top of them sits an editorial layer that slices the graph into perspectives, which is where we point the reader next.

Editorial perspectives vs. structure

practical, e.g. sharing your data
domain, e.g. genomics, neuroimaging
regional, e.g. France, Europe, Japan
general, e.g. Open access publishing

graph TD
  p1([practice:<br/>sharing your data])
  p2([domain:<br/>genomics])
  p3([region:<br/>France])
  p1 -.-> bids([BIDS])
  p1 -.-> ega([EGA])
  p2 -.-> ega
  p2 -.-> vcf([VCF])
  p3 -.-> ega
  p3 -.-> snds([SNDS])
  p4([general:<br/>open access publishing])
  p4 -.-> hal([HAL])
  p4 -.-> plans([Plan S])

Note

Open Neuroscience Graph provides curated views on the network that discuss the nodes in context that matter. E.g. a practical guide such as Sharing your data, a domain such as Genomics or Neuroimaging, or a country or region such as France, Europe, or Japan.

This means that the same node appears in several of them. E.g. EGA sits in the genomics perspective, the France perspective, and the sharing-your-data guide at once, while a node like the VCF variant format belongs only to genomics.

This layer is an editorial. The perspective pages can grow and change or expand as a field develops, without changing the underlying nodes.

Example: Europe as structure

The graph’s edges record dependence: pointing up to authorities or standards it relies on
European open neuroscience has no common root: it is not designed as a whole

graph BT
  zen([Zenodo]) -->|recommendedBy| ec([EC Open Science<br/>policy])
  eosc([EOSC]) -->|implements| fair([FAIR principles])
  fhir([HL7 FHIR]) -->|endorsedBy| ehds([EHDS])
  gdi([GDI]) -->|implements| onemg([1+MG Framework])
  ebrains([EBRAINS]) -->|accepts| bids([BIDS])
  ebrains -->|accepts| nwb([NWB])
  ebrains -->|implements| omind([openMINDS])

Note

The Europe perspective gathers the policies, infrastructures, and bodies shaping the field across the continent.

Start from how the graph actually records relationships: every edge points from a dependent node up to the authority or standard it relies on:

Zenodo is recommended by the EC open science policy

EOSC implements the FAIR principles

EHDS designates HL7 FHIR

GDI implements the 1+MG Framework

EBRAINS accepts the BIDS and NWB data standards and implements the openMINDS metadata framework.

These do not form a single tree but are disconnected fragments because European open neuroscience has no common root and each depend on different authorities and standards. Nobody designed the landscape as a whole.

Example: Europe as narrative

The editorial perspective connects the same entities into one view, and makes it traversable

graph TD
  eu([European open<br/>neuroscience])
  eu -->|open science| ec([EC Open Science<br/>policy])
  eu -->|health sovereignty| ehds([EHDS])
  eu -->|neuro platform| ebrains([EBRAINS])

  ec -->|federated cloud| eosc([EOSC])
  eosc -->|recommended deposit| zen([Zenodo])
  eosc -->|built on| fair([FAIR principles])

  ehds -->|record exchange| fhir([HL7 FHIR])
  ehds -->|federated genomics| gdi([GDI])
  gdi -->|framework| onemg([1+MG Framework])

  ebrains -->|accepts| bids([BIDS])
  ebrains -->|accepts| nwb([NWB])
  ebrains -->|metadata| omind([openMINDS])

Note

The Europe perspective provides a more cohesive view with European open neuroscience as the root, and a path through the tree: open science (the EC policy reaching EOSC, which recommends Zenodo and is built on FAIR), health sovereignty (EHDS designating FHIR for record exchange and driving GDI and the 1+MG Framework for federated genomics), and the neuroscience platform (EBRAINS, which accepts the BIDS and NWB data standards and uses the openMINDS metadata framework).

Example: Genomics

Where data goes is set by three axes: pipeline stage (format) × access tier x region
Raw reads → ENA / SRA / DDBJ; human-controlled → EGA / dbGaP; variants → EVA / dbSNP

graph LR
  start([Genomic data])
  start --> eu([EU])
  start --> us([USA])

  eu -->|raw reads| eu_fastq([FASTQ])
  eu -->|aligned reads| eu_bam([SAM-BAM-CRAM])
  eu -->|variants| eu_vcf([VCF])
  eu -->|expression| eu_tsv([TSV])

  us -->|raw reads| us_fastq([FASTQ])
  us -->|aligned reads| us_bam([SAM-BAM-CRAM])
  us -->|variants| us_vcf([VCF])
  us -->|expression| us_tsv([TSV])

  ena([ENA])
  ega([EGA])
  eva([EVA])
  sra([SRA])
  dbgap([dbGaP])
  dbsnp([dbSNP])
  geo([NCBI GEO])

  eu_fastq -->|public| ena
  eu_fastq -->|controlled| ega
  eu_bam -->|public| ena
  eu_bam -->|controlled| ega
  eu_vcf -->|public| eva
  eu_vcf -->|controlled| ega
  eu_tsv -->|controlled| ega

  us_fastq -->|public| sra
  us_fastq -->|controlled| dbgap
  us_bam -->|public| sra
  us_bam -->|controlled| dbgap
  us_vcf -->|public| dbsnp
  us_vcf -->|controlled| dbgap
  us_tsv -->|public| geo
  us_tsv -->|controlled| dbgap
  geo -.->|brokers raw reads| sra
  dbsnp -.->|large variants| dbvar([dbVar])

Note

The Genomics perspective shows the opposite face: not a single policy that converges, but rather destinations multiplying and diverging. Where genomic data can best be deposited depends on the stage of data processing (raw reads, aligned reads, variants, expression), and the level of sensitivity and therefor access tier (open, or controlled when re-identification is a risk), which depends on region. Raw reads in FASTQ go to the INSDC partners, ENA in Europe, SRA in the US, DDBJ in Japan. Human reads that cannot be openly released go instead to controlled archives, EGA or dbGaP. Variants can often be shared openly, to EVA or dbSNP. Expression matrices go to GEO when open, or to the controlled archives when human.

Example: Health data

European regulations are pushing towards coherence across nation states.
Each country is required to develop a designated EHDS access body (HDAB) for secondary use
Two standards are emerging: FHIR for primary exchange, OMOP CDM for secondary reuse

graph TD
  ehds([EHDS regulation])

  ehds -->|France| hdh([Health Data Hub])
  hdh -->|portal over| snds([SNDS])
  hdh -.->|is developing| hdabfr([designated HDAB])
  snds -.-> hdabfr

  ehds -->|Netherlands| hri([Health-RI])
  hri -->|federated holders| lifelines([Lifelines cohort])
  hri -.->|is developing| hdabnl([designated HDAB])
  lifelines -.-> hdabnl

  hdabfr -.-> fromop([OMOP CDM<br/>*secondary reuse*])
  hdabfr <-.-> fhir([FHIR<br/>*primary data exchange*])
  hdabnl <-.-> fhir
  hdabnl -.-> nlomop([OMOP CDM<br/>*secondary reuse*])

  fromop -.-> comb([combined data reuse])
  nlomop -.-> comb

Note

National health systems were built separately and store data differently. The EHDS does not yet mandate a single technical standard, but two have emerged as the pair the ecosystem is converging on: FHIR for primary exchange of records between live systems, and OMOP CDM as the common model for secondary research reuse. The bridging work between them is still ongoing, through the TEHDAS2 joint action.

Each member state must designate a Health Data Access Body (HDAB) as its national access point for secondary use, by 2027. These are still being built (shown dotted). France: the Health Data Hub is the portal over the SNDS and the candidate HDAB. Netherlands: Health-RI coordinates federated holders such as Lifelines and is building the Dutch HDAB. Each HDAB converges on the shared FHIR layer for cross-border exchange, and structures its own data in OMOP for research, which is what allows combined reuse across countries. Different national routes, one shared destination.

Example: Multimodal data & HED

Data can be combined based on meaningful events, such as stimuli or responses
Encoding these events for interoperability requires an event vocabulary
HED provides such vocabulary, which is supported in community data standards

graph TD
  hed([HED<br/>event vocabulary])
  eeg([EEG study]) -->|annotates with| hed
  fmri([fMRI study]) -->|annotates with| hed
  behav([behavioural study]) -->|annotates with| hed
  hed -->|integrated with| bids([BIDS])
  hed -->|integrated with| nwb([NWB])

Note

BIDS organises a dataset but HED, Hierarchical Event Descriptors, annotates what happened inside it in a common vocabulary, using HED tags. This allows multimodal data to be synchronized either within a recording, between recording of the same experiment.

Introduction: Describing data is a shared discipline

Every field must describe its data so others can find and reuse it
The ways of doing this come from information science
Bodies like DCMI and the W3C publish standard ways to describe data
Most fields reuse these standard terms and add domain-specific ones
That documented selection is called an application profile

graph TD
  dcmi([DCMI<br/>standards body])
  w3c([W3C<br/>standards body])
  dcmi -->|publishes| dc([Dublin Core<br/>general descriptive terms])
  w3c -->|publishes| dcat([DCAT<br/>data and catalogue terms])
  dc -->|reused by| ap([application profile<br/>a field's chosen terms])
  dcat -->|reused by| ap
  custom([custom<br/>domain-specific]) -->|added to| ap

Note

One general point before the specific standards. Describing data so others can find, trust, and reuse it is not unique to neuroscience. It is a shared discipline, worked out over decades in library and information science, and the same ideas recur in genomics, health, climate science, and government data.

There are recognised bodies for it. The W3C, the organisation behind the standards of the web, publishes DCAT for describing datasets, alongside older standards such as Dublin Core. These supply ready-made terms for who published a dataset, what it covers, and where to find it.

A field rarely invents its own scheme. It reuses these standard terms and adds domain-specific ones only where nothing standard fits. That documented mix is an application profile, the normal and interoperable way to proceed. What follows are some examples of how neuroscience ends up organising its terms and relations.

Introduction: Vocabularies, ontologies, and data models

A vocabulary defines approved terms
A thesaurus adds broader, narrower, and related links
An ontology adds typed relations
A data model is a set of required (and optional) fields
Metadata is a dataset’s filled-in description

graph TD
  vocab(["vocabulary<br/>approved term: <code>jaguar</code><br/>not <code>panther</code>"])
  thes(["thesaurus<br/>broader: <code>big cat</code><br/>related: <code>Panthera</code>"])
  onto(["ontology<br/><code>jaguar</code> <code>is-a</code> <code>Panthera</code><br/><code>preys-on</code> <code>capybara</code>"])
  vocab -->|adds structure| thes -->|adds structure| onto
  onto -->|supplies terms to| dm(["data model<br/>required: <code>species</code>"])
  dm -->|becomes| meta(["metadata<br/><code>species</code> = <code>jaguar</code>"])

Note

You will hear certain terms being used a lot, sometimes seemingly interchangeable, which they are not. We will use the example of describing a jaguar to explain these terms.

A vocabulary is an agreed list of approved words, so everyone writes jaguar and nobody writes panther for the same animal. A thesaurus adds a little structure on top: it records that jaguar sits under big cat and is related to Panthera, without saying exactly how. An ontology does say how: a jaguar is a kind of Panthera and preys on capybara. Because each link has a stated meaning, software can follow it and reason about it. The located in, performs, and acts on links in the diagrams ahead are exactly this kind of stated relationship, which is why those resources are ontologies and not just lists.

The last two words are about description rather than naming. A data model is a form with fields, where each field has to be filled from an approved vocabulary: a species field that only accepts a listed term. A data model is also called a schema (a database schema, a JSON schema), though that word is used loosely elsewhere too. Metadata is that form filled in for one dataset: species = jaguar. So the data model gives the blank fields, vocabularies and ontologies supply the words allowed in them, and the metadata is the finished description.

In practice these really do blur. SNOMED CT has enough structure to be called an ontology by some, MeSH is built as a thesaurus but often used as a plain vocabulary. The order on the slide is the thing to remember: each step adds more structure and more meaning a person or computer can act on.

Ontologies: from biology to the clinic

Biology gives rise to traits, traits define diseases, diseases are recorded as diagnoses
A disease can be matched from the genetics that underlie it to the code a clinician enters

graph TD
  bio([biology<br/>cells, anatomy, molecules]) -->|manifests as| trait([traits<br/>phenotypes, behaviour])
  trait -->|grouped into| disease([disease<br/>one identifier])
  disease -->|recorded as| record([clinical record<br/>a coded diagnosis])

Note

Biology (cell types, anatomy, molecular function) gives rise to observable traits (phenotypes and behaviour). Traits define diseases. A disease is recorded in a patient’s record as a diagnosis code. Each step is a different relationship: a trait is a manifestation of biology, a disease is a grouping of traits, a diagnosis code is a record of a disease.

The standard naming each of these maps to the standard naming the next. So a single disease can be matched from the genetics that underlie it to the diagnosis code a clinician enters.

The ontologies for biology and traits, and MONDO and ORDO for disease, follow the shared rules of the OBO Foundry: one term defined in one ontology and reused by the others, under an open licence. That shared discipline is what lets them fit together. The clinical and literature coding systems that appear later (ICD, MeSH, SNOMED CT, and the drug, lab, and procedure codes) are governed separately and do not follow these rules.

Example: Ontologies (1/4) — biology: anatomy, cells, and molecules

Each ontology covers a distinct biological domain and connects to adjacent ones.
Cell type is one aspect of anatomy, performs molecular function, acting on chemicals

graph TD
  cl([Cell Ontology<br/>cell type])
  uberon([UBERON<br/>anatomy])
  go([GO<br/>molecular function])
  chebi([ChEBI<br/>chemical entities])

  cl -->|located in| uberon
  cl -->|performs| go
  go -->|acts on| chebi

Note

Where metadata standards describe how a dataset is structured, ontologies name what it is about: the cell types, brain regions, molecules, traits, and diseases a record refers to. Each ontology covers one kind of thing and links to its neighbours, so a term defined in one can be used consistently in another. All of them are endorsed by the OBO Foundry, which sets shared design rules so the separate ontologies fit together. That endorsement is the same for every ontology, so it is left out of these diagrams.

This first group covers the biological subject matter. A cell type (Cell Ontology) is located in a brain region or other anatomical structure (UBERON) and performs molecular functions (GO), which act on chemical entities (ChEBI) such as drugs and neurotransmitters. This is the biology that the traits, diseases, and diagnoses in the examples that follow rest on.

Example: Ontologies (2/4) — traits: phenotype and behaviour

A phenotype is an observable trait, named separately from the disease behind it
Human and mouse phenotypes use different ontologies, linked by a shared mapping
The mapping lets a trait seen in a mouse be matched to the human equivalent

graph TD
  hpo([HPO<br/>human phenotype])
  mp([MP<br/>mouse, rat<br/>phenotype])
  nbo([NBO<br/>behaviour])

  hpo <-->|co-maintained mapping| mp
  nbo -->|describes behaviour for| hpo
  nbo -->|describes behaviour for| mp

Note

A phenotype is an observable trait, such as a seizure type or a memory deficit, named separately from the disease that produces it (the next group). HPO covers human phenotypes and MP covers the phenotypes of mice, rats, and other model organisms.

The link between them is the vault predicate correspondsWith: two standards that maintain a shared mapping together, where neither is derived from the other. Here it is the Mouse-Human Ontology Mapping Initiative, which lets a trait seen in a mouse model be matched to the human equivalent for disease research. NBO adds behavioural and neurological traits for both. HPO, MP, and NBO are all OBO Foundry ontologies, which is what makes the mapping between them tractable.

Example: Ontologies (3/4) — disease: harmonising the systems

The same disease is coded differently for clinics, genetics, rare disease, and literature
Each system was built for its own purpose, so the codes do not line up
One ontology maps every system to a single research disease identifier

graph BT
  omim([OMIM<br/>Mendelian genetics]) -->|maps to| mondo([MONDO<br/>one research disease ID])
  ordo([ORDO<br/>rare disease]) -->|maps to| mondo
  icd([ICD-10 / ICD-11<br/>clinical coding]) -->|maps to| mondo
  mesh([MeSH<br/>literature]) -->|maps to| mondo

Note

The same disease is recorded in several systems, each built for a different purpose. ICD-10 and its successor ICD-11 are the WHO classifications used for clinical coding, statistics, and billing. OMIM catalogues Mendelian gene-disease relationships. ORDO, produced by Orphanet, is the European rare-disease reference. MeSH indexes the disease literature for PubMed. Because each was built separately, their codes do not line up.

MONDO assigns one identifier per disease and curates a mapping to the matching code in every system, which is why all the arrows point up into it. For example MONDO:0004975 for Alzheimer’s maps to ICD-10 G30, OMIM 104300, and ORPHA:26929. That single set of mappings is what lets a disease be queried across databases that each chose a different classification, whether the starting point is a clinical code, a genetic catalogue, a rare-disease reference, or the literature.

This is also where the OBO Foundry boundary falls. MONDO and ORDO follow OBO’s rules; ICD-10, ICD-11, and MeSH are governed separately and do not. MONDO, an OBO ontology, is the one that reaches across the boundary, mapping out to the clinical and literature systems that sit outside it. So this is not a handover between two separate worlds. An ontology built to OBO’s rules does the work of connecting them.

Example: Ontologies (4/4) — the clinic: coding a record

A patient record holds several kinds of fact, each coded in its own terminology
Diagnoses, drugs, labs, procedures, adverse events, and cancer detail each have one
A common data model such as OMOP CDM gives every field a slot and a standard code

graph TD
  omop([OMOP CDM<br/>common data model])

  omop -->|codes with| snomed([SNOMED CT<br/>conditions])
  omop -->|codes with| rxnorm([RxNorm / ATC<br/>drugs])
  omop -->|codes with| loinc([LOINC<br/>measurements])
  omop -->|codes with| ccam([CCAM<br/>procedures])
  omop -->|codes with| meddra([MedDRA<br/>adverse events])
  omop -->|codes with| icdo([ICD-O-3<br/>cancer detail])

Note

A single patient record is not coded by one terminology but by several, one for each kind of fact it holds. Diagnoses use SNOMED CT, drugs use RxNorm (with ATC for drug class), lab results and measurements use LOINC, procedures use CCAM in France, adverse events use MedDRA, and cancer morphology uses ICD-O-3. Each codes one slice of the record, the way the research ontologies each code one kind of biological thing.

This is where the data models come in, and it answers how they relate to the terminologies. A common data model such as the OMOP CDM does not replace these vocabularies. It provides a table for each kind of fact (conditions, drugs, measurements, procedures) and requires each to be coded in a designated standard: conditions in SNOMED CT, drugs in RxNorm, measurements in LOINC. The model is the container, the terminologies are what fill it. CDISC, the clinical-trials model, does the same for trial submissions, drawing its adverse-event coding from MedDRA. The models are covered in the Health and Clinical Trials perspectives.

The same disease runs through all of these examples. A molecular function gives rise to an observable trait; traits define a disease; the disease is recorded as a diagnosis code in a patient’s chart. Because the standard naming each step maps to the one naming the next, a single disease can be matched from the genetics that underlie it to the diagnosis code used in care.

How the graph is built and maintained

A node is a plain text file

Every node originates as one Markdown file in Obsidian
Frontmatter: name, website, status, parent, type/ and domain/ tags, verified
Body: Overview, Connections (labelled edges), Resources
Anyone who can edit text can contribute a node

---
name: Brain Imaging Data Structure
aliases:
  - BIDS
website: https://bids.neuroimaging.io
status: active
founded: 2016
parent_org: BIDS Steering Group
tags:
  - type/datamodel       # directory + graph colour
  - domain/neuroimaging  # research area, used for filtering
verified: true
last_reviewed: 2026-06-01
---

Note

Every node is a single Markdown file, in the Obsidian standard: a YAML frontmatter header and a short body.

The frontmatter in YAML holds name and aliases, website, status and founding year, parent organisation, the type/ and domain/ tags, and a verified flag with the date last checked against primary sources. The type/ tag places the node in one of four families: Actors that do research, Standards that encode data, Resources that store and process it, and Governance that coordinates and regulates it.

The body has three sections:

Overview (a 3-6 sentence plain-language summary)

Connections (the labelled edges, e.g. governedBy: BIDS Steering Group)

Resources (primary-source URLs).

Nodes and edges

Every entity is a node with labelled connections (edges) to other nodes
The edge is written once, in the dependent node, pointing up to the authority
Labels come from a controlled vocabulary (FAIRsharing, schema.org, and custom)
Reverse connections show automatically as backlinks

graph BT
  on([OpenNeuro]) -->|requires| bids([BIDS])
  nidm([NIDM]) -->|extends| bids
  bids -->|governedBy| sg([BIDS Steering Group])
  bids -->|endorsedBy| incf([INCF])
  bids -.->|backlink, automatic| on

Note

Each node carries labelled connections (edges) to other nodes. A node requires at least one real significant edge: an unconnected node says nothing about how the field is organised.

An edge has a direction and a label. It is written in the more specific or dependent node, pointing up to the authority, standard, or parent it depends on. The authority doesn’t point down to those that depend on it, to prevent accumulation upwards. However, these backwards connections (backlinks) are shown automatically on the site (and in Obsidian)

The label comes from a controlled vocabulary composited from FAIRsharing (data-flow terms), schema.org, Dublin Core (structural terms), and custom governance terms. Reusing established terms where they exist, and minting our own only where open-neuroscience governance has no standard equivalent, is what makes this an application profile rather than a private vocabulary. The Vocabulary page shows the full list and names each term’s source.

Inclusion criteria

1. Domain scope: Does the entity operates in a neuroscience data domain?
2. Type-appropriate function: Does it fit a clear type/ tag (see Vocabulary)
3. Edge generation: Does it have at least one significant (strong) labelable connection
4. Participation is not enough: Endorsing or belonging is not enough
5. Precedent: Would adding the entity commit the vault to include all of its kind? (if yes, exclude)
6. Removal: Would deleting it leave a dangling link? (if no, exclude)

graph TD
  start([candidate]) --> t1([in scope?])
  t1 -->|no| out([not included])
  t1 -->|yes| t2([existing type?])
  t2 -->|no| out
  t2 -->|yes| t3([1+ edge?])
  t3 -->|no| out
  t3 -->|yes| t4([precedent ok?])
  t4 -->|no| out
  t4 -->|yes| node([node])

Note

A node must pass six tests, in order:

Domain scope: operates in a neuroscience data domain.

Type-appropriate function: clears the bar for its kind (a repository holds data; an institute operates open infrastructure, not just research).

Edge generation: produces at least one labelable connection.

Participation is not enough: endorsing or belonging does not substitute for tests 1-3.

Precedent: would adding it commit the vault to every other entity of the same kind?

Removal: if deleted, would any node be left with a dangling link?

Domain tags

Each data-facing node carries a domain/ tag for its research area
Cross-domain governance and regulatory entities carry none
Tags allow filtering

graph TD
  repo([OpenNeuro])
  t1(["#type/repository"]) -.- repo
  t2(["#domain/neuroimaging"]) -.- repo

Note

Each data-facing node carries a domain/ tag for the research area it serves. Cross-domain governance and regulatory entities carry none. Tags drive filtering: every perspective and domain view is a query over them. Domains: neuroimaging, electrophysiology, genomics, biosamples, bioimaging, behavior, clinical, health, computational, reproducibility.

Built to be found

Registered for discovery: FAIRsharing, bio.tools, w3id, Zenodo DOI
Two targets are themselves nodes (registries the vault catalogues)
The graph’s own discoverability uses the same registeredIn predicate

graph TD
  ong([Open Neuroscience Graph]) -->|registeredIn| fs([FAIRsharing])
  ong -->|registeredIn| bt([bio.tools])
  ong -->|registeredIn| w3([w3id])
  ong -->|registeredIn| zen([Zenodo DOI])

Note

The graph is registered for Findability (FAIR): a FAIRsharing record, a bio.tools entry, a permanent [[openneuroscience|w3id identifier]], and a Zenodo DOI for citation. Two targets are themselves nodes (FAIRsharing and bio.tools are registries the vault catalogues), so the graph’s own discoverability uses the same registeredIn predicate it applies to any dataset or identifier.

How it is built and run

Plain text and open tooling, fully transferable by design
Maintainer writes notes in Obsidian, Claude structures and completes them
Maintainer curates in VS Code using Git
Site build with Quartz, hosted on Gitlab Pages, with Matomo analytics

graph TD
  maint([Maintainer]) -->|writes raw notes| notes([Obsidian<br/>Raw notes])
  notes -->|scope & evaluate| claude([Claude])
  claude -->|write nodes| nodes([Obsidian<br/>Structured node])
  claude -.->|curate| nodes
  nodes -->|review| vsc([VS Code<br/>any maintainer curates])
  vsc -->|commit| gl([GitLab])
  gl -.->|change log| vsc
  gl -->|build| quartz([Quartz])
  quartz -->|deploy| pages([GitLab Pages])
  pages -->|measure| matomo([Matomo analytics])

Note

Openneuroscience Graph is build and maintained by me (Stephen Whitmarsh). I build a system that works for me, allows me to maintain and scale this work, while anticipating future contributions and maintenance by the community. The following is therefor written in the first-person, but I explain it because it might be useful for you - as future contributor - as well.

I use Obsidian when I can, or write directly in TODO.md. I’ve installed Claude MCP, so that it can directly read and write within the vault. I then use it to scope and evaluate new notes and harmonize formatting. The end result is structured in the real vault format, including the YAML frontmatter. Through git integration in Visual Studio Code, it is easy for me to review, curate and validate new nodes. From each commit the site is automatically build with Quartz, hosted through GitLab Pages, and measured with Matomo (no cookies). For more details see CONTRIBUTING.md.

Because everything (vault and the build) is shared in Git, any maintainer, on any machine, can clone or pull the vault.

Finally, Quartz is also heavily customized. This is done with custom code in such a way that a rebase of Quartz can be done without problems. This is documented in QUARTZ.md.

Where it lives

Site: openneuroscience.org
Repository: gitlab.com/icm-institute/dac/opensciencegraph
DOI: 10.5281/zenodo.20181900
FAIRsharing: https://fairsharing.org/8243
bio.tools: https://bio.tools/open_neuroscience_graph
w3id: https://w3id.org/openneuroscience/graph
Contact: stephenwhitmarsh@proton.me

Note

The database and code are part of the open neuroscience ecosystem itself, and effort is keep it FAIR. This is important because standards, tools and actors all change continuously. Contributions, corrections, and suggestions are therefore essential, especially anything that makes this graph (and the field) more inclusive. Editing conventions, inclusion criteria, frontmatter fields, and Dataview query examples are in CONTRIBUTING.md and DATAVIEW.md, or get in touch via the repository or by email.

Citing

Creating and maintaining this resource requires real time and effort, so please acknowledge this in your work by citing: Whitmarsh, S. (2026). Open Neuroscience Graph. Zenodo. https://doi.org/10.5281/zenodo.20181900.

Explorer

About

Shaped by principles (1/3)

Shaped by principles (2/3)

Shaped by principles (3/3)

Typical questions

5 pillars

Editorial perspectives vs. structure

Example: Europe as structure

Example: Europe as narrative

Example: Genomics

Example: Health data

Example: Multimodal data & HED

Introduction: Describing data is a shared discipline

Introduction: Vocabularies, ontologies, and data models

Ontologies: from biology to the clinic

Example: Ontologies (1/4) — biology: anatomy, cells, and molecules

Example: Ontologies (2/4) — traits: phenotype and behaviour

Example: Ontologies (3/4) — disease: harmonising the systems

Example: Ontologies (4/4) — the clinic: coding a record

How the graph is built and maintained

A node is a plain text file

Nodes and edges

Inclusion criteria

Domain tags

Built to be found

How it is built and run

Where it lives

Citing

Table of Contents

Backlinks