About
Hi! What follows is a presentation about why and how this website exists. Feel free to contact me with questions and ideas.
Shaped by principles (1/3)
- Researchers & engineers create open knowledge with data standards & data repositories
- Lawyers & ethicists protect the individual, creating regulations such as the GDPR and HIPAA
- Nations & regions defend security & sovereignty with national & regional laws
graph TD r([researchers, engineers]) -->|open knowledge| fair([data standards<br/>& repositories]) l([lawyers, ethicists]) -->|protect the individual| dp([GDPR, HIPAA,<br/>health codes]) s([nations, regions]) -->|security & sovereignty| nat([national &<br/>regional laws])
Note
Researchers and engineers need to share knowledge and make data and methods accessible. That gives us FAIR principles, and bottom-up data standards, repositories, and tools for sharing.
Lawyers, ethicists, and data-protection authorities shield people from misuse of a person’s data, especially with health and biometric data that cannot be recalled once leaked. That gives us GDPR, HIPAA, consent frameworks, and controlled access.
Nations and regions keep (sensitive) data within trusted borders for security, sovereignty, and protecting economic interest. This force is increasingly visible, and a reason why parts of the landscape are diverging, for example between the EU and the US.
Shaped by principles (2/3)
- Researchers & engineers create open knowledge with data standards & data repositories
- Lawyers & ethicists protect the individual, creating regulations such as the GDPR and HIPAA
- Nations & regions defend security & sovereignty with national & regional laws
graph TD r([researchers, engineers]) -->|open knowledge| fair([data standards<br/>& repositories]) l([lawyers, ethicists]) -->|protect the individual| dp([GDPR, HIPAA,<br/>health codes]) s([nations, regions]) -->|security & sovereignty| nat([national &<br/>regional laws]) fair .-> net([network of rules,<br/>standards & infrastructures]) dp .-> net nat .-> net
Note
You probably agree with all three. However, they push and pull in the same and different directions, leaving a single complex network of standards, tools and repositories, within rules, regulations and policies.
Shaped by principles (3/3)
- Researchers & engineers create open knowledge with data standards & data repositories
- Lawyers & ethicists protect the individual, creating regulations such as the GDPR and HIPAA
- Nations & regions defend security & sovereignty with national & regional laws
graph TD wg([working groups]) .-> r wg .-> l wg .-> s r([researchers, engineers]) -->|open knowledge| fair([data standards<br/>& repositories]) l([lawyers, ethicists]) -->|protect the individual| dp([GDPR, HIPAA,<br/>health codes]) s([nations, regions]) -->|security & sovereignty| nat([national &<br/>regional laws]) fair .-> net([network of rules,<br/>standards & infrastructures]) dp .-> net nat .-> net
Note
Typical questions
- Sharing data: Where, and what, can or must I share data, and on what terms?
- Reuse data: What data is out there, where can I find it, and how can I use it?
- Shape policy: On which rules and practices do we agree, where not, and how can I participate?
- Always: What is true, up to date, and what does it mean?
Note
As a result, similar questions face anyone working with neuroscience data.
If you produce data: where, and what, may or must you publish, and on what terms?
If you reuse data: what is out there, where is it, in what format, and under which conditions for use?
If you shape policy: where do we (dis)agree and who is making the decisions? If you want to change something, where do you engage?
And underneath all of them, the hardest one: how do you know that what you are reading is true, current, and that it means what you think it means? Especially when the network keeps shifting, with new tools and new rules. Navigating it with confidence is a challenge.
5 pillars
- Curated nodes of actors, standards, resources and governing entities
- Explanations in plain readable language for all audiences
- Connections showing meaningful relationships with standardized vocabulary
- Verified by primary sources and experts
- Maintained openly in line with FAIR principles and practices
- Editorial perspectives (next slide)
Note
The Open Neuroscience Graph is a response to these challenges, by provided a map, built around six pillars.
It is curated: nodes for the actors who do the research, the standards that encode the data, the resources that store and process it, and the governing entities that regulate it. Not everything, but the entities that matter and connect and are relevant for neuroscience.
Each node is explained in plain language, readable by a newcomer and a policymaker, not only by a specialist already inside one corner of the field.
The nodes are joined by named connections that state a concrete relationship: which repository implements which standard, which mandatTe routes to which platform, which body governs which standard. You can follow the relations rather than infer them.
It is verified against primary sources and, increasingly, by the experts in each community, so the claims can be trusted and the source is on the node.
It is maintained openly by design: a living map that knows where its own edges are, corrected in the open as the field moves. That is also the honest limit, it is semi-complete on purpose, and it improves by being used and challenged.
On top of them sits an editorial layer that slices the graph into perspectives, which is where we point the reader next.
Editorial perspectives vs. structure
- practical, e.g. sharing your data
- domain, e.g. genomics, neuroimaging
- regional, e.g. France, Europe, Japan
- general, e.g. Open access publishing
graph TD p1([practice:<br/>sharing your data]) p2([domain:<br/>genomics]) p3([region:<br/>France]) p1 -.-> bids([BIDS]) p1 -.-> ega([EGA]) p2 -.-> ega p2 -.-> vcf([VCF]) p3 -.-> ega p3 -.-> snds([SNDS]) p4([general:<br/>open access publishing]) p4 -.-> hal([HAL]) p4 -.-> plans([Plan S])
Note
Open Neuroscience Graph provides curated views on the network that discuss the nodes in context that matter. E.g. a practical guide such as Sharing your data, a domain such as Genomics or Neuroimaging, or a country or region such as France, Europe, or Japan.
This means that the same node appears in several of them. E.g. EGA sits in the genomics perspective, the France perspective, and the sharing-your-data guide at once, while a node like the VCF variant format belongs only to genomics.
This layer is an editorial. The perspective pages can grow and change or expand as a field develops, without changing the underlying nodes.
Example: Europe as structure
- The graph’s edges record dependence: pointing up to authorities or standards it relies on
- European open neuroscience has no common root: it is not designed as a whole
graph BT zen([Zenodo]) -->|recommendedBy| ec([EC Open Science<br/>policy]) eosc([EOSC]) -->|implements| fair([FAIR principles]) fhir([HL7 FHIR]) -->|endorsedBy| ehds([EHDS]) gdi([GDI]) -->|implements| onemg([1+MG Framework]) ebrains([EBRAINS]) -->|accepts| bids([BIDS]) ebrains -->|accepts| nwb([NWB]) ebrains -->|implements| omind([openMINDS])
Note
The Europe perspective gathers the policies, infrastructures, and bodies shaping the field across the continent.
Start from how the graph actually records relationships: every edge points from a dependent node up to the authority or standard it relies on:
- Zenodo is recommended by the EC open science policy
- EOSC implements the FAIR principles
- EHDS designates HL7 FHIR
- GDI implements the 1+MG Framework
- EBRAINS accepts the BIDS and NWB data standards and implements the openMINDS metadata framework.
These do not form a single tree but are disconnected fragments because European open neuroscience has no common root and each depend on different authorities and standards. Nobody designed the landscape as a whole.
Example: Europe as narrative
- The editorial perspective connects the same entities into one view, and makes it traversable
graph TD eu([European open<br/>neuroscience]) eu -->|open science| ec([EC Open Science<br/>policy]) eu -->|health sovereignty| ehds([EHDS]) eu -->|neuro platform| ebrains([EBRAINS]) ec -->|federated cloud| eosc([EOSC]) eosc -->|recommended deposit| zen([Zenodo]) eosc -->|built on| fair([FAIR principles]) ehds -->|record exchange| fhir([HL7 FHIR]) ehds -->|federated genomics| gdi([GDI]) gdi -->|framework| onemg([1+MG Framework]) ebrains -->|accepts| bids([BIDS]) ebrains -->|accepts| nwb([NWB]) ebrains -->|metadata| omind([openMINDS])
Note
The Europe perspective provides a more cohesive view with European open neuroscience as the root, and a path through the tree: open science (the EC policy reaching EOSC, which recommends Zenodo and is built on FAIR), health sovereignty (EHDS designating FHIR for record exchange and driving GDI and the 1+MG Framework for federated genomics), and the neuroscience platform (EBRAINS, which accepts the BIDS and NWB data standards and uses the openMINDS metadata framework).
Example: Genomics
- Where data goes is set by three axes: pipeline stage (format) × access tier x region
- Raw reads → ENA / SRA / DDBJ; human-controlled → EGA / dbGaP; variants → EVA / dbSNP
graph LR start([Genomic data]) start --> eu([EU]) start --> us([USA]) eu -->|raw reads| eu_fastq([FASTQ]) eu -->|aligned reads| eu_bam([SAM-BAM-CRAM]) eu -->|variants| eu_vcf([VCF]) eu -->|expression| eu_tsv([TSV]) us -->|raw reads| us_fastq([FASTQ]) us -->|aligned reads| us_bam([SAM-BAM-CRAM]) us -->|variants| us_vcf([VCF]) us -->|expression| us_tsv([TSV]) ena([ENA]) ega([EGA]) eva([EVA]) sra([SRA]) dbgap([dbGaP]) dbsnp([dbSNP]) geo([NCBI GEO]) eu_fastq -->|public| ena eu_fastq -->|controlled| ega eu_bam -->|public| ena eu_bam -->|controlled| ega eu_vcf -->|public| eva eu_vcf -->|controlled| ega eu_tsv -->|controlled| ega us_fastq -->|public| sra us_fastq -->|controlled| dbgap us_bam -->|public| sra us_bam -->|controlled| dbgap us_vcf -->|public| dbsnp us_vcf -->|controlled| dbgap us_tsv -->|public| geo us_tsv -->|controlled| dbgap geo -.->|brokers raw reads| sra dbsnp -.->|large variants| dbvar([dbVar])
Note
The Genomics perspective shows the opposite face: not a single policy that converges, but rather destinations multiplying and diverging. Where genomic data can best be deposited depends on the stage of data processing (raw reads, aligned reads, variants, expression), and the level of sensitivity and therefor access tier (open, or controlled when re-identification is a risk), which depends on region. Raw reads in FASTQ go to the INSDC partners, ENA in Europe, SRA in the US, DDBJ in Japan. Human reads that cannot be openly released go instead to controlled archives, EGA or dbGaP. Variants can often be shared openly, to EVA or dbSNP. Expression matrices go to GEO when open, or to the controlled archives when human.
Example: Health data
- European regulations are pushing towards coherence across nation states.
- Each country is required to develop a designated EHDS access body (HDAB) for secondary use
- Two standards are emerging: FHIR for primary exchange, OMOP CDM for secondary reuse
graph TD ehds([EHDS regulation]) ehds -->|France| hdh([Health Data Hub]) hdh -->|portal over| snds([SNDS]) hdh -.->|is developing| hdabfr([designated HDAB]) snds -.-> hdabfr ehds -->|Netherlands| hri([Health-RI]) hri -->|federated holders| lifelines([Lifelines cohort]) hri -.->|is developing| hdabnl([designated HDAB]) lifelines -.-> hdabnl hdabfr -.-> fromop([OMOP CDM<br/>*secondary reuse*]) hdabfr <-.-> fhir([FHIR<br/>*primary data exchange*]) hdabnl <-.-> fhir hdabnl -.-> nlomop([OMOP CDM<br/>*secondary reuse*]) fromop -.-> comb([combined data reuse]) nlomop -.-> comb
Note
National health systems were built separately and store data differently. The EHDS does not yet mandate a single technical standard, but two have emerged as the pair the ecosystem is converging on: FHIR for primary exchange of records between live systems, and OMOP CDM as the common model for secondary research reuse. The bridging work between them is still ongoing, through the TEHDAS2 joint action.
Each member state must designate a Health Data Access Body (HDAB) as its national access point for secondary use, by 2027. These are still being built (shown dotted). France: the Health Data Hub is the portal over the SNDS and the candidate HDAB. Netherlands: Health-RI coordinates federated holders such as Lifelines and is building the Dutch HDAB. Each HDAB converges on the shared FHIR layer for cross-border exchange, and structures its own data in OMOP for research, which is what allows combined reuse across countries. Different national routes, one shared destination.
Example: Multimodal data & HED
- Data can be combined based on meaningful events, such as stimuli or responses
- Encoding these events for interoperability requires an event vocabulary
- HED provides such vocabulary, which is supported in community data standards
graph TD hed([HED<br/>event vocabulary]) eeg([EEG study]) -->|annotates with| hed fmri([fMRI study]) -->|annotates with| hed behav([behavioural study]) -->|annotates with| hed hed -->|integrated with| bids([BIDS]) hed -->|integrated with| nwb([NWB])
Note
BIDS organises a dataset but HED, Hierarchical Event Descriptors, annotates what happened inside it in a common vocabulary, using HED tags. This allows multimodal data to be synchronized either within a recording, between recording of the same experiment.
Introduction: Describing data is a shared discipline
- Every field must describe its data so others can find and reuse it
- The ways of doing this come from information science
- Bodies like DCMI and the W3C publish standard ways to describe data
- Most fields reuse these standard terms and add domain-specific ones
- That documented selection is called an application profile
graph TD dcmi([DCMI<br/>standards body]) w3c([W3C<br/>standards body]) dcmi -->|publishes| dc([Dublin Core<br/>general descriptive terms]) w3c -->|publishes| dcat([DCAT<br/>data and catalogue terms]) dc -->|reused by| ap([application profile<br/>a field's chosen terms]) dcat -->|reused by| ap custom([custom<br/>domain-specific]) -->|added to| ap
Note
One general point before the specific standards. Describing data so others can find, trust, and reuse it is not unique to neuroscience. It is a shared discipline, worked out over decades in library and information science, and the same ideas recur in genomics, health, climate science, and government data.
There are recognised bodies for it. The W3C, the organisation behind the standards of the web, publishes DCAT for describing datasets, alongside older standards such as Dublin Core. These supply ready-made terms for who published a dataset, what it covers, and where to find it.
A field rarely invents its own scheme. It reuses these standard terms and adds domain-specific ones only where nothing standard fits. That documented mix is an application profile, the normal and interoperable way to proceed. What follows are some examples of how neuroscience ends up organising its terms and relations.
Introduction: Vocabularies, ontologies, and data models
- A vocabulary defines approved terms
- A thesaurus adds broader, narrower, and related links
- An ontology adds typed relations
- A data model is a set of required (and optional) fields
- Metadata is a dataset’s filled-in description
graph TD vocab(["vocabulary<br/>approved term: <code>jaguar</code><br/>not <code>panther</code>"]) thes(["thesaurus<br/>broader: <code>big cat</code><br/>related: <code>Panthera</code>"]) onto(["ontology<br/><code>jaguar</code> <code>is-a</code> <code>Panthera</code><br/><code>preys-on</code> <code>capybara</code>"]) vocab -->|adds structure| thes -->|adds structure| onto onto -->|supplies terms to| dm(["data model<br/>required: <code>species</code>"]) dm -->|becomes| meta(["metadata<br/><code>species</code> = <code>jaguar</code>"])
Note
You will hear certain terms being used a lot, sometimes seemingly interchangeable, which they are not. We will use the example of describing a jaguar to explain these terms.
A vocabulary is an agreed list of approved words, so everyone writes
jaguarand nobody writespantherfor the same animal. A thesaurus adds a little structure on top: it records thatjaguarsits underbig catand is related toPanthera, without saying exactly how. An ontology does say how: a jaguar is a kind ofPantheraand preys oncapybara. Because each link has a stated meaning, software can follow it and reason about it. Thelocated in,performs, andacts onlinks in the diagrams ahead are exactly this kind of stated relationship, which is why those resources are ontologies and not just lists.The last two words are about description rather than naming. A data model is a form with fields, where each field has to be filled from an approved vocabulary: a
speciesfield that only accepts a listed term. A data model is also called a schema (a database schema, a JSON schema), though that word is used loosely elsewhere too. Metadata is that form filled in for one dataset:species = jaguar. So the data model gives the blank fields, vocabularies and ontologies supply the words allowed in them, and the metadata is the finished description.In practice these really do blur. SNOMED CT has enough structure to be called an ontology by some, MeSH is built as a thesaurus but often used as a plain vocabulary. The order on the slide is the thing to remember: each step adds more structure and more meaning a person or computer can act on.
Ontologies: from biology to the clinic
- Biology gives rise to traits, traits define diseases, diseases are recorded as diagnoses
- A disease can be matched from the genetics that underlie it to the code a clinician enters
graph TD bio([biology<br/>cells, anatomy, molecules]) -->|manifests as| trait([traits<br/>phenotypes, behaviour]) trait -->|grouped into| disease([disease<br/>one identifier]) disease -->|recorded as| record([clinical record<br/>a coded diagnosis])
Note
Biology (cell types, anatomy, molecular function) gives rise to observable traits (phenotypes and behaviour). Traits define diseases. A disease is recorded in a patient’s record as a diagnosis code. Each step is a different relationship: a trait is a manifestation of biology, a disease is a grouping of traits, a diagnosis code is a record of a disease.
The standard naming each of these maps to the standard naming the next. So a single disease can be matched from the genetics that underlie it to the diagnosis code a clinician enters.
The ontologies for biology and traits, and MONDO and ORDO for disease, follow the shared rules of the OBO Foundry: one term defined in one ontology and reused by the others, under an open licence. That shared discipline is what lets them fit together. The clinical and literature coding systems that appear later (ICD, MeSH, SNOMED CT, and the drug, lab, and procedure codes) are governed separately and do not follow these rules.
Example: Ontologies (1/4) — biology: anatomy, cells, and molecules
- Each ontology covers a distinct biological domain and connects to adjacent ones.
- Cell type is one aspect of anatomy, performs molecular function, acting on chemicals
graph TD cl([Cell Ontology<br/>cell type]) uberon([UBERON<br/>anatomy]) go([GO<br/>molecular function]) chebi([ChEBI<br/>chemical entities]) cl -->|located in| uberon cl -->|performs| go go -->|acts on| chebi
Note
Where metadata standards describe how a dataset is structured, ontologies name what it is about: the cell types, brain regions, molecules, traits, and diseases a record refers to. Each ontology covers one kind of thing and links to its neighbours, so a term defined in one can be used consistently in another. All of them are endorsed by the OBO Foundry, which sets shared design rules so the separate ontologies fit together. That endorsement is the same for every ontology, so it is left out of these diagrams.
This first group covers the biological subject matter. A cell type (Cell Ontology) is located in a brain region or other anatomical structure (UBERON) and performs molecular functions (GO), which act on chemical entities (ChEBI) such as drugs and neurotransmitters. This is the biology that the traits, diseases, and diagnoses in the examples that follow rest on.
Example: Ontologies (2/4) — traits: phenotype and behaviour
- A phenotype is an observable trait, named separately from the disease behind it
- Human and mouse phenotypes use different ontologies, linked by a shared mapping
- The mapping lets a trait seen in a mouse be matched to the human equivalent
graph TD hpo([HPO<br/>human phenotype]) mp([MP<br/>mouse, rat<br/>phenotype]) nbo([NBO<br/>behaviour]) hpo <-->|co-maintained mapping| mp nbo -->|describes behaviour for| hpo nbo -->|describes behaviour for| mp
Note
A phenotype is an observable trait, such as a seizure type or a memory deficit, named separately from the disease that produces it (the next group). HPO covers human phenotypes and MP covers the phenotypes of mice, rats, and other model organisms.
The link between them is the vault predicate
correspondsWith: two standards that maintain a shared mapping together, where neither is derived from the other. Here it is the Mouse-Human Ontology Mapping Initiative, which lets a trait seen in a mouse model be matched to the human equivalent for disease research. NBO adds behavioural and neurological traits for both. HPO, MP, and NBO are all OBO Foundry ontologies, which is what makes the mapping between them tractable.
Example: Ontologies (3/4) — disease: harmonising the systems
- The same disease is coded differently for clinics, genetics, rare disease, and literature
- Each system was built for its own purpose, so the codes do not line up
- One ontology maps every system to a single research disease identifier
graph BT omim([OMIM<br/>Mendelian genetics]) -->|maps to| mondo([MONDO<br/>one research disease ID]) ordo([ORDO<br/>rare disease]) -->|maps to| mondo icd([ICD-10 / ICD-11<br/>clinical coding]) -->|maps to| mondo mesh([MeSH<br/>literature]) -->|maps to| mondo
Note
The same disease is recorded in several systems, each built for a different purpose. ICD-10 and its successor ICD-11 are the WHO classifications used for clinical coding, statistics, and billing. OMIM catalogues Mendelian gene-disease relationships. ORDO, produced by Orphanet, is the European rare-disease reference. MeSH indexes the disease literature for PubMed. Because each was built separately, their codes do not line up.
MONDO assigns one identifier per disease and curates a mapping to the matching code in every system, which is why all the arrows point up into it. For example MONDO:0004975 for Alzheimer’s maps to ICD-10 G30, OMIM 104300, and ORPHA:26929. That single set of mappings is what lets a disease be queried across databases that each chose a different classification, whether the starting point is a clinical code, a genetic catalogue, a rare-disease reference, or the literature.
This is also where the OBO Foundry boundary falls. MONDO and ORDO follow OBO’s rules; ICD-10, ICD-11, and MeSH are governed separately and do not. MONDO, an OBO ontology, is the one that reaches across the boundary, mapping out to the clinical and literature systems that sit outside it. So this is not a handover between two separate worlds. An ontology built to OBO’s rules does the work of connecting them.
Example: Ontologies (4/4) — the clinic: coding a record
- A patient record holds several kinds of fact, each coded in its own terminology
- Diagnoses, drugs, labs, procedures, adverse events, and cancer detail each have one
- A common data model such as OMOP CDM gives every field a slot and a standard code
graph TD omop([OMOP CDM<br/>common data model]) omop -->|codes with| snomed([SNOMED CT<br/>conditions]) omop -->|codes with| rxnorm([RxNorm / ATC<br/>drugs]) omop -->|codes with| loinc([LOINC<br/>measurements]) omop -->|codes with| ccam([CCAM<br/>procedures]) omop -->|codes with| meddra([MedDRA<br/>adverse events]) omop -->|codes with| icdo([ICD-O-3<br/>cancer detail])
Note
A single patient record is not coded by one terminology but by several, one for each kind of fact it holds. Diagnoses use SNOMED CT, drugs use RxNorm (with ATC for drug class), lab results and measurements use LOINC, procedures use CCAM in France, adverse events use MedDRA, and cancer morphology uses ICD-O-3. Each codes one slice of the record, the way the research ontologies each code one kind of biological thing.
This is where the data models come in, and it answers how they relate to the terminologies. A common data model such as the OMOP CDM does not replace these vocabularies. It provides a table for each kind of fact (conditions, drugs, measurements, procedures) and requires each to be coded in a designated standard: conditions in SNOMED CT, drugs in RxNorm, measurements in LOINC. The model is the container, the terminologies are what fill it. CDISC, the clinical-trials model, does the same for trial submissions, drawing its adverse-event coding from MedDRA. The models are covered in the Health and Clinical Trials perspectives.
The same disease runs through all of these examples. A molecular function gives rise to an observable trait; traits define a disease; the disease is recorded as a diagnosis code in a patient’s chart. Because the standard naming each step maps to the one naming the next, a single disease can be matched from the genetics that underlie it to the diagnosis code used in care.
How the graph is built and maintained
A node is a plain text file
- Every node originates as one Markdown file in Obsidian
- Frontmatter: name, website, status, parent,
type/anddomain/tags,verified - Body: Overview, Connections (labelled edges), Resources
- Anyone who can edit text can contribute a node
---
name: Brain Imaging Data Structure
aliases:
- BIDS
website: https://bids.neuroimaging.io
status: active
founded: 2016
parent_org: BIDS Steering Group
tags:
- type/datamodel # directory + graph colour
- domain/neuroimaging # research area, used for filtering
verified: true
last_reviewed: 2026-06-01
---Note
Every node is a single Markdown file, in the Obsidian standard: a YAML frontmatter header and a short body.
The frontmatter in YAML holds name and aliases, website, status and founding year, parent organisation, the type/ and domain/ tags, and a verified flag with the date last checked against primary sources. The type/ tag places the node in one of four families: Actors that do research, Standards that encode data, Resources that store and process it, and Governance that coordinates and regulates it.
The body has three sections:
- Overview (a 3-6 sentence plain-language summary)
- Connections (the labelled edges, e.g.
governedBy: BIDS Steering Group)- Resources (primary-source URLs).
Nodes and edges
- Every entity is a node with labelled connections (edges) to other nodes
- The edge is written once, in the dependent node, pointing up to the authority
- Labels come from a controlled vocabulary (FAIRsharing, schema.org, and custom)
- Reverse connections show automatically as backlinks
graph BT on([OpenNeuro]) -->|requires| bids([BIDS]) nidm([NIDM]) -->|extends| bids bids -->|governedBy| sg([BIDS Steering Group]) bids -->|endorsedBy| incf([INCF]) bids -.->|backlink, automatic| on
Note
Each node carries labelled connections (edges) to other nodes. A node requires at least one real significant edge: an unconnected node says nothing about how the field is organised.
An edge has a direction and a label. It is written in the more specific or dependent node, pointing up to the authority, standard, or parent it depends on. The authority doesn’t point down to those that depend on it, to prevent accumulation upwards. However, these backwards connections (backlinks) are shown automatically on the site (and in Obsidian)
The label comes from a controlled vocabulary composited from FAIRsharing (data-flow terms), schema.org, Dublin Core (structural terms), and custom governance terms. Reusing established terms where they exist, and minting our own only where open-neuroscience governance has no standard equivalent, is what makes this an application profile rather than a private vocabulary. The Vocabulary page shows the full list and names each term’s source.
Inclusion criteria
- 1. Domain scope: Does the entity operates in a neuroscience data domain?
- 2. Type-appropriate function: Does it fit a clear type/ tag (see Vocabulary)
- 3. Edge generation: Does it have at least one significant (strong) labelable connection
- 4. Participation is not enough: Endorsing or belonging is not enough
- 5. Precedent: Would adding the entity commit the vault to include all of its kind? (if yes, exclude)
- 6. Removal: Would deleting it leave a dangling link? (if no, exclude)
graph TD start([candidate]) --> t1([in scope?]) t1 -->|no| out([not included]) t1 -->|yes| t2([existing type?]) t2 -->|no| out t2 -->|yes| t3([1+ edge?]) t3 -->|no| out t3 -->|yes| t4([precedent ok?]) t4 -->|no| out t4 -->|yes| node([node])
Note
A node must pass six tests, in order:
- Domain scope: operates in a neuroscience data domain.
- Type-appropriate function: clears the bar for its kind (a repository holds data; an institute operates open infrastructure, not just research).
- Edge generation: produces at least one labelable connection.
- Participation is not enough: endorsing or belonging does not substitute for tests 1-3.
- Precedent: would adding it commit the vault to every other entity of the same kind?
- Removal: if deleted, would any node be left with a dangling link?
Domain tags
- Each data-facing node carries a
domain/tag for its research area - Cross-domain governance and regulatory entities carry none
- Tags allow filtering
graph TD repo([OpenNeuro]) t1(["#type/repository"]) -.- repo t2(["#domain/neuroimaging"]) -.- repo
Note
Each data-facing node carries a domain/ tag for the research area it serves. Cross-domain governance and regulatory entities carry none. Tags drive filtering: every perspective and domain view is a query over them. Domains: neuroimaging, electrophysiology, genomics, biosamples, bioimaging, behavior, clinical, health, computational, reproducibility.
Built to be found
- Registered for discovery: FAIRsharing, bio.tools, w3id, Zenodo DOI
- Two targets are themselves nodes (registries the vault catalogues)
- The graph’s own discoverability uses the same
registeredInpredicate
graph TD ong([Open Neuroscience Graph]) -->|registeredIn| fs([FAIRsharing]) ong -->|registeredIn| bt([bio.tools]) ong -->|registeredIn| w3([w3id]) ong -->|registeredIn| zen([Zenodo DOI])
Note
The graph is registered for Findability (FAIR): a FAIRsharing record, a bio.tools entry, a permanent [[openneuroscience|w3id identifier]], and a Zenodo DOI for citation. Two targets are themselves nodes (FAIRsharing and bio.tools are registries the vault catalogues), so the graph’s own discoverability uses the same registeredIn predicate it applies to any dataset or identifier.
How it is built and run
- Plain text and open tooling, fully transferable by design
- Maintainer writes notes in Obsidian, Claude structures and completes them
- Maintainer curates in VS Code using Git
- Site build with Quartz, hosted on Gitlab Pages, with Matomo analytics
graph TD maint([Maintainer]) -->|writes raw notes| notes([Obsidian<br/>Raw notes]) notes -->|scope & evaluate| claude([Claude]) claude -->|write nodes| nodes([Obsidian<br/>Structured node]) claude -.->|curate| nodes nodes -->|review| vsc([VS Code<br/>any maintainer curates]) vsc -->|commit| gl([GitLab]) gl -.->|change log| vsc gl -->|build| quartz([Quartz]) quartz -->|deploy| pages([GitLab Pages]) pages -->|measure| matomo([Matomo analytics])
Note
Openneuroscience Graph is build and maintained by me (Stephen Whitmarsh). I build a system that works for me, allows me to maintain and scale this work, while anticipating future contributions and maintenance by the community. The following is therefor written in the first-person, but I explain it because it might be useful for you - as future contributor - as well.
I use Obsidian when I can, or write directly in TODO.md. I’ve installed Claude MCP, so that it can directly read and write within the vault. I then use it to scope and evaluate new notes and harmonize formatting. The end result is structured in the real vault format, including the YAML frontmatter. Through git integration in Visual Studio Code, it is easy for me to review, curate and validate new nodes. From each commit the site is automatically build with Quartz, hosted through GitLab Pages, and measured with Matomo (no cookies). For more details see CONTRIBUTING.md.
Because everything (vault and the build) is shared in Git, any maintainer, on any machine, can clone or pull the vault.
Finally, Quartz is also heavily customized. This is done with custom code in such a way that a rebase of Quartz can be done without problems. This is documented in QUARTZ.md.
Where it lives
- Site: openneuroscience.org
- Repository: gitlab.com/icm-institute/dac/opensciencegraph
- DOI: 10.5281/zenodo.20181900
- FAIRsharing: https://fairsharing.org/8243
- bio.tools: https://bio.tools/open_neuroscience_graph
- w3id: https://w3id.org/openneuroscience/graph
- Contact: stephenwhitmarsh@proton.me
Note
The database and code are part of the open neuroscience ecosystem itself, and effort is keep it FAIR. This is important because standards, tools and actors all change continuously. Contributions, corrections, and suggestions are therefore essential, especially anything that makes this graph (and the field) more inclusive. Editing conventions, inclusion criteria, frontmatter fields, and Dataview query examples are in CONTRIBUTING.md and DATAVIEW.md, or get in touch via the repository or by email.
Citing
Creating and maintaining this resource requires real time and effort, so please acknowledge this in your work by citing: Whitmarsh, S. (2026). Open Neuroscience Graph. Zenodo. https://doi.org/10.5281/zenodo.20181900.

