dbGaP — Database of Genotypes and Phenotypes
Overview
The Database of Genotypes and Phenotypes (dbGaP) is the NIH repository for controlled-access human genomic and phenotypic data from studies investigating the relationship between genotype and phenotype, particularly genome-wide association studies (GWAS), sequencing studies, and epidemiological cohorts. Operated by NCBI/NIH, it stores data from major NIH-funded neurological and psychiatric research initiatives.
Access Model
dbGaP uses a controlled-access model managed by NIH. Aggregate-level summary statistics and metadata are freely accessible without approval. Individual-level genotype and phenotype data requires a Data Access Request (DAR) approved by the relevant Data Access Committee (DAC), submitted through NIH’s eRA Commons system. Approved researchers receive dbGaP data via the SRA Toolkit or cloud platforms.
Key Neuroscience Datasets
Notable neurological and psychiatric datasets hosted in dbGaP include the ADSP (Alzheimer’s Disease Sequencing Project, WGS and WES, 20,000+ samples), the ABCD Study (Adolescent Brain Cognitive Development, longitudinal, 11,000+ children), ENIGMA summary statistics from multiple ENIGMA working groups, NIMH collections covering schizophrenia, bipolar disorder, and ASD GWAS datasets, and UK Biobank GWAS summary statistics for neuroimaging and cognitive phenotypes.
Relationship to Other Repositories
dbGaP is the NIH primary controlled-access repository. NCBI GEO hosts open-access gene expression data while dbGaP handles the controlled-access counterpart. EGA is the European equivalent for controlled-access genomics. Raw sequencing reads from dbGaP studies are linked to NCBI SRA and synchronised with ENA and DDBJ via the INSDC framework.
Connections
- Access governance: GA4GH (Beacon, DRS)
- Phenotype annotation: HPO
- Part of: INSDC data sharing ecosystem (with DDBJ, ENA)
Resources
- https://dbgap.ncbi.nlm.nih.gov
- https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login (access request portal)

