ENA — European Nucleotide Archive

Overview

The European Nucleotide Archive (ENA) is the primary European open-access repository for raw nucleotide sequence data, operated by EMBL-EBI. It is one of three partner databases in the International Nucleotide Sequence Database Collaboration (INSDC), alongside SRA (NCBI, US) and DDBJ (Japan), synchronising data daily so that a submission to any one archive becomes accessible via all three. ENA accepts and archives raw sequencing reads (FASTQ, BAM, CRAM), assembled sequences, and annotated sequences from all organisms, and is the mandatory European deposition point for raw sequencing data associated with publications. It is the open-access raw data companion to EGA, which handles controlled-access human genomic data that cannot be publicly released.

Scope

ENA archives three broad categories of sequence data. Raw reads include FASTQ, BAM, and CRAM files from all sequencing technologies. Assembled sequences cover genome assemblies, transcriptome assemblies, and metagenomes. Annotated sequences include genes, coding sequences, non-coding RNA, and regulatory elements, linked to protein databases (UniProt) and genome browsers (Ensembl).

Relationship to EGA

ENA and EGA are complementary archives operated by EMBL-EBI serving distinct use cases:

FeatureENAEGA
AccessFully openControlled (DAC approval required)
Data sensitivityNon-sensitive (model organisms, non-human, aggregated)Sensitive human genomic/phenotypic data
Raw formatFASTQ, BAM, CRAMFASTQ, BAM, CRAM (Crypt4GH encrypted)
GovernanceEMBL-EBI (INSDC framework)EMBL-EBI + CRG (Barcelona)

In practice, a human genomics study will deposit raw FASTQ/CRAM files in EGA (controlled access), while the non-sensitive processed outputs (normalised expression matrices, variant summaries) may go to NCBI GEO or EVA.

Connections

  • INSDC partners: SRA (NCBI, US), DDBJ (Japan) (see https://www.insdc.org)
  • Controlled-access counterpart: EGA (human sensitive data)
  • Open variant archive: EVA (short variants derived from ENA data)
  • Part of: ELIXIR (core data resource)

Resources