SAM / BAM / CRAM — Sequence Alignment Formats

Overview

SAM (Sequence Alignment Map), BAM (Binary Alignment Map), and CRAM are the standard formats for storing sequencing reads after they have been aligned to a reference genome. Developed originally at the Wellcome Sanger Institute in 2009 and now governed by GA4GH through the hts-specs community, these formats sit between raw read production in FASTQ and downstream analyses that produce VCF variant files or count matrices. SAM is the human-readable text specification. BAM is its compressed binary equivalent used in day-to-day analysis. CRAM uses reference-based compression for archival storage.

Connections

  • governedBy: GA4GH (hts-specs)
  • relatedTo: FASTQ (produced by aligning FASTQ reads to a reference genome)
  • relatedTo: VCF (VCF files are produced downstream from variant calling on SAM/BAM/CRAM alignments)
  • relatedTo: AnnData (SAM/BAM/CRAM alignments are quantified to produce count matrices stored in AnnData format)

Resources