SAM / BAM / CRAM — Sequence Alignment Formats

Overview

SAM (Sequence Alignment Map), BAM (Binary Alignment Map), and CRAM are the standard formats for storing sequencing reads after they have been aligned to a reference genome. Developed originally at the Wellcome Sanger Institute in 2009 and now governed by GA4GH through the hts-specs community, these formats sit between raw read production in FASTQ and downstream analyses that produce VCF variant files or count matrices. SAM is the human-readable text specification. BAM is its compressed binary equivalent used in day-to-day analysis. CRAM uses reference-based compression for archival storage.

Connections

  • Governed by: GA4GH (hts-specs)
  • Upstream format: FASTQ (raw reads from sequencer)
  • Downstream formats: VCF (variant calling), AnnData (single-cell quantification)

Resources