SAM / BAM / CRAM — Sequence Alignment Formats
Overview
SAM (Sequence Alignment Map), BAM (Binary Alignment Map), and CRAM are the standard formats for storing sequencing reads after they have been aligned to a reference genome. Developed originally at the Wellcome Sanger Institute in 2009 and now governed by GA4GH through the hts-specs community, these formats sit between raw read production in FASTQ and downstream analyses that produce VCF variant files or count matrices. SAM is the human-readable text specification. BAM is its compressed binary equivalent used in day-to-day analysis. CRAM uses reference-based compression for archival storage.
Connections
- Governed by: GA4GH (hts-specs)
- Upstream format: FASTQ (raw reads from sequencer)
- Downstream formats: VCF (variant calling), AnnData (single-cell quantification)
Resources
- https://samtools.github.io/hts-specs/ (SAM/BAM/CRAM specification)
- https://samtools.github.io/hts-specs/SAMv1.pdf (SAM format specification)
- https://samtools.github.io/hts-specs/CRAMv3.pdf (CRAM v3 specification)

