SAM / BAM / CRAM — Sequence Alignment Formats
Overview
SAM (Sequence Alignment Map), BAM (Binary Alignment Map), and CRAM are the standard formats for storing sequencing reads after they have been aligned to a reference genome. Developed originally at the Wellcome Sanger Institute in 2009 and now governed by GA4GH through the hts-specs community, these formats sit between raw read production in FASTQ and downstream analyses that produce VCF variant files or count matrices. SAM is the human-readable text specification. BAM is its compressed binary equivalent used in day-to-day analysis. CRAM uses reference-based compression for archival storage.
Connections
- governedBy: GA4GH (hts-specs)
- relatedTo: FASTQ (produced by aligning FASTQ reads to a reference genome)
- relatedTo: VCF (VCF files are produced downstream from variant calling on SAM/BAM/CRAM alignments)
- relatedTo: AnnData (SAM/BAM/CRAM alignments are quantified to produce count matrices stored in AnnData format)
Resources
- https://samtools.github.io/hts-specs/ (SAM/BAM/CRAM specification)
- https://samtools.github.io/hts-specs/SAMv1.pdf (SAM format specification)
- https://samtools.github.io/hts-specs/CRAMv3.pdf (CRAM v3 specification)

