FASTQ — Raw Sequencing Read Format
Overview
FASTQ is the standard format for storing raw sequencing reads, the direct output of next-generation sequencing instruments. Described by Cock et al. (2010, Nucleic Acids Research), it stores base calls and their associated Phred-scaled quality scores in a simple four-line-per-read text format. FASTQ has no formal governance body but is universally adopted as the starting point of every genomics pipeline. Files are almost always gzip-compressed in practice.
Position in the Genomics Pipeline
FASTQ is the upstream input to alignment, which produces SAM-BAM-CRAM files. These are then processed for variant calling (producing VCF) or expression quantification (producing count matrices in AnnData for single-cell data).
Connections
- Downstream format: SAM-BAM-CRAM (after alignment)
Resources
- https://doi.org/10.1093/nar/gkp1137 (Cock et al. 2010, Nucleic Acids Research)
- https://www.ebi.ac.uk/ena (ENA)
- https://www.ncbi.nlm.nih.gov/sra (NCBI SRA)

