FASTQ — Raw Sequencing Read Format

Overview

FASTQ is the standard format for storing raw sequencing reads, the direct output of next-generation sequencing instruments. Described by Cock et al. (2010, Nucleic Acids Research), it stores base calls and their associated Phred-scaled quality scores in a simple four-line-per-read text format. FASTQ has no formal governance body but is universally adopted as the starting point of every genomics pipeline. Files are almost always gzip-compressed in practice.

Position in the Genomics Pipeline

FASTQ is the upstream input to alignment, which produces SAM-BAM-CRAM files. These are then processed for variant calling (producing VCF) or expression quantification (producing count matrices in AnnData for single-cell data).

Connections

Resources