Software Heritage

Overview

Software Heritage is a non-profit initiative launched by Inria in 2016 under UNESCO auspices, with the mission to collect, preserve, and share all publicly available software source code. It contains over 16 billion unique source files from 250+ million projects as of 2024, crawled continuously from GitHub, GitLab, PyPI, npm, CRAN, and other code repositories. For research, it addresses a reproducibility problem: cited software often disappears when repositories are deleted, moved, or renamed, breaking references in publications. Software Heritage solves this by providing long-term archival and permanent identifiers that remain valid regardless of what happens to the original hosting location.

Persistent identifiers for software

When a piece of software is archived in Software Heritage, it receives a SWHID (Software Heritage persistent Identifier), a unique, permanent reference computed from the content itself rather than from its hosting location. This means the identifier stays valid even if the original repository is moved, deleted, or renamed. Researchers can include a SWHID in a publication to permanently identify the exact version of code they used, making their work reproducible long after the original repository may have changed. SWHIDs can be embedded in a CITATION.cff file, a machine-readable citation metadata file placed at the root of a repository that tells others how to cite the software.

Integration with French open science

Software Heritage is integrated with HAL, France’s national open access archive. Researchers depositing software via HAL automatically have their source code archived in Software Heritage and receive a persistent SWHID alongside their HAL identifier. This pathway implements the software citation pillar of Ouvrir la Science, which encourages French publicly funded researchers to archive and cite research software as a first-class research output.

Connections

  • Founded by: Inria / UNESCO
  • Integrated with: HAL (French national software archive pathway)

Resources