DataLad

Overview

DataLad is a free and open-source data management system that brings version control to scientific data, built on Git and git-annex. Developed from 2013 through a collaboration between Forschungszentrum Jülich and Dartmouth College, it was initially designed for neuroscience but has since been adopted across research domains. Git handles dataset versioning and provenance. git-annex manages the actual file content, enabling large datasets to be distributed across institutions and linked into modular project structures without requiring centralised storage. It tracks not just what data exists, but the exact commands and parameters that produced it, making analyses re-executable and results reproducible.

Resources