FAIRy: Reproducible Datasets by Default — DSLC Project Club

Data Science Learning Community Project • December 13, 2025

Abstract

Community talk + live demo (recorded). This presentation covers FAIRy's approach to reproducible datasets by default, including the problem it solves, the validation approach, output/report bundles, rulepacks, and who it's for. The talk demonstrates how FAIRy helps researchers and data stewards ensure datasets meet repository requirements before submission, reducing back-and-forth with curators.

FAIRy: Reproducible Datasets by Default — DSLC Project Club

Download slides (PDF) →

Key takeaways

Dataset submission workflows often fail due to missing metadata, format issues, and misalignment with repository requirements, leading to delays and rejected submissions
FAIRy's local-first validation system checks datasets against configurable rulepacks before submission, catching issues early in the workflow
FAIRy generates one-page readiness reports (PASS/WARN/FAIL) and attestation bundles with timestamps, file hashes, and rulepack versions that can be attached to deposits as proof of review
Domain-specific validation rules encode repository requirements (e.g., GEO, ENA, SRA, GBIF IPT/DwC-A) and can be customized for institutional policies, making validation transparent and inspectable
Designed for data stewards, core facilities, collections managers, and researchers who need to ensure datasets meet repository requirements before submission, reducing back-and-forth with curators

Try FAIRy

Get started with FAIRy in 2 minutes. See the Try FAIRy page for installation and quick start instructions.

View fairy-core repository on GitHub →

Interested in learning more or trying FAIRy?

Email hello@datadabra.com or visit our researchers page to get started.

View repository (early access) →