FAIRy

FAIRy: Reproducible Datasets by Default — DSLC Project Club

Data Science Learning Community Project • December 13, 2025

Abstract

Community talk + live demo (recorded). This presentation covers FAIRy's approach to reproducible datasets by default, including the problem it solves, the validation approach, output/report bundles, rulepacks, and who it's for. The talk demonstrates how FAIRy helps researchers and data stewards ensure datasets meet repository requirements before submission, reducing back-and-forth with curators.

FAIRy: Reproducible Datasets by Default — DSLC Project Club

Key takeaways

  • Dataset submission workflows often fail due to missing metadata, format issues, and misalignment with repository requirements, leading to delays and rejected submissions
  • FAIRy's local-first validation system checks datasets against configurable rulepacks before submission, catching issues early in the workflow
  • FAIRy generates one-page readiness reports (PASS/WARN/FAIL) and attestation bundles with timestamps, file hashes, and rulepack versions that can be attached to deposits as proof of review
  • Domain-specific validation rules encode repository requirements (e.g., GEO, ENA, SRA, GBIF IPT/DwC-A) and can be customized for institutional policies, making validation transparent and inspectable
  • Designed for data stewards, core facilities, collections managers, and researchers who need to ensure datasets meet repository requirements before submission, reducing back-and-forth with curators

Try FAIRy

Get started with FAIRy in 2 minutes. See the Try FAIRy page for installation and quick start instructions.

View fairy-core repository on GitHub →

Interested in learning more or trying FAIRy?

Email hello@datadabra.com or visit our researchers page to get started.