FAIRy: Reproducible Datasets by Default — DSLC Project Club
Data Science Learning Community Project • December 13, 2025
Abstract
Community talk + live demo (recorded). This presentation covers FAIRy's approach to reproducible datasets by default, including the problem it solves, the validation approach, output/report bundles, rulepacks, and who it's for. The talk demonstrates how FAIRy helps researchers and data stewards ensure datasets meet repository requirements before submission, reducing back-and-forth with curators.
FAIRy: Reproducible Datasets by Default — DSLC Project Club
Key takeaways
- Dataset submission workflows often fail due to missing metadata, format issues, and misalignment with repository requirements, leading to delays and rejected submissions
- FAIRy's local-first validation system checks datasets against configurable rulepacks before submission, catching issues early in the workflow
- FAIRy generates one-page readiness reports (PASS/WARN/FAIL) and attestation bundles with timestamps, file hashes, and rulepack versions that can be attached to deposits as proof of review
- Domain-specific validation rules encode repository requirements (e.g., GEO, ENA, SRA, GBIF IPT/DwC-A) and can be customized for institutional policies, making validation transparent and inspectable
- Designed for data stewards, core facilities, collections managers, and researchers who need to ensure datasets meet repository requirements before submission, reducing back-and-forth with curators
Try FAIRy
Get started with FAIRy in 2 minutes. See the Try FAIRy page for installation and quick start instructions.
Interested in learning more or trying FAIRy?
Email hello@datadabra.com or visit our researchers page to get started.