- Labs hand off spreadsheets and folders that "work for us" but don't match what repositories or core facilities require.
- Data stewards and curators spend hours chasing missing metadata, renaming files, and stitching IDs by hand.
- Everyone says "just make it FAIR," but no one hands you a checklist of "fix these 3 things and you're good."
- Repositories bounce submissions for preventable reasons — wrong file naming, missing required fields, ambiguous dates — and that wastes days.
- By the time data is finally accepted, nobody can prove what was actually checked or when.
Make research data usable, not just published.
We believe publicly funded research data is a public good — and getting it out of people's heads, laptops, and lab drives shouldn't be difficult.
How it actually works today
Where we're going
- Institution-specific rulepacks: Each lab / core / collection has non-negotiable fields. We're making those rules repeatable and enforceable without endless email.
- Submission readiness as a norm: "Attach your FAIRy readiness report with your dataset" becomes the new "fill out this checklist."
- Trustable provenance: The attestation file travels with the dataset and proves what was actually validated, which helps with internal review, journal submission, and grant reporting.
- Repeatable pre-intake checks: FAIRy gives institutions a repeatable pre-intake check — with both a human-readable fix list and a machine-readable attestation — so their data can confidently join larger integrated networks without weeks of one-off curator triage.
- Less time wasted on formatting, more time on actual science and curation.
Who we're accountable to
- People inside institutions who are responsible for accepting data and have to say "no" when required information is missing.
- Researchers who don't want to lose days to unclear submission requirements.
- Collections and core facilities that can't ingest material until identifiers, filenames, and required metadata line up.
- Funders and journals that need traceable evidence of data quality without getting access to the raw data itself.