FAIRy

FAIRy documentation

Everything you need to know about dataset validation

⚠ Early alpha: Interfaces may change before v1.0

FAIRy is in active development. We're working with early partners to refine features and interfaces. Please report issues or share feedback via hello@datadabra.com or GitHub issues.

Get started

Quick links to FAIRy repositories and documentation:

Repositories

What FAIRy checks today

FAIRy focuses on the common reasons datasets get delayed or rejected during submission. We're modeling these checks on patterns from public repositories like GEO and Zenodo.

GEO (Gene Expression Omnibus)

Checks for missing required fields, filename patterns that include the accession ID, basic platform / sample annotations, and other issues that commonly cause GEO submissions to bounce.

Zenodo

Flags missing descriptive metadata, unclear licensing, and file organization issues that make it hard to publish a clean record.

Validation categories

  • Metadata completeness: Required fields, data types, and format validation
  • File organization: Naming conventions, directory structure, and file formats
  • Repository-style expectations: sample/platform annotations, organism/host fields, accession-aware filenames, and other elements commonly required at submission time
  • Data integrity: Checksums, file sizes, and format validation
  • Reuse signals: license clarity, contact information, and basic attribution info so a curator (or future user) knows who to reach and how it can be shared

Repository-style expectations

These checks are modeled on common reject reasons from public repositories like GEO and Zenodo (missing required fields, bad filenames, nonstandard dates). This is not an official submission approval.

FAIRy checkGEO-style requirementZenodo-style requirementStatus
Metadata completenessGEO submission guideZenodo metadata guide✓ Passed
File naming conventionGEO file namingZenodo file naming⚠ Warning
Date format standardizationGEO date formatZenodo date format⚠ Warning
Required fieldsGEO required fieldsZenodo required fields✗ Failed

What's the attestation file?

FAIRy generates an attestation file that documents your validation process. You can attach this file to the dataset bundle when you hand it to a curator, a journal, or a program officer. It's your "we actually checked this" receipt.

Why attestation matters

The attestation file provides documented proof that validation was performed, which is valuable for:

  • Institutions: Demonstrate that you have records of validation performed before submission, reducing administrative back-and-forth.
  • Journals: Show that data quality checks were performed using standardized validation rules and versioned rulepacks—demonstrating due diligence.
  • Grant panels: Prove that your institution has processes in place to streamline data deposition and reduce friction for data publication.

What the attestation file includes

  • FAIRy version and rulepack used: Documents which validation rules were applied and in which version.
  • Validation timestamp: Records when the validation was performed.
  • Summary of checks performed: Lists what was validated (e.g., dates normalized to ISO 8601, IDs validated, units standardized, ORCIDs present and well-formed).
  • File hashes and manifest information: Provides SHA-256 checksums for data files to verify integrity.
  • Repository dry-run results: Shows whether the dataset passed preflight checks for specific repositories (e.g., GEO, Zenodo).

Sample attestation file

You can download a sample attestation file to see what it looks like:

Download sample attestation file (FAIRy_attestation_example.json)

Learn more about how attestation helps with compliance and due diligence in our institutions documentation.

Note: This is illustrative; production attestation files include a signed JSON format for institutional deployments.

Data handling

FAIRy is designed to respect institutional boundaries, sensitive data, and curator workload.

Local-only processing

All validation runs inside your environment (laptop, lab machine, core facility server, HPC cluster, etc.).

FAIRy does not send your raw data, filenames, sample IDs, coordinates, or metadata to us.

Institution control

You decide where reports are written and who sees them. FAIRy produces:

  • a human-readable readiness sheet (PASS / WARN / FAIL, why it matters, how to fix), and
  • a machine-readable summary.

You can share those internally without sharing the underlying data.

No phoning home

FAIRy does not collect usage analytics or send telemetry.

We don't phone home with filenames, metadata, run logs, or error details.

The only information we receive is what you explicitly choose to send us (for example, if you fill out a pilot interest form).

Data use

We do not use your datasets or metadata to train models or build products.

FAIRy is designed to be run locally so your data stays under your control.

Licensing

FAIRy uses a dual-license and permissive content model to balance open science goals with sustainable development:

FAIRy-core (CLI + validators)

Licensed under AGPL-3.0-only. This ensures the core validator remains open while allowing commercial licensing options for organizations that require it.

Commercial licensing available

Available for organizations that cannot adopt AGPL. Contact hello@datadabra.com for details.

Rulepack schema & example rulepacks

Licensed under CC0 (or CC BY-4.0). This encourages community rulepack sharing and avoids license contamination concerns.

Sample datasets

Licensed under CC BY-4.0, allowing reuse with attribution.

Hosted UI / orchestration

Proprietary or source-available (when available).

For more details, see the FAIRy-core repository or visit our Open Science page.

What we're exploring with early partners

We're scoping next steps with institutions to expand FAIRy's preflight checks to more data types, formats, and repositories.

SRA / NCBI-style preflight

Check that required metadata fields and filenames are present before attempting SRA submission.

Goal: catch missing sample / experiment / run info early so you don’t learn about it during upload.

FASTA / GFF3 checks

Basic structural checks for common genomics formats (FASTA, GFF3): do the files parse, are the expected fields present, do IDs line up.

Goal: reduce back-and-forth on “the file won’t load” issues.

Generalist repository support

Extend preflight checks for generalist repositories (e.g. Dryad, Zenodo-style deposits): make sure descriptive metadata, licensing, and basic organization are present before packaging.

Goal: fewer “what is this / who owns this / can we even share this?” emails.

Multi-dataset runs

Run the same preflight rules across many submissions and get a summary of which ones are missing required fields.

Goal: help cores / data offices apply consistent intake standards across labs.

Core pilot promise

FAIRy’s job is simple:

One run → a human-readable readiness sheet (PASS / WARN / FAIL + how to fix) → plus a structured summary you can archive.

fairy validate /path/to/dataset --out out/

Outputs

  • readiness-report.html — what to fix, in plain English
  • validation-summary.json — machine-readable summary
  • attestation file — documented proof that validation was performed (see What's the attestation file?)

How to cite FAIRy

If you use FAIRy in your research, please cite:

APA Style

Slotnick, J. (2025). FAIRy Core (Version 0.1) [Computer software]. Datadabra.
https://github.com/yuummmer/fairy-core

BibTeX

@software{fairy2025,
  author = {Slotnick, Jennifer},
  title = {FAIRy Core},
  year = {2025},
  version = {0.1},
  publisher = {Datadabra},
  url = {https://github.com/yuummmer/fairy-core}
}

Get involved

Help shape FAIRy's development by sharing your feedback, reporting issues, or requesting new features.

hello@datadabra.com