FAIRy

Try GEO preflight demo

Run FAIRy locally in 2 minutes. No account needed. No uploads.

GEO (NCBI Gene Expression Omnibus) is a public repository for functional genomics data (e.g., RNA-seq). This demo checks whether a bulk-seq submission package is submission-ready.

1. Run these commands

# Create and activate a Python virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install FAIRy (engine) from GitHub
pip install -U pip
pip install "git+https://github.com/yuummmer/fairy-core.git@main"
fairy --version

# Clone GEO rulepacks (if rerunning, delete the folder first)
rm -rf fairy-rulepacks-geo
git clone https://github.com/yuummmer/fairy-rulepacks-geo.git
cd fairy-rulepacks-geo

# Run preflight
mkdir -p .tmp
fairy preflight \
  --rulepack rulepacks/geo_bulk_seq/v0_2_0.json \
  --samples  rulepacks/geo_bulk_seq/fixtures/samples_bad.tsv \
  --files    rulepacks/geo_bulk_seq/fixtures/files.tsv \
  --out      .tmp/geo_bulk_seq_report.json

# Open the report
less .tmp/geo_bulk_seq_report.md

Runs locally on your machine. All processing happens offline.

2. Open the report

# Open the report
less .tmp/geo_bulk_seq_report.md

The report is saved as Markdown (.md) and JSON. Want a clean run? Swap samples_bad.tsvsamples.tsv in the preflight command.

3. What you'll see

# FAIRy Preflight Report

- **Schema version:** 1.0.0
- **Rulepack:** geo_bulk_seq@0.2.0
- **FAIRy version:** 0.2.2
- **Generated at (UTC):** 2025-12-29T20:29:58.466142Z
- **Dataset ID:** sha256:052c2ab58c6ad35669b47881262e006a9e8b795a9af62f7aae9a9ce48d6c6faf
- **submission_ready:** `False`

## Summary

- FAIL findings: 1 ['GEO.BIO.CONTEXT_MISSING']
- WARN findings: 1 ['CORE.DATE.INVALID_ISO8601']

If `submission_ready` is `True`, FAIRy believes this dataset is ready to submit.

---

## Input provenance

These hashes and dimensions identify the exact files that FAIRy validated.
You can hand this block to a curator or PI as evidence of what was checked.

### samples.tsv

- path: 'rulepacks/geo_bulk_seq/fixtures/samples_bad.tsv'
- sha256: '96ae14a766369c0ab581bf7dc16af186fc732139adc23eb438d8de47ad49e798'
- rows: '2'
- cols: '8'

### files.tsv

- path: 'rulepacks/geo_bulk_seq/fixtures/files.tsv'
- sha256: '3305edf715ad6f1bf9ade6ee48cfc84e6599d25e42cf2a32d8741a32185ed348'
- rows: '4'
- cols: '3'

---

## Results (all current issues)

Level `fail` means "must fix before submission."
Level `warn` means "soft violation / likely curator feedback."
Level `pass` means the rule passed with no violations.

| Level | Rule | Count | Samples |
|-------|------|-------|--------|
| warn | CORE.DATE.INVALID_ISO8601 | 1 | row 1, col collection_date |
| pass | CORE.ID.UNMATCHED_SAMPLE | 0 | (none) |
| fail | GEO.BIO.CONTEXT_MISSING | 1 | row 1 |
| pass | GEO.FILE.PAIRING_MISMATCH | 0 | (none) |
| pass | GEO.REQ.MISSING_FIELD | 0 | (none) |
| pass | GEO.REQ.MISSING_PROCESSED_DATA | 0 | (none) |

### CORE.DATE.INVALID_ISO8601 (warn, 1 sample)

- row 1, column 'collection_date', message: Value '2025/01/15' in collection_date is not ISO8601 (YYYY-MM-DD)., hint: Use format YYYY-MM-DD, e.g. 2025-10-02.

### GEO.BIO.CONTEXT_MISSING (fail, 1 sample)

- row 1, message: Sample 'S1' does not provide tissue/cell_line/cell_type., hint: Fill at least one of: tissue, cell_line, or cell_type.

---

## Resolved since last run

_No baseline from prior run (first run or cache missing)._

FAIRy generates a Markdown report you can share with contributors, plus a machine-readable JSON attestation file.

4. Watch the walkthrough

2-minute demo showing how FAIRy validates datasets and generates readiness reports.

Need help with your own datasets?

If you're working with your lab's data and need help setting up validation rules or creating custom rulepacks, we're here to help.

Contact hello@datadabra.com

For labs, cores, and institutions looking to implement FAIRy for their data submission workflows.