Repo as System of Record for Research
A research repo can store plans, sources, assumptions, data definitions, artifacts, and verification records in one reviewable history.
Site connection
RAIL makes the repository the contract for research plans, source definitions, ontology schemas, hydration pipelines, artifacts, and agent handoffs.
Visual model
Research state as connected artifacts
Plans, sources, assumptions, tasks, and outputs become connected nodes rather than scattered files.
Interactive
Agent systems are graphs of state, routing, and tool access
Research fragments easily: notebooks, dashboards, chats, local CSVs, screenshots, API calls, and half-remembered assumptions. A repo-native system treats those pieces as versioned artifacts with explicit relationships.
Why Git Helps
Git records changes over time. For research, that means assumptions, source definitions, transformations, and results can be reviewed as a history rather than remembered as vibes.
The repo also gives agents a durable workspace: they can resume from files, not from a fading chat transcript.
What Belongs in the Repo
Not everything belongs as raw data. Large or sensitive datasets may stay external, but the repo should store definitions, access notes, schemas, checksums, query descriptions, and reproducible scripts.
The goal is not to turn Git into a warehouse. The goal is to make the research contract inspectable.
| Artifact | Repo role |
|---|---|
| Research plan | Defines question and scope |
| Source note | Explains data provenance and limits |
| Ontology schema | Defines concepts and relationships |
| Hydration pipeline | Recreates local analysis tables |
| Verification record | Documents checks and confidence |
Common Pitfalls
- Committing sensitive raw data by accident.
- Tracking outputs without tracking assumptions.
- Letting notebooks become undocumented black boxes.
- Using agents without durable handoff records.
Quick check
Quiz
What does 'repo as system of record' mean?
- The repository stores the reviewable research contract and artifacts
- Every dataset must be committed raw
- Git replaces statistical validation
- No documentation is needed
The repo preserves plans, sources, assumptions, artifacts, and verification context.