Research infrastructureAdvanced

Repo as System of Record for Research

A research repo can store plans, sources, assumptions, data definitions, artifacts, and verification records in one reviewable history.

RAILResearchGitAuditability

Site connection

RAIL makes the repository the contract for research plans, source definitions, ontology schemas, hydration pipelines, artifacts, and agent handoffs.

Visual model

Research state as connected artifacts

Plans, sources, assumptions, tasks, and outputs become connected nodes rather than scattered files.

Interactive

Agent systems are graphs of state, routing, and tool access

1User requestinput
2Orchestratorstate update
3Search toolstate update
4Study agentstate update
5Answeroutput

Research fragments easily: notebooks, dashboards, chats, local CSVs, screenshots, API calls, and half-remembered assumptions. A repo-native system treats those pieces as versioned artifacts with explicit relationships.

PlanWhat question is being answered?
SourcesWhich data and documents are allowed?
ArtifactsWhat outputs were produced?
VerificationWhat checks make the output trustworthy?

Why Git Helps

Git records changes over time. For research, that means assumptions, source definitions, transformations, and results can be reviewed as a history rather than remembered as vibes.

The repo also gives agents a durable workspace: they can resume from files, not from a fading chat transcript.

What Belongs in the Repo

Not everything belongs as raw data. Large or sensitive datasets may stay external, but the repo should store definitions, access notes, schemas, checksums, query descriptions, and reproducible scripts.

The goal is not to turn Git into a warehouse. The goal is to make the research contract inspectable.

ArtifactRepo role
Research planDefines question and scope
Source noteExplains data provenance and limits
Ontology schemaDefines concepts and relationships
Hydration pipelineRecreates local analysis tables
Verification recordDocuments checks and confidence

Common Pitfalls

  • Committing sensitive raw data by accident.
  • Tracking outputs without tracking assumptions.
  • Letting notebooks become undocumented black boxes.
  • Using agents without durable handoff records.

Quick check

Quiz

What does 'repo as system of record' mean?
  1. The repository stores the reviewable research contract and artifacts
  2. Every dataset must be committed raw
  3. Git replaces statistical validation
  4. No documentation is needed

The repo preserves plans, sources, assumptions, artifacts, and verification context.

Sources and Further Reading