Research infrastructureAdvanced

Repo as System of Record for Research

A research repo can store plans, sources, assumptions, data definitions, artifacts, and verification records in one reviewable history.

RAILResearchGitAuditability

Site connection

RAIL makes the repository the contract for research plans, source definitions, ontology schemas, hydration pipelines, artifacts, and agent handoffs.

RAIL project

Visual model

Research state as connected artifacts

Plans, sources, assumptions, tasks, and outputs become connected nodes rather than scattered files.

Interactive

Agent systems are graphs of state, routing, and tool access

1User requestinput

2Orchestratorstate update

3Search toolstate update

4Study agentstate update

5Answeroutput

Research fragments easily: notebooks, dashboards, chats, local CSVs, screenshots, API calls, and half-remembered assumptions. A repo-native system treats those pieces as versioned artifacts with explicit relationships.

PlanWhat question is being answered?

SourcesWhich data and documents are allowed?

ArtifactsWhat outputs were produced?

VerificationWhat checks make the output trustworthy?

Why Git Helps

Git records changes over time. For research, that means assumptions, source definitions, transformations, and results can be reviewed as a history rather than remembered as vibes.

The repo also gives agents a durable workspace: they can resume from files, not from a fading chat transcript.

What Belongs in the Repo

Not everything belongs as raw data. Large or sensitive datasets may stay external, but the repo should store definitions, access notes, schemas, checksums, query descriptions, and reproducible scripts.

The goal is not to turn Git into a warehouse. The goal is to make the research contract inspectable.

Artifact	Repo role
Research plan	Defines question and scope
Source note	Explains data provenance and limits
Ontology schema	Defines concepts and relationships
Hydration pipeline	Recreates local analysis tables
Verification record	Documents checks and confidence

Common Pitfalls

Committing sensitive raw data by accident.
Tracking outputs without tracking assumptions.
Letting notebooks become undocumented black boxes.
Using agents without durable handoff records.

Quick check

Quiz

What does 'repo as system of record' mean?

The repository stores the reviewable research contract and artifacts
Every dataset must be committed raw
Git replaces statistical validation
No documentation is needed

The repo preserves plans, sources, assumptions, artifacts, and verification context.

Sources and Further Reading

Model Context Protocol intro