File Structure

ContextKit stores all metadata in a context/ directory at the root of your project. Files are organized by concern — schema definition, governance, business rules, lineage, glossary, and ownership — so that each file has a single, clear purpose.

Directory layout

contextkit.config.yaml            # Project config and data sources (project root)
context/
  models/
    orders.osi.yaml               # OSI semantic model for "orders"
    customers.osi.yaml
  governance/
    orders.governance.yaml        # Ownership, trust, security, semantic roles
    customers.governance.yaml
    orders.rules.yaml             # Golden queries, business rules, guardrails
    customers.rules.yaml
  lineage/
    orders.lineage.yaml           # Upstream sources for "orders"
    customers.lineage.yaml
  glossary/
    revenue.term.yaml             # Business glossary term
    churn.term.yaml
    arr.term.yaml
  owners/
    data-engineering.owner.yaml   # Team ownership definition
    analytics.owner.yaml

File types

`*.osi.yaml` — OSI semantic model

These files follow the OSI specification and define the schema of your semantic layer: datasets, fields (with expressions and types), relationships between datasets, and metrics. OSI files are the source of truth for what exists.

ContextKit never modifies OSI files. They may be generated by context introspect, but once created, all enrichment and governance metadata goes into companion files. This separation means you can safely regenerate or update OSI files from your warehouse without losing governance work.

`*.governance.yaml` — Governance metadata

Governance files live in the context/governance/ directory and add the metadata needed to trust and correctly use the data:

Ownership — which team or individual is responsible
Trust status — trusted, verified, unverified, deprecated
Security classification — field-level sensitivity (public, internal, confidential, restricted)
Grain — the columns that uniquely identify a row
Semantic roles — what each field represents (identifier, measure, attribute, date, etc.)
Sample values — representative values for each field
Refresh cadence — how often the data updates
Tags — categorical labels for discovery

`*.rules.yaml` — Business rules and queries

Rules files capture the knowledge needed for AI agents to use data correctly:

Golden queries — tested, approved queries that demonstrate correct usage
Business rules — prose or structured explanations of business logic
Guardrail filters — required filters that must always be applied (e.g., is_deleted = false)
Hierarchies — dimensional drill paths (e.g., country > region > city)
Aggregation rules — default aggregation for measures

`*.lineage.yaml` — Lineage

Lineage files document where data comes from: the upstream sources, transformations, and dependencies that produced each dataset. This is critical for impact analysis and for AI agents to understand data provenance.

`*.term.yaml` — Glossary terms

Each glossary term gets its own file in the glossary/ directory. A term file defines the business meaning of a concept (e.g., “ARR”, “churn”, “active user”), including its canonical definition, related terms, and which fields map to it.

Fields in governance files link to glossary terms, connecting technical column names to business language.

`*.owner.yaml` — Ownership

Owner files define teams or individuals who are responsible for data assets. They include contact information, escalation paths, and the list of datasets they own.

`contextkit.config.yaml` — Project configuration

The root configuration file for a ContextKit project. It defines:

Data sources — warehouse connections (Snowflake, BigQuery, Postgres, DuckDB, etc.)
Project settings — default schemas, naming conventions, output paths
Enrichment configuration — LLM provider settings for context enrich

Separation of concerns

The most important design principle in ContextKit’s file structure is that OSI files define schema and companion files add governance. This means:

OSI files are portable. They follow an open standard and can be consumed by any tool that supports OSI.
Governance is additive. You can regenerate OSI files from your warehouse without losing trust status, business rules, or semantic roles.
Each concern has one home. Ownership lives in governance files, not scattered across multiple locations. Business rules live in rules files, not embedded in schema definitions.

This separation makes it safe to run context introspect repeatedly as your warehouse evolves — new tables and columns are picked up automatically, while existing governance metadata is preserved.

File Structure

Directory layout

File types

*.osi.yaml — OSI semantic model

*.governance.yaml — Governance metadata

*.rules.yaml — Business rules and queries

*.lineage.yaml — Lineage

*.term.yaml — Glossary terms

*.owner.yaml — Ownership

contextkit.config.yaml — Project configuration