Skip to content

File Structure

ContextKit stores all metadata in a context/ directory at the root of your project. Files are organized by concern — schema definition, governance, business rules, lineage, glossary, and ownership — so that each file has a single, clear purpose.

contextkit.config.yaml # Project config and data sources (project root)
context/
models/
orders.osi.yaml # OSI semantic model for "orders"
customers.osi.yaml
governance/
orders.governance.yaml # Ownership, trust, security, semantic roles
customers.governance.yaml
orders.rules.yaml # Golden queries, business rules, guardrails
customers.rules.yaml
lineage/
orders.lineage.yaml # Upstream sources for "orders"
customers.lineage.yaml
glossary/
revenue.term.yaml # Business glossary term
churn.term.yaml
arr.term.yaml
owners/
data-engineering.owner.yaml # Team ownership definition
analytics.owner.yaml

These files follow the OSI specification and define the schema of your semantic layer: datasets, fields (with expressions and types), relationships between datasets, and metrics. OSI files are the source of truth for what exists.

ContextKit never modifies OSI files. They may be generated by context introspect, but once created, all enrichment and governance metadata goes into companion files. This separation means you can safely regenerate or update OSI files from your warehouse without losing governance work.

Governance files live in the context/governance/ directory and add the metadata needed to trust and correctly use the data:

  • Ownership — which team or individual is responsible
  • Trust statustrusted, verified, unverified, deprecated
  • Security classification — field-level sensitivity (public, internal, confidential, restricted)
  • Grain — the columns that uniquely identify a row
  • Semantic roles — what each field represents (identifier, measure, attribute, date, etc.)
  • Sample values — representative values for each field
  • Refresh cadence — how often the data updates
  • Tags — categorical labels for discovery

*.rules.yaml — Business rules and queries

Section titled “*.rules.yaml — Business rules and queries”

Rules files capture the knowledge needed for AI agents to use data correctly:

  • Golden queries — tested, approved queries that demonstrate correct usage
  • Business rules — prose or structured explanations of business logic
  • Guardrail filters — required filters that must always be applied (e.g., is_deleted = false)
  • Hierarchies — dimensional drill paths (e.g., country > region > city)
  • Aggregation rules — default aggregation for measures

Lineage files document where data comes from: the upstream sources, transformations, and dependencies that produced each dataset. This is critical for impact analysis and for AI agents to understand data provenance.

Each glossary term gets its own file in the glossary/ directory. A term file defines the business meaning of a concept (e.g., “ARR”, “churn”, “active user”), including its canonical definition, related terms, and which fields map to it.

Fields in governance files link to glossary terms, connecting technical column names to business language.

Owner files define teams or individuals who are responsible for data assets. They include contact information, escalation paths, and the list of datasets they own.

contextkit.config.yaml — Project configuration

Section titled “contextkit.config.yaml — Project configuration”

The root configuration file for a ContextKit project. It defines:

  • Data sources — warehouse connections (Snowflake, BigQuery, Postgres, DuckDB, etc.)
  • Project settings — default schemas, naming conventions, output paths
  • Enrichment configuration — LLM provider settings for context enrich

The most important design principle in ContextKit’s file structure is that OSI files define schema and companion files add governance. This means:

  1. OSI files are portable. They follow an open standard and can be consumed by any tool that supports OSI.
  2. Governance is additive. You can regenerate OSI files from your warehouse without losing trust status, business rules, or semantic roles.
  3. Each concern has one home. Ownership lives in governance files, not scattered across multiple locations. Business rules live in rules files, not embedded in schema definitions.

This separation makes it safe to run context introspect repeatedly as your warehouse evolves — new tables and columns are picked up automatically, while existing governance metadata is preserved.