File Structure
ContextKit stores all metadata in a context/ directory at the root of your project. Files are organized by concern — schema definition, governance, business rules, lineage, glossary, and ownership — so that each file has a single, clear purpose.
Directory layout
Section titled “Directory layout”contextkit.config.yaml # Project config and data sources (project root)context/ models/ orders.osi.yaml # OSI semantic model for "orders" customers.osi.yaml governance/ orders.governance.yaml # Ownership, trust, security, semantic roles customers.governance.yaml orders.rules.yaml # Golden queries, business rules, guardrails customers.rules.yaml lineage/ orders.lineage.yaml # Upstream sources for "orders" customers.lineage.yaml glossary/ revenue.term.yaml # Business glossary term churn.term.yaml arr.term.yaml owners/ data-engineering.owner.yaml # Team ownership definition analytics.owner.yamlFile types
Section titled “File types”*.osi.yaml — OSI semantic model
Section titled “*.osi.yaml — OSI semantic model”These files follow the OSI specification and define the schema of your semantic layer: datasets, fields (with expressions and types), relationships between datasets, and metrics. OSI files are the source of truth for what exists.
ContextKit never modifies OSI files. They may be generated by context introspect, but once created, all enrichment and governance metadata goes into companion files. This separation means you can safely regenerate or update OSI files from your warehouse without losing governance work.
*.governance.yaml — Governance metadata
Section titled “*.governance.yaml — Governance metadata”Governance files live in the context/governance/ directory and add the metadata needed to trust and correctly use the data:
- Ownership — which team or individual is responsible
- Trust status —
trusted,verified,unverified,deprecated - Security classification — field-level sensitivity (
public,internal,confidential,restricted) - Grain — the columns that uniquely identify a row
- Semantic roles — what each field represents (
identifier,measure,attribute,date, etc.) - Sample values — representative values for each field
- Refresh cadence — how often the data updates
- Tags — categorical labels for discovery
*.rules.yaml — Business rules and queries
Section titled “*.rules.yaml — Business rules and queries”Rules files capture the knowledge needed for AI agents to use data correctly:
- Golden queries — tested, approved queries that demonstrate correct usage
- Business rules — prose or structured explanations of business logic
- Guardrail filters — required filters that must always be applied (e.g.,
is_deleted = false) - Hierarchies — dimensional drill paths (e.g.,
country > region > city) - Aggregation rules — default aggregation for measures
*.lineage.yaml — Lineage
Section titled “*.lineage.yaml — Lineage”Lineage files document where data comes from: the upstream sources, transformations, and dependencies that produced each dataset. This is critical for impact analysis and for AI agents to understand data provenance.
*.term.yaml — Glossary terms
Section titled “*.term.yaml — Glossary terms”Each glossary term gets its own file in the glossary/ directory. A term file defines the business meaning of a concept (e.g., “ARR”, “churn”, “active user”), including its canonical definition, related terms, and which fields map to it.
Fields in governance files link to glossary terms, connecting technical column names to business language.
*.owner.yaml — Ownership
Section titled “*.owner.yaml — Ownership”Owner files define teams or individuals who are responsible for data assets. They include contact information, escalation paths, and the list of datasets they own.
contextkit.config.yaml — Project configuration
Section titled “contextkit.config.yaml — Project configuration”The root configuration file for a ContextKit project. It defines:
- Data sources — warehouse connections (Snowflake, BigQuery, Postgres, DuckDB, etc.)
- Project settings — default schemas, naming conventions, output paths
- Enrichment configuration — LLM provider settings for
context enrich
Separation of concerns
Section titled “Separation of concerns”The most important design principle in ContextKit’s file structure is that OSI files define schema and companion files add governance. This means:
- OSI files are portable. They follow an open standard and can be consumed by any tool that supports OSI.
- Governance is additive. You can regenerate OSI files from your warehouse without losing trust status, business rules, or semantic roles.
- Each concern has one home. Ownership lives in governance files, not scattered across multiple locations. Business rules live in rules files, not embedded in schema definitions.
This separation makes it safe to run context introspect repeatedly as your warehouse evolves — new tables and columns are picked up automatically, while existing governance metadata is preserved.