OSI Specification
The Open Semantic Interchange (OSI) specification is an open standard for describing semantic models in a vendor-neutral format. OSI v1.0 is backed by Snowflake, dbt Labs, Salesforce, and other industry participants. ContextKit reads and validates OSI files natively.
For the full specification, see the OSI GitHub repository.
Why OSI matters
Section titled “Why OSI matters”Before OSI, every semantic layer tool used its own proprietary format. Moving metadata between tools meant writing custom adapters. OSI provides a shared language: if your metadata is in OSI format, any compliant tool can read it.
ContextKit uses OSI as the foundation layer. All schema information — datasets, fields, relationships, and metrics — lives in *.osi.yaml files. Governance, business rules, and enrichment live in separate companion files, so the OSI files remain portable and standard-compliant.
Core schema
Section titled “Core schema”An OSI file contains a semantic_model with four main sections:
semantic_model: name: orders description: Order transactions from the e-commerce platform
datasets: - name: orders description: One row per order schema: analytics table: fct_orders fields: - name: order_id expression: order_id description: Unique order identifier - name: revenue expression: order_total description: Total order amount in USD - name: order_date expression: created_at description: Date the order was placed
relationships: - name: orders_to_customers from: dataset: orders columns: - customer_id to: dataset: customers columns: - customer_id
metrics: - name: total_revenue expression: SUM(revenue) description: Sum of all order revenue ai_context: Revenue is recognized at shipment, not at order placement.Datasets and fields
Section titled “Datasets and fields”A dataset maps to a table or view in your warehouse. Each dataset declares its schema, table name, and a list of fields.
A field represents a column or derived expression. Fields are defined with:
| Property | Description |
|---|---|
name | The semantic name used in queries and references. |
expression | The SQL expression that produces this field. Can include dialect-specific syntax. |
description | Human-readable description of what the field represents. |
Dialect support in expressions
Section titled “Dialect support in expressions”Field expressions can reference raw column names for simple mappings, or use SQL expressions for derived fields:
fields: - name: full_name expression: "CONCAT(first_name, ' ', last_name)" description: Customer full name
- name: order_year expression: "EXTRACT(YEAR FROM order_date)" description: Year the order was placedWhen expressions use dialect-specific SQL, the consuming tool is responsible for translating them to the target warehouse’s syntax.
Dimension and label
Section titled “Dimension and label”Fields can be annotated with type hints that indicate whether they are dimensions (used for grouping and filtering) or labels (human-readable display values):
fields: - name: status_code expression: status description: Order status code dimension: true
- name: status_label expression: status_name description: Human-readable order status label: trueRelationships
Section titled “Relationships”Relationships define how datasets join together. OSI uses a directional model:
from— the many-side of the relationship (the table with the foreign key)to— the one-side of the relationship (the table being referenced)from_columns/to_columns— the columns that form the join condition
relationships: - name: order_items_to_orders from: dataset: order_items columns: - order_id to: dataset: orders columns: - order_id
- name: order_items_to_products from: dataset: order_items columns: - product_id to: dataset: products columns: - product_idRelationships are critical for AI agents. Without them, an agent has to guess how tables join — a common source of hallucinated queries. With explicit relationships, the agent knows exactly which columns to use and which side has many rows.
Metrics
Section titled “Metrics”Metrics are named, reusable calculations defined at the semantic model level. They reference fields from datasets and declare the aggregation logic centrally.
| Property | Description |
|---|---|
name | The metric’s identifier. |
expression | The SQL aggregation expression (e.g., SUM(revenue), COUNT(DISTINCT customer_id)). |
description | Human-readable explanation of what the metric measures. |
ai_context | Instructions for AI agents on how to interpret and use this metric. |
metrics: - name: average_order_value expression: AVG(revenue) description: Average revenue per order ai_context: instructions: > Use this metric for order-level analysis only. For customer-level averages, use customer_lifetime_aov instead. synonyms: - AOV - avg order size examples: - "What is the AOV this quarter?" - "Show me average order value by region"The ai_context property
Section titled “The ai_context property”The ai_context property can appear on metrics, fields, and datasets. It accepts two formats:
String format
Section titled “String format”A simple string with instructions for the AI:
ai_context: Revenue is recognized at shipment. Do not confuse with gross_merchandise_value which includes cancelled orders.Object format
Section titled “Object format”A structured object with specific guidance:
ai_context: instructions: > This field represents net revenue after returns and discounts. Always pair with order_date for time-series analysis. synonyms: - net sales - revenue - sales amount examples: - "What was total revenue last month?" - "Show revenue by product category"The object format is preferred for Gold-tier metadata because it gives AI agents structured, unambiguous guidance. The synonyms list helps agents match user questions to the correct field even when the user uses informal language. The examples list provides concrete query patterns the agent can follow.
ContextKit and OSI
Section titled “ContextKit and OSI”ContextKit treats OSI files as read-only schema definitions. When you run context introspect, ContextKit generates OSI files from your warehouse. When you run context enrich, ContextKit writes governance and rules to companion files — never to the OSI file itself.
This means your OSI files remain compliant with the specification and portable across tools. Any enrichment ContextKit adds is stored separately and can be stripped away without affecting the core schema definition.