---
title: "Google Cloud's Open Knowledge Format Is a Standard, Not a Product: A Deep Dive Into OKF v0.1"
description: "On June 12, 2026, Google Cloud published the Open Knowledge Format (OKF), an open specification that formalizes the LLM-wiki pattern into a portable, interoperable format: a directory of markdown files with YAML frontmatter, one required field (type), five recommended ones, and zero required tooling. The tweet from Google Cloud Tech on June 16 drove 117,000 views in 24 hours and made the spec the most-discussed knowledge-format launch of the year. This long read walks through the v0.1 spec section by section, the design choices that make it deliberately minimal, what Google is shipping alongside it (an enrichment agent for BigQuery, a static HTML visualizer, three sample bundles, and a native BigQuery Knowledge Catalog integration), and the open question every AI agent builder and data platform team should be tracking over the next six months."
date: 2026-06-17
image: "/images/heroes/2026-06-17--google-cloud-okf-open-knowledge-format-deep-dive.png"
author: lschvn
tags: ["ai", "tooling", "ecosystem"]
tldr:
  - "OKF v0.1 was published on June 12, 2026 in the GoogleCloudPlatform/knowledge-catalog repo (Apache-2.0, currently 3.3k stars) and promoted via a Google Cloud Tech tweet on June 16. The spec is a directory of markdown files with YAML frontmatter, one required field (type), five recommended ones (title, description, resource, tags, timestamp), and two reserved filenames (index.md, log.md). There is no schema registry, no central authority, and no required SDK."
  - "The 'LLM-wiki pattern' OKF formalizes is the same pattern that has been reappearing under different names for a year: Karpathy's LLM Wiki gist, Obsidian vaults wired to coding agents, the AGENTS.md / CLAUDE.md family of convention files, repos full of index.md and log.md artifacts that agents consult, and 'metadata as code' repositories inside data teams. OKF pins down the small set of conventions needed to make these instances interoperable without dictating tooling."
  - "Google shipped four reference artifacts in the same release: an enrichment agent built on the Google Agent Development Kit with Gemini as the backend, a static HTML visualizer that renders any OKF bundle as a self-contained interactive force-directed graph, three sample bundles (GA4 e-commerce, Stack Overflow, Bitcoin public datasets), and a BigQuery Knowledge Catalog integration that ingests OKF natively. The format is the contribution; the tools are proofs of concept."
  - "What changes for builders: the BigQuery Knowledge Catalog can now serve a corpus of OKF markdown to agents, which means the spec is not just a format for wikis, it is also a data-catalog export format, and the export target is the place where the most important enterprise knowledge about SQL tables, metrics, and runbooks actually lives. The interesting strategic question is whether OKF becomes the lingua franca, or whether a competing spec emerges in the next six months."
faq:
  - question: "What is the Open Knowledge Format (OKF)?"
    answer: "OKF v0.1 is an open specification published by Google Cloud on June 12, 2026 under Apache-2.0. It defines a directory of markdown files with YAML frontmatter as a portable, interoperable format for representing the metadata, context, and curated knowledge that AI agents need. The format is intentionally minimal: one required frontmatter field (type), five recommended ones (title, description, resource, tags, timestamp), two reserved filenames (index.md, log.md), standard markdown cross-links, and a permissive consumption model where unknown fields and broken links are tolerated by design. The reference implementation lives at github.com/GoogleCloudPlatform/knowledge-catalog under the okf/ directory."
  - question: "Is OKF a Google product, and will adopting it lock us in to BigQuery?"
    answer: "No on both counts. OKF is published as an open specification with the spec text under Apache-2.0, and Google has explicitly framed it as 'vendor-neutral, agent- and human-friendly.' The spec text never mentions BigQuery, Gemini, or any Google product. The BigQuery enrichment agent and the BigQuery Knowledge Catalog integration are reference implementations on one end of the producer/consumer axis, not part of the format. You can produce OKF in Notion, edit it in Obsidian, render it with Hugo or MkDocs, ship it as a tarball, mount it on a filesystem, or write a producer in any language. The lock-in question is the one the spec is designed to dissolve."
  - question: "What is the LLM-wiki pattern that OKF formalizes?"
    answer: "The LLM-wiki pattern is the practice of storing curated knowledge as markdown files with YAML frontmatter, organized in a directory tree, and letting agents (rather than search engines) traverse the structure. The pattern has been reappearing under different names for at least a year: Andrej Karpathy's LLM Wiki gist (which argues that LLMs are better at the bookkeeping that causes humans to abandon personal wikis), Obsidian vaults wired to Claude Code or Cursor, the AGENTS.md and CLAUDE.md family of convention files, repos full of index.md and log.md artifacts, and 'metadata as code' repositories inside data teams. Each instance is bespoke; OKF pins down the small set of conventions that lets these instances cooperate."
  - question: "How is OKF different from a RAG pipeline, a vector database, or a metadata catalog?"
    answer: "OKF is a file format, not a retrieval system. There is no embedding model, no chunking strategy, no query interface, and no storage layer specified. A bundle of OKF markdown files can be ingested by a RAG pipeline (each file is a natural chunk), indexed in a vector database (the frontmatter and body are both embeddable text), exported from a metadata catalog (the BigQuery enrichment agent is the example), or read directly by an agent into its context window. The format is the contract; the tooling at each end is independently swappable. This is the part the spec calls 'producer/consumer independence' and it is the design choice that distinguishes OKF from a RAG framework."
  - question: "What do I actually need to do to get started with OKF?"
    answer: "Three things. First, read the spec, which is roughly 1,000 lines and fits on a single page. Second, create a directory with a few markdown files, give each a YAML frontmatter block with at least a type field, and start writing the concepts you want an agent to be able to find. Third, run the reference visualizer (bunx or python -m enrichment_agent visualize --bundle ./my_bundle) to see your bundle rendered as an interactive graph. The reference enrichment agent works against BigQuery; for other sources, the Source interface is the documented extension point and a non-BigQuery producer can be written in any language in a few hundred lines."
  - question: "How does OKF relate to Karpathy's LLM Wiki, Obsidian, AGENTS.md, CLAUDE.md, and Hugo?"
    answer: "OKF is downstream of all five and is intentionally compatible with the shape of the artifacts they produce. Karpathy's LLM Wiki is the conceptual source: 'LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass.' Obsidian vaults already use markdown + frontmatter + cross-links, so an Obsidian vault can be turned into an OKF bundle with a small mapping layer. AGENTS.md and CLAUDE.md are the convention-file pattern that agents consult before doing real work; OKF generalizes this from one file to a directory. Hugo, MkDocs, and Jekyll already render markdown + frontmatter, so an OKF bundle is browsable as a static site out of the box. The spec's 'Relationship to other formats' section (SPEC.md §10) is explicit about this lineage."
  - question: "What does Google get out of publishing OKF as an open spec?"
    answer: "Three things, in order of strategic weight. First, distribution. If OKF becomes the lingua franca for representing enterprise knowledge, the BigQuery Knowledge Catalog becomes the natural ingestion target for any team that already runs BigQuery, which is the part of the cloud data market Google has been most focused on. Second, the BigQuery enrichment agent becomes a default producer that any BigQuery customer can run with one CLI command, which makes BigQuery the easiest place to author an OKF bundle. Third, an open spec is a hedge against any one of the dozen competing knowledge-format proposals becoming the de facto standard; by publishing the spec under Apache-2.0 and welcoming alternative implementations, Google lowers the temperature of a format war it would prefer not to fight."
  - question: "What is the license of OKF and what does that allow?"
    answer: "OKF is published under the Apache License 2.0, the same license used by the broader knowledge-catalog repository. The license is permissive: anyone can use, modify, and distribute the spec and any derivative implementation, including for commercial purposes, with attribution and a notice of changes. There are no patent grants beyond the implicit Apache-2.0 patent license, no copyleft, and no field-of-use restrictions. The choice of Apache-2.0 over MIT is itself a signal: Apache-2.0 includes the explicit patent grant that MIT lacks, which is the reason most enterprise-friendly open standards (Kubernetes, TensorFlow, Swift) use Apache-2.0 rather than MIT."
---

On the evening of June 16, 2026, the Google Cloud Tech account posted a single tweet that drove 117,000 views, 1,800 likes, and 1,800 bookmarks in 24 hours, the highest-engagement knowledge-format announcement of the year by an order of magnitude. The tweet introduced the [Open Knowledge Format (OKF)](https://github.com/GoogleCloudPlatform/knowledge-catalog/tree/main/okf), an open specification that "formalizes the LLM-wiki pattern into a portable, interoperable format." The post linked to the [spec on GitHub](https://github.com/GoogleCloudPlatform/knowledge-catalog/tree/main/okf) and to a [blog post on cloud.google.com](https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing) by Sam McVeety (Tech Lead, Data Analytics) and Amir Hormati (Tech Lead, BigQuery). The blog post was published four days before the tweet, on June 12, the same day the reference implementation was first pushed to the GoogleCloudPlatform/knowledge-catalog repository.

> [@GoogleCloudTech](https://x.com/GoogleCloudTech/status/2067012903337664886) - 22:34 · 16 juin 2026
>
> Introducing the Open Knowledge Format (OKF), an open specification that formalizes the LLM-wiki pattern into a portable, interoperable format. AI is only as smart as the context we give it. As we build more advanced, agentic AI systems, they need accurate metadata and context to organizations, that context is locked inside fragmented data catalogs, isolated wikis, scattered code comments, or the minds of senior engineers. Every time a new AI agent is built, teams are forced to solve the exact same context-assembly problem from scratch.
>
> To solve this, we've announced OKF, a vendor-neutral, open specification that formalizes the "LLM-wiki pattern" into a portable, interoperable format. It provides a standardized way to represent the enterprise knowledge that modern AI systems rely on.
>
> — Just markdown: readable in any editor, renderable on GitHub, indexable by any search tool
> — Just files: shippable as a tarball, hostable in any git repo, mountable on any filesystem
> — Just YAML frontmatter: for the small set of structured fields that need to be queryable: type, title, description, resource, tags, and timestamp
>
> We've also shipped reference implementations to help you hit the ground running, including an enrichment agent for BigQuery, a static HTML visualizer, and live sample bundles on GitHub.

The substance of the announcement is best understood as four distinct moves: the spec itself, the reference implementations, the BigQuery integration, and the positioning move. Each is doing a different job. The spec is a 1,000-line document that defines a directory of markdown files with YAML frontmatter, one required field, and two reserved filenames. The reference implementations are three artifacts: a BigQuery enrichment agent, a static HTML visualizer, and three sample bundles. The BigQuery integration is a native ingest path in the [Knowledge Catalog](https://cloud.google.com/bigquery) that turns an OKF bundle into a queryable, agent-servable surface. The positioning move is the claim that the LLM-wiki pattern is a category, not a one-off, and that the format is the contribution, not the tooling.

## The framing correction, in one paragraph

The framing worth making up front is that OKF is a format, not a product, and that the most common misreading of the announcement is to treat it as a Google Cloud product launch. The spec text never mentions BigQuery, Gemini, or any Google product. The reference implementations are deliberately called "proofs of concept" in the README, and the spec's "Relationship to other formats" section (SPEC.md §10) is explicit that OKF is downstream of patterns the community has been building for at least a year: [Karpathy's LLM Wiki gist](https://gist.github.com/karpathy), Obsidian vaults wired to coding agents, the [AGENTS.md and CLAUDE.md family](/articles/2026-03-23-claude-code-rise-ai-coding-tool-2026) of convention files, and the "metadata as code" repositories inside data teams. What Google has done, in the cleanest reading, is pin down the small set of conventions that lets these instances cooperate, publish the pin-down under Apache-2.0, and ship one reference implementation per end of the producer/consumer axis. The lock-in question is the one the spec is designed to dissolve.

The corollary is the part that matters for builders. If OKF becomes the lingua franca, the [BigQuery Knowledge Catalog](https://cloud.google.com/bigquery/docs/knowledge-catalog) becomes the natural ingestion target for any team that already runs BigQuery, which is the largest part of the cloud data market. That is the strategic move, and the open-spec framing is the way the move is being made without triggering the kind of regulatory and competitive scrutiny a closed product launch would invite. The format is the contribution, and the format is also the wedge.

## The spec, in one screen

The v0.1 spec, [SPEC.md](https://github.com/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/SPEC.md) in the repository, is roughly 1,000 lines of markdown with 11 sections, a conformance appendix, and a single minimal example bundle. The structure of a bundle is the part that anchors everything else, and it is small enough to reproduce in full:

```
sales/
├── index.md                      # Optional. Directory listing for progressive disclosure.
├── log.md                        # Optional. Chronological history of updates.
├── <concept>.md                  # A concept at the bundle root.
└── <subdirectory>/               # Subdirectories organize concepts into groups.
    ├── index.md
    ├── <concept>.md
    └── <subdirectory>/
        └── …
```

A bundle is a directory tree. The directory structure is independent of the domain, and producers organize concepts however makes sense for the knowledge being captured. A bundle MAY be distributed as a git repository (the recommended form, because git provides history, attribution, and diffs), as a tarball or zip archive, or as a subdirectory within a larger repository. The reserved filenames, `index.md` and `log.md`, have defined meaning at any level of the hierarchy and MUST NOT be used for concept documents. All other `.md` files in the tree are concept documents.

A concept is a single UTF-8 markdown file with two parts: a YAML frontmatter block delimited by `---` lines at the top of the file, and a markdown body containing free-form content. The frontmatter is where the structural work happens, and the spec is opinionated about exactly one thing in the frontmatter: the `type` field is required, must be non-empty, and is the only field that consumers are allowed to require. Everything else is producer-defined.

```yaml
---
type: <Type name>                  # REQUIRED
title: <Optional display name>
description: <Optional one-line summary>
resource: <Optional canonical URI for the underlying asset>
tags: [<tag>, <tag>, …]            # Optional
timestamp: <ISO 8601 datetime>     # Optional last-modified time
# … other producer-defined key/value pairs
---
```

The recommended fields, in priority order, are `title` (human-readable display name, derived from the filename if absent), `description` (a single-sentence summary used by index generators, search snippets, and previews), `resource` (a canonical URI for the underlying asset the concept describes, omitted for concepts that describe abstract ideas rather than physical resources), `tags` (a YAML list of short strings for cross-cutting categorization), and `timestamp` (ISO 8601 datetime of last meaningful change). Producers MAY include any additional keys, and consumers SHOULD preserve unknown keys when round-tripping and SHOULD NOT reject documents with unrecognized fields. The tolerance is intentional: OKF is meant to remain useful as bundles grow, get refactored, and are partially generated by agents.

The body is standard markdown. There are no required body sections, but the spec names three conventional section headings that producers SHOULD use when applicable: `# Schema` (a structured description of an asset's columns or fields, typically as a markdown table), `# Examples` (concrete usage examples, often as fenced code blocks), and `# Citations` (external sources backing claims in the body, with numbered references). The body is where most of the value lives for an agent consumer, and the structural guidance is the part that distinguishes an OKF document from a personal-notes file in the same directory.

The type field is the part that is doing the most work. Type values are not registered centrally, and the spec gives the example list as `BigQuery Table`, `BigQuery Dataset`, `API Endpoint`, `Metric`, `Playbook`, `Reference`. Producers SHOULD pick values that are descriptive and self-explanatory; consumers MUST tolerate unknown types gracefully, typically by treating them as generic concepts. The tolerance is the same as the frontmatter-extensions tolerance: an agent that encounters a concept of type `Sev1 Incident Runbook` from a vendor it has never seen should not refuse the bundle, it should render the concept as a generic document and let the producer's prose describe the semantics. The permissive consumption model is the design choice that makes OKF viable as a community-maintained format rather than a Google-controlled schema.

## The spec, in detail: linking, indexing, logging, conformance

The interesting parts of the spec are the four mechanisms that turn a directory of files into a navigable, version-controllable knowledge corpus. They are small, they compose, and each one is the answer to a question that any team that has tried to maintain a wiki of more than a few hundred pages will recognize.

Cross-linking is the part that turns the directory into a graph. Concepts MAY link to other concepts using standard markdown links, in two forms. The first is bundle-relative (absolute within the bundle): a link like `[customers table](/tables/customers.md)` is resolved relative to the bundle root, which makes it stable when documents are moved within their subdirectory. The second is relative, the standard markdown `./other.md` form. The spec recommends the absolute form because it survives reorganization. Link semantics are deliberately untyped: a link from concept A to concept B asserts a relationship, and the specific kind of relationship (parent/child, references, joins-with, depends-on) is conveyed by the surrounding prose, not by the link itself. Consumers that build a graph view typically treat all links as directed edges of an untyped relationship, and the untyped model is what keeps the spec from sliding into a schema. Consumers MUST tolerate broken links, a link whose target does not exist is not malformed, it may simply represent not-yet-written knowledge, and the tolerance is the part that makes the format usable while a bundle is being built.

Indexing is the part that supports progressive disclosure. An `index.md` file MAY appear in any directory, including the bundle root, and it enumerates the directory's contents to support a human or agent seeing what is available before opening individual documents. Index files contain no frontmatter. The body uses one or more sections, each grouping concepts under a heading, with bullet-list entries that link to the concept's relative URL and pull in the concept's description from its frontmatter. Producers MAY generate `index.md` automatically; consumers MAY synthesize one on the fly when none is present. The progressive-disclosure pattern is the part that makes OKF workable for large corpora: an agent can read the root `index.md`, decide which subdirectory is relevant, read that `index.md`, decide which concept to open, and read the concept, without ever loading the entire bundle into context. For a 10,000-concept bundle, the pattern is the difference between a usable corpus and an unusable one.

Logging is the part that supports version-control workflows. A `log.md` file MAY appear at any level of the hierarchy to record the history of changes to that scope. The format is a flat list of date-grouped entries, newest first, with date headings in ISO 8601 `YYYY-MM-DD` form. The log entries are prose, with a leading bold word (`**Update**`, `**Creation**`, `**Deprecation**`) as a convention rather than a requirement. The pattern is borrowed directly from the changelog conventions that the open-source community has been using for a decade, and the choice to keep the format prose rather than structured is the part that makes the log readable by humans, parseable by agents, and writable in a git commit message. The log is not authoritative (git is the source of truth), it is a denormalized read-optimized view of the history that a bundle curator can produce on every release.

Conformance is the part that pins down what it means to be OKF-compatible. A bundle is conformant with v0.1 if three conditions hold: every non-reserved `.md` file in the tree contains a parseable YAML frontmatter block, every frontmatter block contains a non-empty `type` field, and every reserved filename follows the structure described in §6 and §7 when present. Consumers SHOULD treat all other constraints as soft guidance, and consumers MUST NOT reject a bundle because of missing optional frontmatter fields, unknown `type` values, unknown additional frontmatter keys, broken cross-links, or missing `index.md` files. The permissive consumption model is the design choice that distinguishes OKF from a stricter schema-driven format (Protocol Buffers, Avro, OpenAPI), and the choice is deliberate: the value of a knowledge format is in how many parties speak it, and a format that rejects bundles for missing optional fields is a format that gets forked.

The versioning story is the part that determines whether OKF v0.1 is a starting point or a finished standard. The spec is versioned in the form `<major>.<minor>`, with a minor version bump introducing backward-compatible additions (new optional fields, new conventional section headings) and a major version bump reserved for breaking changes (renaming required fields, changing reserved filenames). Bundles MAY declare the OKF version they target by including `okf_version: "0.1"` in a bundle-root `index.md` frontmatter block, which is the only place frontmatter is permitted in an `index.md`. Consumers that do not understand the declared version SHOULD attempt best-effort consumption rather than refusing the bundle. The pattern is the same one the IETF, W3C, and WHATWG have been using for two decades, and the choice to be explicit about the version semantics is the part that makes OKF a candidate for actual standardization rather than a one-off format.

## The LLM-wiki pattern, named

The most useful thing the announcement does is give a name to a pattern that has been quietly emerging across the AI ecosystem for at least a year. The pattern is the practice of storing curated knowledge as markdown files with YAML frontmatter, organized in a directory tree, and letting agents (rather than search engines) traverse the structure. The pattern has been reappearing under different names, in different communities, for different reasons, and OKF is the first spec to claim the category rather than just one of the instances.

[Andrej Karpathy's LLM Wiki gist](https://gist.github.com/karpathy) is the conceptual source, and the blog post quotes it directly: "LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass." The argument is that the bookkeeping that causes humans to abandon personal wikis (updating cross-references, fixing dead links, restructuring hierarchies) is exactly what LLMs are good at. The same insight appears in different forms across the AI tooling ecosystem. Obsidian vaults wired to coding agents are the most visible instance: a developer maintains a vault of notes about their codebase, an agent reads the vault before doing real work, the agent updates the vault as it learns, and the vault becomes a living memory of the project that grows more useful over time. The [AGENTS.md and CLAUDE.md](/articles/2026-03-23-claude-code-rise-ai-coding-tool-2026) family of convention files is the same pattern in single-file form: a single markdown file at the repository root that an agent consults before doing real work, and that the agent updates as it learns the project conventions. The pattern also shows up in data teams as "metadata as code" repositories, where SQL table documentation, metric definitions, and runbooks live in version-controlled markdown files rather than in a separate metadata registry.

The reason the pattern has been reappearing is that it solves a problem that has become acute as agents have become the primary consumers of enterprise knowledge. Search-based retrieval (find me a document that contains the string `weekly_active_users`) breaks down when the agent needs to assemble a context from ten different documents, with cross-references between them, in a specific order, with a specific level of confidence in each piece. A directory of structured markdown files lets the agent navigate the structure: read the root index, decide which subdirectory is relevant, read the subdirectory index, decide which concept to open, read the concept, follow the cross-links. The navigation is explicit in the file system and explicit in the frontmatter, which is the part that makes it both human-readable and agent-navigable.

The reason the pattern has been a mess until now is that each instance is bespoke. Karpathy's wiki and your team's wiki and a vendor's catalog export may all look alike (markdown, frontmatter, cross-links), but none of them are intentionally designed to cooperate. There is no agreed-upon answer to what fields every document should carry, or what filenames mean what. As a result, the knowledge encoded in wikis remains siloed within the original teams, leading to redundant effort whenever a new agent is built. OKF is the answer to the question "what is the smallest set of conventions that lets all of these instances cooperate," and the answer is small enough to fit in a 1,000-line spec.

The thing to notice about the pattern is that it is downstream of three older ideas, and the lineage is part of the reason the spec is so minimal. The first is the static-site-generator pattern (Hugo, Jekyll, MkDocs, Docusaurus), which has been rendering markdown + frontmatter for over a decade. The second is the personal-knowledge-management pattern (Obsidian, Notion, Roam, Logseq), which has been maintaining markdown + frontmatter + cross-links for personal use for almost as long. The third is the documentation-as-code pattern (docs as markdown in the same repo as the code, deployed on every release), which is now the default for almost every developer-facing product. OKF is the combination of all three, with the agent as the new consumer, and the spec is minimal because the underlying patterns are already minimal. The work was always there; OKF is the part that names it.

## What Google is shipping: four reference artifacts, one strategic integration

The reference implementations are deliberately called "proofs of concept" in the README, and the framing is important: the format is the contribution, and the tools exist to make the format tangible at both ends of the producer/consumer axis. Google shipped four distinct artifacts in the same release, and each one is the answer to a specific question about how the format works in practice.

The enrichment agent is the producer end. It is a Python package (`enrichment_agent`, installed via `python3.13 -m venv .venv && .venv/bin/pip install -e .[dev]`) built on the [Google Agent Development Kit](https://adk.dev/) with Gemini as the model backend. The agent runs in two passes. The first pass walks a BigQuery dataset, reads the schema and metadata for every table and view, and writes one OKF concept document per concept the source advertises. The second pass runs the LLM as its own crawler: it receives a list of seed URLs, fetches the seeds, and decides which outbound links are worth following based on whether they look like authoritative documentation for the existing concepts. For each page it fetches, the agent chooses to enrich one or more existing concept docs, mint a standalone `references/<slug>` doc, or skip. A hard `--web-max-pages` cap and a same-domain allowed-hosts filter are enforced inside the tool, so the agent cannot overrun. The CLI is one command:

```
.venv/bin/python -m enrichment_agent enrich \
    --source bq \
    --dataset <project>.<dataset> \
    --web-seed-file <path/to/seeds.txt> \
    --out ./bundles/<name>
```

The Source interface is designed to grow. BigQuery is the first source implementation; Dataplex, Unity Catalog, Collibra, and a database walker are all named in the README as targets, and the interface is the documented extension point for any new source.

The static HTML visualizer is the consumer end. It is a `visualize` subcommand on the same Python package, and it renders any OKF bundle as a self-contained interactive HTML file: one file, no backend, no install on the viewing side. The viewer shows a force-directed graph of every concept in the bundle, with colored nodes by type (datasets, tables, references) and directed edges drawn from each cross-link in the markdown bodies. A detail panel for the selected concept shows its frontmatter (description, resource link, tags) and its rendered markdown body, with internal links rewired to navigate within the viewer instead of following the path. A "Cited by" backlinks list is computed from the reverse of the link graph. A search box matches title, concept id, and tags, a type filter narrows the visible nodes, and switchable graph layouts (cose, concentric, breadth-first, circle, grid) let a curator explore the corpus in different ways. The visualization is generated by cytoscape.js, embedded in a single self-contained HTML file, and the file can be opened locally, shared as an artifact, or hosted on a static file server.

The three sample bundles are the live examples. The first is the [GA4 e-commerce bundle](https://github.com/GoogleCloudPlatform/knowledge-catalog/tree/main/okf/bundles/ga4), generated from the GA4 BigQuery Export public dataset and seeded with the canonical GA4 documentation URLs. The second is the [Stack Overflow bundle](https://github.com/GoogleCloudPlatform/knowledge-catalog/tree/main/okf/bundles/stackoverflow), generated from the Stack Exchange Data Dump public dataset and seeded with the community's canonical schema references; it exercises multi-concept enrichment from cross-cutting docs pages. The third is the [Bitcoin bundle](https://github.com/GoogleCloudPlatform/knowledge-catalog/tree/main/okf/bundles/crypto_bitcoin), generated from the `bitcoin-etl` pipeline and seeded with the Bitcoin protocol documentation; it exercises cross-table foreign-key relationships in prose, the kind of knowledge that is hard to represent in a relational schema and easy to represent in a few sentences of markdown. Each sample pairs a recipe (the seed URLs and the exact `enrich` command, in `samples/<name>/`) with the produced bundle (in `bundles/<name>/`), so a developer can reproduce the bundle on their own machine with one command. The sample bundles are also the strongest evidence that the format works at non-trivial scale: the GA4 bundle has hundreds of concept documents and a richly connected link graph, and the visualizer renders it without performance issues.

The BigQuery Knowledge Catalog integration is the strategic move. The [Knowledge Catalog](https://cloud.google.com/bigquery/docs/knowledge-catalog) is BigQuery's built-in metadata service, the surface that powers SQL autocomplete, schema discovery, and the integration with Vertex AI agents. As of the OKF launch, the Knowledge Catalog can ingest OKF bundles natively: a team can run the enrichment agent against their BigQuery dataset, produce an OKF bundle, and push it to the catalog without writing any custom integration code. The catalog then serves the OKF content to agents that query BigQuery, with the frontmatter becoming the indexable metadata and the body becoming the agent-readable context. The integration is documented separately, in the BigQuery documentation, rather than in the OKF spec, and the separation is deliberate: the spec is platform-neutral, and the integration is a Google-platform-specific convenience on top of it.

The four artifacts together are doing a different job than they look like they are doing. The enrichment agent is not a product, it is a reference implementation. The visualizer is not a product, it is a reference implementation. The sample bundles are not products, they are reference examples. The BigQuery integration is the only one of the four that is a product, and it is the part that is positioned as a value-add for BigQuery customers rather than a separate product launch. The architecture is the same one Google has used for Kubernetes, TensorFlow, and gRPC: publish the open spec, ship a high-quality reference implementation, and let the product integrations follow from the spec adoption. The strategy is well-tested, and the spec is well-positioned to be the kind of thing that becomes the default without ever being the only option.

## The precedent: who already does this, and what changes

The interesting question for the next six months is not whether OKF is a good format, the spec is good, the format is minimal, and the reference implementations work. The interesting question is whether the pattern OKF names, the LLM-wiki pattern, was already coalescing into a community convention before Google published the spec, and whether the spec accelerates the convergence or fragments it.

The pattern was already coalescing, and the evidence is the list of communities that have independently arrived at the same shape. The [Hugo](https://gohugo.io/) community has been doing markdown + frontmatter + cross-links for static sites since 2013. The [Obsidian](https://obsidian.md/) community has been doing the same for personal knowledge management since 2020, with a plugin ecosystem that already includes OKF-shaped exports. The [Docusaurus](https://docusaurus.io/) and [Astro](https://astro.build/) communities have been doing markdown + frontmatter for documentation sites since 2017 and 2021 respectively. The data-engineering community has been doing "metadata as code" in various forms (dbt docs, LookML views with descriptions, BigQuery INFORMATION_SCHEMA + hand-written markdown) for at least five years. The AI tooling community has been doing AGENTS.md / CLAUDE.md / context.md for the past eighteen months. The [Karpathy LLM Wiki gist](https://gist.github.com/karpathy) crystallized the pattern as a recommendation, and the [Anthropic context engineering blog post](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) formalized the principles in mid-2025. The pattern is the convergence point, and OKF is the first spec to claim it.

What changes with the spec is the answer to the question "what is the smallest set of conventions that lets all of these instances cooperate." Before OKF, the answer was "nothing, they all just look kind of similar." After OKF, the answer is "the v0.1 spec, which is 1,000 lines and fits on a single page." The change is not in the format itself, the format was always there, the change is in the social fact that there is now a published spec that anyone can point to, that anyone can implement against, and that anyone can fork if they disagree with the design choices. The spec is a coordination point, and coordination points are the most valuable artifact in any ecosystem that has more than three participants.

The competitive question, the one that will play out over the next six months, is whether OKF becomes the lingua franca, or whether a competing spec emerges from a different corner of the AI ecosystem. The most credible competing spec candidates are Anthropic's context.md work (if it gets formalized beyond the engineering blog post), the [MCP (Model Context Protocol)](https://modelcontextprotocol.io/) community's resource format (which is JSON-RPC based and not markdown), and the [Cognee](https://www.cognee.ai/) / [LlamaIndex](https://www.llamaindex.ai/) graph-store formats (which are JSON-based and database-shaped rather than file-shaped). Each of these has a constituency, and each of them is solving a slightly different problem: MCP is for runtime tool-calling, the graph-store formats are for vector-indexed retrieval, and OKF is for human-readable, version-controlled, agent-navigable corpora. The problems overlap but are not identical, and the most likely outcome is coexistence rather than consolidation. The reason is that the three formats solve three different problems, and the same team that uses MCP for tool-calling and LlamaIndex for retrieval will use OKF for the documentation that the agent reads to understand the context.

The strategic question for Google is whether the BigQuery Knowledge Catalog integration is enough to make OKF the default, or whether the ecosystem needs additional moats. The integration is a strong moat for BigQuery customers, but it does not help teams that run on Snowflake, Databricks, or Redshift. The reference implementations are deliberately platform-neutral, but the high-quality producer in the reference set (the enrichment agent) is BigQuery-only in v0.1. The bet is that the spec adoption will come from teams that already use BigQuery, that the BigQuery customers will produce the most useful OKF bundles, and that the bundles will pull in non-BigQuery consumers because the bundles are good. The bet is plausible, and the timeline is the part to watch: if Snowflake or Databricks publishes a competing spec within six months, the format war has begun; if no competing spec emerges within a year, OKF has won by default.

The thing that makes the bet plausible is the openness of the licensing and the design. OKF is published under Apache-2.0, the same license that has been the foundation of the modern open-source ecosystem (Kubernetes, TensorFlow, Swift, Apache Foundation projects). The permissive license is the part that makes the spec forkable, the part that makes alternative implementations welcome, and the part that makes the spec a candidate for adoption by teams that are philosophically allergic to anything that looks like a Google-controlled standard. The permissive license is also the part that makes the spec a poor fit for a hostile fork: a competing spec would have to be meaningfully different, not just differently branded, and the meaningful-difference threshold is high when the underlying spec is already this minimal.

## What to watch

The next 24 hours and the next 24 weeks each have specific signals worth tracking. OKF v0.1 is a starting point, and the signals that decide whether it becomes a category-defining spec or a one-off format are all observable.

For the next week:

- **The repo's first community PRs.** The repository is at 3,300 stars, 213 forks, and 22 open issues as of the article date, and the first community PRs will set the tone for the spec's evolution. A PR that adds a new optional frontmatter field is a healthy signal (the spec is being extended in the way it was designed to be extended). A PR that adds a competing required field is a warning sign (the community is already forking the design). A PR that adds a new reference implementation in a non-BigQuery source (Snowflake, Databricks, Postgres) is the strongest possible signal of ecosystem traction.
- **The first non-Google OKF bundle.** The three sample bundles are all Google-produced. The first non-Google OKF bundle, posted to GitHub or Hacker News in the next seven days, is the most direct test of whether the spec is being adopted beyond the BigQuery ecosystem. The most likely source is a developer who has been maintaining a Karpathy-style LLM wiki and has been looking for a way to share the format; the most valuable source is a data team at a non-BigQuery shop that has decided to standardize on OKF.
- **The first third-party visualizer.** The reference visualizer is a single self-contained HTML file generated by the Google Python package. A third-party visualizer, written in a different language (TypeScript, Go, Rust) or using a different visualization library (d3, vis.js, sigma.js), is the strongest signal of consumer-side adoption. The first one to show up is likely to be a TypeScript port that uses vis-network or react-flow, because the developer audience for visualization libraries is heavily TypeScript.
- **The first OKF mention in a non-Google AI tooling blog post.** The blog post and the tweet are both Google. The first non-Google mention, in a Cursor blog post, a Claude Code release note, a LangChain changelog, a LlamaIndex announcement, or an OpenAI cookbook, is the first signal that the spec has crossed from "Google project" to "community standard." The most likely venue is a Cursor or Claude Code post about how the team is using OKF as the format for the tool's internal knowledge base.

For the next 24 weeks:

- **An OKF v0.2 release.** v0.1 is a draft, and the first v0.2 release is the first signal of how the spec is being shaped by community feedback. The v0.2 changelog is the most important read: which optional fields were added, which conventional section headings were promoted, which conformance criteria were tightened. A v0.2 that adds too many fields is a sign that the spec is losing its minimalism; a v0.2 that adds too few is a sign that the spec is being under-curated.
- **A non-Google BigQuery-equivalent source implementation.** The enrichment agent is BigQuery-only in v0.1. The first non-BigQuery source implementation (Snowflake, Databricks, Postgres, MySQL, a generic SQL walker) is the first signal that the format is being adopted as a multi-vendor standard. The most likely source is a community PR to the existing repository, not a new repository, because the format name is more discoverable when it lives in one place.
- **A BigQuery competitor's OKF-compatible product.** Snowflake Cortex, Databricks Unity Catalog, Redshift Spectrum, or any other data-warehouse-adjacent product shipping an OKF-compatible ingestion path is the first signal that the spec has become a competitive battleground. The most likely timeline is Q4 2026 or Q1 2027, because the data-warehouse vendors have historically been slow to adopt open standards that were initiated by competitors.
- **Anthropic's response.** Anthropic has its own context engineering work, the MCP protocol, and a strong interest in the LLM-wiki pattern. The first Anthropic post that mentions OKF, either positively (as a useful complement to MCP) or negatively (as a format that does not meet Anthropic's needs), is the first signal of how the second-largest AI lab is positioning relative to the spec. A positive mention would be a major endorsement; a negative mention would be the first shot in a format war.
- **The first OKF bundle with more than 10,000 concepts.** The reference bundles have hundreds of concepts. The first OKF bundle with more than 10,000 concepts is the first signal that the format works at the scale of a real enterprise. The most likely source is a data team that has decided to export an entire BigQuery INFORMATION_SCHEMA corpus to OKF, which is the natural use case for the BigQuery enrichment agent. The performance of the visualizer on a 10,000-concept bundle is the first signal of whether the format is viable for genuinely large corpora.
- **An OKF v1.0 release with breaking changes.** v0.1 is explicitly a draft. The first v1.0 release, with whatever breaking changes the curation process has accumulated, is the first signal of whether OKF is going to follow the IETF/W3C pattern of slow, careful, backward-incompatible evolution, or the Kubernetes pattern of frequent, careful, backward-incompatible evolution. The breaking changes that v1.0 introduces are the first signal of which design choices the community is willing to defend and which ones the community is willing to revise.

