On the evening of June 16, 2026, the Google Cloud Tech account posted a single tweet that drove 117,000 views, 1,800 likes, and 1,800 bookmarks in 24 hours, the highest-engagement knowledge-format announcement of the year by an order of magnitude. The tweet introduced the Open Knowledge Format (OKF), an open specification that "formalizes the LLM-wiki pattern into a portable, interoperable format." The post linked to the spec on GitHub and to a blog post on cloud.google.com by Sam McVeety (Tech Lead, Data Analytics) and Amir Hormati (Tech Lead, BigQuery). The blog post was published four days before the tweet, on June 12, the same day the reference implementation was first pushed to the GoogleCloudPlatform/knowledge-catalog repository.
@GoogleCloudTech - 22:34 Β· 16 juin 2026
Introducing the Open Knowledge Format (OKF), an open specification that formalizes the LLM-wiki pattern into a portable, interoperable format. AI is only as smart as the context we give it. As we build more advanced, agentic AI systems, they need accurate metadata and context to organizations, that context is locked inside fragmented data catalogs, isolated wikis, scattered code comments, or the minds of senior engineers. Every time a new AI agent is built, teams are forced to solve the exact same context-assembly problem from scratch.
To solve this, we've announced OKF, a vendor-neutral, open specification that formalizes the "LLM-wiki pattern" into a portable, interoperable format. It provides a standardized way to represent the enterprise knowledge that modern AI systems rely on.
β Just markdown: readable in any editor, renderable on GitHub, indexable by any search tool β Just files: shippable as a tarball, hostable in any git repo, mountable on any filesystem β Just YAML frontmatter: for the small set of structured fields that need to be queryable: type, title, description, resource, tags, and timestamp
We've also shipped reference implementations to help you hit the ground running, including an enrichment agent for BigQuery, a static HTML visualizer, and live sample bundles on GitHub.
The substance of the announcement is best understood as four distinct moves: the spec itself, the reference implementations, the BigQuery integration, and the positioning move. Each is doing a different job. The spec is a 1,000-line document that defines a directory of markdown files with YAML frontmatter, one required field, and two reserved filenames. The reference implementations are three artifacts: a BigQuery enrichment agent, a static HTML visualizer, and three sample bundles. The BigQuery integration is a native ingest path in the Knowledge Catalog that turns an OKF bundle into a queryable, agent-servable surface. The positioning move is the claim that the LLM-wiki pattern is a category, not a one-off, and that the format is the contribution, not the tooling.
The framing correction, in one paragraph
The framing worth making up front is that OKF is a format, not a product, and that the most common misreading of the announcement is to treat it as a Google Cloud product launch. The spec text never mentions BigQuery, Gemini, or any Google product. The reference implementations are deliberately called "proofs of concept" in the README, and the spec's "Relationship to other formats" section (SPEC.md Β§10) is explicit that OKF is downstream of patterns the community has been building for at least a year: Karpathy's LLM Wiki gist, Obsidian vaults wired to coding agents, the AGENTS.md and CLAUDE.md family of convention files, and the "metadata as code" repositories inside data teams. What Google has done, in the cleanest reading, is pin down the small set of conventions that lets these instances cooperate, publish the pin-down under Apache-2.0, and ship one reference implementation per end of the producer/consumer axis. The lock-in question is the one the spec is designed to dissolve.
The corollary is the part that matters for builders. If OKF becomes the lingua franca, the BigQuery Knowledge Catalog becomes the natural ingestion target for any team that already runs BigQuery, which is the largest part of the cloud data market. That is the strategic move, and the open-spec framing is the way the move is being made without triggering the kind of regulatory and competitive scrutiny a closed product launch would invite. The format is the contribution, and the format is also the wedge.
The spec, in one screen
The v0.1 spec, SPEC.md in the repository, is roughly 1,000 lines of markdown with 11 sections, a conformance appendix, and a single minimal example bundle. The structure of a bundle is the part that anchors everything else, and it is small enough to reproduce in full:
sales/
βββ index.md # Optional. Directory listing for progressive disclosure.
βββ log.md # Optional. Chronological history of updates.
βββ <concept>.md # A concept at the bundle root.
βββ <subdirectory>/ # Subdirectories organize concepts into groups.
βββ index.md
βββ <concept>.md
βββ <subdirectory>/
βββ β¦
A bundle is a directory tree. The directory structure is independent of the domain, and producers organize concepts however makes sense for the knowledge being captured. A bundle MAY be distributed as a git repository (the recommended form, because git provides history, attribution, and diffs), as a tarball or zip archive, or as a subdirectory within a larger repository. The reserved filenames, index.md and log.md, have defined meaning at any level of the hierarchy and MUST NOT be used for concept documents. All other .md files in the tree are concept documents.
A concept is a single UTF-8 markdown file with two parts: a YAML frontmatter block delimited by --- lines at the top of the file, and a markdown body containing free-form content. The frontmatter is where the structural work happens, and the spec is opinionated about exactly one thing in the frontmatter: the type field is required, must be non-empty, and is the only field that consumers are allowed to require. Everything else is producer-defined.
---
type: <Type name> # REQUIRED
title: <Optional display name>
description: <Optional one-line summary>
resource: <Optional canonical URI for the underlying asset>
tags: [<tag>, <tag>, β¦] # Optional
timestamp: <ISO 8601 datetime> # Optional last-modified time
# β¦ other producer-defined key/value pairs
---
The recommended fields, in priority order, are title (human-readable display name, derived from the filename if absent), description (a single-sentence summary used by index generators, search snippets, and previews), resource (a canonical URI for the underlying asset the concept describes, omitted for concepts that describe abstract ideas rather than physical resources), tags (a YAML list of short strings for cross-cutting categorization), and timestamp (ISO 8601 datetime of last meaningful change). Producers MAY include any additional keys, and consumers SHOULD preserve unknown keys when round-tripping and SHOULD NOT reject documents with unrecognized fields. The tolerance is intentional: OKF is meant to remain useful as bundles grow, get refactored, and are partially generated by agents.
The body is standard markdown. There are no required body sections, but the spec names three conventional section headings that producers SHOULD use when applicable: # Schema (a structured description of an asset's columns or fields, typically as a markdown table), # Examples (concrete usage examples, often as fenced code blocks), and # Citations (external sources backing claims in the body, with numbered references). The body is where most of the value lives for an agent consumer, and the structural guidance is the part that distinguishes an OKF document from a personal-notes file in the same directory.
The type field is the part that is doing the most work. Type values are not registered centrally, and the spec gives the example list as BigQuery Table, BigQuery Dataset, API Endpoint, Metric, Playbook, Reference. Producers SHOULD pick values that are descriptive and self-explanatory; consumers MUST tolerate unknown types gracefully, typically by treating them as generic concepts. The tolerance is the same as the frontmatter-extensions tolerance: an agent that encounters a concept of type Sev1 Incident Runbook from a vendor it has never seen should not refuse the bundle, it should render the concept as a generic document and let the producer's prose describe the semantics. The permissive consumption model is the design choice that makes OKF viable as a community-maintained format rather than a Google-controlled schema.
The spec, in detail: linking, indexing, logging, conformance
The interesting parts of the spec are the four mechanisms that turn a directory of files into a navigable, version-controllable knowledge corpus. They are small, they compose, and each one is the answer to a question that any team that has tried to maintain a wiki of more than a few hundred pages will recognize.
Cross-linking is the part that turns the directory into a graph. Concepts MAY link to other concepts using standard markdown links, in two forms. The first is bundle-relative (absolute within the bundle): a link like [customers table](/tables/customers.md) is resolved relative to the bundle root, which makes it stable when documents are moved within their subdirectory. The second is relative, the standard markdown ./other.md form. The spec recommends the absolute form because it survives reorganization. Link semantics are deliberately untyped: a link from concept A to concept B asserts a relationship, and the specific kind of relationship (parent/child, references, joins-with, depends-on) is conveyed by the surrounding prose, not by the link itself. Consumers that build a graph view typically treat all links as directed edges of an untyped relationship, and the untyped model is what keeps the spec from sliding into a schema. Consumers MUST tolerate broken links, a link whose target does not exist is not malformed, it may simply represent not-yet-written knowledge, and the tolerance is the part that makes the format usable while a bundle is being built.
Indexing is the part that supports progressive disclosure. An index.md file MAY appear in any directory, including the bundle root, and it enumerates the directory's contents to support a human or agent seeing what is available before opening individual documents. Index files contain no frontmatter. The body uses one or more sections, each grouping concepts under a heading, with bullet-list entries that link to the concept's relative URL and pull in the concept's description from its frontmatter. Producers MAY generate index.md automatically; consumers MAY synthesize one on the fly when none is present. The progressive-disclosure pattern is the part that makes OKF workable for large corpora: an agent can read the root index.md, decide which subdirectory is relevant, read that index.md, decide which concept to open, and read the concept, without ever loading the entire bundle into context. For a 10,000-concept bundle, the pattern is the difference between a usable corpus and an unusable one.
Logging is the part that supports version-control workflows. A log.md file MAY appear at any level of the hierarchy to record the history of changes to that scope. The format is a flat list of date-grouped entries, newest first, with date headings in ISO 8601 YYYY-MM-DD form. The log entries are prose, with a leading bold word (**Update**, **Creation**, **Deprecation**) as a convention rather than a requirement. The pattern is borrowed directly from the changelog conventions that the open-source community has been using for a decade, and the choice to keep the format prose rather than structured is the part that makes the log readable by humans, parseable by agents, and writable in a git commit message. The log is not authoritative (git is the source of truth), it is a denormalized read-optimized view of the history that a bundle curator can produce on every release.
Conformance is the part that pins down what it means to be OKF-compatible. A bundle is conformant with v0.1 if three conditions hold: every non-reserved .md file in the tree contains a parseable YAML frontmatter block, every frontmatter block contains a non-empty type field, and every reserved filename follows the structure described in Β§6 and Β§7 when present. Consumers SHOULD treat all other constraints as soft guidance, and consumers MUST NOT reject a bundle because of missing optional frontmatter fields, unknown type values, unknown additional frontmatter keys, broken cross-links, or missing index.md files. The permissive consumption model is the design choice that distinguishes OKF from a stricter schema-driven format (Protocol Buffers, Avro, OpenAPI), and the choice is deliberate: the value of a knowledge format is in how many parties speak it, and a format that rejects bundles for missing optional fields is a format that gets forked.
The versioning story is the part that determines whether OKF v0.1 is a starting point or a finished standard. The spec is versioned in the form <major>.<minor>, with a minor version bump introducing backward-compatible additions (new optional fields, new conventional section headings) and a major version bump reserved for breaking changes (renaming required fields, changing reserved filenames). Bundles MAY declare the OKF version they target by including okf_version: "0.1" in a bundle-root index.md frontmatter block, which is the only place frontmatter is permitted in an index.md. Consumers that do not understand the declared version SHOULD attempt best-effort consumption rather than refusing the bundle. The pattern is the same one the IETF, W3C, and WHATWG have been using for two decades, and the choice to be explicit about the version semantics is the part that makes OKF a candidate for actual standardization rather than a one-off format.
The LLM-wiki pattern, named
The most useful thing the announcement does is give a name to a pattern that has been quietly emerging across the AI ecosystem for at least a year. The pattern is the practice of storing curated knowledge as markdown files with YAML frontmatter, organized in a directory tree, and letting agents (rather than search engines) traverse the structure. The pattern has been reappearing under different names, in different communities, for different reasons, and OKF is the first spec to claim the category rather than just one of the instances.
Andrej Karpathy's LLM Wiki gist is the conceptual source, and the blog post quotes it directly: "LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass." The argument is that the bookkeeping that causes humans to abandon personal wikis (updating cross-references, fixing dead links, restructuring hierarchies) is exactly what LLMs are good at. The same insight appears in different forms across the AI tooling ecosystem. Obsidian vaults wired to coding agents are the most visible instance: a developer maintains a vault of notes about their codebase, an agent reads the vault before doing real work, the agent updates the vault as it learns, and the vault becomes a living memory of the project that grows more useful over time. The AGENTS.md and CLAUDE.md family of convention files is the same pattern in single-file form: a single markdown file at the repository root that an agent consults before doing real work, and that the agent updates as it learns the project conventions. The pattern also shows up in data teams as "metadata as code" repositories, where SQL table documentation, metric definitions, and runbooks live in version-controlled markdown files rather than in a separate metadata registry.
The reason the pattern has been reappearing is that it solves a problem that has become acute as agents have become the primary consumers of enterprise knowledge. Search-based retrieval (find me a document that contains the string weekly_active_users) breaks down when the agent needs to assemble a context from ten different documents, with cross-references between them, in a specific order, with a specific level of confidence in each piece. A directory of structured markdown files lets the agent navigate the structure: read the root index, decide which subdirectory is relevant, read the subdirectory index, decide which concept to open, read the concept, follow the cross-links. The navigation is explicit in the file system and explicit in the frontmatter, which is the part that makes it both human-readable and agent-navigable.
The reason the pattern has been a mess until now is that each instance is bespoke. Karpathy's wiki and your team's wiki and a vendor's catalog export may all look alike (markdown, frontmatter, cross-links), but none of them are intentionally designed to cooperate. There is no agreed-upon answer to what fields every document should carry, or what filenames mean what. As a result, the knowledge encoded in wikis remains siloed within the original teams, leading to redundant effort whenever a new agent is built. OKF is the answer to the question "what is the smallest set of conventions that lets all of these instances cooperate," and the answer is small enough to fit in a 1,000-line spec.
The thing to notice about the pattern is that it is downstream of three older ideas, and the lineage is part of the reason the spec is so minimal. The first is the static-site-generator pattern (Hugo, Jekyll, MkDocs, Docusaurus), which has been rendering markdown + frontmatter for over a decade. The second is the personal-knowledge-management pattern (Obsidian, Notion, Roam, Logseq), which has been maintaining markdown + frontmatter + cross-links for personal use for almost as long. The third is the documentation-as-code pattern (docs as markdown in the same repo as the code, deployed on every release), which is now the default for almost every developer-facing product. OKF is the combination of all three, with the agent as the new consumer, and the spec is minimal because the underlying patterns are already minimal. The work was always there; OKF is the part that names it.
What Google is shipping: four reference artifacts, one strategic integration
The reference implementations are deliberately called "proofs of concept" in the README, and the framing is important: the format is the contribution, and the tools exist to make the format tangible at both ends of the producer/consumer axis. Google shipped four distinct artifacts in the same release, and each one is the answer to a specific question about how the format works in practice.
The enrichment agent is the producer end. It is a Python package (enrichment_agent, installed via python3.13 -m venv .venv && .venv/bin/pip install -e .[dev]) built on the Google Agent Development Kit with Gemini as the model backend. The agent runs in two passes. The first pass walks a BigQuery dataset, reads the schema and metadata for every table and view, and writes one OKF concept document per concept the source advertises. The second pass runs the LLM as its own crawler: it receives a list of seed URLs, fetches the seeds, and decides which outbound links are worth following based on whether they look like authoritative documentation for the existing concepts. For each page it fetches, the agent chooses to enrich one or more existing concept docs, mint a standalone references/<slug> doc, or skip. A hard --web-max-pages cap and a same-domain allowed-hosts filter are enforced inside the tool, so the agent cannot overrun. The CLI is one command:
.venv/bin/python -m enrichment_agent enrich \
--source bq \
--dataset <project>.<dataset> \
--web-seed-file <path/to/seeds.txt> \
--out ./bundles/<name>
The Source interface is designed to grow. BigQuery is the first source implementation; Dataplex, Unity Catalog, Collibra, and a database walker are all named in the README as targets, and the interface is the documented extension point for any new source.
The static HTML visualizer is the consumer end. It is a visualize subcommand on the same Python package, and it renders any OKF bundle as a self-contained interactive HTML file: one file, no backend, no install on the viewing side. The viewer shows a force-directed graph of every concept in the bundle, with colored nodes by type (datasets, tables, references) and directed edges drawn from each cross-link in the markdown bodies. A detail panel for the selected concept shows its frontmatter (description, resource link, tags) and its rendered markdown body, with internal links rewired to navigate within the viewer instead of following the path. A "Cited by" backlinks list is computed from the reverse of the link graph. A search box matches title, concept id, and tags, a type filter narrows the visible nodes, and switchable graph layouts (cose, concentric, breadth-first, circle, grid) let a curator explore the corpus in different ways. The visualization is generated by cytoscape.js, embedded in a single self-contained HTML file, and the file can be opened locally, shared as an artifact, or hosted on a static file server.
The three sample bundles are the live examples. The first is the GA4 e-commerce bundle, generated from the GA4 BigQuery Export public dataset and seeded with the canonical GA4 documentation URLs. The second is the Stack Overflow bundle, generated from the Stack Exchange Data Dump public dataset and seeded with the community's canonical schema references; it exercises multi-concept enrichment from cross-cutting docs pages. The third is the Bitcoin bundle, generated from the bitcoin-etl pipeline and seeded with the Bitcoin protocol documentation; it exercises cross-table foreign-key relationships in prose, the kind of knowledge that is hard to represent in a relational schema and easy to represent in a few sentences of markdown. Each sample pairs a recipe (the seed URLs and the exact enrich command, in samples/<name>/) with the produced bundle (in bundles/<name>/), so a developer can reproduce the bundle on their own machine with one command. The sample bundles are also the strongest evidence that the format works at non-trivial scale: the GA4 bundle has hundreds of concept documents and a richly connected link graph, and the visualizer renders it without performance issues.
The BigQuery Knowledge Catalog integration is the strategic move. The Knowledge Catalog is BigQuery's built-in metadata service, the surface that powers SQL autocomplete, schema discovery, and the integration with Vertex AI agents. As of the OKF launch, the Knowledge Catalog can ingest OKF bundles natively: a team can run the enrichment agent against their BigQuery dataset, produce an OKF bundle, and push it to the catalog without writing any custom integration code. The catalog then serves the OKF content to agents that query BigQuery, with the frontmatter becoming the indexable metadata and the body becoming the agent-readable context. The integration is documented separately, in the BigQuery documentation, rather than in the OKF spec, and the separation is deliberate: the spec is platform-neutral, and the integration is a Google-platform-specific convenience on top of it.
The four artifacts together are doing a different job than they look like they are doing. The enrichment agent is not a product, it is a reference implementation. The visualizer is not a product, it is a reference implementation. The sample bundles are not products, they are reference examples. The BigQuery integration is the only one of the four that is a product, and it is the part that is positioned as a value-add for BigQuery customers rather than a separate product launch. The architecture is the same one Google has used for Kubernetes, TensorFlow, and gRPC: publish the open spec, ship a high-quality reference implementation, and let the product integrations follow from the spec adoption. The strategy is well-tested, and the spec is well-positioned to be the kind of thing that becomes the default without ever being the only option.
The precedent: who already does this, and what changes
The interesting question for the next six months is not whether OKF is a good format, the spec is good, the format is minimal, and the reference implementations work. The interesting question is whether the pattern OKF names, the LLM-wiki pattern, was already coalescing into a community convention before Google published the spec, and whether the spec accelerates the convergence or fragments it.
The pattern was already coalescing, and the evidence is the list of communities that have independently arrived at the same shape. The Hugo community has been doing markdown + frontmatter + cross-links for static sites since 2013. The Obsidian community has been doing the same for personal knowledge management since 2020, with a plugin ecosystem that already includes OKF-shaped exports. The Docusaurus and Astro communities have been doing markdown + frontmatter for documentation sites since 2017 and 2021 respectively. The data-engineering community has been doing "metadata as code" in various forms (dbt docs, LookML views with descriptions, BigQuery INFORMATION_SCHEMA + hand-written markdown) for at least five years. The AI tooling community has been doing AGENTS.md / CLAUDE.md / context.md for the past eighteen months. The Karpathy LLM Wiki gist crystallized the pattern as a recommendation, and the Anthropic context engineering blog post formalized the principles in mid-2025. The pattern is the convergence point, and OKF is the first spec to claim it.
What changes with the spec is the answer to the question "what is the smallest set of conventions that lets all of these instances cooperate." Before OKF, the answer was "nothing, they all just look kind of similar." After OKF, the answer is "the v0.1 spec, which is 1,000 lines and fits on a single page." The change is not in the format itself, the format was always there, the change is in the social fact that there is now a published spec that anyone can point to, that anyone can implement against, and that anyone can fork if they disagree with the design choices. The spec is a coordination point, and coordination points are the most valuable artifact in any ecosystem that has more than three participants.
The competitive question, the one that will play out over the next six months, is whether OKF becomes the lingua franca, or whether a competing spec emerges from a different corner of the AI ecosystem. The most credible competing spec candidates are Anthropic's context.md work (if it gets formalized beyond the engineering blog post), the MCP (Model Context Protocol) community's resource format (which is JSON-RPC based and not markdown), and the Cognee / LlamaIndex graph-store formats (which are JSON-based and database-shaped rather than file-shaped). Each of these has a constituency, and each of them is solving a slightly different problem: MCP is for runtime tool-calling, the graph-store formats are for vector-indexed retrieval, and OKF is for human-readable, version-controlled, agent-navigable corpora. The problems overlap but are not identical, and the most likely outcome is coexistence rather than consolidation. The reason is that the three formats solve three different problems, and the same team that uses MCP for tool-calling and LlamaIndex for retrieval will use OKF for the documentation that the agent reads to understand the context.
The strategic question for Google is whether the BigQuery Knowledge Catalog integration is enough to make OKF the default, or whether the ecosystem needs additional moats. The integration is a strong moat for BigQuery customers, but it does not help teams that run on Snowflake, Databricks, or Redshift. The reference implementations are deliberately platform-neutral, but the high-quality producer in the reference set (the enrichment agent) is BigQuery-only in v0.1. The bet is that the spec adoption will come from teams that already use BigQuery, that the BigQuery customers will produce the most useful OKF bundles, and that the bundles will pull in non-BigQuery consumers because the bundles are good. The bet is plausible, and the timeline is the part to watch: if Snowflake or Databricks publishes a competing spec within six months, the format war has begun; if no competing spec emerges within a year, OKF has won by default.
The thing that makes the bet plausible is the openness of the licensing and the design. OKF is published under Apache-2.0, the same license that has been the foundation of the modern open-source ecosystem (Kubernetes, TensorFlow, Swift, Apache Foundation projects). The permissive license is the part that makes the spec forkable, the part that makes alternative implementations welcome, and the part that makes the spec a candidate for adoption by teams that are philosophically allergic to anything that looks like a Google-controlled standard. The permissive license is also the part that makes the spec a poor fit for a hostile fork: a competing spec would have to be meaningfully different, not just differently branded, and the meaningful-difference threshold is high when the underlying spec is already this minimal.
What to watch
The next 24 hours and the next 24 weeks each have specific signals worth tracking. OKF v0.1 is a starting point, and the signals that decide whether it becomes a category-defining spec or a one-off format are all observable.
For the next week:
- The repo's first community PRs. The repository is at 3,300 stars, 213 forks, and 22 open issues as of the article date, and the first community PRs will set the tone for the spec's evolution. A PR that adds a new optional frontmatter field is a healthy signal (the spec is being extended in the way it was designed to be extended). A PR that adds a competing required field is a warning sign (the community is already forking the design). A PR that adds a new reference implementation in a non-BigQuery source (Snowflake, Databricks, Postgres) is the strongest possible signal of ecosystem traction.
- The first non-Google OKF bundle. The three sample bundles are all Google-produced. The first non-Google OKF bundle, posted to GitHub or Hacker News in the next seven days, is the most direct test of whether the spec is being adopted beyond the BigQuery ecosystem. The most likely source is a developer who has been maintaining a Karpathy-style LLM wiki and has been looking for a way to share the format; the most valuable source is a data team at a non-BigQuery shop that has decided to standardize on OKF.
- The first third-party visualizer. The reference visualizer is a single self-contained HTML file generated by the Google Python package. A third-party visualizer, written in a different language (TypeScript, Go, Rust) or using a different visualization library (d3, vis.js, sigma.js), is the strongest signal of consumer-side adoption. The first one to show up is likely to be a TypeScript port that uses vis-network or react-flow, because the developer audience for visualization libraries is heavily TypeScript.
- The first OKF mention in a non-Google AI tooling blog post. The blog post and the tweet are both Google. The first non-Google mention, in a Cursor blog post, a Claude Code release note, a LangChain changelog, a LlamaIndex announcement, or an OpenAI cookbook, is the first signal that the spec has crossed from "Google project" to "community standard." The most likely venue is a Cursor or Claude Code post about how the team is using OKF as the format for the tool's internal knowledge base.
For the next 24 weeks:
- An OKF v0.2 release. v0.1 is a draft, and the first v0.2 release is the first signal of how the spec is being shaped by community feedback. The v0.2 changelog is the most important read: which optional fields were added, which conventional section headings were promoted, which conformance criteria were tightened. A v0.2 that adds too many fields is a sign that the spec is losing its minimalism; a v0.2 that adds too few is a sign that the spec is being under-curated.
- A non-Google BigQuery-equivalent source implementation. The enrichment agent is BigQuery-only in v0.1. The first non-BigQuery source implementation (Snowflake, Databricks, Postgres, MySQL, a generic SQL walker) is the first signal that the format is being adopted as a multi-vendor standard. The most likely source is a community PR to the existing repository, not a new repository, because the format name is more discoverable when it lives in one place.
- A BigQuery competitor's OKF-compatible product. Snowflake Cortex, Databricks Unity Catalog, Redshift Spectrum, or any other data-warehouse-adjacent product shipping an OKF-compatible ingestion path is the first signal that the spec has become a competitive battleground. The most likely timeline is Q4 2026 or Q1 2027, because the data-warehouse vendors have historically been slow to adopt open standards that were initiated by competitors.
- Anthropic's response. Anthropic has its own context engineering work, the MCP protocol, and a strong interest in the LLM-wiki pattern. The first Anthropic post that mentions OKF, either positively (as a useful complement to MCP) or negatively (as a format that does not meet Anthropic's needs), is the first signal of how the second-largest AI lab is positioning relative to the spec. A positive mention would be a major endorsement; a negative mention would be the first shot in a format war.
- The first OKF bundle with more than 10,000 concepts. The reference bundles have hundreds of concepts. The first OKF bundle with more than 10,000 concepts is the first signal that the format works at the scale of a real enterprise. The most likely source is a data team that has decided to export an entire BigQuery INFORMATION_SCHEMA corpus to OKF, which is the natural use case for the BigQuery enrichment agent. The performance of the visualizer on a 10,000-concept bundle is the first signal of whether the format is viable for genuinely large corpora.
- An OKF v1.0 release with breaking changes. v0.1 is explicitly a draft. The first v1.0 release, with whatever breaking changes the curation process has accumulated, is the first signal of whether OKF is going to follow the IETF/W3C pattern of slow, careful, backward-incompatible evolution, or the Kubernetes pattern of frequent, careful, backward-incompatible evolution. The breaking changes that v1.0 introduces are the first signal of which design choices the community is willing to defend and which ones the community is willing to revise.



