Benchmarks are only useful if they drive real improvements. And a benchmark that rewards engines for optimizing specifically for itself — rather than for actual applications — becomes counterproductive over time.
That's the core problem JetStream 3 solves. Released last week by engineers from WebKit, Google, and Mozilla, it's the first major revision of the JetStream suite since 2019. The web has changed dramatically in seven years, and the old benchmark had started showing its age in ways that actually hurt performance progress.
The Microbenchmark Trap
The original JetStream 2 scored WebAssembly in two phases: a single-iteration Startup measurement and a longer Runtime measurement. The idea was reasonable when the benchmark was designed — early Wasm adopters were compiling large C and C++ applications (games, codecs) where users would tolerate a one-time startup cost for sustained throughput.
But engines got fast. Really fast. WebKit optimized the Wasm instantiation path so aggressively that for smaller workloads, startup time effectively hit zero milliseconds. And because JetStream 2 used Date.now() for timing — which rounds down — sub-millisecond times registered as 0ms. The scoring formula Score = 5000 / Time then produced infinity.
The team patched this by clamping the score to 5000, but it was a clear signal: the benchmark methodology had outgrown its subject matter. An "infinite" score tells you nothing useful about how an engine handles real workloads. More importantly, a zero startup time in a microbenchmark ignores what happens after instantiation — the actual work your application does.
Unified Scoring for Wasm
JetStream 3 retires the split startup/runtime model and adopts the same methodology used for JavaScript benchmarks. Every Wasm workload now runs across multiple iterations, capturing:
- First Iteration — compilation and initial setup
- Worst Case Iterations — jank, GC pauses, and tiering spikes
- Average Case Iterations — sustained throughput
These are geometrically averaged into a single subtest score, which feeds into the geometric mean of the full benchmark. Engines are now incentivized to optimize the entire lifecycle of a Wasm instance, not just instantiation.
Real Languages, Real Toolchains
JetStream 3's Wasm workloads are compiled from five source languages: C++, C#, Dart, Java, and Kotlin. This reflects how Wasm is actually used in production — not just C++ game engines, but Dart and Kotlin via WasmGC (used by modern web frameworks like Flutter), and Rust for performance-critical modules.
The new workloads exercise Wasm features that JetStream 2 barely touched:
- WasmGC — garbage-collected heap allocations (structs, arrays) enabling idiomatic patterns from high-level languages
- SIMD — single instruction, multiple data for parallel data processing
- Exception Handling — structured exception throwing and catching
JavaScript coverage was updated too: Promises and async functions, modern RegExp features, and public/private class fields. Several asm.js workloads were removed — the technology has been superseded by WebAssembly and was distorting the scoring.
The Engineering Behind WebKit's Gains
The WebKit team published a detailed breakdown of their JavaScriptCore optimizations targeting JetStream 3. The results are substantial:
GC allocation inlining: WasmGC programs create millions of small objects. The original JSC implementation called a C++ function for every allocation. Two changes delivered ~40% improvement on WasmGC subtests:
- Changed object layout so structs and arrays store field data inline after the header in a single allocation, eliminating the second allocation and pointer indirection
- Inlined the allocation fast path directly into generated machine code — a short instruction sequence that bumps a pointer, writes the header, and returns without leaving generated code
Type display inlining: WasmGC languages rely heavily on runtime type checks (casts, instanceof tests, indirect function calls). WebKit implemented Cohen's type display algorithm and inlined it into both their baseline (BBQ) and optimizing (OMG) compilers. They also embedded the first six display entries directly in each type record so shallow hierarchies require no pointer indirection and stay within a single cache line.
Eliminating GC destructor overhead: Previously, every WasmGC object held a reference to its type definition and ran a destructor on destruction — which had to decrement the reference count under a global lock. Restructuring type information to use the garbage collector's existing Structure mechanism eliminated destructors entirely, delivering another ~40% on the Dart-flute-wasm subtest.
Why This Matters for JavaScript Developers
Browser benchmarks sound like browser-engineer trivia, but they have direct practical consequences. When engines optimize for benchmarks, all web applications benefit — the optimizations are real, the workloads are just a proxy.
JetStream 3's shift away from microbenchmarks toward larger, longer-running workloads means the optimizations engines pursue will be the ones that matter in production applications. A 40% improvement on a WasmGC subtest means Flutter web apps, Kotlin-to-Wasm tools, and any application using Wasm for computation-intensive tasks will run faster in Safari.
The collaboration between the three major engines is also notable. JetStream 3 uses an open governance model, with contributions pooled in a shared GitHub repository. The goal is a benchmark that all engines have incentive to optimize for honestly — which is ultimately what makes it useful to developers.
JetStream 3 is available now at browserbench.org.