JetStream 3: The Benchmark That Actually Reflects How Modern Web Apps Run

lschvnApril 12, 20266 min read

Benchmarks are only useful if they drive real improvements. And a benchmark that rewards engines for optimizing specifically for itself, rather than for actual applications, becomes counterproductive over time.

That's the core problem JetStream 3 solves. Released last week by engineers from WebKit, Google, and Mozilla, it's the first major revision of the JetStream suite since 2019. The web has changed dramatically in seven years, and the old benchmark had started showing its age in ways that actually hurt performance progress.

The Microbenchmark Trap

The original JetStream 2 scored WebAssembly in two phases: a single-iteration Startup measurement and a longer Runtime measurement. The idea was reasonable when the benchmark was designed, early Wasm adopters were compiling large C and C++ applications (games, codecs) where users would tolerate a one-time startup cost for sustained throughput.

But engines got fast. Really fast. WebKit optimized the Wasm instantiation path so aggressively that for smaller workloads, startup time effectively hit zero milliseconds. And because JetStream 2 used Date.now() for timing, which rounds down, sub-millisecond times registered as 0ms. The scoring formula Score = 5000 / Time then produced infinity.

The team patched this by clamping the score to 5000, but it was a clear signal: the benchmark methodology had outgrown its subject matter. An "infinite" score tells you nothing useful about how an engine handles real workloads. More importantly, a zero startup time in a microbenchmark ignores what happens after instantiation, the actual work your application does.

Unified Scoring for Wasm

JetStream 3 retires the split startup/runtime model and adopts the same methodology used for JavaScript benchmarks. Every Wasm workload now runs across multiple iterations, capturing:

First Iteration: compilation and initial setup
Worst Case Iterations: jank, GC pauses, and tiering spikes
Average Case Iterations: sustained throughput

These are geometrically averaged into a single subtest score, which feeds into the geometric mean of the full benchmark. Engines are now incentivized to optimize the entire lifecycle of a Wasm instance, not just instantiation.

Real Languages, Real Toolchains

JetStream 3's Wasm workloads are compiled from five source languages: C++, C#, Dart, Java, and Kotlin. This reflects how Wasm is actually used in production, not just C++ game engines, but Dart and Kotlin via WasmGC (used by modern web frameworks like Flutter), and Rust for performance-critical modules.

The new workloads exercise Wasm features that JetStream 2 barely touched:

WasmGC: garbage-collected heap allocations (structs, arrays) enabling idiomatic patterns from high-level languages
SIMD: single instruction, multiple data for parallel data processing
Exception Handling: structured exception throwing and catching

JavaScript coverage was updated too: Promises and async functions, modern RegExp features, and public/private class fields. Several asm.js workloads were removed, the technology has been superseded by WebAssembly and was distorting the scoring.

The Engineering Behind WebKit's Gains

The WebKit team published a detailed breakdown of their JavaScriptCore optimizations targeting JetStream 3. The results are substantial:

GC allocation inlining: WasmGC programs create millions of small objects. The original JSC implementation called a C++ function for every allocation. Two changes delivered ~40% improvement on WasmGC subtests:

Changed object layout so structs and arrays store field data inline after the header in a single allocation, eliminating the second allocation and pointer indirection
Inlined the allocation fast path directly into generated machine code, a short instruction sequence that bumps a pointer, writes the header, and returns without leaving generated code

Type display inlining: WasmGC languages rely heavily on runtime type checks (casts, instanceof tests, indirect function calls). WebKit implemented Cohen's type display algorithm and inlined it into both their baseline (BBQ) and optimizing (OMG) compilers. They also embedded the first six display entries directly in each type record so shallow hierarchies require no pointer indirection and stay within a single cache line.

Eliminating GC destructor overhead: Previously, every WasmGC object held a reference to its type definition and ran a destructor on destruction, which had to decrement the reference count under a global lock. Restructuring type information to use the garbage collector's existing Structure mechanism eliminated destructors entirely, delivering another ~40% on the Dart-flute-wasm subtest.

Why This Matters for JavaScript Developers

Browser benchmarks sound like browser-engineer trivia, but they have direct practical consequences. When engines optimize for benchmarks, all web applications benefit, the optimizations are real, the workloads are just a proxy.

JetStream 3's shift away from microbenchmarks toward larger, longer-running workloads means the optimizations engines pursue will be the ones that matter in production applications. A 40% improvement on a WasmGC subtest means Flutter web apps, Kotlin-to-Wasm tools, and any application using Wasm for computation-intensive tasks will run faster in Safari.

The collaboration between the three major engines is also notable. JetStream 3 uses an open governance model, with contributions pooled in a shared GitHub repository. The goal is a benchmark that all engines have incentive to optimize for honestly, which is ultimately what makes it useful to developers.

JetStream 3 is available now at browserbench.org.

Frequently Asked Questions

Node.js 25.9: The stream/iter API Finally Lands as Experimental

Node.js 25.9 adds an experimental stream/iter module for async iteration over streams, a --max-heap-size CLI flag, AsyncLocalStorage with using scopes, TurboSHAKE crypto, and an upgraded npm 11.12.1. Here's what each change means for your code.

The CSS vertical-align Problem Is Finally Solved: text-box-trim and margin-block

For decades, centering text vertically in buttons, badges, and layouts felt slightly off. The culprit: leading space, invisible padding built into every font's metrics. Two CSS properties now solve this at different levels: text-box-trim for inline text and margin-block with cap/lh units for block layouts.

More coverage with overlapping topics and tags.

runtimesMar 24, 2026

Bun vs Node vs Deno in 2026: The Runtime Showdown Nobody Asked For (But Everyone's Having)

Three JavaScript runtimes. Three different philosophies. Independent benchmarks across HTTP throughput, cold starts, and async performance tell a clearer story than marketing ever could. Here's the brutally honest breakdown for developers choosing their next server-side JS platform.

runtimesJun 26, 2026

Deno 2.9 Ships 1.98x Faster Cold Start, 2.2-3.1x Less RSS Under Load, Default-On npm Minimum Release Age, No-Downgrade Trust Policy, and Built-In Snapshot Testing

Deno 2.9 (Bartek Iwańczuk, published 2026-06-25 on deno.com/blog/v2.9) is the largest Deno release of the cycle. Cold start drops from 34.2 ms to 17.3 ms (1.98x), peak RSS on the Deno.serve realworld workload drops 2.2x (142 MB → 64 MB) and 3.1x on 1 MiB bodies (197 MB → 63 MB), and Deno.serve throughput climbs 1.27x realworld (56.8k → 72.4k req/s), 1.11x plaintext, and 1.18x on 1 MiB bodies. Supply chain hardening: npm minimum-release-age is enabled by default with a 24h window (PR #35458), and a new opt-in no-downgrade trust policy (PR #34927) refuses to resolve any version whose trust evidence (staged publish, trusted publishing, provenance attestation) is weaker than the strongest evidence on any earlier-published version of the same package. Test runner parity: built-in t.assertSnapshot() (#35139), Deno.test.each (#34938), --shard for CI fan-out (#35057), retry and repeats (#35053), change-aware --changed and --related (#35199), and coverage thresholds (#35056). Lockfile interop: deno install seeds deno.lock from package-lock.json, pnpm-lock.yaml, yarn.lock, or bun.lock (#34296, #35330, #35346, #35350, #35394), pnpm-workspace.yaml auto-migrates to deno.json / package.json (#34993), and git merge conflict markers in deno.lock auto-resolve (#34726). Plus: deno desktop graduates from experimental (the June 16 PR #33441), deno link / deno unlink / deno list / deno watch subcommands, stable --unsafe-proto (#34738), Web Locks API (#31166), Happy Eyeballs v2 (RFC 8305) (#31726), navigator.userAgentData (#34743), the WebCrypto Modern Algorithms proposal (ML-KEM, ML-DSA, SLH-DSA, ChaCha20-Poly1305, SHA-3 family, KMAC, Argon2) (#34447, #34448, #34914, #35223), Node 26.3.0 compat (#34746, #34747), Node-API v10 (#35270), and CSS module imports under --unstable-raw-imports (#35093). 165+ PRs land in this cycle.

runtimesJun 24, 2026

Node.js 24.18.0 'Krypton' LTS Lands Buffer.poolSize at 64 KiB, Web Crypto's TurboSHAKE and KangarooTwelve, and http.writeInformation for Arbitrary 1xx Codes

Node.js 24.18.0 'Krypton' (LTS), published 2026-06-23, ships the Buffer.poolSize 64 KiB default that landed on Current in 26.3.0, adds RFC 9861's TurboSHAKE and KangarooTwelve to Web Cryptography (PR #62183, 1,521 additions, 13 files), adds http.writeInformation for arbitrary 1xx status codes (PR #63155, 306 additions, 7 files), exposes V8 precise coverage start to the JS inspector runtime (commit 8c989ec4a3), adds JWK import-export for the ML-KEM and SLH-DSA post-quantum key types (PR #62706, 842 additions, 39 files), lands the BoringSSL-side wiring of ML-DSA, ML-KEM, ChaCha20-Poly1305, and AES-KW for Web Crypto (PR #63255), hardens WebCrypto against prototype pollution (PR #63363), aligns crypto.diffieHellman key argument names and accepts key data inputs (PR #62527), reverts the 24.16.0 'noop pause/resume on destroyed streams' behavior (PR #63834), and ships a single-line hotfix on 22.23.1 that backs out an http agent change from the 06-18 security release that triggered an unexpected re-stream.

Comments

No comments yet. Be the first to share your thoughts.