← Back to blog

Parsing the USN Journal in the Browser with Rust + WebAssembly

How we ship a full NTFS USN Journal parser to your browser as 105 KB of WebAssembly — and why "parse it client-side" is the only acceptable answer for forensic artefacts.

3 min read

USN journals are inherently sensitive. A 100 MB $J file off a corporate workstation contains, line by line, the entire recent file history of that machine: every Word document the user touched, every executable that ran, every browser cache eviction. Uploading that to a SaaS to "analyse it for you" is something a forensic professional should never agree to, and something we never want to offer.

So we built usnparser.com the other way around: the parser runs in your browser. The file is read from disk into JavaScript, handed to a WebAssembly module, and the records come back without a single byte ever leaving your machine.

This post is a technical walkthrough of how that works.

The Rust crate

We didn't reinvent the wheel: the actual parsing logic comes from usnrs, Airbus CERT's clean implementation of USN_RECORD_V2. It already exposed a Read + Seek interface, which is what we needed — that's exactly the trait std::io::Cursor<Vec<u8>> implements, so we can feed it raw bytes from a JS Uint8Array.

The wrapper crate is around 60 lines of Rust:

#[wasm_bindgen(js_name = parseUsn)]
pub fn parse_usn(
  usn_bytes: &[u8],
  mft_bytes: Option<Box<[u8]>>,
) -> Result<JsValue, JsValue> {
  let usn = Cursor::new(usn_bytes.to_vec());
  let mft = mft_bytes
    .map(|b| MftParser::from_buffer(b.into_vec()))
    .transpose()?;
  let iter = Usn::new(mft, usn, None)?;
  let records: Vec<UsnRecord> = iter.map(into_record).collect();
  serde_wasm_bindgen::to_value(&records)
}

It accepts the journal bytes and, optionally, the $MFT bytes for full-path resolution. The whole thing compiles to ~105 KB of .wasm after wasm-opt.

Three Cargo gotchas

Getting usnrs plus its transitive deps to compile cleanly for wasm32-unknown-unknown took three small workarounds:

  1. getrandom needs the js feature on wasm32. The rand crate (pulled in by mft) depends on it transitively, and without the JS backend the wasm build fails with "no available getrandom backend". We force it in our Cargo.toml:
    [target.'cfg(target_arch = "wasm32")'.dependencies]
    getrandom = { version = "0.2", features = ["js"] }
    
  2. chrono needs wasmbind when the clock feature is enabled, otherwise it tries to call time(2). We add it via our direct dep declaration.
  3. Disable mft's default features isn't actually necessary in this case — the mft_dump feature only adds optional CLI deps that still cross-compile.

The browser-side glue

We compile with wasm-pack build --target web --out-dir public/wasm, which produces a small ES-module JS shim plus the .wasm binary. Both live under /public/wasm/ so they are served as static assets at known URLs.

The parser lives in a Web Worker so the main thread stays responsive while the parser chews through a million records:

// public/workers/parse.js
import init, { parseUsn } from "/wasm/usn_wasm.js";

await init();
self.onmessage = (event) => {
  const { usnBytes, mftBytes } = event.data;
  const records = parseUsn(
    new Uint8Array(usnBytes),
    mftBytes ? new Uint8Array(mftBytes) : null,
  );
  self.postMessage({ type: "result", records });
};

This is the only place wasm and the worker meet. Neither the worker nor the wasm goes through Next.js's bundler, which means no webpack/Turbopack config.

Numbers

A representative 60 MB $J from a Windows 11 workstation:

  • Parse time: ~1.4 s on a recent Macbook
  • Memory: transient, freed when the worker is terminated
  • Records produced: ~720k
  • Wire bytes leaving your machine: 0

The UI then virtualises the table with TanStack Virtual, so even a million-row result feels instant.

What about huge journals?

For journals north of 500 MB we'd switch to a streaming API that yields batches of records rather than accumulating them in a Vec. The change is small — Usn is already an Iterator, we'd just expose next_batch(n) on the wasm side. We haven't shipped it because we don't yet have feedback that anyone needs it. If you do, open an issue.

Why this matters

Forensic tooling has historically been split between heavy desktop suites (X-Ways, EnCase) and Python scripts you have to install and trust with sensitive data. There's a third lane: open, inspectable, runs entirely client-side. WebAssembly is fast enough now that there's no performance excuse to leave that lane unbuilt.