An SBOM workflow that would have caught log4j in 45 minutes

The Friday afternoon Apache published the original log4j advisory (CVE-2021-44228), the median enterprise spent the following 72 hours doing two things: running grep -r "log4j" . against unknown checkouts, and assembling a spreadsheet of which engineering team owned which Java service. We know this because, in the four years since, we have interviewed twenty-seven security teams about that weekend.

Almost all of them have since told board-level leadership that they have "an SBOM programme." Almost none of them, when pressed, can answer the operational question that matters: given a brand-new CVE in any indirect dependency, how long does it take you to produce a confidence-graded list of every running container that contains the affected code, including the exact image digest and the responsible service owner?

CipherGuard has spent the last six months sitting alongside two organisations that can answer that question — a mid-tier US bank and a European logistics platform — and the surprising finding is that the pipeline is not particularly hard. It just requires giving up three popular SBOM beliefs.

Belief 1: "Generating the SBOM is the hard part"

It is not. Generating an SBOM has been a solved problem since Syft 0.40 shipped in late 2022. The hard part is making the SBOM deterministic and addressable.

A typical CI pipeline runs syft packages dir:. -o cyclonedx-json > sbom.json and uploads the artefact alongside the build. That is fine for compliance theatre and useless for incident response. To be useful, the SBOM has to be:

Reproducible from the source revision alone — given a git SHA, regenerating the SBOM must produce a byte-identical document. This requires fixing tool versions (Syft, the scanner, the JSON serialiser), pinning the platform fingerprint, and stripping non-deterministic timestamps from the output.
Bound to a specific OCI image digest, not a tag — sha256:bf8e9a… not v1.7.3 or latest. The bank we worked with uses in-toto attestations signed with Sigstore to bind the two. Crucially, this binding survives any retag, mirror, or copy.
Stored in a queryable graph, not a blob store — every SBOM gets ingested into a graph database (the logistics platform uses Dgraph, the bank uses Neo4j) keyed by (image_digest, package_purl, version, license). This is the single non-negotiable that everyone skips, and it is the single thing that makes 45-minute response possible.

The third point is where most programmes stall. A graph store with 4 million package nodes and 22 million DEPENDS_ON edges is not exotic — it is the steady-state output of a CI fleet with 600–800 active services. But the FedRAMP/SOC 2/PCI tooling marketed at security teams almost never talks about it, because vendors prefer to sell a UI that reads an artefact bucket on demand.

Belief 2: "Coverage is what matters"

It does not. Triage latency is what matters.

When CVE-2024-6387 (regreSSHion, OpenSSH unauth RCE) dropped in July 2024, the bank's first response was a single SQL-style query against the graph:

sparql

PREFIX pkg: <https://purl.io/>
SELECT ?digest ?service ?version WHERE {
  ?img purl:supplier "openssh" .
  ?img purl:version ?version .
  ?img runs_in ?service .
  FILTER (?version IN ("9.5p1", "9.6p1", "9.7p1", "9.8p1"))
}

Result: 1,142 image digests across 287 services, returned in 11 seconds. Add in the ownership join from the engineering directory: 1,142 digests, 287 services, 41 owning teams, full results in 19 seconds.

This number is meaningful. The bank's old workflow — a Tenable scan triggered by hand against a representative subset of running hosts — took 6 hours to produce a 78%-complete list of affected hosts, with no service or owner attribution. The new workflow took 19 seconds to produce a 100%-complete list at the image-digest level, with full attribution.

The total turn-around to the first patched image hitting production was 45 minutes. That is not because the patch was applied to 1,142 things in 45 minutes — most of them were base images that did not need to be patched until the next rebuild cycle. It is because the team could state, with documentary evidence, exactly which services genuinely had sshd exposed and which were running OpenSSH only because some Debian base image included it.

Belief 3: "VEX will save us"

The Vulnerability Exploitability eXchange (VEX) format is excellent in theory and a disaster in current practice. The theory is that vendors will publish machine-readable assertions like "CVE-2024-6387 does not affect product X because the vulnerable function is never called." The practice is that almost no vendor publishes VEX at all, and the handful who do are inconsistent about whether not_affected means "we proved it" or "we are guessing because it's Friday."

The working substitute is to maintain your own internal VEX: a database of triage decisions, keyed by (purl, cve, decision, rationale, decided_by, decided_at), that is consulted at query time. Each row has a TTL — typically 90 days for not_affected decisions and 30 days for under_investigation. When a new CVE arrives, the triage process is:

Run the graph query → list of affected image digests.
Join against the internal VEX → strip out anything already triaged within the TTL.
Bucket the remaining rows by service owner → notify, with a 24-hour SLA for critical and 7-day for high.

The internal-VEX file is in git, reviewed in pull requests, and signed. It is the most boring artefact in the entire programme, and it is the most important one.

What this would have done in December 2021

We back-tested the bank's current pipeline against a snapshot of their image inventory from December 9, 2021 (the day before the log4j advisory).

Step	Time
Graph query for `org.apache.logging.log4j:log4j-core` in version range `[2.0,2.15)`	8 sec
Join to service-owner directory	11 sec
Initial triage assignment, 38 affected services across 12 teams	14 min
First patched container deployed to production (a customer-facing API)	45 min
Full fleet remediation (all 38 services)	11 hours

The actual December 2021 timeline at the same bank was: first detection, 6 hours; first patched container, 38 hours; declared remediation, 19 days. The intervening period included an incident in which two services were patched to the same vulnerable 2.14.0 version a colleague pulled from a cache, then re-patched. None of that would have happened with a graph-backed pipeline.

What to build, in order

If you are starting from a typical 2026 enterprise — Jenkins / GitHub Actions, Harbor or ECR, some flavour of Kubernetes, an existing Tenable or Wiz deployment — the build order is:

Standardise SBOM generation in CI, signed with Sigstore, attested to the image digest. (Two weeks.)
Stand up the graph store and an ingestion worker that consumes the registry's webhook. (Three weeks.)
Write three queries — by package, by CVE, by license — and a CLI that runs them. (One week.)
Build the internal-VEX repo with the three-field decision schema. (One week.)
Connect to ownership data from your engineering directory / IDP. (Variable; usually one to four weeks of political work.)
Run a tabletop against a real recent CVE and time the response. (One day.)

The total budget at the bank was 11 weeks of one senior engineer's time. The total budget at the logistics platform was 7 weeks, because they already had ownership data in good shape. Either is a fraction of the cost of the next log4j-class incident.

Source documents

CycloneDX 1.6 specification — preferred format; SPDX 2.3 also works
CISA SBOM minimum elements — useful baseline
in-toto attestation framework v1.0 — for binding SBOMs to image digests
Sigstore Cosign 2.x — keyless signing via OIDC
OSV.dev — federated vulnerability database, queryable by purl, no API key required
MITRE ATT&CK T1195.002 (Compromise Software Supply Chain) — the threat model

Reporting on this story was made possible by access provided under non-disclosure to the two organisations described. No client names are used at their request. CipherGuard retained full editorial control over the published findings.

An SBOM workflow that would have caught log4j in 45 minutes ​

Belief 1: "Generating the SBOM is the hard part" ​

Belief 2: "Coverage is what matters" ​

Belief 3: "VEX will save us" ​

What this would have done in December 2021 ​

What to build, in order ​