Methodology

How DEPLOY tracks safety incidents and recalls: Verified evidence behind every safety claim

Safety claims are the highest-stakes layer of autonomous- systems coverage. The framework partitions the claim space into primary-source filings, manufacturer cause-claims, recall actions, and pending litigation, so operators evaluating "is X service safe" can match the answer to the evidence anchor each claim actually clears.

"Is this robot safe" is a question with three structurally distinct answers. There is the regulator-filed answer (NHTSA's recall record, NTSB's investigation finding, a state utility commission's suspension order). There is the manufacturer-claimed answer (the maker's published mitigation, the cause attribution in a corporate statement, the safety record the maker emphasizes in investor communication). And there is the pending-claim answer (allegations preserved in litigation, contested counterparty attribution, unresolved investigation status). Each answer is editorially meaningful. None is interchangeable with the others, and the framework's value is naming which answer a given safety claim actually anchors against.

The methodology rests on the constructs defined at /methodology/what-verified-means: the verified-vs-claimed boundary, source discipline, and cap-flag-as-trust-signal. Safety claims apply the same constructs at the highest-stakes claim layer; the consequences of vocabulary collapse here are the largest across DEPLOY's coverage, which is why source discipline is most adversarial to marketing-language safety framing at this layer.

Primary-source incident reports

DEPLOY's verification anchor for safety claims is the primary-source filing layer. In the United States, four regulator surfaces produce the load-bearing artifacts. NHTSA's Part 573 defect and noncompliance recall records anchor recall scope, defect description, remedy status, and ongoing reporting obligations; each recall carries a campaign number and a public investigation trail downstream consumers can verify against. The NTSB anchors accident-investigation findings for events that meet the board's jurisdiction. State utility commissions (California's CPUC for robotaxi services) anchor permit-action suspension orders. State law enforcement and municipal traffic-injury records anchor on-road incident reports outside the federal investigation framework.

A safety claim that anchors to one of these surfaces is verified safety claim; the regulator's published record is the evidence anchor that does not depend on the maker's narrative. The Waymo 3,791-vehicle flooding recall is the canonical worked example: Waymo reported the defect to NHTSA on May 1, 2026; NHTSA acknowledged the recall in a letter dated May 11, 2026; the federal framework required eight consecutive quarterly status reports followed by three annual reports. The recall scope, the defect description, and the remedy status are all anchored to the primary regulatory surface; DEPLOY cites the regulator surface rather than the maker's communication as the verification anchor.

Manufacturer cause-claims versus independent investigation findings

Makers publish cause-claims and proposed mitigations alongside the regulatory filings. The cause-claim is editorially significant; the maker's diagnosis informs how the deployment will operate going forward. The cause-claim is not the same artifact as an independent investigation finding; the maker's diagnosis is one input to the regulator's evaluation, not a substitute for the regulator's evaluation. The framework treats the two as structurally distinct.

DEPLOY's incident coverage separates the filing layer from the cause-claim layer. NHTSA's published defect description is filing layer. The maker's statement about why the defect occurred and what the planned mitigation is sits at the cause-claim layer. When the regulator has recorded a final remedy, the remedy moves to verified status. When the regulator has flagged the remedy as outstanding, the maker's "mitigations in place" framing is cause-claim, not verified remedy. The Waymo recall signal preserves this discipline explicitly: the interim software update and the tighter weather restrictions are surfaced as Waymo's shipped interim mitigation, and NHTSA's flag that no permanent fix has been provided is surfaced separately as regulatory state. Both are accurate; they are not interchangeable claims.

Recall versus service suspension distinction

A recall is a remedial action under the federal recall framework. The maker has reported a defect; the regulator has acknowledged the report; the affected vehicles are subject to manufacturer compliance with the remedy process. The recall record is durable; the campaign number and the reporting obligations persist regardless of whether the affected vehicles are currently in service. A service suspension is an operational decision. The maker has paused operations in a market or across a deployment envelope for reasons that may include software fixes, weather-related caution, or regulatory action; the suspension is potentially temporary, with documented conditions for resumption.

The two layers can coincide on the same event but are editorially distinct. Waymo's response to the flooding defect included both a NHTSA-anchored recall (recall framework, campaign number, mandatory reporting cadence) and an operational suspension in Atlanta and San Antonio (service-decision framework, expected resumption pending software fix). The two surfaces coexist; an operator evaluating "is the flooding issue resolved" reads the recall record for the regulator's remedy status and the service-suspension record for the operational reopening status. Similarly, Waymo's freeway-service halt across four markets is an operational suspension without (at time of writing) a corresponding NHTSA recall record; the operational layer carries the suspension; the federal recall layer is silent. DEPLOY surfaces the two layers separately so the operator query lands at the layer that actually carries the answer.

Pending lawsuit allegations as preserved claims

Civil litigation against autonomous-systems operators produces a third claim layer: allegations preserved in court filings that may or may not survive adjudication. DEPLOY's framework treats lawsuit allegations as preserved claims, not as verified findings. The plaintiff's allegation is editorially meaningful; the allegation frames a potential safety concern, names a specific deployment context, and produces a public record the adversarial process will evaluate. The allegation is not the same artifact as an adjudicated finding; until a court has ruled, the allegation is claim, not verified.

DEPLOY's incident coverage preserves lawsuit allegations with explicit not-adjudicated framing. The cyclist-injury lawsuit against Avride (Jersey City, October 2025) is a worked example of the discipline: the allegation is preserved in the registry's incident record, the plaintiff's specific claims are surfaced, and the adjudication status is flagged as pending. Subsequent court action moves the allegation from preserved claim to adjudicated finding (or to dismissed claim, which is itself a different editorial artifact than ongoing allegation). The framework distinguishes the three states so operators encountering the incident's coverage know where the verification surface currently stands.

Cap-flag application and recursive discipline

Some safety incidents reach DEPLOY's coverage threshold but do not anchor at the source-discipline floor. A collision incident with single-outlet reporting, where adjacent trade press has not picked up the story and no regulatory filing has surfaced, sits at the cap-flag territory: the event is editorially significant enough to surface (the deployment occurred, the collision occurred, the event is on the public record), but the verification surface is thinner than DEPLOY's standard threshold. The cap-flag is the published verification posture: incident surfaced, verification surface thin, source-set named. The December 2024 Waymo Serve incident in West Hollywood is a worked example: TechCrunch's reporting is the sole originator, no NHTSA filing landed, and DEPLOY surfaces the incident with the three-source set explicitly cap-flagged rather than padding to clear the threshold or omitting the incident entirely.

Applying the framework recursively to DEPLOY's own safety-coverage discipline produces the load-bearing credibility surface. Wave 4 of the source-depth campaign deepened incident coverage across multiple makers and regulatory surfaces; the cleanup discipline operates the same verified-vs-claimed boundary on DEPLOY's own incident records as on the maker claims the framework discriminates against. The Waymo recall signal's inline editorial transparency footer is the most explicit demonstration: the signal's current published version corrects an earlier scaffold-state version that stated the recall scope as 1,200 vehicles rather than the actual 3,791 per NHTSA. The correction is surfaced inline; the earlier error is preserved as editorial record; the discipline that caught the error (cross-link integrity audit + adjacent placeholder-content audit) is named. DEPLOY's safety-coverage discipline includes its own correction surface; the recursive application is that the same verified-vs-claimed framework that operates on maker safety claims operates on DEPLOY's own safety records.

Where to go next

Operators evaluating safety records for a specific maker should start at the registry's incident surface (verified incidents), which catalogs incidents per maker with the verification anchor, regulator surface, manufacturer cause-claim, remedy status, and adjudication state preserved per incident. Primary regulatory references include NHTSA's Automated Vehicle safety framework and FMCSA's autonomous and connected vehicles framework for commercial motor vehicles. DEPLOY's consumer surfaces will provide entity-level safety overviews aggregating incident records, recall actions, and pending litigation per maker once published; the safety-claim constructs defined here govern what surfaces as verified, claimed, cap-flagged, or pending across those pages.

What 'verified' means at DEPLOY → the canonical reference for verified-vs-claimed boundary, source discipline, and cap-flag-as-trust-signal.
How DEPLOY verifies deployment status → the same framework applied to active, paused, and ended deployment states; pairs closely with safety because service suspensions often follow safety events.
How DEPLOY verifies capability claims → demo-versus-deployment, single-task versus general- purpose, lab versus real-world envelopes.
How DEPLOY tracks pricing claims → projected versus available, pre-order versus production, at-scale versus near-term.
DEPLOY's verification framework: the four anchors → counterparty, operating envelope, absence of human-in-loop, repeatability.