Methodology

How DEPLOY tracks deployment lifecycle state

Four maturity stages with exact definitions. State transitions verified against primary-source evidence. Ended and paused states preserved honestly.

The four maturity stages

Most coverage in physical AI collapses every deployment into a single word. Deployed. Commercial. Live. The single word hides structurally different operating states. We separate them.

Research means lab-only sales, study, or development. The platform exists. Customers do not yet operate it for real economic work. Units may ship to researchers, universities, or internal development teams. The deployment record carries the customer relationship if there is one; the maturity stage stays research until the customer puts the platform into work that pays for itself.

Pilot means limited trials. The maker is operating the platform with a customer of record under a defined trial scope. The trial has a start, a stated success criterion, and an expected outcome (expand, retain, or end). A pilot is verified when the customer of record discloses or the operator discloses a pilot relationship; the trial scope is part of the verified state.

Commercial means real economic work. The customer pays for output the platform delivers, on terms equivalent to or better than the non-autonomous baseline. Commercial deployments generate revenue against a defined unit (per ride, per pick, per mile, per procedure). A commercial deployment is verified when the customer of record confirms ongoing paid operation, when a regulator records commercial authorization, or when an SEC filing discloses commercial revenue from the deployment.

Production means mass-produced. The platform ships at volume on a defined production line. Manufacturing cadence is regular. The platform is generally available to customers who meet the maker's acceptance criteria. A production stage is verified against the maker's published manufacturing record, regulator clearance scope at production volume, or SEC disclosures of unit volume shipped.

State transitions

Maturity is the stage. State is what is happening right now. Both move independently.

A pilot can be announced, active, paused, or ended. A commercial deployment can be active, paused, or ended. A production stage can be active, paused (a line halted for retooling, a recall sweep, a regulator order), or ended (the product is wound down).

We track maturity and state as separate fields. A reader on a deployment page sees both. A pilot that paused after a regulator order is pilot stage plus paused state plus the regulator order as the primary source. A commercial deployment that the operator ended after a customer pivot is commercial stage plus ended state plus the wind-down evidence preserved in the source history.

Worked examples

The Tesla Optimus deployments at Tesla factories sit at research stage. Optimus units perform battery-cell handling and basic logistics tasks on Tesla manufacturing lines. The customer of record is Tesla itself, performing the work internally. The deployment is editorially significant; the maturity stage is honest about what is happening.

Apptronik Apollo across Mercedes-Benz, GXO, and Jabil sits at pilot stage. Three enterprise customers, three trial scopes, three customer-of-record confirmations. Per-deployment throughput data is cap-flagged pending evidence; the pilot stage is verified at the contract layer, with the throughput surface still open.

Bot Auto's Houston-to-Dallas humanless commercial truckload on April 29, 2026 sits at commercial stage at first-event scope. A 231-mile run booked through Ryan Transportation's brokerage with no safety driver, no in-cab observer, and no low-latency remote teleoperation backup. The commercial stage is verified at the customer of record; repeatability anchors via sustained subsequent operation along the corridor.

Agility Robotics' Digit at GXO Flowery Branch sits at commercial stage at sustained scope. The 100,000-tote milestone is verified by the customer of record. The multi-year Robots-as-a-Service contract is the verification surface.

Figure 02 at BMW Spartanburg sits at commercial stage. 30,000 BMW X3 vehicles built across an 11-month live production deployment, with chassis assembly work accepted into BMW's normal quality-control process. The commercial stage is verified at end-product OEM acceptance.

Intuitive Surgical da Vinci sits at production stage. 11,395 da Vinci systems plus 1,041 Ion systems are installed across customer sites per SEC 10-Q. The production stage is verified at SEC disclosure of installed unit volume.

Cruise's San Francisco service sits at commercial stage plus ended state. Operations were active under CPUC permit through October 2023, paused after the pedestrian-dragging incident, and subsequently ended through GM's corporate decision to retire the robotaxi business. The state transitioned from active to paused to ended along documented evidence at each step.

Waymo's Atlanta and San Antonio service sits at commercial stage plus paused state. Service paused after vehicles repeatedly drove into flooded roads. Waymo published the software-update work and the conditional resumption plan, anchoring the paused state with both a documented reason and an expected resumption.

Source discipline

Research stage is verified against the maker's own surface plus the customer relationship when one exists. We do not treat a maker's framing of research as commercial.

Pilot stage is verified against a customer of record. The customer's own disclosure (a press release, an earnings-call mention, an investor-disclosure line item, an on-site journalist report) anchors the pilot. The maker's announcement of a pilot is editorially significant; it does not anchor the pilot at customer-of-record verification depth on its own.

Commercial stage is verified against the customer of record confirming ongoing paid operation, a regulator recording commercial authorization, or an SEC filing disclosing revenue from the deployment. Trade-press coverage that names a deployment as commercial without one of these primary sources is treated as claim.

Production stage is verified against the maker's published manufacturing record, regulator clearance at production scope, or SEC disclosures of unit volume shipped. Production claims without one of these primary sources are claimed at the maker surface, not verified at production scope.

For the cross-cohort source-quality discipline, see /methodology/what-verified-means and the 9-tier source-quality rubric.

Cap-flag application

A deployment record without current evidence is cap-flagged at stale. The original anchor remains visible. The cap-flag names the date the evidence last resolved. We do not treat stale evidence as current state.

A deployment that announced a pilot but has not surfaced any customer confirmation since the announcement is cap-flagged at announced-but-not-confirmed. The maker's announcement is surfaced as claim; the pilot is not verified at customer-of- record depth until the customer confirms.

A commercial deployment that paused without a documented reason and without an expected resumption is cap-flagged at paused-without-disclosure. Some operational pauses are not announced; the deployment carries the cap-flag until the operator or a primary source resolves the state.

A deployment that wound down without a documented ending is cap-flagged at ended-without-disclosure. The state is recorded as ended; the absence of a documented cause is named.

For the canonical cap-flag triggers, see /methodology/what-verified-means.

The framework applies to us

State drift is the default failure mode at registry scale. A deployment is announced. The announcement becomes the verification anchor. Months later, the deployment has expanded, contracted, paused, or ended. The original anchor is no longer current. Default behavior at most data sources is to leave the anchor in place. This is how registry entries accumulate stale claims that read as verified state when they no longer are.

We refresh evidence anchors on a recurring cycle. The source-depth campaign deepens deployment records against current sources. Wave 3 of the campaign produced eleven stale-status corrections across seventy-nine deepened entities. The corrections include ended-misregistered-as-active and paused-misregistered-as-active patterns, plus active-but- evidence-stale entries re-anchored to current sources.

The Waymo Driver Austin deployment is the recursive worked example. The deployment launched via Waymo's partnership with Uber in March 2025 and has remained active and expanding through May 2026. The registry entry drifted into a paused- state assertion during a period when the evidence anchor was not refreshed. Wave 3 source-depth research re-verified against current evidence and surfaced the paused-misregistered- as-active correction; the deployment state was corrected to verified-active with refreshed evidence anchors.

The correction is logged at /corrections. The original anchor is preserved in the source history. A reader can see what we knew when, and what changed.

Where to go next

The methodology canon: /methodology/what-verified-means for the core verified-vs-claimed framework, /methodology/how-we-verify-deployment-status for the parallel framework on the active / paused / ended state field, /methodology/operating-envelope-precision for envelope precision in autonomous freight.

The applied work: registry.deploy.report/deployments for the deployment records with verified maturity stage and state per entity, registry.deploy.report/verified-deployments for the verified-only cut.

Continue reading