Ongoing management of AI systems: beyond the pilot
Most AI failures happen not at launch but quietly, months later, after the project team has moved on. Ongoing management is the unglamorous discipline that prevents them.
Most AI failures happen not at launch but quietly, months later, after the project team has moved on. Ongoing management is the unglamorous discipline that prevents them.
An AI deployment usually passes its launch review. The pilot showed value. The documentation existed. Someone signed it off. Then the project team disbanded, the champion moved on, and the system continued running with no clear owner. Six months later, customer complaints, regulator enquiries or a quietly worsening business metric reveal that the model has drifted, the vendor has changed a default, or the workflow around the model has degraded.
The pattern is consistent and avoidable. It comes from treating AI deployment as a project rather than a service. Projects end; services have to be managed.
Anything that changes can cause an incident. For an AI system in production, four things change continuously, three of them often invisibly to the customer.
The data. The distribution of inputs to the model is rarely stable. Customer language shifts, products are added, seasonality bites. A model trained on last year's data sees this year's reality through a slightly distorted lens. This is covariate shift; left unmanaged it produces a slow, invisible decay in performance (Quiñonero-Candela et al. 2009).
The model. Hosted models are updated by the vendor, sometimes without a version bump that the customer notices. Self-hosted models are retrained, re-quantised, or moved between hardware. Each change is a potential regression.
The prompts and configuration. The instructions, the system prompt, the temperature, the retrieval index — these are usually edited freely by the team that owns the application, often without version control or testing. They are the most common cause of regression in practice.
The surrounding workflow. The humans, queues, escalation paths and downstream systems that consume the model's output. These are organic and change continuously without anyone calling it a release.
Traditional IT governance leans on periodic review — quarterly, annually. For AI this is the wrong frequency. Failure modes are statistical: error rates creep up, hallucination categories shift, confidence calibrations drift. By the time a quarterly review notices, the impact is already a number on a report.
Effective ongoing management defines a small set of operational metrics — task-level quality, refusal rates, latency, complaint rates, override rates — and watches them weekly. The point is not statistical perfection; it is early warning.
Regulators are increasingly explicit that high-risk AI must be subject to meaningful human oversight (European Parliament 2024). In practice, oversight is often theoretical. The reviewer sees the model's output, ticks a box and ships it. The signal that the human added is zero.
Meaningful oversight requires three conditions: the reviewer has the time to form an independent view; they have the information needed (inputs, model version, confidence indicators, comparable examples); and they have the standing to overrule the model without career consequences. Designing oversight without these three is theatre.
The most reliable pattern I have seen across mature AI operations is a three-cadence rhythm.
Weekly: the small group that owns the AI system reviews operational metrics and any incidents. Most weeks this is a fifteen-minute meeting that concludes nothing. That is the point. The rare week when something has moved is what the rhythm exists to catch.
Monthly: a wider review covering change control (what shipped, what regressed), supplier updates, data quality and a sample of outputs reviewed by an independent reviewer.
Quarterly: recertification. The system's risk classification, control set, and human-oversight design are revisited. Anything that no longer fits is fixed before the next cycle.
ISO/IEC 42001 formalises this rhythm through clauses 9 (performance evaluation) and 10 (improvement), with hooks into management review (ISO/IEC 2023). The standard does not invent the discipline; it gives it a vocabulary that auditors can test against.
ISO-STANDARD.app gives every AI system an owner, an operating rhythm, a control set and a recertification cadence — so drift is caught early and oversight is more than a tick-box.
ISO-STANDARD.app ships a ready-to-adopt ISO 42001 workspace with the risk register, controls catalogue, policies and audit-ready exports already wired together — no spreadsheet sprawl, no consultant lock-in.
Prefer a conversation? Email hello@iso-standard.app — a real human responds within one business day.