Axelspire

Certificate Revocation at Operational Scale

Revocation is the operational function for invalidating a certificate before its natural expiry. It is the function most enterprises haven't tested, can't execute quickly, and discover is broken at exactly the moment it matters — during a security incident or a key compromise.

Part of: Enterprise PKI Operating Model — the pillar page for the operations library.

The operational reality of revocation diverges sharply from the technical reality. Technically, revocation is straightforward: the CA publishes the revocation through CRL or OCSP, clients check, and the certificate is rejected. Operationally, revocation is a service-replacement event with cross-team coordination, change-management implications, and a blast radius that scales with how many services share the certificate or its key.

Featured Tool Runs fully in-browser

PKI Health Radar

Drag the sliders to assess your current posture — scores update instantly.

Why revocation is operationally hard

Most clients don't actually check revocation. Browser behaviour varies — Chrome's CRLSets cover only a subset of revocations. Some clients fail-open if the CRL or OCSP responder is unreachable. Many internal services don't check revocation at all. This means a revoked certificate is not necessarily a non-functional certificate — and an attacker presenting a revoked-but-not-checked certificate may still authenticate. Revocation is a control that depends on the verifier as much as the issuer.

Revocation requires replacement first. Revoking a certificate that is currently in use breaks the service it protects. The mature pattern is replace-then-revoke: deploy a new certificate, verify it is working, then revoke the old one. The immature pattern is revoke-then-scramble — and it is depressingly common when teams discover a key compromise and act without coordination.

The blast radius is invisible until it's exercised. The certificate protected one service that you knew about. It also protected three services that were configured to use it years ago, by teams that have since moved on, that no-one currently associates with this certificate. Revoking the certificate breaks all of them. The pre-revocation discovery exercise is part of the revocation workflow, not an optional extra.

Revocation propagation is asynchronous. CRLs are typically updated every few hours. OCSP responses are cached. Even when revocation succeeds at the CA, downstream verifiers may still accept the certificate for some period. The operating model has to account for this propagation lag in incident response timelines.

When revocation is actually required

Five scenarios trigger revocation. Each has different urgency and different operational handling.

Key compromise. The private key has been exposed — copied from a compromised host, leaked in a code repository, recovered by an attacker. This is the highest-urgency scenario. The certificate must be revoked because the holder of the private key can use it to impersonate the service. Replace-then-revoke is still preferred, but the replacement must happen at incident speed.

Certificate misissuance. The certificate was issued in violation of policy — wrong identity, wrong validity, wrong CA, wrong purpose. The certificate is functioning correctly but should not exist. Revocation is required for governance compliance, with normal change-management timing because the certificate is not actively dangerous.

Service decommissioning. The service the certificate protects is being shut down. The certificate is no longer needed. Revocation is good practice but not urgent — the certificate will simply expire. Many organisations skip revocation in this scenario, which is a defensible decision but introduces a small amount of revocation hygiene debt.

CA distrust events. The issuing CA has been distrusted by browsers or operating systems — the Entrust distrust by Chrome and Mozilla in 2024 is the recent example, with effective dates in 2024–2025 depending on certificate purpose. Affected certificates must be replaced, and ideally revoked, before the distrust takes effect to avoid client errors. This is a planned migration, not an emergency, but it has a hard deadline.

Compliance-driven revocation. Regulatory or contractual requirements that mandate revocation under specific conditions: the service has changed ownership, the certificate has changed purpose, an audit finding requires it. Revocation timing is dictated by the compliance requirement.

The operating model defines the response pattern for each scenario. The patterns differ on urgency, on coordination requirements, and on the level of approval required.

The replace-then-revoke pattern

For all scenarios except true key compromise, the replace-then-revoke pattern is the right operational approach. The five steps:

Figure 1. The replace-then-revoke pattern. Five sequential steps with verification (step 4) as the critical gate — revocation does not proceed until verification confirms the replacement is in use at every identified location. The immature anti-pattern is revoke-then-scramble.
Figure 1. The replace-then-revoke pattern. Five sequential steps with verification (step 4) as the critical gate — revocation does not proceed until verification confirms the replacement is in use at every identified location. The immature anti-pattern is revoke-then-scramble.

1. Identify all uses of the certificate. The discovery exercise: every service, host, load balancer, configuration, trust store that references this certificate or its key. The discovery scope must include locations the certificate's private key may have been copied to — backup systems, disaster recovery environments, development copies, key escrow stores. The blast radius is the union of all these locations.

2. Generate a replacement. Issue a new certificate using the standard issuance workflow. The replacement uses a new key pair (do not reuse the old key — the point of revocation is often that the old key is no longer trustworthy). The replacement certificate is held until step 3 is complete.

3. Deploy the replacement to all identified locations. Coordinate the deployment across all teams holding copies. For internal services, this is a change-management exercise; for external partners, it may require notification and a coordination window. Deployment is not complete until verification (step 4).

4. Verify replacement deployment. Active verification at every location: the new certificate is presented, the service is healthy, the configuration is current. Verification has to be specific — confirming the new certificate fingerprint, not just that a valid certificate is in place.

5. Revoke the original certificate. With replacements deployed and verified, the original certificate is revoked through the issuing CA. Revocation is logged with the reason code (key compromise, superseded, cessation of operation, privilege withdrawn, affiliation changed, certificate hold). The reason code matters for compliance and downstream consumers.

The whole process can take minutes (cloud-native estate, single team, automated everything) or weeks (regulated estate, multiple teams, manual coordination). The operating model has to support both.

The key-compromise emergency pattern

When the private key is genuinely exposed, the calculus changes. The risk of an attacker exploiting the compromised key during the replace-then-revoke window may exceed the risk of service disruption from immediate revocation. The decision is contextual:

Revoke immediately if the compromise is confirmed, the attacker is known to have the key, the service exposed by the certificate is high-value, and brief service disruption is preferable to continued exposure.

Replace-then-revoke if the compromise is suspected but unconfirmed, the certificate is widely deployed, the immediate impact of revocation would be substantial, and the additional exposure window is bounded.

The decision is typically made by the incident commander in consultation with the executive owner of the affected service. The operating model defines who has authority to make this call, what information they need, and what the post-decision process looks like. It does not pre-decide the answer because the right answer depends on the specifics of the compromise.

Recommended decision
Revoke immediately
Exposure risk: highDisruption risk: high

Reasoning

  • Confirmed key exposure — exposure risk is high and active.
  • Replacement is ready within minutes — minimum disruption window.
  • Immediate revocation reduces exposure faster than replace-then-revoke.
Decision authority

Incident commander + executive owner of the affected service (typically named CISO or VP Engineering).

Conditions that would shift the recommendation

  • If disruption risk reduces (simpler deployment, lower criticality), the case strengthens further.
  • If the compromise indicator downgrades to suspected, recommendation shifts to replace-then-revoke.

Download artefacts

Revocation versus removal

Revocation and removal are different operations and they get conflated.

Revocation is the act of marking a certificate as no longer trusted, through the CA's revocation infrastructure (CRL, OCSP). The certificate physically still exists on the deployed services, but verifiers that check revocation will reject it. Revocation is the cryptographic statement “this certificate is no longer valid”.

Removal is the act of taking the certificate off the service that uses it. The certificate file is deleted, the service configuration is updated to use a different certificate, the trust store is cleaned. Removal is the operational statement “this certificate is no longer in use”.

A certificate can be revoked but not removed (the typical state during the revocation propagation window). It can be removed but not revoked (a certificate that was decommissioned without going through the revocation workflow). The mature operating model tracks both states and recognises that they require different operational actions.

Where revocation breaks

No tested process. Revocation is exercised so rarely in most organisations that the process is theoretical. The first time it is needed at incident speed, the team discovers the playbook references tools that have changed, contacts that have left, and approval paths that no longer exist. The fix is to tabletop the revocation process at least annually, treating it as a first-class incident response exercise.

Incomplete blast radius identification. The team revoked the certificate they thought was the only one. Three other services using copies of the same certificate are now broken, and no-one knew those services existed. This is a discovery failure surfaced during revocation. The fix is to make blast-radius identification an explicit step in the revocation runbook, supported by the quality of the upstream discovery data — the radius can only be identified in advance if discovery has surfaced the dependencies.

Revocation propagation assumed instantaneous. The team revoked the certificate and assumed the revocation took effect immediately. CRLs and OCSP responses propagate over hours; cached copies may persist longer. Verifiers that don't check revocation never reflect the change. The fix is to align operational expectations with the technical reality — document the propagation window per CA, and treat revocation as effective only after propagation is verified, not at the moment the revocation is logged.

No reason-code discipline. Every revocation gets the default reason code (often “unspecified”) because no-one trained the team on the codes. Compliance auditors flag this. Downstream consumers (browser vendors, operating systems, federation partners) cannot interpret the revocation appropriately. The fix is to make the RFC 5280 reason codes part of the operational standard — train the team, encode the choice in the revocation runbook, and audit reason-code usage as part of the revocation review.

Revocation-only response to suspected compromise. The team revoked the certificate but didn't rotate the private key — and didn't investigate how the suspected compromise occurred. Revocation is a control on the certificate; key rotation is a control on the underlying secret; incident investigation is the control on the threat. The fix is a compromise-response runbook that requires all three controls in sequence — revocation, key rotation, and root-cause investigation — rather than treating revocation as the complete response.

Blast radius for
Shared-key wildcard
Services affected
375
Teams to coordinate
13
Coordination time
802 h
Shared-key multiplier applied: base radius 25 × multiplier 15× = 375 services affected.

Recommended pattern

Replace-then-revoke

The blast radius is large enough that immediate revocation would create more service-disruption risk than is justified by exposure reduction.

Risk flags

Large blast radius (375 services): coordination time exceeds standard incident-response windows. Consider replace-then-revoke even for confirmed compromise — service-disruption risk from immediate revocation likely exceeds exposure risk during the replace window.
Book a 30-minute revocation review

Maturity progression for revocation

The five-level PKI operational maturity model introduced in the pillar maps onto the revocation domain as follows.

Level 1 — Ad-hoc. No documented revocation process. When revocation is needed — an incident, an audit finding, a CA distrust event — someone improvises by reading the CA's documentation in real time. Reason codes are not used or consistently applied. The blast radius is discovered during the revocation, not before.

Level 2 — Tooled. A CLM exists with a revocation function. Some team members know how to use it. The technical mechanics work but the operational process is undocumented. Most actual revocations are decommissioning revocations done casually; the high-stakes scenarios (key compromise, mis-issuance) have never been executed and the team is not confident they would work.

Level 3 — Operationalised. Revocation runbooks exist. Replace-then-revoke is the documented pattern with explicit verification gates. Reason codes are part of the standard process. The team understands the workflow and the difference between revocation and removal. The blast-radius identification is connected to the discovery function.

Level 4 — Integrated. The replace-then-revoke runbook is exercised annually via tabletop or live drill. The operational team knows the playbook because they have run it. Revocation events are integrated with incident management, change management, and the broader incident response process. The decision authority for emergency revocation is documented and the named people know they hold it.

Level 5 — Intelligent. Revocation patterns produce operational intelligence. Recurring blast-radius surprises identify discovery gaps. Recurring revocation reasons identify policy weaknesses upstream. Revocation becomes rare not because the process is avoided but because the operating model prevents the conditions that require it — fewer mis-issuances, faster compromise detection, better key hygiene. When revocation does happen, it is fast and uneventful.

Most enterprises sit between levels 1 and 2 on revocation. The progression to level 3 takes three to six months once the runbook is documented. The progression to level 4 requires the discipline of annual tabletop exercises — the structural change that most organisations underestimate, because exercising a process you hope never to need is hard to justify against current operational pressure. It is justified anyway, because the alternative is discovering the process is broken when it matters.

Further reading within this cluster