What does a PKI support model define?

A PKI support model defines five components. Team structure: who is on the team, their roles, and how they relate to adjacent teams. RACI per operational domain: who is responsible, accountable, consulted, and informed for each operational responsibility. Coverage model: when the team is available and what happens outside their hours — business hours only, business hours plus on-call, or follow-the-sun. Escalation path: who the team escalates to when an issue exceeds their authority or capability. Joiner/leaver process: how new team members get the access, training, and authority they need, and how leaving members hand over their work and have access revoked. Without explicit definition of these five components, every other domain in the operating model fails in the same predictable way: work that has no named owner doesn't get done.

How do you size a PKI operations team?

Team size depends on workload and skill level required. Volume drivers are number of distinct platforms integrated, number of distinct service teams supported, number of incidents per quarter, number of normal-change events, and number of new onboardings per year. Certificate count alone is a poor sizing metric. Typical patterns: sub-1,000 certificates with mature automation needs 0.5-1 FTE; 1,000-10,000 certificates needs 1-3 FTE dedicated team; 10,000-50,000 certificates needs 3-8 FTE with domain specialisation; 50,000+ needs 8+ FTE typically split between operations and engineering. The dominant variable is the percentage of work that is incident-driven versus steady-state — a team with 60 percent incident-driven work needs roughly 1.5x the FTE of a pure steady-state team because incident response is bursty and capacity-intensive.

How do you build a useful PKI RACI?

Producing a useful RACI is a half-day exercise that most organisations have never done. The format is a grid where columns are roles (named functions, not named people) and rows are operational responsibilities. Cells contain R (responsible — does the work), A (accountable — owns the outcome), C (consulted — provides input), or I (informed — told what happened). Rules: exactly one Accountable per row (multiple A means accountability is not actually defined), multiple Responsibles are fine, empty rows indicate gaps the operating model has not addressed. The half-day exercise produces the artefact; the harder work is socialising it with the named functions and getting agreement. Most organisations have a draft RACI that has never been agreed by all parties — the agreement is what makes the document operational.

What are the three PKI coverage models?

Business hours only with general escalation: the team works business hours; incidents outside hours go to general infrastructure on-call who escalates to certificate operations the next business day. Works for estates where after-hours certificate-incident impact is acceptable to wait until morning. Business hours plus on-call: the team works business hours with an on-call rotation for after-hours incidents; common for medium-to-large estates. Follow-the-sun coverage: the team is distributed across time zones for continuous coverage; fits global organisations with very large estates. The coverage decision is parameterised against estate size, geographic distribution, business criticality, and the cost-benefit of after-hours response. Most organisations under-invest in coverage relative to the risk.

Why is the joiner/leaver process critical for PKI operations?

Mundane and consequential — most operational drift originates here. Joiner process should be checklist-driven and ensure access to CLM, CAs, and platforms; membership in team channels; knowledge transfer on operating model and runbooks; inclusion in on-call after a documented qualification period; documented role and responsibilities. Leaver process is the one that fails most often: access remains active after departure, knowledge held by the leaver is lost, certificates they were named owner of become orphaned. The fix is making the leaver process a defined exit gate where completion of the checklist is required before access is finally removed: knowledge transfer of work-in-progress, documentation of tribal knowledge they alone held, removal from on-call, revocation of CLM/CA/platform access, closure of privileged access accounts, reassignment of named ownership.

What is the single point of failure risk in PKI operations teams?

One engineer holds essential knowledge that no-one else has. They are unavailable; the team is paralysed. This is the most common operational risk in mature PKI teams because as the team grows, individuals naturally develop deep specialisation in particular platforms, CAs, or workflows. The fix has three parts: documented runbooks (knowledge that cannot be lost has to be in writing, not in one person's head); shared incident response (incidents are run with at least two engineers so knowledge transfers organically); and rotation of who handles which kinds of work (preventing accidental specialisation that becomes single-point-of-failure).

PKI Support Model and RACI

The support model is the part of the operating model that turns process into action. The eleven domains described elsewhere in this library define what needs to happen; the support model defines who does it, when, and through what channels. Without an explicit support model, every other domain in the operating model fails in the same predictable way: work that has no named owner doesn't get done.

Part of: Enterprise PKI Operating Model — the pillar page for the operations library.

The support model is also the most context-dependent part of the operating model. A 200-engineer organisation cannot run the structure that fits a 50,000-employee bank, and neither fits a small specialist team. Parameterisation matters more here than anywhere else. The principles are universal; the implementation depends on the organisation.

Featured Tool Runs fully in-browser

PKI Health Radar

Drag the sliders to assess your current posture — scores update instantly.

6 more tools: Cost & Risk Explorer Timeline Builder Shadow Heatmap Process Transform Slider Scenario Comparator What-If Demo All tools & guide →

What a support model defines

Five components, each of which has to be answered explicitly.

Figure 1. The support-model topology. Five components, each addressing a distinct operational question, all interdependent. Team structure is the foundation; RACI translates structure into responsibility; coverage extends responsibility across time; escalation extends authority across hierarchy; joiner/leaver extends responsibility across personnel changes. Gaps in any component degrade the others.

Team structure. Who is on the team, what their roles are, how they relate to adjacent teams (security, infrastructure, application teams, compliance). The team can be a single person, a small dedicated team, or a federated function spread across multiple groups — the structure depends on volume and complexity.

RACI per operational domain. For each of the eleven domains in the operating model, who is responsible (does the work), who is accountable (owns the outcome), who is consulted (provides input), who is informed (told what happened). The RACI is the artefact that makes the support model concrete.

Coverage model. When is the team available, and what happens outside their hours? On-call rotation, follow-the-sun coverage, or business-hours-only with escalation to general infrastructure support outside hours. The coverage model has to match the operational risk profile of the certificate estate.

Escalation path. When the operations team cannot resolve an issue, who do they escalate to? When their authority is exceeded, who do they consult? The escalation path is where governance meets operations — the steering function described in the governance spoke is typically at the top of this path.

Joiner/leaver process. How do new team members get the access, training, and authority they need to operate? How do leaving members hand over their work and have their access revoked? The joiner/leaver process is mundane but it is the most common source of operational drift — access and knowledge accumulate without explicit transition discipline.

Sizing the team

The single question that determines team size: how much certificate operations work does the estate generate, and at what skill level?

Volume drivers. The number of certificates under management is one driver, but not the dominant one. The dominant drivers are:

Number of distinct platforms integrated (each platform requires platform-specific operational knowledge).
Number of distinct service teams the operations team supports (each team is a relationship that needs maintenance).
Number of incidents per quarter (incident response is bursty and capacity-intensive).
Number of normal-change events per quarter (each requires review, scheduling, coordination).
Number of new onboardings per year (onboarding is project-shaped work that interrupts steady-state operations).

Skill-level drivers. Certificate operations work splits into routine operational tasks (where automation reduces human effort), specialist work (incident response, design decisions, complex onboarding), and consulting (helping service teams understand certificate operations). The team has to have capacity at each skill level.

Typical sizing patterns. The patterns we see across enterprise estates:

Sub-1,000 certificates with mature automation: 0.5–1 FTE (often a portion of a security engineer's time).
1,000–10,000 certificates: 1–3 FTE dedicated team.
10,000–50,000 certificates: 3–8 FTE dedicated team with specialisation by domain.
50,000+ certificates: 8+ FTE team, typically split between operations and engineering functions.

These are starting points, not prescriptions. Estates with higher complexity (multi-cloud, regulated, high incident rate) need more capacity per certificate than the simple ratios suggest. Estates with strong automation need less. The right number for a specific organisation depends on the actual workload, which the operating model surfaces through measurement.

The dominant variable that the simple ratios miss is the percentage of work that is incident-driven versus steady-state. A team whose work is 80% steady-state can be sized against the steady-state volume — automation handles most of it, the team handles the exceptions. A team whose work is 60% incident-driven needs roughly 1.5× the FTE of a pure steady-state team because incident response is bursty and capacity-intensive: the team has to be sized against the peaks, not the average. Mature operating models systematically reduce the incident-driven percentage by fixing the operational gaps that cause incidents.

PKI Team Sizing Calculator

Parameterised against estate complexity and incident load.

Estate Size (Certs)

Platforms

Service Teams

Incident Workload (35%)

Steady StateReactive/Incident

Recommended Capacity

2.3FTE

Steady-state baseline2.0 FTE

Incident multiplier

Sizing against peaks, not averages, for bursty incident response.

×1.2

Primary Cost Driver

Operational Tempo

The RACI exercise

Producing a useful RACI for PKI is a half-day exercise that most organisations have never done. The format: a grid where the columns are roles (named functions, not named people) and the rows are operational responsibilities. The cells contain R, A, C, or I (or are blank, indicating no involvement).

The roles are typically:

Certificate operations team lead. Accountable for the operations function.
Certificate operations engineers. Responsible for execution.
PKI executive owner. Accountable for the overall PKI function.
Service owners. Responsible for their service's certificate state, with various levels of automation help.
Security operations centre. Responsible for security monitoring and response.
Compliance and audit. Consulted on policy, informed on operations.
Identity and access management. Consulted on identity-related certificate use.
Infrastructure and platform teams. Consulted on platform-specific integrations.
Change advisory board. Accountable for change governance, consulted or informed depending on classification.

The responsibilities, derived from the operational domains:

Policy definition and updates.
Certificate issuance (per workflow type).
Certificate renewal (per service class).
Certificate revocation (routine and emergency).
Discovery operations.
Anomaly remediation.
Platform onboarding.
Trust-store management.
Incident response (per incident category).
Change execution (per classification).
Reporting and KPIs.
Joiner/leaver process.

A useful RACI has exactly one Accountable for each row. Multiple Responsibles are fine; multiple Accountables means the accountability is not actually defined. Empty rows (no Accountable) are gaps the operating model has not addressed.

The half-day exercise produces the RACI artefact. The harder work is socialising it with the named functions and getting agreement that the assignments are correct. Most organisations have a draft RACI that has never been agreed by all parties; the agreement is what makes the document operational.

Interactive RACI Builder

Define accountability for the 11 operational domains.

Responsibility	Lead	Engineer	CISO	Service Owner	SOC	Audit	IAM	Status
Policy Definition								Missing Accountable
Certificate Issuance								Missing Accountable
Certificate Renewal								Missing Accountable
Revocation Ops								Missing Accountable
Discovery Ops								Missing Accountable
Anomaly Remediation								Missing Accountable
Platform Onboarding								Missing Accountable
Trust-store Mgmt								Missing Accountable
Incident Response								Missing Accountable
Change Execution								Missing Accountable

(A) ACCOUNTABLE

(R) RESPONSIBLE

Coverage and on-call

Certificate incidents, particularly expiry incidents, can occur outside business hours. Renewal pipeline failures that happen on a Friday evening, if undetected, can produce a Monday morning outage. The coverage model has to match the risk.

Three patterns:

Business-hours only with general escalation. The certificate operations team works business hours. Incidents outside hours are handled by the general infrastructure on-call team, who escalates to certificate operations the next business day. This pattern works for estates where the impact of an after-hours certificate incident is acceptable to wait until morning.

Business-hours plus on-call. The certificate operations team works business hours, with an on-call rotation for after-hours incidents. The on-call engineer responds to certificate-specific incidents directly. This pattern is common for medium-to-large estates where the impact justifies on-call capacity.

Follow-the-sun coverage. The certificate operations team is distributed across time zones, providing continuous coverage. This pattern fits global organisations with very large estates and high operational tempo.

The coverage decision is parameterised against estate size, geographic distribution, business criticality, and the cost-benefit calculation for after-hours response. Most organisations under-invest in coverage relative to the risk; certificate-driven incidents have outsize customer impact and the cost of after-hours response is small compared to the cost of a Monday-morning outage.

The joiner/leaver process

Mundane and consequential. Most operational drift originates here.

Joiner process. A new team member needs:

Access to the CLM (with role-appropriate permissions).
Access to the CAs (typically read access for most engineers, write access for specific roles).
Access to the platforms the team supports (cloud accounts, on-premise systems, ticketing systems).
Membership in the team's communication channels.
Knowledge transfer on the operating model, the runbooks, the team's history with key services.
Inclusion in the on-call rotation (after a documented qualification period).
Documentation of their role and responsibilities.

The process should be checklist-driven so that nothing is missed and the new team member is genuinely productive within their first weeks rather than discovering missing access and knowledge gaps over months.

Leaver process. A departing team member needs:

Knowledge transfer of any work-in-progress.
Documentation of any tribal knowledge they have been the sole holder of.
Removal from the on-call rotation.
Revocation of access to the CLM, CAs, platforms, and channels.
Closure of any privileged access accounts (CyberArk, equivalent).
Reassignment of any specific certificates or services they were the named owner of.

The leaver process is the one that fails most often. Access remains active long after the person has left. Knowledge they held alone is lost. Certificates they were the named owner of become orphaned. The fix is making the leaver process a defined exit gate — completion of the checklist is required before the leaver's access is finally removed.

Where the support model breaks

Implicit ownership. No-one was ever explicitly assigned as the owner of the certificate operations function. The team that handles it has formed organically. When that team is reorganised, the function is re-discovered by the next team after the next incident. The fix is explicit assignment, documented in the org chart and the operating model.

RACI as documentation, not practice. The RACI document exists. No-one has been told what their role on it is. When work needs to happen, the named-Responsible is unaware and the named-Accountable is not paying attention. The fix is socialisation — every named role on the RACI is explicitly briefed on what they have agreed to and signs off on the assignment.

Coverage that doesn't match risk. The team is staffed for business-hours work. After-hours incidents happen. The general infrastructure on-call team handles them, badly, because they don't have certificate-specific expertise. The fix is honest assessment of after-hours risk and appropriate coverage investment.

Joiner training that doesn't survive past the first week. The new team member completed the onboarding checklist. They are still missing significant operational knowledge that the team's senior members assume they have. They make their first mistake six weeks in. The fix is structured ongoing knowledge transfer for the first three to six months, with explicit checkpoints.

Single-point-of-failure team members. One engineer holds essential knowledge that no-one else has. They are unavailable; the team is paralysed. The fix is documented runbooks, shared incident response, and rotation of who handles which kinds of work. Knowledge that cannot be lost has to be in writing, not in one person's head.

Maturity progression for the support model

The five-level PKI operational maturity model introduced in the pillar maps onto the support model domain as follows.

Level 1 — Ad-hoc. No-one is explicitly assigned as the owner of the certificate operations function. Whoever happens to know about the CLM handles requests. There is no RACI, no documented coverage model, no joiner/leaver process specific to PKI. New team members learn by accident; departing team members take their knowledge with them.

Level 2 — Tooled. A team exists, identifiable by name, with a team lead. A draft RACI may exist as an internal document but has not been agreed by the named functions. Coverage is “whatever the team can do during business hours” with general infrastructure on-call as fallback. Joiner/leaver processes exist but are checklist-driven only for access — knowledge transfer is informal.

Level 3 — Operationalised. Team structure is documented in the operating model. The RACI exists and assigns roles to operational responsibilities. Coverage model matches the operational risk — typically business-hours plus on-call for medium estates. Joiner/leaver checklist covers access, channels, and knowledge transfer. Single-point-of-failure risk is recognised but not yet systematically mitigated.

Level 4 — Integrated. Two structural elements operate together: the RACI is socialised and signed by all named roles (it represents agreed allocation of responsibility, not unilateral assignment), and joiner/leaver process is exercised with rigour — every joiner is fully equipped before declaring onboarding complete; every leaver passes the exit gate before final access removal. The team structure has been deliberately designed to avoid single-point-of-failure roles through documentation discipline and rotation.

Level 5 — Intelligent. Access provisioning runs through an automated joiner/leaver workflow connected to identity and access management — joiner triggers automatically grant the catalogued access set; leaver triggers automatically revoke. The RACI is a living document, updated as the operating model evolves and re-signed at the named-role level on a quarterly cadence. Team capacity is data-driven from the operational metrics, not estimated from headcount ratios.

Most enterprises sit at level 1 or 2 because the support model is treated as an HR matter rather than an operational discipline. Progression to level 3 takes a quarter of focused work to produce the RACI, get it signed, and document the joiner/leaver process. Progression to level 4 requires the discipline of socialising the RACI with named functions and treating joiner/leaver as exit gates rather than checklists. Progression to level 5 typically requires investment in identity-and-access automation that organisations defer because it crosses team boundaries.