AxelSpire

Designing PKI for the National Payment Utility: Architecture Options for Non-Bank PSPs

Issuer-key custody, ceremony rehearsal, and OCSP availability tied to payment SLAs are the items SARB and counterparties see first.
Issuer-key custody, ceremony rehearsal, and OCSP availability tied to payment SLAs are the items SARB and counterparties see first.

The SARB Authorisation Framework and Exemption Notice published alongside the Payments Ecosystem Modernisation Programme open the National Payment System to non-bank Payment Service Providers (PSPs) at parity with banks. The trade-off: cybersecurity, AML, client-fund, and operational-resilience expectations apply to non-banks at full regulator standard. Non-bank PSPs that historically rode on a sponsor bank's PKI now need their own.

This page is for architects and engineering leads at non-bank PSPs preparing for direct NPU participation. It walks through the five PKI problems that determine authorisation outcomes, with vendor-neutral guidance on each.

TL;DR

If you are a non-bank PSP preparing for NPU participation under SARB's activity-based Authorisation Framework, the PKI work splits cleanly into six problems:

  1. CA topology. Build, buy, or hybrid. Most non-bank PSPs should start with managed PKI plus a small internal CA for service mesh.
  2. Key custody. FIPS 140-3 Level 3 HSMs for any signing key the NPU or counterparties will trust. Documented and rehearsed key ceremony.
  3. Lifecycle automation. ACME (RFC 8555) for everything that supports it; ARI (RFC 9773) for renewal coordination. The CA/Browser Forum SC-081v3 trajectory toward 47-day validity makes this non-optional. High-availability OCSP/CRL infrastructure tied to payment-rail SLAs.
  4. Crypto-agility. Algorithm-agnostic pipelines now. PQC algorithms (FIPS 203/204/205) when counterparties accept them.
  5. POPIA-aligned audit and governance. Issuance events logged, key custody documented, subject minimisation enforced.
  6. Validation, vendor governance, ongoing assurance. Sandbox cutover testing, managed-PKI vendor due diligence, KPIs tied to payment SLAs, named succession for ceremony roles.

This page covers each in turn, with reference architecture and the operational gotchas Axelspire sees most often in SA engagements.

If you are still working through which trust model applies to which initiative, read PKI trust models for SA's federated payment and identity ecosystem first.

Architecture risk for
Mid-stage non-bank PSP
Overall risk
11 / 18MEDIUM
Deal-breakers
0
Authorisation Framework

Per-axis breakdown

CA topologyLOW
Hybrid (private internal + managed public for external)
Internal CA for service-to-service, managed public for external-facing.
HSM custody and key ceremonyMEDIUM
Cloud HSM with documented ceremony
AWS CloudHSM / Azure Dedicated HSM / GCP Cloud HSM, ceremony documented and rehearsed annually.
Lifecycle automationMEDIUM
Partial ACME on a subset
ACME on public TLS for some properties; private CA still manual or scripted.
Crypto-agility readinessMEDIUM
Inventory in place, substitution untested
Algorithm-aware inventory exists; hybrid issuance untested.
POPIA, audit, and governanceMEDIUM
Partial — policy exists, evidence weak
CP/CPS exist; audit cadence informal; POPIA mapping partial.
Validation, vendor governance, ongoing assuranceMEDIUM
Sandbox exists but parity is partial
Sandbox exists but does not match production CA topology, HSM, or revocation services.

Recommendations

Extend ACME coverage and add ARIMEDIUM

Partial ACME leaves manual exception paths that erode operationally. Extend coverage to the long tail and add ARI integration on the CA side so renewal urgency can be broadcast through the protocol.

Book a 30-minute architecture review

The new regulatory frame

The SARB Authorisation Framework and Exemption Notice published alongside the PEM Programme open the National Payment System to direct non-bank participation under an activity-based regulatory model — the activity is regulated, not the entity, so non-banks meeting the bar participate at parity with banks. The trade-off: tighter governance, AML, cybersecurity, and client-fund segregation rules apply to non-banks at the same level. Non-bank PSPs that historically rode on a sponsor bank's PKI now need their own.

The PEM Programme defines three foundational components: digital identity (with e-signatures and KYC), digital payments (PayShap and successors), and a data-sharing infrastructure layer. PKI is load-bearing across all three.

For the architect, this means the scope of PKI is broader than "TLS for our APIs." It includes:

  • ISO 20022 message signing for PayShap and NPU clearing.
  • Mutual TLS to NPU clearing infrastructure.
  • Code signing for any client SDK or mobile artefact distributed to customers or merchants.
  • KYC / e-signature signing keys, where the PSP issues signed evidence of customer onboarding.
  • Internal machine identity for service-to-service authentication, audit log integrity, and database encryption.
  • Where the PSP also issues verifiable credentials (PEMKey-aligned products, employer-issued credentials), a separate issuer signing infrastructure.

The five problem statements below address each of these, but Axelspire's engagement experience is that almost every non-bank PSP in SA has gaps in at least three.

Problem 1 — CA topology

The defensible options are three.

Option A: Internal CA only

A self-operated CA hierarchy (commonly EJBCA, AD CS, or smallstep) gives full policy control and the lowest per-certificate cost at scale. The cost is operational: HSM custody, root key ceremony, audit trail, vulnerability management, and personnel. For a non-bank PSP entering the NPU with a small operations team, this is usually a poor first choice. It becomes correct as the organisation matures or where regulator policy explicitly requires it.

Option B: Managed PKI / public CA only

A managed PKI service (DigiCert ONE, GlobalSign, Sectigo Managed PKI) or a public CA (Let's Encrypt, Google Trust Services, Sectigo public) covers most certificate types with strong automation and minimal internal operations. The constraints: public CAs cannot issue arbitrary internal certificates, ACME endpoints from the managed PKI may not match every internal protocol, and lifecycle is governed by the provider's policy (which is generally aligned with the WebPKI consensus, including SC-081v3).

Option C: Hybrid (recommended for most non-bank PSPs)

The architecture Axelspire ends up recommending in 80% of SA engagements:

                    ┌─────────────────────────────┐
                    │  Public CA (managed)        │
                    │  - External TLS             │
                    │  - Customer-facing APIs     │
                    │  - WebPKI-trusted certs     │
                    └─────────────────────────────┘
                                  │
                                  │ ACME
                                  ▼
                    ┌─────────────────────────────┐
                    │  Internal CA (small, well-  │
                    │  governed)                  │
                    │  - Service mesh             │
                    │  - Internal mutual TLS      │
                    │  - Code signing for         │
                    │    internal artefacts       │
                    └─────────────────────────────┘
                                  │
                                  │ HSM-backed signing
                                  ▼
                    ┌─────────────────────────────┐
                    │  Dedicated signing keys     │
                    │  (HSM)                      │
                    │  - ISO 20022 / PayShap      │
                    │  - NPU clearing             │
                    │  - Code signing (released)  │
                    │  - VC issuer keys           │
                    └─────────────────────────────┘

The dedicated signing keys are governed differently from the rest. They have a key ceremony, restricted personnel access, hardware-bound custody, and strict rotation policy. This is the layer where regulator and counterparty trust is built — it deserves disproportionate operational investment relative to its small certificate count.

The structural weakness this design tends to develop: each layer ends up with its own audit trail, its own renewal automation, and its own monitoring. Within 12–18 months you have three control planes. This is where 3AM applies — a unified policy and logging surface across multiple CA backends so that the topology stays clean even as the operational footprint grows.

Problem 2 — HSMs and key ceremony

The signing keys for ISO 20022, NPU participation, and any externally-trusted issuer role need hardware-bound custody. The reference points:

  • FIPS 140-3 Level 3 is the conservative baseline. Most SA banks already operate to this level for SAMOS and SWIFT connectivity.
  • Common Criteria EAL4+ is the European reference equivalent.
  • Cloud HSMs (AWS CloudHSM, Azure Dedicated HSM, GCP Cloud HSM) typically meet FIPS 140-2 Level 3 and are increasingly accepted. Confirm with SARB and your sponsor bank or counterparty whether residency and key-custody arrangements meet their authorisation expectations.

A documented key ceremony is the artefact that demonstrates governance to regulators and counterparties. The minimum content:

  1. Pre-ceremony preparation: HSM provisioning, attestation of firmware and configuration.
  2. Roles: at least three witnesses (security officer, custodian, auditor), preferably with separation between SARB liaison, internal security, and external audit.
  3. Generation: keys generated inside the HSM, never extracted. Public key components extracted and signed.
  4. Activation: M-of-N quorum for activation (typically 3-of-5 or 2-of-3), with smartcards or quorum tokens distributed to separated custodians.
  5. Recording: video, signed attestation, key fingerprints recorded in tamper-evident storage.
  6. Rotation plan: defined rotation cadence (typically annual for signing CAs, longer for offline roots) and emergency rotation procedure.

Axelspire has run key ceremonies for SA financial-services clients across the last decade. The single most common mistake: treating the ceremony as a one-time event and never rehearsing the rotation. When the rotation event finally arrives — usually in the middle of a regulatory deadline — the original participants have moved on and the documentation is incomplete.

Problem 3 — Lifecycle automation

ACME (RFC 8555) is the certificate enrolment protocol the entire WebPKI is moving to. ARI (RFC 9773) is the renewal-information protocol that lets CAs hint to consumers when to renew, including in mass-revocation scenarios.

Why this is now non-optional

The CA/Browser Forum SC-081v3 ballot (note: a CA/Browser Forum decision, not a SARB initiative — a misattribution worth correcting wherever it appears in vendor documents) was ratified in 2025 and took effect on 15 March 2026. It set the maximum public TLS validity to 200 days. The phasing continues:

Date Maximum public TLS validity
Until 14 March 2026 398 days
From 15 March 2026 200 days
Phase 2 (announced) 100 days
2029 (announced) 47 days

A 47-day validity period implies effective renewal at roughly 30 days. For an organisation with 1,000 public certificates, that is 33 renewals per day on average. Manual operations cannot serve this. ACME automation is the only path.

For internal certificates not subject to public CA policy, the same operational logic applies for different reasons: shorter-lived certificates reduce the blast radius of a key compromise, and uniform automation across internal and external certificates is operationally cheaper than maintaining two regimes.

Minimum ACME architecture

  Workload                    ACME client                 CA
  (Nginx, Envoy,              (cert-manager,              (public or internal)
   service mesh,              Caddy, certbot,
   etc.)                      step-ca, etc.)
     │                           │                          │
     │ requests cert             │                          │
     ├──────────────────────────▶│                          │
     │                           │ /new-order               │
     │                           ├─────────────────────────▶│
     │                           │ ◀────── challenge ───────│
     │                           │ /challenge response      │
     │                           ├─────────────────────────▶│
     │                           │ ◀────── certificate ─────│
     │ ◀────── deployed ─────────│                          │

For Kubernetes environments, cert-manager is the de facto orchestrator. For traditional environments, certbot and step-ca cover most cases. For service meshes, Linkerd, Istio, and Consul Connect have native ACME or SPIRE-based identity workflows.

Adding ARI

ARI is ACME Renewal Information as specified in RFC 9773 (IETF, June 2025). It is a renewal-coordination protocol — not, as some recent vendor and analyst documents incorrectly state, an "Automated Renewal/Rotation Infrastructure" or a crypto-agility mechanism. The expansion is a hallucinated one that has propagated through several South African 2026 advisory documents; if you see it in a procurement document or RFP, correct it before the document goes external.

ARI gives the ACME client a server-side hint about when to renew. The CA returns a renewalInfo URL with a suggested window. In a mass-revocation scenario the CA can place that window in the past, signalling clients to renew immediately — without requiring out-of-band coordination. A correctly-configured ACME client checks ARI on every renewal cycle.

The operational value: when a CA detects compromise or mass-misissuance, ARI lets it broadcast "renew now" through the same protocol clients already use. For PSPs running thousands of certificates, this is the difference between a coordinated rotation and a Sunday-night incident. Certbot 4.1+, Lego, simple-acme, and step-ca all support ARI in current versions; acme.sh does not at time of writing.

The CA/Browser Forum SC-081v3 phasing toward 47-day public TLS validity by 2029 makes ACME automation non-optional. Manual operations cannot serve a 33-renewals-per-day cadence at 1,000-certificate scale.
The CA/Browser Forum SC-081v3 phasing toward 47-day public TLS validity by 2029 makes ACME automation non-optional. Manual operations cannot serve a 33-renewals-per-day cadence at 1,000-certificate scale.

What this looks like for ISO 20022 / PayShap signing keys specifically

Message-signing keys are typically longer-lived than TLS certificates, but rotation must still be coordinated with all counterparties. The architecture pattern that works:

  • Signing key issued with a defined validity (e.g., 12–24 months) and a rotation calendar published to counterparties in advance.
  • Overlap window during which both the outgoing and incoming key are accepted, to avoid a hard cutover.
  • Counterparty notification through whatever channel the NPU operator publishes (typically a directory service or out-of-band feed).

This is not ACME territory — ACME is for the volume side of certificate operations. Message-signing key rotation is a coordinated industry event and should be modelled as such.

Revocation and status infrastructure

Certificate validation in real-time payments is on the critical path. OCSP, CRL, or equivalent revocation infrastructure must meet the same availability and latency requirements as the payment rail it supports. Practical baseline:

  • OCSP responder availability at four-nines minimum, with multi-region failover. A single-region OCSP outage during a PayShap business window has the same operational consequence as the payment rail itself going down.
  • OCSP stapling at participant edge to reduce verifier dependence on the responder being live for every transaction.
  • CRL publication cadence aligned to the validity floor — for short-lived certificates, CRLs need to be published frequently enough that revoked certificates do not remain trusted past their useful life.
  • Status-list infrastructure for verifiable credentials (PEMKey-aligned issuance) is a separate concern. The W3C status-list specification and OpenID4VC status patterns differ from X.509 OCSP/CRL — credential issuers need to plan their revocation infrastructure as a distinct capability, not as a bolt-on to the existing PKI.

Where Axelspire sees this go wrong: revocation infrastructure designed for nine-to-five operations and inherited into a 24/7 payment context without re-architecture. The first weekend incident exposes the gap.

Problem 4 — Crypto-agility

The PQC migration horizon is real but slower than the marketing implies.

What is standardised

NIST has published the following PQC standards as FIPS:

  • FIPS 203 — ML-KEM (key encapsulation, replaces RSA-OAEP and ECDH for key transport).
  • FIPS 204 — ML-DSA (digital signature, replaces RSA and ECDSA).
  • FIPS 205 — SLH-DSA (stateless hash-based signature, alternative to ML-DSA).

These are the algorithms the global ecosystem will eventually require.

What the mandates actually say

The relevant US federal instruments are:

  • NSM-10 (National Security Memorandum on Promoting United States Leadership in Quantum Computing) — direction-setting, 2022.
  • OMB M-23-02 — federal agency PQC migration planning, 2022.
  • EO 14144 — strengthening national cybersecurity, including PQC migration, January 2025.
  • Quantum Cybersecurity Preparedness Act (signed 2022) — federal IT inventory and migration planning.

The horizon for full migration of high-value systems is 2035. Note: NIST SP 800-208 is not the migration mandate — it is a stateful hash-based signature standard. Confusing the two is a recurring error in PQC discussions.

What this means for SA non-bank PSPs

SARB has not published explicit PQC guidance as of May 2026. The international financial messaging ecosystem (SWIFT, ISO 20022, EMV) is tracking NIST closely and will move when global readiness allows.

The architectural priority is not racing to deploy ML-DSA tomorrow. It is ensuring your CA, HSM, ACME pipeline, and message-signing infrastructure can switch algorithms without re-architecture. Practical implications:

  • HSM firmware that supports ML-KEM, ML-DSA, SLH-DSA. Most major vendors have shipped support across 2024–2025.
  • CA software that can issue certificates with PQC algorithms. EJBCA, smallstep, and the major managed PKIs all support the FIPS 203/204/205 family in current versions.
  • Hybrid certificates (classical + PQC) for the migration period — supported by the CA and accepted by relying parties.
  • Application code that does not hardcode algorithm identifiers (RSA-2048, ECDSA-P256) and instead reads from configuration.

The crypto-agility problem at the application layer is where most non-bank PSPs underestimate the work. SDKs, mobile apps, payment integrations, and partner middleware tend to embed algorithm assumptions deep in code paths. CertBridge, Axelspire's protocol abstraction platform, addresses this directly: a consistent interface across classical, hybrid, and PQC algorithms so that protocol upgrades become a configuration change rather than a code release. Where the structural weakness is application-layer crypto rigidity rather than infrastructure rigidity, this is the leverage point.

Problem 5 — POPIA, audit, and governance

POPIA (Act 4 of 2013) constrains how personal information is processed across all PSP operations. For PKI specifically:

  • Subject minimisation. Certificate subjects should not contain personal information beyond what is functionally required. Avoid embedding ID numbers, full names, or contact details where a stable opaque identifier suffices.
  • Audit logging. Issuance, renewal, revocation, and key-access events must be logged. Logs themselves are processing of personal information and must be governed under POPIA — retention, access controls, and lawful basis for processing all apply.
  • Key custody documentation. Where keys protect personal information (e.g., database encryption keys), key custody arrangements form part of the security safeguards POPIA requires the responsible party to maintain.
  • Data minimisation in credentials. Where the PSP issues verifiable credentials (e.g., PEMKey-aligned products), credential schemas should support selective disclosure so that holders disclose the minimum required for each verification.
  • Cross-border considerations. Where keys, audit logs, or credential infrastructure are operated outside South Africa, POPIA's cross-border transfer rules apply.

The Vision 2030+ readiness checklist maps each PKI area to the corresponding POPIA principle and SARB Authorisation Framework expectation.

Problem 6 — Validation, vendor governance, and ongoing assurance

The five problems above describe the architecture. This sixth problem is what separates an architecture that survives Authorisation Framework review from one that does not.

Sandbox and pre-production validation

PKI changes — algorithm rotation, certificate-shape changes, ACME endpoint migrations, HSM firmware updates, validity-floor adjustments — all need to be validated in a representative environment before they touch production payment flows. Practical baseline:

  • A sandbox environment that mirrors production CA topology, certificate types, and ACME automation, populated with realistic ISO 20022 message volumes and PEMKey-aligned credential issuance flows.
  • Certificate-rotation rehearsals at least quarterly, with named participants and timed runbooks.
  • A pre-production cutover window of at least one full validity cycle for any new certificate type before it carries production traffic.
  • Rollback procedures that have been executed in anger, not just documented.

The pattern Axelspire sees most often: sandbox environments that exist on paper but do not match production in terms of CA chain, ACME client version, or HSM configuration. The first rotation event reveals the divergence.

Vendor and managed PKI due diligence

Where the organisation relies on a managed PKI service, public CA, HSM-as-a-service, or third-party verifiable-credential issuer, the controls those vendors operate are part of the organisation's PKI posture. SARB's Authorisation Framework outsourcing expectations apply.

The minimum due-diligence inventory:

  • CA operations: SOC 2 Type II, WebTrust for CAs, or equivalent independent attestation. Audit reports reviewed annually. Right-of-audit clause in the contract for material services.
  • Key custody: HSM standards (FIPS 140-3 Level 3 baseline), key-ceremony documentation, custodian separation, and the vendor's own rotation cadence.
  • Lifecycle and protocol: ACME (RFC 8555) with ARI (RFC 9773) support; algorithm-agility roadmap; PQC migration plan with named timeline rather than aspirational language.
  • Incident handling: breach notification SLA, mass-revocation cooperation procedure, and the vendor's track record on past incidents (Google Trust Services and others publish operational status pages worth reviewing before contract).
  • Residency and POPIA alignment: where audit logs or key material are operated outside South Africa, the cross-border transfer basis is documented and contractual data-handling commitments are specific.

The structural risk that comes up in SA engagements: organisations procure a managed PKI service for the certificates they consume, but the vendor's controls over the issuer keys fail audit because the contract was scoped to certificate issuance rather than to root-of-trust operations. The two are different services with different governance expectations.

Metrics, monitoring, and incident response

PKI is an operational system. It needs operational telemetry tied to the SLAs it underpins.

Minimum KPI set:

  • Issuance success rate and latency, broken down by CA backend.
  • Renewal success rate, with leading indicators for pending expiries (aged buckets at 60, 30, 14, 7 days).
  • Revocation propagation time (revocation issued → CRL/OCSP visible to relying parties).
  • Key-ceremony and custodian-access events (rare events, but the absence of them in the log is itself a finding).
  • Algorithm distribution across the certificate estate (input to crypto-agility planning).

Alert routing to named owners, not shared inboxes. Escalation paths defined in advance, not improvised during an outage. Post-incident review for every certificate-related operational event, including near-misses.

Skills, training, and succession

Key ceremonies require trained custodians. ACME operations require engineers who understand the protocol, not just the tooling. Verifiable-credential issuance requires a different skill set again. Most SA non-bank PSPs entering NPU participation have one or two people who hold the institutional PKI knowledge — a single point of failure that compounds with the operational load Vision 2030+ implies.

Documented competency framework, named succession for each ceremony role, and at least one independent trained custodian outside the immediate operations team are the structural mitigations.

Reference architecture summary

A defensible non-bank PSP PKI architecture for 2026:

                                ┌────────────────────────────┐
                                │ Policy & Audit Control     │
                                │ Plane                      │
                                │ (unified across CAs;       │
                                │ POPIA-compliant logging)   │
                                └─────────────┬──────────────┘
                                              │
              ┌───────────────────────┬───────┴────────┬────────────────────┐
              ▼                       ▼                ▼                    ▼
      ┌──────────────┐       ┌───────────────┐ ┌──────────────┐  ┌───────────────────┐
      │ Public CA    │       │ Internal CA   │ │ Signing-key  │  │ VC Issuer keys   │
      │ (ACME)       │       │ (small,       │ │ HSM          │  │ (HSM, separate    │
      │ External TLS │       │  governed)    │ │ (ISO 20022,  │  │ governance)       │
      │              │       │ Service mesh  │ │  PayShap,    │  │                   │
      │              │       │ Internal mTLS │ │  NPU)        │  │ For PEMKey-       │
      │              │       │ Code signing  │ │              │  │ aligned products  │
      └──────────────┘       └───────────────┘ └──────────────┘  └───────────────────┘
              │                       │                │                    │
              └───────────────────────┴────────────────┴────────────────────┘
                                              │
                                  ACME / ARI for everything
                                  that supports it

The unified policy and audit control plane is the structural element that prevents fragmentation. Without it, each CA backend grows its own operations and the organisation ends up with three or four parallel PKI silos. With it, the topology can evolve as Vision 2030+ implementation details become clearer without rebuilding the operational layer.

This is the role 3AM serves in Axelspire engagements: a serverless multi-PKI control plane built on AWS KMS that integrates with public CAs, AD CS, EJBCA, and verifiable-credential issuers behind a single policy and compliance-logging surface. For non-bank PSPs that need to demonstrate governance to SARB without operating a heavyweight on-premises PKI estate, it is the most direct path to the architecture above.

What to do next

If you are at the start of NPU authorisation work, the sequence Axelspire usually recommends:

  1. Inventory. Map every certificate, key, and CA the organisation depends on. Most SA non-bank PSPs underestimate this by 5–10x.
  2. Risk-rank. Identify the small set of signing keys that warrant central-bank-grade governance and separate them from the long tail.
  3. Automate the long tail. ACME everywhere it works. ARI as a baseline expectation.
  4. Govern the high-value keys. HSMs, key ceremony, documented rotation.
  5. Plan crypto-agility. Algorithm-agnostic pipelines. Hybrid-capable CA. Plan, do not deploy.
  6. Stand up sandbox and validation. Production-mirror sandbox; quarterly rotation rehearsals; rollback procedures executed in anger.
  7. Run vendor due diligence. SOC 2 / WebTrust evidence; right-of-audit; ACME+ARI confirmed; PQC roadmap with named timeline.
  8. Map to POPIA and the Authorisation Framework. Use the readiness checklist.

For a working session on any of the above, Contact Axelspire.


Related Resources


Authored by Dan C. (Axelspire). Vendor-neutral PKI advisory; engagement experience across SA financial services and post-doctoral cryptography research at the University of Cambridge.