cert-manager for Kubernetes: ACME Integration and Production Ops
cert-manager is the standard Kubernetes controller for X.509 lifecycle: Certificate, Issuer, ClusterIssuer, ACME challenges, and Secret updates. This guide focuses on ACME configuration, solver choice, renewal + ARI, and production failure modes—aligned with ACME and short public TLS lifetimes.
cert-manager for Kubernetes: ACME Integration and Production Ops
Section titled “cert-manager for Kubernetes: ACME Integration and Production Ops”TL;DR: Use ClusterIssuer for shared ACME config; prefer DNS-01 for wildcards and private clusters; enable ARI (
ExperimentalOptions) for CA-driven renewal windows; watch Prometheus cert-manager metrics and ingress-nginx + pathType interactions on newer cert-manager.
Overview
Section titled “Overview”cert-manager installs a controller, webhook, and cainjector into the cert-manager namespace. The controller talks to RFC 8555 ACME endpoints (public or private). For private ACME (Vault PKI 1.14+, step-ca), the YAML shape matches—swap server and add caBundle for private roots. Deep Vault API issuance (non-ACME) uses the Vault issuer—see HashiCorp Vault PKI.
Version note: Release trains move quickly (e.g. 1.17.x–1.19.x in 2026); pin versions in GitOps and read upstream release notes—project stewardship has shifted toward CyberArk (Venafi) for the core OSS project.
Problem Statement
Section titled “Problem Statement”- HTTP-01 breaks behind NAT, split DNS, or non-80 ingress—teams burn days on “challenge pending.”
- DNS-01 depends on API tokens—expired tokens become top outage class as lifetimes shrink (automation readiness).
- Rate limits (Let’s Encrypt) punish test in prod without staging first.
- subPath Secret mounts never see rotated certs—silent expiry risk.
- Shared ACME account keys across clusters cause order races.
- 47-day public max validity tightens renewal cadence—every link in solver → Secret → workload reload must be reliable.
Failure scenario: cert-manager renews at 2/3 lifetime; CA moves ARI window for incident response; without ARI, batch renewals hit rate limits—critical Ingress shows expired cert while Certificate flips False.
Architecture and components
Section titled “Architecture and components”cert-manager installs three primary components into your cluster, all running in the cert-manager namespace:
cert-manager controller: The core component. Watches for Certificate resources, creates CertificateRequest objects, completes ACME challenges, and updates Kubernetes Secrets with issued certificates. This is the component that communicates with ACME servers (Let’s Encrypt, step-ca, Vault PKI, or any RFC 8555-compliant CA).
cert-manager webhook: A validating and mutating admission webhook that ensures cert-manager custom resources are correctly structured before they’re accepted by the Kubernetes API server. The webhook requires a TLS certificate of its own, which cert-manager self-provisions.
cert-manager cainjector: Injects CA bundles into webhook configurations, CRDs, and other resources that need to trust cert-manager’s internal certificates. Required for the webhook to function.
ACME issuer configuration
Section titled “ACME issuer configuration”The ACME issuer is cert-manager’s most-used issuer type. It implements the full ACME protocol (RFC 8555) including account registration, order creation, challenge completion, and certificate retrieval.
ClusterIssuer for Let’s Encrypt (production)
Section titled “ClusterIssuer for Let’s Encrypt (production)”apiVersion: cert-manager.io/v1kind: ClusterIssuermetadata: name: letsencrypt-prodspec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: pki-team@yourcompany.com privateKeySecretRef: name: letsencrypt-prod-account-key solvers: - http01: ingress: ingressClassName: nginxCritical fields: The email field is used for urgent renewal and security notices from the CA — use a team distribution list, not an individual’s address. The privateKeySecretRef stores the ACME account private key — if this Secret is deleted, a new ACME account is created and existing authorizations are lost. The server field points to the ACME directory URL.
Staging first, always
Section titled “Staging first, always”Always test with Let’s Encrypt’s staging environment before configuring production. Staging has relaxed rate limits and issues certificates that browsers won’t trust — but it validates that your entire pipeline (DNS, ingress, challenge completion) works.
apiVersion: cert-manager.io/v1kind: ClusterIssuermetadata: name: letsencrypt-stagingspec: acme: server: https://acme-staging-v02.api.letsencrypt.org/directory email: pki-team@yourcompany.com privateKeySecretRef: name: letsencrypt-staging-account-key solvers: - http01: ingress: ingressClassName: nginxUsing a private ACME server (Vault PKI or step-ca)
Section titled “Using a private ACME server (Vault PKI or step-ca)”For internal certificates, point the ACME directory at your private CA. The configuration is identical except for the server URL and an optional caBundle for trusting the private CA’s root certificate.
apiVersion: cert-manager.io/v1kind: ClusterIssuermetadata: name: internal-acmespec: acme: server: https://vault.internal:8200/v1/pki/acme/directory # For step-ca: https://ca.internal/acme/acme/directory email: pki-team@yourcompany.com privateKeySecretRef: name: internal-acme-account-key caBundle: <base64-encoded-root-CA-PEM> solvers: - http01: ingress: ingressClassName: nginxSee private CA comparison for Vault vs step-ca positioning.
Challenge solvers: HTTP-01 vs DNS-01
Section titled “Challenge solvers: HTTP-01 vs DNS-01”HTTP-01 solver
Section titled “HTTP-01 solver”The HTTP-01 solver is the simplest to configure. When a challenge is issued, cert-manager creates a temporary Pod (the acmesolver), a Service, and an Ingress (or Gateway API resource) in the Certificate’s namespace. The ACME server connects to your ingress on port 80 and retrieves a challenge token from /.well-known/acme-challenge/{token}.
Requirements: The ACME server must be able to reach your ingress from the public internet on port 80. Your ingress controller must route the challenge path correctly. Your network policy must allow traffic from the internet to the acmesolver Pod via the ingress.
Known issue (cert-manager 1.18.0+): The default Ingress pathType changed from ImplementationSpecific to Exact. This is incompatible with ingress-nginx versions that enable strict-validate-path-type by default (v1.12.0+). The fix: either disable the ACMEHTTP01IngressPathTypeExact feature gate in cert-manager, or disable strict-validate-path-type in ingress-nginx.
Limitation: HTTP-01 cannot validate wildcard domains. If you need *.example.com, you must use DNS-01.
DNS-01 solver
Section titled “DNS-01 solver”The DNS-01 solver proves domain control by creating a TXT record at _acme-challenge.yourdomain.com. cert-manager supports DNS providers including Cloudflare, Route53, Google Cloud DNS, Azure DNS, and many others via webhook solvers.
apiVersion: cert-manager.io/v1kind: ClusterIssuermetadata: name: letsencrypt-prod-dnsspec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: pki-team@yourcompany.com privateKeySecretRef: name: letsencrypt-prod-account-key solvers: - dns01: cloudflare: apiTokenSecretRef: name: cloudflare-api-token key: api-token selector: dnsZones: - "example.com"Requirements: DNS provider API credentials must be stored as a Kubernetes Secret. The cert-manager controller Pod must have network egress to your DNS provider’s API and to external DNS resolvers for self-check queries.
Axelspire recommendation: Use DNS-01 as your primary solver for production. It supports wildcards, works regardless of whether your cluster is publicly accessible, and doesn’t depend on ingress controller configuration. The only external dependency is DNS provider API availability — and that dependency exists anyway for your domain infrastructure.
More patterns: DNS-01 challenge validation.
Combining solvers
Section titled “Combining solvers”You can configure multiple solvers with selectors to use DNS-01 for wildcard domains and HTTP-01 for everything else:
solvers:- dns01: cloudflare: apiTokenSecretRef: name: cloudflare-api-token key: api-token selector: dnsZones: - "example.com"- http01: ingress: ingressClassName: nginxcert-manager evaluates selectors in order and uses the first matching solver. More specific selectors (matching dnsNames or dnsZones) should come before general-purpose solvers.
Certificate resources and renewal
Section titled “Certificate resources and renewal”Requesting a certificate
Section titled “Requesting a certificate”apiVersion: cert-manager.io/v1kind: Certificatemetadata: name: api-tls namespace: productionspec: secretName: api-tls-secret issuerRef: name: letsencrypt-prod kind: ClusterIssuer dnsNames: - api.example.com - api-v2.example.com renewBefore: 360h # 15 days before expirycert-manager creates the api-tls-secret Secret containing tls.crt, tls.key, and ca.crt. The Secret is updated in-place on renewal — any Pod mounting the Secret will see the new certificate (subject to kubelet’s Secret refresh interval, typically 60 seconds).
Ingress annotation shortcut
Section titled “Ingress annotation shortcut”For simple cases, annotating an Ingress resource triggers cert-manager to create a Certificate resource automatically:
apiVersion: networking.k8s.io/v1kind: Ingressmetadata: name: api-ingress annotations: cert-manager.io/cluster-issuer: letsencrypt-prodspec: tls: - hosts: - api.example.com secretName: api-tls-secret rules: - host: api.example.com http: paths: - path: / pathType: Prefix backend: service: name: api-service port: number: 80Renewal lifecycle
Section titled “Renewal lifecycle”cert-manager renews certificates when approximately two-thirds of the certificate’s lifetime has elapsed, or when the renewBefore threshold is reached — whichever comes first. For a 90-day Let’s Encrypt certificate with the default settings, renewal happens at approximately 60 days (30 days before expiry).
With ARI enabled (cert-manager 1.15+), the CA’s suggestedWindow overrides the static threshold. The controller polls the renewalInfo endpoint and schedules renewal within the CA’s suggested window. This is critical for mass revocation scenarios — the CA can signal “renew now” and ARI-enabled cert-manager responds within its polling interval.
Enabling ARI:
# In cert-manager Helm valuesconfig: featureGates: ExperimentalOptions: trueSee certificate automation readiness for client-level ARI context.
Production operations
Section titled “Production operations”Monitoring
Section titled “Monitoring”cert-manager exposes Prometheus metrics on port 9402. The critical metrics for certificate operations:
certmanager_certificate_expiration_timestamp_seconds — Unix timestamp of certificate expiration. Alert when this minus current time drops below your threshold (e.g., 7 days for 90-day certificates).
certmanager_certificate_ready_status — Whether the certificate is in Ready state. Alert on any certificate with ready=False for more than 30 minutes.
certmanager_controller_sync_call_count — ACME challenge completion events. Spikes in errors indicate CA communication failures or challenge solver problems.
Common failure modes
Section titled “Common failure modes”DNS propagation delay: DNS-01 challenges fail because the TXT record hasn’t propagated before cert-manager’s self-check queries the record. cert-manager performs a self-check before telling the ACME server to validate — if the self-check fails, the challenge is retried. Increase the --dns01-recursive-nameservers-only flag to use specific DNS resolvers rather than the cluster’s default.
Ingress controller misconfiguration: HTTP-01 challenges fail because the ingress controller doesn’t route /.well-known/acme-challenge/ to the acmesolver Pod. Verify that your ingress controller’s class matches the ingressClassName in your solver configuration. The cert-manager 1.18.0 pathType: Exact change broke ingress-nginx configurations with strict path validation.
Rate limits: Let’s Encrypt enforces rate limits — 50 certificates per registered domain per week, 5 duplicate certificates per week, 300 new orders per account per 3 hours. In clusters with many subdomains, hitting rate limits is easy. Use the staging environment for testing, and consider ARI (which exempts renewals from rate limits) for production.
Stale ACME accounts: If the ACME account Secret is deleted or corrupted, cert-manager registers a new account. Existing authorizations under the old account are lost, and in-flight orders fail. Back up ACME account Secrets.
Secret not updating in Pods: After renewal, the Secret is updated, but Pods using subPath volume mounts won’t see the update (a known Kubernetes limitation). Pods using standard Secret volume mounts will see the new certificate within the kubelet sync period (default: 60 seconds). For immediate propagation, consider a sidecar that watches the Secret and triggers a reload.
Scaling cert-manager
Section titled “Scaling cert-manager”For clusters managing thousands of certificates, cert-manager controller resource requirements increase linearly. Each Certificate resource adds reconciliation overhead. In Axelspire’s experience, the most common scaling bottleneck is not cert-manager itself but the ACME server — either rate limits (Let’s Encrypt) or throughput limits (self-hosted ACME servers like Vault PKI under high concurrent issuance).
Multi-cluster patterns: Each cluster runs its own cert-manager instance with its own ACME account. Do not share ACME account keys across clusters — this creates race conditions on order management. Use separate ClusterIssuers per cluster, each with their own privateKeySecretRef.
Issuer types beyond ACME
Section titled “Issuer types beyond ACME”cert-manager supports multiple issuer types. For completeness within the Axelspire vault context:
Vault issuer: Connects directly to HashiCorp Vault’s PKI secrets engine API, bypassing ACME. Useful when Vault is your internal CA and you want direct API-level integration rather than ACME.
CA issuer: Uses a local CA key pair stored in a Kubernetes Secret. Suitable for development and testing environments where you want a self-signed CA without an external CA server.
Venafi issuer: Integrates with Venafi TLS Protect (now CyberArk Certificate Manager) for enterprise certificate lifecycle management.
step-ca issuer (Smallstep): A dedicated cert-manager issuer for step-ca, supporting JWK and OIDC provisioners in addition to ACME.
The 47-day certificate intersection
Section titled “The 47-day certificate intersection”Under SC-081v3’s phased reduction (200 days now, 100 days in 2027, 47 days in 2029), cert-manager’s renewal automation becomes progressively more critical. At 47-day lifetimes, the default renewal window (2/3 of lifetime = ~31 days) means cert-manager is issuing new certificates roughly every 16 days. Every component in the chain — ACME solver, DNS provider API, ingress configuration, Secret propagation — must work reliably at this cadence. cert-manager’s ARI support ensures that when a CA needs emergency renewal, the Kubernetes certificate infrastructure responds without human intervention.
Operational checklist
Section titled “Operational checklist”- Staging issuer tested before prod; email is a distribution list.
- DNS-01 credentials rotated on calendar; least privilege on DNS API.
- ARI feature gate on for CAs that expose renewalInfo.
- Alerts on
certmanager_certificate_ready_statusand expiry metrics. - No subPath for TLS Secrets—or reload sidecar documented.
- Separate ACME account keys per cluster; GitOps pins cert-manager version.
- ingress-nginx
pathType/ feature gate compatibility validated after upgrades.
Related documentation
Section titled “Related documentation”- ACME protocol — RFC 8555 semantics
- ACME protocol implementation — Enterprise rollout patterns
- Certificate automation readiness — ARI, clients, DCV
- 47-day TLS certificates — SC-081v3 timeline
- Renewal automation — Renewal SLOs
- DNS-01 challenge validation — DNS proof patterns
- ACME clients index — Broader client ecosystem
- HashiCorp Vault PKI — Vault issuer + scaling
- Private CA comparison — step-ca, Vault, EJBCA
- Multi-cloud PKI — cert-manager across clouds
- mTLS on Kubernetes — Mesh vs cert-manager TLS
- Service mesh certificates — Istio, Linkerd, cert-manager