Skip to content

cert-manager for Kubernetes: ACME Integration and Production Ops

cert-manager is the standard Kubernetes controller for X.509 lifecycle: Certificate, Issuer, ClusterIssuer, ACME challenges, and Secret updates. This guide focuses on ACME configuration, solver choice, renewal + ARI, and production failure modes—aligned with ACME and short public TLS lifetimes.

cert-manager for Kubernetes: ACME Integration and Production Ops

Section titled “cert-manager for Kubernetes: ACME Integration and Production Ops”

TL;DR: Use ClusterIssuer for shared ACME config; prefer DNS-01 for wildcards and private clusters; enable ARI (ExperimentalOptions) for CA-driven renewal windows; watch Prometheus cert-manager metrics and ingress-nginx + pathType interactions on newer cert-manager.

cert-manager installs a controller, webhook, and cainjector into the cert-manager namespace. The controller talks to RFC 8555 ACME endpoints (public or private). For private ACME (Vault PKI 1.14+, step-ca), the YAML shape matches—swap server and add caBundle for private roots. Deep Vault API issuance (non-ACME) uses the Vault issuer—see HashiCorp Vault PKI.

Version note: Release trains move quickly (e.g. 1.17.x–1.19.x in 2026); pin versions in GitOps and read upstream release notes—project stewardship has shifted toward CyberArk (Venafi) for the core OSS project.

  • HTTP-01 breaks behind NAT, split DNS, or non-80 ingress—teams burn days on “challenge pending.”
  • DNS-01 depends on API tokens—expired tokens become top outage class as lifetimes shrink (automation readiness).
  • Rate limits (Let’s Encrypt) punish test in prod without staging first.
  • subPath Secret mounts never see rotated certs—silent expiry risk.
  • Shared ACME account keys across clusters cause order races.
  • 47-day public max validity tightens renewal cadence—every link in solver → Secret → workload reload must be reliable.

Failure scenario: cert-manager renews at 2/3 lifetime; CA moves ARI window for incident response; without ARI, batch renewals hit rate limits—critical Ingress shows expired cert while Certificate flips False.

cert-manager installs three primary components into your cluster, all running in the cert-manager namespace:

cert-manager controller: The core component. Watches for Certificate resources, creates CertificateRequest objects, completes ACME challenges, and updates Kubernetes Secrets with issued certificates. This is the component that communicates with ACME servers (Let’s Encrypt, step-ca, Vault PKI, or any RFC 8555-compliant CA).

cert-manager webhook: A validating and mutating admission webhook that ensures cert-manager custom resources are correctly structured before they’re accepted by the Kubernetes API server. The webhook requires a TLS certificate of its own, which cert-manager self-provisions.

cert-manager cainjector: Injects CA bundles into webhook configurations, CRDs, and other resources that need to trust cert-manager’s internal certificates. Required for the webhook to function.

The ACME issuer is cert-manager’s most-used issuer type. It implements the full ACME protocol (RFC 8555) including account registration, order creation, challenge completion, and certificate retrieval.

ClusterIssuer for Let’s Encrypt (production)

Section titled “ClusterIssuer for Let’s Encrypt (production)”
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: pki-team@yourcompany.com
privateKeySecretRef:
name: letsencrypt-prod-account-key
solvers:
- http01:
ingress:
ingressClassName: nginx

Critical fields: The email field is used for urgent renewal and security notices from the CA — use a team distribution list, not an individual’s address. The privateKeySecretRef stores the ACME account private key — if this Secret is deleted, a new ACME account is created and existing authorizations are lost. The server field points to the ACME directory URL.

Always test with Let’s Encrypt’s staging environment before configuring production. Staging has relaxed rate limits and issues certificates that browsers won’t trust — but it validates that your entire pipeline (DNS, ingress, challenge completion) works.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: pki-team@yourcompany.com
privateKeySecretRef:
name: letsencrypt-staging-account-key
solvers:
- http01:
ingress:
ingressClassName: nginx

Using a private ACME server (Vault PKI or step-ca)

Section titled “Using a private ACME server (Vault PKI or step-ca)”

For internal certificates, point the ACME directory at your private CA. The configuration is identical except for the server URL and an optional caBundle for trusting the private CA’s root certificate.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: internal-acme
spec:
acme:
server: https://vault.internal:8200/v1/pki/acme/directory
# For step-ca: https://ca.internal/acme/acme/directory
email: pki-team@yourcompany.com
privateKeySecretRef:
name: internal-acme-account-key
caBundle: <base64-encoded-root-CA-PEM>
solvers:
- http01:
ingress:
ingressClassName: nginx

See private CA comparison for Vault vs step-ca positioning.

The HTTP-01 solver is the simplest to configure. When a challenge is issued, cert-manager creates a temporary Pod (the acmesolver), a Service, and an Ingress (or Gateway API resource) in the Certificate’s namespace. The ACME server connects to your ingress on port 80 and retrieves a challenge token from /.well-known/acme-challenge/{token}.

Requirements: The ACME server must be able to reach your ingress from the public internet on port 80. Your ingress controller must route the challenge path correctly. Your network policy must allow traffic from the internet to the acmesolver Pod via the ingress.

Known issue (cert-manager 1.18.0+): The default Ingress pathType changed from ImplementationSpecific to Exact. This is incompatible with ingress-nginx versions that enable strict-validate-path-type by default (v1.12.0+). The fix: either disable the ACMEHTTP01IngressPathTypeExact feature gate in cert-manager, or disable strict-validate-path-type in ingress-nginx.

Limitation: HTTP-01 cannot validate wildcard domains. If you need *.example.com, you must use DNS-01.

The DNS-01 solver proves domain control by creating a TXT record at _acme-challenge.yourdomain.com. cert-manager supports DNS providers including Cloudflare, Route53, Google Cloud DNS, Azure DNS, and many others via webhook solvers.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod-dns
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: pki-team@yourcompany.com
privateKeySecretRef:
name: letsencrypt-prod-account-key
solvers:
- dns01:
cloudflare:
apiTokenSecretRef:
name: cloudflare-api-token
key: api-token
selector:
dnsZones:
- "example.com"

Requirements: DNS provider API credentials must be stored as a Kubernetes Secret. The cert-manager controller Pod must have network egress to your DNS provider’s API and to external DNS resolvers for self-check queries.

Axelspire recommendation: Use DNS-01 as your primary solver for production. It supports wildcards, works regardless of whether your cluster is publicly accessible, and doesn’t depend on ingress controller configuration. The only external dependency is DNS provider API availability — and that dependency exists anyway for your domain infrastructure.

More patterns: DNS-01 challenge validation.

You can configure multiple solvers with selectors to use DNS-01 for wildcard domains and HTTP-01 for everything else:

solvers:
- dns01:
cloudflare:
apiTokenSecretRef:
name: cloudflare-api-token
key: api-token
selector:
dnsZones:
- "example.com"
- http01:
ingress:
ingressClassName: nginx

cert-manager evaluates selectors in order and uses the first matching solver. More specific selectors (matching dnsNames or dnsZones) should come before general-purpose solvers.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: api-tls
namespace: production
spec:
secretName: api-tls-secret
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- api.example.com
- api-v2.example.com
renewBefore: 360h # 15 days before expiry

cert-manager creates the api-tls-secret Secret containing tls.crt, tls.key, and ca.crt. The Secret is updated in-place on renewal — any Pod mounting the Secret will see the new certificate (subject to kubelet’s Secret refresh interval, typically 60 seconds).

For simple cases, annotating an Ingress resource triggers cert-manager to create a Certificate resource automatically:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- api.example.com
secretName: api-tls-secret
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80

cert-manager renews certificates when approximately two-thirds of the certificate’s lifetime has elapsed, or when the renewBefore threshold is reached — whichever comes first. For a 90-day Let’s Encrypt certificate with the default settings, renewal happens at approximately 60 days (30 days before expiry).

With ARI enabled (cert-manager 1.15+), the CA’s suggestedWindow overrides the static threshold. The controller polls the renewalInfo endpoint and schedules renewal within the CA’s suggested window. This is critical for mass revocation scenarios — the CA can signal “renew now” and ARI-enabled cert-manager responds within its polling interval.

Enabling ARI:

# In cert-manager Helm values
config:
featureGates:
ExperimentalOptions: true

See certificate automation readiness for client-level ARI context.

cert-manager exposes Prometheus metrics on port 9402. The critical metrics for certificate operations:

certmanager_certificate_expiration_timestamp_seconds — Unix timestamp of certificate expiration. Alert when this minus current time drops below your threshold (e.g., 7 days for 90-day certificates).

certmanager_certificate_ready_status — Whether the certificate is in Ready state. Alert on any certificate with ready=False for more than 30 minutes.

certmanager_controller_sync_call_count — ACME challenge completion events. Spikes in errors indicate CA communication failures or challenge solver problems.

DNS propagation delay: DNS-01 challenges fail because the TXT record hasn’t propagated before cert-manager’s self-check queries the record. cert-manager performs a self-check before telling the ACME server to validate — if the self-check fails, the challenge is retried. Increase the --dns01-recursive-nameservers-only flag to use specific DNS resolvers rather than the cluster’s default.

Ingress controller misconfiguration: HTTP-01 challenges fail because the ingress controller doesn’t route /.well-known/acme-challenge/ to the acmesolver Pod. Verify that your ingress controller’s class matches the ingressClassName in your solver configuration. The cert-manager 1.18.0 pathType: Exact change broke ingress-nginx configurations with strict path validation.

Rate limits: Let’s Encrypt enforces rate limits — 50 certificates per registered domain per week, 5 duplicate certificates per week, 300 new orders per account per 3 hours. In clusters with many subdomains, hitting rate limits is easy. Use the staging environment for testing, and consider ARI (which exempts renewals from rate limits) for production.

Stale ACME accounts: If the ACME account Secret is deleted or corrupted, cert-manager registers a new account. Existing authorizations under the old account are lost, and in-flight orders fail. Back up ACME account Secrets.

Secret not updating in Pods: After renewal, the Secret is updated, but Pods using subPath volume mounts won’t see the update (a known Kubernetes limitation). Pods using standard Secret volume mounts will see the new certificate within the kubelet sync period (default: 60 seconds). For immediate propagation, consider a sidecar that watches the Secret and triggers a reload.

For clusters managing thousands of certificates, cert-manager controller resource requirements increase linearly. Each Certificate resource adds reconciliation overhead. In Axelspire’s experience, the most common scaling bottleneck is not cert-manager itself but the ACME server — either rate limits (Let’s Encrypt) or throughput limits (self-hosted ACME servers like Vault PKI under high concurrent issuance).

Multi-cluster patterns: Each cluster runs its own cert-manager instance with its own ACME account. Do not share ACME account keys across clusters — this creates race conditions on order management. Use separate ClusterIssuers per cluster, each with their own privateKeySecretRef.

cert-manager supports multiple issuer types. For completeness within the Axelspire vault context:

Vault issuer: Connects directly to HashiCorp Vault’s PKI secrets engine API, bypassing ACME. Useful when Vault is your internal CA and you want direct API-level integration rather than ACME.

CA issuer: Uses a local CA key pair stored in a Kubernetes Secret. Suitable for development and testing environments where you want a self-signed CA without an external CA server.

Venafi issuer: Integrates with Venafi TLS Protect (now CyberArk Certificate Manager) for enterprise certificate lifecycle management.

step-ca issuer (Smallstep): A dedicated cert-manager issuer for step-ca, supporting JWK and OIDC provisioners in addition to ACME.

Under SC-081v3’s phased reduction (200 days now, 100 days in 2027, 47 days in 2029), cert-manager’s renewal automation becomes progressively more critical. At 47-day lifetimes, the default renewal window (2/3 of lifetime = ~31 days) means cert-manager is issuing new certificates roughly every 16 days. Every component in the chain — ACME solver, DNS provider API, ingress configuration, Secret propagation — must work reliably at this cadence. cert-manager’s ARI support ensures that when a CA needs emergency renewal, the Kubernetes certificate infrastructure responds without human intervention.

  • Staging issuer tested before prod; email is a distribution list.
  • DNS-01 credentials rotated on calendar; least privilege on DNS API.
  • ARI feature gate on for CAs that expose renewalInfo.
  • Alerts on certmanager_certificate_ready_status and expiry metrics.
  • No subPath for TLS Secrets—or reload sidecar documented.
  • Separate ACME account keys per cluster; GitOps pins cert-manager version.
  • ingress-nginx pathType / feature gate compatibility validated after upgrades.