Certbot Auto-Renewal: Setup, Cron Jobs & Failure Troubleshooting
TL;DR: Certbot renewal automation eliminates manual certificate management overhead for Let's Encrypt and other ACME CAs. Proper implementation requires deployment hooks, multi-server coordination, monitoring integration, and fallback procedures. This guide covers enterprise-grade renewal patterns from single-server deployments to multi-region architectures.
Deploy Hooks: Reload Services After Renewal
The most common question: "How do I reload nginx/apache after renewal?"
Deploy hooks run commands automatically after successful renewal, enabling zero-downtime certificate updates.
Basic Service Reload
Nginx:
Apache:
Multiple Services:
Advanced Deploy Hook Examples
Copy certificates to load balancer:
/usr/local/bin/sync-certs.sh:
#!/bin/bash
# Sync renewed certificates to load balancers
rsync -avz /etc/letsencrypt/live/ lb1.example.com:/etc/nginx/certs/
rsync -avz /etc/letsencrypt/live/ lb2.example.com:/etc/nginx/certs/
ssh lb1.example.com "systemctl reload nginx"
ssh lb2.example.com "systemctl reload nginx"
Update CDN with new certificate:
certbot renew --deploy-hook "aws acm import-certificate \
--certificate fileb:///etc/letsencrypt/live/example.com/cert.pem \
--private-key fileb:///etc/letsencrypt/live/example.com/privkey.pem \
--certificate-chain fileb:///etc/letsencrypt/live/example.com/chain.pem"
Notify monitoring system:
certbot renew --deploy-hook "curl -X POST https://monitoring.example.com/webhook \
-d '{\"event\":\"cert_renewed\",\"domain\":\"example.com\"}'"
Deploy Hook Best Practices
- Make scripts executable:
chmod +x /usr/local/bin/sync-certs.sh - Test hooks manually: Run script before adding to Certbot
- Log hook execution: Use
>> /var/log/certbot-hooks.log 2>&1 - Handle failures gracefully: Use
|| trueto prevent renewal failures - Use environment variables: Certbot provides
$RENEWED_DOMAINS,$RENEWED_LINEAGE
Example with environment variables:
#!/bin/bash
# /usr/local/bin/deploy-hook.sh
echo "[$(date)] Renewed domains: $RENEWED_DOMAINS" >> /var/log/certbot-deploy.log
echo "[$(date)] Certificate path: $RENEWED_LINEAGE" >> /var/log/certbot-deploy.log
for domain in $RENEWED_DOMAINS; do
systemctl reload nginx
done
Test Renewal Before It Matters (--dry-run)
ALWAYS test renewal configuration before certificates expire. Failed renewals at 2 AM are no fun.
Basic Dry Run Test
This simulates renewal without: - Replacing production certificates - Hitting Let's Encrypt rate limits - Running deploy hooks - Validating configuration - Testing challenge validation - Checking permissions
Expected output (success):
Saving debug log to /var/log/letsencrypt/letsencrypt.log
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processing /etc/letsencrypt/renewal/example.com.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Cert not due for renewal, but simulating renewal for dry run
** DRY RUN: simulating 'certbot renew' close to cert expiry
** (The test certificates below have not been saved.)
Congratulations, all simulated renewals succeeded:
/etc/letsencrypt/live/example.com/fullchain.pem (success)
Test Deploy Hooks
Deploy hooks DO NOT run during --dry-run (by design). To test hooks:
# Run hook script manually
/usr/local/bin/sync-certs.sh
# Or use --force-renewal (WARNING: uses rate limit quota)
certbot renew --force-renewal --cert-name example.com
When to Run --dry-run
- Before first renewal setup: Verify configuration works
- After infrastructure changes: New firewall rules, DNS changes
- Monthly: Catch configuration drift
- After Certbot updates: Ensure compatibility
- Before production deployment: Test in staging first
Common Renewal Failures & Fixes
| Error | Cause | Fix |
|---|---|---|
| "The client lacks sufficient authorization" | Validation failed (HTTP-01/DNS-01) | Check web server config, DNS records, firewall |
| "too many certificates already issued" | Rate limit hit (50 certs/week) | Wait 7 days OR use --staging for testing |
| "deploy-hook command failed" | Hook script error/permissions | Check script: bash -x /path/to/hook.sh |
| "Cert not yet due for renewal" | Renewal attempted too early | Certbot renews at 30 days remaining (normal) |
| "timeout during connect" | Firewall blocking port 80/443 | Open firewall, check iptables/cloud security groups |
| "wrong status code '404'" | Webroot path incorrect | Verify --webroot-path matches DocumentRoot |
| "Connection refused" | Web server not running | Start web server: systemctl start nginx |
| "An unexpected error occurred" | Permissions, disk space, or bugs | Check /var/log/letsencrypt/letsencrypt.log |
Debugging Failed Renewals
1. Check renewal configuration:
2. View detailed logs:
3. Test renewal manually:
4. Force renewal (uses rate limit):
5. Check certificate expiration:
Overview: From Manual Renewals to Production Automation
Certificate renewal automation represents the operational reality of running ACME at scale. Let's Encrypt certificates expire every 90 days—intentionally short to encourage automation and limit compromise windows. Manual renewal of even 10 certificates becomes unsustainable; at enterprise scale (100+ certificates), automation isn't optional, it's existential.
The renewal challenge: Certificate renewal seems simple in tutorials but production deployments face coordination challenges across load-balanced servers, zero-downtime deployment requirements, validation method conflicts with existing infrastructure, and integration with deployment pipelines and monitoring systems.
Why This Belongs in ACME Client Operations
The ACME Protocol defines the renewal process; this guide addresses renewal operations. Understanding protocol flows doesn't prepare you for:
- Multi-server deployments: How to renew certificates on a primary server and distribute to 20 web servers without service interruption
- Deployment hooks: Automatically reloading services, updating load balancer configurations, syncing to CDNs
- Validation conflicts: Managing port 80/443 conflicts between running web servers and ACME validation
- Failure recovery: Handling renewal failures, rollback procedures, backup certificate sources
- Monitoring integration: Detecting failed renewals before certificates expire
Real-world scenario: Your organization runs a load-balanced web application with certificates on 15 servers. Certbot's default behavior would issue 15 separate certificates (hitting rate limits) and require coordinated deployment across all servers. This guide shows you how to issue once, distribute efficiently, and validate the deployment.
Related Documentation
This page is part of the Operating ACME Clients section:
- Operating ACME Clients Overview - Section introduction and navigation
- X.509 Certificate Verification - Validating ACME-issued certificates
- Certbot Renewal Automation (this page) - Production renewal patterns
- ACME Client Configuration (coming) - Certbot, acme.sh, cert-manager configuration
- Multi-Environment ACME (coming) - Development, staging, production patterns
For broader automation context: - Renewal Automation - Platform-agnostic renewal strategies - Certificate Lifecycle Management - Complete lifecycle operations - Monitoring and Alerting - Certificate monitoring frameworks
For ACME protocol understanding: - ACME Protocol - Protocol specification and RFC 8555 - ACME Protocol Implementation - Building ACME servers
Problem Statement
Manual certificate renewal creates operational overhead and introduces security risks through potential service interruptions. Enterprise Certbot deployments face specific challenges:
- Service Continuity: Avoiding downtime during renewal processes while maintaining SLA requirements
- Multi-Server Coordination: Preventing duplicate certificate issuance across load-balanced environments (rate limit consumption)
- Validation Method Conflicts: Managing port 80/443 conflicts between web servers and ACME standalone challenges
- Automated Deployment: Ensuring renewed certificates are properly deployed to all services (web servers, mail servers, load balancers, CDNs)
- Zero-Trust Validation: Verifying renewed certificates before deploying to production
- Failure Recovery: Handling renewal failures gracefully without manual intervention during business hours
Common failure scenario: Certbot renewal succeeds but nginx reload fails due to syntax error in configuration. Old certificate expires while new certificate sits unused in /etc/letsencrypt/live/. Service outage occurs because deployment hook didn't validate before reload.
Architecture
Single Server Architecture
┌─────────────┐ ┌────────────-─┐ ┌─────────────┐
│ Certbot │───▶│ Let's Encrypt│───▶│ Web Server │
│ Client │ │ CA │ │ (Nginx/ │
│ (scheduled) │ │ (ACME) │ │ Apache) │
└─────────────┘ └────────────-─┘ └─────────────┘
│ │
└──────── Deploy Hook ─────────────────┘
(validate + reload)
When to use: Single server deployments, development environments, proof-of-concept
Limitations: No redundancy, single point of failure, difficult to scale
Multi-Server Architecture (Enterprise)
┌─────────────┐ ┌───────────-──┐ ┌─────────────┐
│ Primary │───▶│ Let's Encrypt│ │ Load │
│ Cert Server │ │ CA │ │ Balancer │
│ (Certbot) │ │ (ACME) │ │ (HAProxy) │
└─────────────┘ └────────────-─┘ └──────┬──────┘
│ │
│ ┌─────────────┐ │
└──────────▶│ Shared │◀──────────┤
deploy │ Storage │ consume │
hook │ (NFS/S3) │ │
└─────────────┘ │
│ │
┌─────────────┼─────────────┐ │
│ │ │ │
┌───────▼───┐ ┌───────▼───┐ ┌───────▼──-─▼┐
│Web Server │ │Web Server │ │Web Server │
│ #1 │ │ #2 │ │ #3 │
│ (sync) │ │ (sync) │ │ (sync) │
└───────────┘ └───────────┘ └──────────--─┘
When to use: Production load-balanced deployments, high-availability requirements
Benefits: Single renewal point, coordinated deployment, rate limit efficiency
High-Availability Architecture
┌─────────────┐ ┌─────────-────┐
│ Primary │────────▶│ Let's Encrypt│
│ Cert Server │ │ CA │
└──────┬──────┘ └──────────-───┘
│ heartbeat
│
┌──────▼──────┐
│ Secondary │ (standby)
│ Cert Server │
└─────────────┘
│
▼
(takeover on failure)
When to use: Critical infrastructure requiring 99.99% uptime
Considerations: Requires distributed locking to prevent duplicate issuance
Implementation
Basic Renewal Setup
Install Certbot (Current Stable: 2.7.4 as of January 2025)
# Ubuntu/Debian
sudo apt update && sudo apt install certbot python3-certbot-nginx
# CentOS/RHEL 8+
sudo dnf install certbot python3-certbot-nginx
# From snap (recommended by EFF)
sudo snap install --classic certbot
sudo ln -s /snap/bin/certbot /usr/bin/certbot
Verify Installation
Standard Renewal Command
# Renew all certificates (dry run first)
sudo certbot renew --dry-run
# Actual renewal
sudo certbot renew
# Renew specific certificate
sudo certbot renew --cert-name example.com
# Force renewal (testing/emergency - counts against rate limits)
sudo certbot renew --force-renewal --cert-name example.com
Advanced Renewal Configuration
High-Security Certificate Renewal
sudo certbot certonly \
--force-renew \
--must-staple \ # Enable OCSP Must-Staple
--rsa-key-size 4096 \ # 4096-bit RSA (vs 2048 default)
--cert-name production.example.com \
--nginx \
--email [email protected] \
--agree-tos \
--no-eff-email # Opt out of EFF communications
Non-Interactive Renewal with Hooks
sudo certbot renew \
--agree-tos \
--non-interactive \ # No user interaction
--deploy-hook "/etc/letsencrypt/deploy-hook.sh" \
--pre-hook "systemctl stop nginx" \ # Stop before renewal
--post-hook "systemctl start nginx" # Start after renewal (success or failure)
Renewal with Custom Configuration
# Create renewal configuration
sudo tee /etc/letsencrypt/renewal/example.com.conf << 'EOF'
[renewalparams]
authenticator = nginx
installer = nginx
account = a1b2c3d4e5f6
server = https://acme-v02.api.letsencrypt.org/directory
renew_hook = /etc/letsencrypt/renewal-hooks/deploy/reload-services.sh
EOF
# Renew using configuration
sudo certbot renew --cert-name example.com
Deployment Hook Implementation
Create Production-Grade Deploy Hook (/etc/letsencrypt/deployment/deploy-hook.sh)
#!/bin/bash
set -euo pipefail # Exit on error, undefined vars, pipe failures
# Certbot environment variables available in hooks:
# $RENEWED_DOMAINS - space-separated list of renewed domains
# $RENEWED_LINEAGE - path to renewal directory
DOMAIN="$RENEWED_DOMAINS"
CERT_PATH="$RENEWED_LINEAGE"
LOG_FILE="/var/log/certbot/deploy-$(date +%Y%m%d-%H%M%S).log"
exec > >(tee -a "$LOG_FILE") 2>&1
echo "=== Certificate Deployment Started: $(date) ==="
echo "Renewed domains: $DOMAIN"
echo "Certificate path: $CERT_PATH"
# Validate certificate before deployment
openssl x509 -in "$CERT_PATH/cert.pem" -noout -checkend 86400 || {
echo "ERROR: Certificate expires within 24 hours"
exit 1
}
# Verify certificate matches private key
cert_modulus=$(openssl x509 -noout -modulus -in "$CERT_PATH/cert.pem" | openssl md5)
key_modulus=$(openssl rsa -noout -modulus -in "$CERT_PATH/privkey.pem" | openssl md5)
if [ "$cert_modulus" != "$key_modulus" ]; then
echo "ERROR: Certificate and private key do not match"
exit 1
fi
# Copy certificates to application directories
cp "$CERT_PATH/fullchain.pem" /etc/ssl/certs/
cp "$CERT_PATH/privkey.pem" /etc/ssl/private/
# Set proper permissions
chmod 644 /etc/ssl/certs/fullchain.pem
chmod 600 /etc/ssl/private/privkey.pem
chown root:ssl-cert /etc/ssl/private/privkey.pem
# Test nginx configuration before reload
nginx -t || {
echo "ERROR: Nginx configuration test failed"
exit 1
}
# Reload services (order matters - nginx first, then mail servers)
systemctl reload nginx
systemctl reload postfix
systemctl reload dovecot
# Verify nginx is still running after reload
sleep 2
systemctl is-active --quiet nginx || {
echo "ERROR: Nginx failed after reload"
systemctl status nginx
exit 1
}
# Optional: Sync to remote servers
# rsync -av --delete /etc/ssl/ backup-server:/etc/ssl/
# Optional: Update load balancer
# curl -X POST "https://lb.example.com/api/reload-ssl" \
# -H "Authorization: Bearer $LB_API_TOKEN"
# Log successful deployment
logger -t certbot-deploy "Certificate renewed and deployed for $DOMAIN"
# Send success notification
curl -X POST "https://monitoring.example.com/webhook" \
-H "Content-Type: application/json" \
-d "{
\"event\": \"cert_renewed\",
\"domain\": \"$DOMAIN\",
\"timestamp\": \"$(date -Iseconds)\",
\"cert_path\": \"$CERT_PATH\"
}" || echo "Warning: Failed to send notification"
echo "=== Certificate Deployment Completed: $(date) ==="
exit 0
Make Script Executable
sudo chmod +x /etc/letsencrypt/deployment/deploy-hook.sh
# Create log directory
sudo mkdir -p /var/log/certbot
Automated Renewal with Systemd Timer
Modern approach using systemd timer (recommended over cron)
Create systemd service (/etc/systemd/system/certbot-renewal.service)
[Unit]
Description=Certbot Renewal
After=network-online.target
Wants=network-online.target
[Service]
Type=oneshot
ExecStart=/usr/bin/certbot renew --quiet --deploy-hook /etc/letsencrypt/deployment/deploy-hook.sh
PrivateTmp=true
Create systemd timer (/etc/systemd/system/certbot-renewal.timer)
[Unit]
Description=Certbot Renewal Timer
Requires=certbot-renewal.service
[Timer]
# Run twice daily at 2:30 AM and 2:30 PM
OnCalendar=*-*-* 02,14:30:00
# Add randomization to avoid thundering herd
RandomizedDelaySec=3600
Persistent=true
[Install]
WantedBy=timers.target
Enable and start timer
sudo systemctl daemon-reload
sudo systemctl enable certbot-renewal.timer
sudo systemctl start certbot-renewal.timer
# Verify timer is active
sudo systemctl list-timers certbot-renewal.timer
Alternative: Traditional Cron Setup
Setup Cron Job
Add Renewal Entry
# Run twice daily at 2:30 AM and 2:30 PM
30 2,14 * * * /usr/bin/certbot renew --quiet --deploy-hook /etc/letsencrypt/deployment/deploy-hook.sh >> /var/log/certbot/renewal-cron.log 2>&1
# Alternative: Run weekly on Monday at 3 AM with logging
0 3 * * 1 /usr/bin/certbot renew --quiet --deploy-hook /etc/letsencrypt/deployment/deploy-hook.sh >> /var/log/certbot/renewal-$(date +\%Y\%m\%d).log 2>&1
Enterprise Multi-Server Implementation
Primary Certificate Server Setup
# Install Certbot on primary server
sudo apt install certbot python3-certbot-nginx
# Obtain certificate (one time)
sudo certbot certonly \
--nginx \
--cert-name shared-certificate \
--domains example.com,www.example.com,api.example.com \
--email [email protected] \
--agree-tos
# Create distribution script
sudo tee /usr/local/bin/cert-distribute.sh << 'EOF'
#!/bin/bash
set -euo pipefail
CERT_DIR="/etc/letsencrypt/live"
SERVERS=("web1.internal" "web2.internal" "web3.internal")
CERT_NAME="shared-certificate"
LOGFILE="/var/log/cert-distribute.log"
echo "=== Certificate Distribution Started: $(date) ===" | tee -a "$LOGFILE"
for server in "${SERVERS[@]}"; do
echo "Syncing to $server..." | tee -a "$LOGFILE"
# Sync certificate files
rsync -av --delete \
"$CERT_DIR/$CERT_NAME/" \
"deploy@$server:/tmp/letsencrypt-sync/" \
--rsync-path="sudo rsync" || {
echo "ERROR: Failed to sync to $server" | tee -a "$LOGFILE"
continue
}
# Copy to destination and reload (remote execution)
ssh deploy@$server "sudo cp -r /tmp/letsencrypt-sync/* /etc/ssl/letsencrypt/ && \
sudo nginx -t && \
sudo systemctl reload nginx" || {
echo "ERROR: Failed to deploy on $server" | tee -a "$LOGFILE"
continue
}
echo "Successfully deployed to $server" | tee -a "$LOGFILE"
done
echo "=== Certificate Distribution Completed: $(date) ===" | tee -a "$LOGFILE"
EOF
sudo chmod +x /usr/local/bin/cert-distribute.sh
# Test distribution
sudo /usr/local/bin/cert-distribute.sh
Secondary Server Configuration
# Create certificate directory structure
sudo mkdir -p /etc/ssl/letsencrypt/shared-certificate
sudo chown -R deploy:deploy /etc/ssl/letsencrypt
# Configure nginx to use shared certificates
sudo tee /etc/nginx/snippets/ssl-shared.conf << 'EOF'
# Shared Let's Encrypt certificate configuration
ssl_certificate /etc/ssl/letsencrypt/shared-certificate/fullchain.pem;
ssl_certificate_key /etc/ssl/letsencrypt/shared-certificate/privkey.pem;
# Modern SSL configuration
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384';
ssl_prefer_server_ciphers off;
# Session configuration
ssl_session_timeout 1d;
ssl_session_cache shared:MozTLS:10m;
ssl_session_tickets off;
# OCSP Stapling
ssl_stapling on;
ssl_stapling_verify on;
ssl_trusted_certificate /etc/ssl/letsencrypt/shared-certificate/chain.pem;
resolver 8.8.8.8 8.8.4.4 valid=300s;
resolver_timeout 5s;
EOF
# Use in nginx site configuration
sudo tee -a /etc/nginx/sites-available/example.com << 'EOF'
server {
listen 443 ssl http2;
server_name example.com;
include snippets/ssl-shared.conf;
# ... rest of configuration
}
EOF
# Test and reload
sudo nginx -t && sudo systemctl reload nginx
Common Pitfalls
1. Port Conflicts with Standalone Mode
Problem: Certbot standalone mode requires ports 80/443, conflicts with running web servers
# WRONG - causes port binding errors
sudo certbot renew --standalone
# Error: Problem binding to port 80: Could not bind to IPv4 or IPv6
Solution: Use native web server plugins or webroot mode
# CORRECT - use nginx plugin
sudo certbot renew --nginx
# Or use webroot mode (no service disruption)
sudo certbot renew --webroot -w /var/www/html
# For Apache
sudo certbot renew --apache
2. Multi-Server Certificate Duplication
Problem: Running Certbot renewal on every server causes duplicate issuance, rate limit exhaustion
# WRONG - each server issues separate certificate
# On web1.example.com:
sudo certbot renew
# On web2.example.com:
sudo certbot renew
# On web3.example.com:
sudo certbot renew
# Result: 3 separate certificates, 3x rate limit consumption
Solution: Centralized renewal with distribution
# CORRECT - primary server only
# On primary-cert.example.com:
sudo certbot renew --deploy-hook /usr/local/bin/cert-distribute.sh
# All other servers: passive consumers via sync/pull
3. Missing Service Reloads
Problem: Renewed certificates exist but services still use old certificates
# WRONG - certificates renewed but not loaded
sudo certbot renew
# New cert: /etc/letsencrypt/live/example.com/fullchain.pem (updated)
# Nginx still serving: old certificate from memory
Solution: Always include deployment hooks
# CORRECT - reload services after successful renewal
sudo certbot renew --deploy-hook "systemctl reload nginx apache2 postfix"
# Better: use comprehensive deploy script
sudo certbot renew --deploy-hook /etc/letsencrypt/deployment/deploy-hook.sh
4. Insufficient Permissions in Deploy Hooks
Problem: Deploy hooks fail due to permission restrictions
# WRONG - hook runs as certbot user without privileges
certbot renew --deploy-hook "systemctl reload nginx"
# Error: Failed to reload nginx: Access denied
Solution: Run Certbot with sudo, ensure hook has proper permissions
# CORRECT - run with appropriate privileges
sudo certbot renew --deploy-hook "systemctl reload nginx"
# In hook script, use sudo for privileged operations
#!/bin/bash
sudo systemctl reload nginx
sudo chmod 600 /etc/ssl/private/*.pem
5. Missing Renewal Configuration After Manual Certificate Request
Problem: Manually obtained certificate doesn't renew automatically
# Initial certificate request (one-time)
sudo certbot certonly --manual -d example.com
# Later: renewal fails
sudo certbot renew
# Skipping example.com: manual authenticator not supported for renewal
Solution: Use automated authenticators (nginx, apache, webroot, dns)
# CORRECT - use automated authenticator
sudo certbot certonly --nginx -d example.com
# For wildcard certificates, use DNS plugin
sudo certbot certonly --dns-route53 -d *.example.com -d example.com
6. Rate Limit Exhaustion from Force Renewals
Problem: Testing with --force-renewal in production hits rate limits
# WRONG - repeated force renewals
sudo certbot renew --force-renewal # Testing
sudo certbot renew --force-renewal # Oops, failed
sudo certbot renew --force-renewal # Try again
sudo certbot renew --force-renewal # Still failing
# Result: Rate limit exceeded (5 duplicate certs per week)
Solution: Use staging environment and dry-run
# CORRECT - test with staging first
sudo certbot renew --dry-run --server https://acme-staging-v02.api.letsencrypt.org/directory
# Production: only force renew when necessary
sudo certbot renew --force-renewal --cert-name example.com
Best Practices
1. Security Hardening
Use Strong Cryptographic Parameters
# 4096-bit RSA for high-security environments
sudo certbot certonly --rsa-key-size 4096 -d example.com
# Enable OCSP Must-Staple
sudo certbot certonly --must-staple -d example.com
# Use ECDSA certificates (smaller, faster)
sudo certbot certonly --key-type ecdsa --elliptic-curve secp384r1 -d example.com
Restrict Private Key Permissions
# In deploy hook
chmod 600 /etc/letsencrypt/live/*/privkey.pem
chown root:ssl-cert /etc/letsencrypt/live/*/privkey.pem
# Verify permissions
find /etc/letsencrypt -name 'privkey.pem' -exec ls -la {} \;
Implement Certificate Pinning for Critical Applications
# Application-level certificate pinning
import ssl
import hashlib
def verify_cert_pinning(cert_der, expected_pins):
"""Verify certificate matches expected pin."""
sha256_pin = hashlib.sha256(cert_der).hexdigest()
return sha256_pin in expected_pins
# Expected certificate pins (backup + current)
EXPECTED_PINS = [
'a1b2c3d4...', # Current certificate
'e5f6g7h8...' # Backup certificate
]
2. Operational Excellence
Monitor Certificate Expiration with External Tools
# Prometheus blackbox_exporter probe
- job_name: 'certificate-expiry'
metrics_path: /probe
params:
module: [tls_connect]
static_configs:
- targets:
- example.com:443
- api.example.com:443
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
Test Renewal Process in Staging
# Staging environment renewal test
sudo certbot renew \
--dry-run \
--server https://acme-staging-v02.api.letsencrypt.org/directory \
--deploy-hook /etc/letsencrypt/deployment/deploy-hook.sh
# Verify staging certificates
openssl s_client -connect staging.example.com:443 -servername staging.example.com | \
openssl x509 -noout -text
Implement Rollback Procedures
#!/bin/bash
# Rollback script: /usr/local/bin/cert-rollback.sh
DOMAIN="$1"
BACKUP_DIR="/etc/letsencrypt/backup"
# Copy previous certificate back
cp "$BACKUP_DIR/$DOMAIN/fullchain.pem" "/etc/ssl/certs/fullchain.pem"
cp "$BACKUP_DIR/$DOMAIN/privkey.pem" "/etc/ssl/private/privkey.pem"
# Reload services
systemctl reload nginx
logger -t cert-rollback "Rolled back certificate for $DOMAIN"
Log All Renewal Activities
# Configure Certbot logging
sudo tee -a /etc/letsencrypt/cli.ini << 'EOF'
# Logging configuration
max-log-backups = 30
logs-dir = /var/log/letsencrypt
# Email notifications
email = [email protected]
EOF
# Monitor logs
sudo tail -f /var/log/letsencrypt/letsencrypt.log
3. High Availability
Use Shared Storage for Certificate Distribution
# Mount NFS share for certificates
sudo mkdir -p /mnt/certificates
sudo mount -t nfs nfs.example.com:/exports/certificates /mnt/certificates
# Symlink Certbot directory
sudo ln -s /mnt/certificates/letsencrypt /etc/letsencrypt
# Update fstab for persistence
echo "nfs.example.com:/exports/certificates /mnt/certificates nfs defaults 0 0" | \
sudo tee -a /etc/fstab
Implement Health Checks Post-Renewal
#!/bin/bash
# Health check in deploy hook
DOMAIN="example.com"
# Check certificate validity
EXPIRY=$(openssl s_client -connect $DOMAIN:443 -servername $DOMAIN 2>/dev/null | \
openssl x509 -noout -enddate | cut -d= -f2)
EXPIRY_EPOCH=$(date -d "$EXPIRY" +%s)
NOW_EPOCH=$(date +%s)
DAYS_LEFT=$(( ($EXPIRY_EPOCH - $NOW_EPOCH) / 86400 ))
if [ $DAYS_LEFT -lt 30 ]; then
echo "WARNING: Certificate expires in $DAYS_LEFT days"
exit 1
fi
# Check HTTPS connectivity
curl -sS --fail https://$DOMAIN || {
echo "ERROR: HTTPS health check failed"
exit 1
}
echo "Health check passed: $DAYS_LEFT days until expiry"
Configure Backup Certificate Sources
# Fallback to cached certificate if renewal fails
if ! certbot renew; then
echo "Renewal failed, using cached certificate"
cp /var/cache/letsencrypt/example.com/* /etc/ssl/
systemctl reload nginx
fi
Automate Certificate Validation After Deployment
#!/bin/bash
# Validate deployed certificate
DOMAIN="example.com"
# Check certificate subject matches
SUBJECT=$(openssl s_client -connect $DOMAIN:443 -servername $DOMAIN 2>/dev/null | \
openssl x509 -noout -subject | sed 's/subject=//')
if ! echo "$SUBJECT" | grep -q "$DOMAIN"; then
echo "ERROR: Certificate subject mismatch"
exit 1
fi
# Verify certificate chain
openssl s_client -connect $DOMAIN:443 -servername $DOMAIN -CApath /etc/ssl/certs 2>/dev/null | \
grep -q "Verify return code: 0" || {
echo "ERROR: Certificate chain validation failed"
exit 1
}
echo "Certificate validation successful"
4. Monitoring Integration
Comprehensive Monitoring Deploy Hook
#!/bin/bash
# /etc/letsencrypt/deployment/monitoring-hook.sh
DOMAIN="$RENEWED_DOMAINS"
CERT_PATH="$RENEWED_LINEAGE"
# Extract certificate details
EXPIRY=$(openssl x509 -enddate -noout -in "$CERT_PATH/cert.pem" | cut -d= -f2)
ISSUER=$(openssl x509 -issuer -noout -in "$CERT_PATH/cert.pem" | cut -d= -f2)
SERIAL=$(openssl x509 -serial -noout -in "$CERT_PATH/cert.pem" | cut -d= -f2)
# Calculate days until expiry
EXPIRY_EPOCH=$(date -d "$EXPIRY" +%s)
NOW_EPOCH=$(date +%s)
DAYS_LEFT=$(( ($EXPIRY_EPOCH - $NOW_EPOCH) / 86400 ))
# Send metrics to monitoring system
curl -X POST "https://monitoring.example.com/api/metrics" \
-H "Content-Type: application/json" \
-d "{
\"metric\": \"certificate_renewed\",
\"domain\": \"$DOMAIN\",
\"expiry\": \"$EXPIRY\",
\"days_left\": $DAYS_LEFT,
\"issuer\": \"$ISSUER\",
\"serial\": \"$SERIAL\",
\"timestamp\": $(date +%s)
}"
# Send Slack notification
curl -X POST "https://hooks.slack.com/services/YOUR/WEBHOOK/URL" \
-H "Content-Type: application/json" \
-d "{
\"text\": \"🔒 Certificate renewed for \`$DOMAIN\`\",
\"attachments\": [{
\"color\": \"good\",
\"fields\": [
{\"title\": \"Domain\", \"value\": \"$DOMAIN\", \"short\": true},
{\"title\": \"Days Left\", \"value\": \"$DAYS_LEFT\", \"short\": true},
{\"title\": \"Issuer\", \"value\": \"$ISSUER\", \"short\": false}
]
}]
}"
# Update status page
curl -X PATCH "https://status.example.com/api/components/ssl" \
-H "Authorization: Bearer $STATUS_API_KEY" \
-H "Content-Type: application/json" \
-d "{\"status\": \"operational\", \"message\": \"Certificate renewed: $DOMAIN\"}"
5. Documentation and Runbooks
Create Renewal Runbook
# Certificate Renewal Runbook
## Normal Operations
- Automatic renewal via systemd timer (twice daily)
- Deploy hook distributes to all servers
- Monitoring alerts on failures
## Manual Renewal (Emergency)
1. SSH to primary-cert.example.com
2. Run: `sudo certbot renew --force-renewal --cert-name example.com`
3. Verify: `sudo /usr/local/bin/cert-distribute.sh`
4. Validate: `curl -v https://example.com`
## Rollback Procedure
1. SSH to affected server
2. Run: `sudo /usr/local/bin/cert-rollback.sh example.com`
3. Verify: `openssl s_client -connect example.com:443`
## Troubleshooting
- Renewal logs: `/var/log/letsencrypt/`
- Deploy logs: `/var/log/certbot/`
- Service status: `systemctl status certbot-renewal.timer`
Operational Checklist
Before deploying Certbot renewal automation to production:
- [ ] Install Certbot and required plugins (nginx/apache/dns)
- [ ] Test renewal with
--dry-runflag - [ ] Create and test deployment hook script
- [ ] Validate certificate after deployment in hook
- [ ] Configure systemd timer or cron job for automated renewal
- [ ] Set up monitoring and alerting for renewal failures
- [ ] Document manual renewal procedures in runbook
- [ ] Test multi-server distribution mechanism
- [ ] Verify rollback procedures work
- [ ] Configure rate limit monitoring (50 certs/week for production)
- [ ] Set up external certificate expiration monitoring
- [ ] Test renewal failure scenarios and recovery
- [ ] Ensure logs are retained and monitored
- [ ] Configure notifications (Slack, PagerDuty, email)
- [ ] Document emergency contacts and escalation paths
Related Documentation
ACME Operations: - Operating ACME Clients Overview - Section navigation - X.509 Certificate Verification - Certificate validation - ACME Challenge Validation (coming) - HTTP-01, DNS-01, TLS-ALPN-01 patterns
Broader Operations: - Renewal Automation - Platform-agnostic renewal strategies - Certificate Lifecycle Management - Complete lifecycle - Monitoring and Alerting - Monitoring frameworks
Troubleshooting: - Expired Certificate Outages - Incident response - Common Misconfigurations - Configuration issues
Protocol: - ACME Protocol - Protocol specification - TLS Protocol - TLS and certificates
This comprehensive guide provides enterprise-grade Certbot renewal automation patterns that ensure reliable, secure, and scalable certificate management across diverse production environments.