Part of the PKI Implementation Guide
Cloud HSM vs On-Premise HSM: Pricing, FIPS 140-2 and the Decision That Matters
The HSM deployment decision is not a technology question. It's a private key protection and business risk question — and the wrong answer costs between $200K and $500K to fix.
Most organisations approach this decision backwards. They compare feature lists and headline hourly rates — AWS CloudHSM at $1.50/hour, Azure Dedicated HSM at around $1.45/hour — choose a deployment model, then discover the organisational and compliance implications eighteen months later during an audit or an incident.
This guide gives you the decision framework first. The cryptographic security of a properly implemented cloud HSM and a properly implemented on-premise HSM is equivalent — both achieve FIPS 140-2 Level 3 certification. The differences that matter are control, total cost over time, compliance narrative, and operational accountability — all of which are business decisions, not engineering ones.
The Decision in Plain Terms
An HSM (Hardware Security Module) is the physical device responsible for private key protection at the highest level of assurance — your root CA private key, your payment signing keys, your code signing infrastructure. If that key material is compromised, you don't have a certificate problem. You have a business continuity problem.
The deployment question is: where does that hardware live, and who is responsible for it?
On-premise: The hardware — typically a Thales Luna Network HSM or Entrust nShield — sits in your data centre. You own it, you manage it, you are accountable for it. Full control, full responsibility.
Cloud HSM: The hardware is in AWS, Azure, or GCP's data centre. AWS CloudHSM and Azure Dedicated HSM both use dedicated Thales Luna hardware. You control the key operations — the cloud provider cannot access your key material in plaintext by design, which is precisely what FIPS 140-2 Level 3 certification guarantees. But the provider manages the physical hardware, firmware updates, and facilities.
Both models meet FIPS 140-2 Level 3 certification requirements. Neither is inherently more secure than the other when properly implemented. The decision comes down to four variables: control requirements, time horizon, organisational capability, and compliance narrative.
Before You Compare Deployment Models: Get the HSM Type Right
There is a fundamental distinction that gets skipped in most HSM procurement processes, and discovering it after purchase is an expensive problem: there are two categories of HSM, and they are not interchangeable.
Financial HSMs — Thales payShield, Atalla, and similar — are purpose-built for payment cryptography. They implement the specific command sets defined by payment schemes: PIN block processing, EMV transaction authorisation, card personalisation. They are excellent at what they do. They are not designed for PKI operations and typically do not support the certificate operations, key types, or CA interfaces that a PKI infrastructure requires.
General purpose HSMs — Thales Luna, Entrust nShield, and their cloud equivalents — support the full range of cryptographic operations including the certificate signing, key generation algorithms, and CA integration that PKI demands. This is the category your PKI infrastructure needs.
The confusion arises because both carry FIPS 140-2 certification, both are called HSMs, and in organisations that operate both payment and PKI infrastructure, teams sometimes assume the hardware already on site is suitable for the PKI use case. It often isn't. If you are procuring HSMs specifically for a PKI programme, confirm explicitly that the device supports PKCS#11 interfaces and the key types your CA requires — do not assume based on FIPS certification alone.
This distinction applies equally to cloud HSM offerings. Verify the specific HSM type and supported operations before committing to a provider for PKI use.
Cloud HSM Pricing vs On-Premise: What You Actually Pay
The headline numbers are deceptive. Here is what each option actually costs.
Cloud HSM pricing per hour (dedicated hardware):
- AWS CloudHSM: ~$1.50/hour per HSM = approximately $13,140/year per device
- Azure Dedicated HSM: ~$1.45–$1.88/hour depending on region = approximately $12,700–$16,500/year per device
- Google Cloud HSM: Key operations priced per use rather than per dedicated device — lower upfront but costs scale with volume and you do not get dedicated hardware
A minimum viable HA deployment requires at least two HSMs in a primary region and ideally one in a DR region. At AWS CloudHSM rates, that is approximately $39,420/year in device costs alone before network, connectivity, or operational overhead.
On-premise pricing: A Thales Luna Network HSM typically costs $20,000–$50,000 per device depending on model and throughput requirements. An HA deployment (two devices) plus a DR standby runs $80,000–$150,000 in hardware. Annual support contracts add 15–20% of hardware value per year.
A realistic on-premise HA deployment (two HSMs for redundancy, one for DR) runs approximately $145,000 in year one — hardware, setup, data centre costs — and roughly $50,000 per year ongoing for support contracts, staff time, and maintenance. Over five years: approximately $345,000. Over ten years, including a hardware refresh: approximately $695,000.
A cloud HSM HA deployment across two regions runs approximately $60,000–$130,000 per year once you include network connectivity costs, data transfer, and operational overhead — not just the headline hourly rate. Over five years: $300,000–$650,000. Over ten years: $600,000–$1,300,000.
The honest conclusion: cloud HSM is usually cheaper for deployments under three years. On-premise is usually cheaper for deployments over five years. The crossover point depends on your specific volume, network costs, and vendor negotiations.
What executives consistently underestimate in on-premise TCO: hardware refresh at year seven to ten ($100K+), emergency replacement outside warranty ($20K–$50K), and the opportunity cost of senior staff time on HSM operations rather than higher-value work.
What executives consistently underestimate in cloud TCO: network connectivity costs that can run $12,000–$60,000 per year, vendor price increases of 3–5% annually with no negotiating leverage, and migration costs if you need to leave the provider ($50K–$200K in re-architecture).
A critical distinction when comparing cloud HSM costs: not all cloud HSM offerings give you the equivalent of a dedicated physical device. Some providers — particularly lower-tier or managed-key offerings — give you a logical partition on a shared HSM rather than dedicated hardware. A partition shares the underlying physical device with other tenants, which has implications for performance predictability, security isolation, and the compliance story you can tell auditors. AWS CloudHSM and Azure Dedicated HSM provide dedicated hardware. Other cloud HSM or managed-key services may not — AWS KMS, for example, is a managed key service backed by HSMs but does not give you a dedicated device or the same level of control. When comparing quotes, confirm explicitly whether you are purchasing dedicated hardware or a partition, and whether the API gives you full PKCS#11 access or a limited subset. The price difference is real, and so are the differences in what you get.
If you're evaluating this decision, our certificate cost management guide provides a fuller framework for mapping total operational burden across your PKI estate.
Compliance: What Auditors Actually Ask
Both deployment models can satisfy SOC 2, PCI DSS, and HIPAA requirements. The difference is in the narrative you have to construct and maintain — and how much of the compliance burden falls on your team versus the vendor.
SOC 2 (General Enterprise)
SOC 2 auditors care about key protection, access controls, audit logging, and backup procedures. Both deployment models address these, but the evidence package differs.
With on-premise HSMs, you're presenting your own physical security procedures, your own access control documentation, and your own DR test results. More documentation, more audit prep time, but you control the narrative entirely.
With cloud HSMs, you lean on the provider's SOC 2 Type II report for physical security and hardware management. Your auditors still need to see your operational procedures, backup testing, and cross-region failover documentation — the shared responsibility model means you're responsible for more than most people assume.
The most common SOC 2 finding with cloud HSMs is insufficient understanding of shared responsibility: organisations assume the provider's certification covers their procedures. It doesn't.
PCI DSS (Financial Services, Payments)
PCI DSS explicitly approves cloud HSMs. The key questions auditors will ask: Are private keys stored in FIPS 140-2 Level 3 certified hardware? Are key operations logged and attributable? Is access to HSM operations controlled and auditable? Can you demonstrate tested DR procedures?
The vendor access question comes up consistently: "Can AWS access your keys?" The correct answer is that FIPS 140-2 Level 3 certification prevents the provider from accessing key material in plaintext — the HSM's tamper-evident design and cryptographic boundary mean the provider's physical access to the hardware does not translate to access to key material. This is precisely what Level 3 certification validates, and it is the same guarantee you get from an on-premise deployment. This is a nuanced answer that auditors may push on, and you need to be prepared to explain it clearly.
For payment systems specifically, we've built and operated HSM infrastructure at Barclays, Deutsche Bank, and TSB Bank. The compliance narrative is manageable with cloud HSMs, but the documentation burden is real.
HIPAA (Healthcare)
HIPAA treats certificate expiration monitoring as a technical safeguard and requires documented justification for private key access controls. Both deployment models satisfy these requirements. Cloud deployments require a Business Associate Agreement (BAA) with the provider — AWS, Azure, and GCP all offer this, but it must be in place before deployment.
The HIPAA-specific risk with cloud HSMs is data residency: if your data must remain within a specific jurisdiction, verify that the cloud provider's HSM infrastructure in that region meets the requirement. This is verifiable and manageable, but it's a step that gets skipped.
The Control Question: Private Key Protection vs Operational Flexibility
Control requirements are the clearest differentiator between deployment models, and they're often driven by regulation rather than preference.
You need on-premise if: You require air-gapped operations — an offline root CA with no network connectivity is best practice for root CA private key protection, and it is physically impossible in a cloud deployment. You have explicit data sovereignty requirements that prohibit third-party physical access to hardware. Your regulatory environment prohibits cloud services for this category of data.
Cloud HSM works if: Your root CA keys can be online (intermediate CA operations, issuance infrastructure). You're cloud-native and your HSM — whether AWS CloudHSM or Azure Dedicated HSM — integrates with existing cloud infrastructure. You need geographic distribution across multiple regions without managing physical hardware in each location.
A hybrid approach is valid and common: root CA private keys on-premise and air-gapped for maximum protection, intermediate CA keys in cloud HSM for operational flexibility and geographic distribution. This is the pattern we implemented at TSB Bank's PKI separation from Banco Sabadell — maximum control for the most sensitive keys, operational convenience for everything else.
Disaster Recovery: Where Organisations Get Hurt
DR is the area where the gap between what organisations think they have and what they actually have is largest — in both deployment models.
Cloud HSMs handle hardware failures transparently. If a physical HSM fails, the provider replaces it without any action required from your team. This is a genuine operational advantage.
What cloud HSMs do not handle automatically: regional disasters. If your cloud region goes down and you have not deployed HSM infrastructure in a second region with tested failover procedures, you have no DR. The most expensive cloud HSM incidents we've seen involve organisations who assumed "cloud equals DR" and discovered during a regional outage that it does not.
On-premise HSMs require you to manage hardware failures yourself — typically 4–8 hours if you have tested spare hardware and backup procedures. DR across facilities is your architecture problem to solve.
In both models, the critical failure mode is the same: untested backup procedures. An HSM backup that has never been restored is not a backup. Tested quarterly. This is not optional.
For a fuller treatment of PKI resilience, see our High Availability and Disaster Recovery guide.
Decision Framework
Answer these five questions. The answers determine your deployment model.
1. Do you require air-gapped operations? If your root CA must have no network connectivity, you need on-premise. This is non-negotiable — cloud HSMs are always online.
2. What is your deployment horizon? Under three years, or uncertain: cloud HSM. Over five years with committed infrastructure: on-premise is likely cheaper. Three to five years: model your specific TCO.
3. Do you have HSM-competent staff? On-premise HSM operations require staff who understand HSM administration, firmware management, FIPS 140-2 compliance procedures, and key ceremony execution. If you don't have this capability, on-premise operational costs are much higher than the hardware cost implies, and operational risk is elevated. Cloud HSMs — whether AWS CloudHSM or Azure Dedicated HSM — reduce (but do not eliminate) this requirement. You still need staff who understand private key protection practices and HSM operational procedures.
4. Do you need geographic distribution? Serving users across multiple regions with low latency, or needing multi-region HA without physical data centre presence? Cloud HSM provides this significantly more simply than deploying and managing hardware globally.
5. What is your budget model? CAPEX available for upfront hardware investment: on-premise is viable. OPEX preferred with no large upfront spend: cloud HSM fits the model better.
If your answers point in different directions — control requirements favour on-premise but budget model favours cloud — a hybrid deployment is likely the right architecture. It adds complexity, but it's better than compromising on either control or operational sustainability.
What Goes Wrong (and How Much It Costs)
The expensive mistakes in HSM deployment are consistent across deployment models and organisations.
Single HSM with no redundancy. Saving $50,000 on a second on-premise HSM is a common cost-cutting decision that creates a single point of failure worth multiples of that saving in potential downtime. Minimum deployment: two HSMs for HA. Three if you want a DR standby.
Untested backup procedures. The backup exists. The restore procedure has never been executed. Hardware fails. The backup can't be restored due to firmware version mismatch or a backup encryption key stored on the failed device. This failure mode has cost organisations $500,000 or more in remediation and downtime. Test your restore quarterly. Document the result.
Assuming cloud DR is automatic. Provider handles hardware. You handle application-level failover across regions. If you haven't provisioned HSM infrastructure in a second region and tested the cutover, you have no regional DR. Discover this during an incident, not during planning.
No exit strategy for cloud deployments. Key export from cloud HSMs is constrained by design — this is a security feature, but it creates vendor dependency. Before committing to a cloud HSM deployment, document how you would migrate to a different provider if required. The answer should involve a tested procedure, not a theoretical process.
Procuring a partition instead of a dedicated device. Some cloud HSM and managed-key offerings give you a logical partition on shared hardware rather than a dedicated physical HSM. This distinction matters for performance under load, security isolation guarantees, and how you answer auditor questions about multi-tenancy. It also affects which API capabilities are available — some partition-based services expose only a limited subset of HSM operations, which can surface as integration failures late in a PKI deployment. Verify what you're buying before you commit.
Making the Decision
The right HSM deployment model is the one that matches your private key protection requirements, control posture, time horizon, organisational capability, and compliance obligations — not the one with the lowest per-hour rate or the most features.
Neither model is universally better. AWS CloudHSM, Azure Dedicated HSM, and on-premise Thales Luna deployments are all used successfully by sophisticated organisations in regulated industries. The failures we've been called in to remediate were not caused by the wrong deployment model — they were caused by inadequate operational procedures, untested DR, misunderstood shared responsibility, and the wrong HSM type being procured in the first place.
If you're making this decision as part of a broader PKI implementation, read the PKI trust models guide before finalising HSM architecture — your trust hierarchy determines which keys require the most stringent private key protection and therefore which deployment model applies to each tier.
Further Reading
- PKI Implementation Guide — Strategic framework for enterprise PKI programmes
- PKI Trust Models Explained — How your trust hierarchy shapes HSM requirements at each tier
- High Availability and Disaster Recovery for PKI — DR architecture for HSM-backed PKI infrastructure
- Keyfactor vs Venafi — How CLM platform choice interacts with HSM deployment model
- Certificate Cost Management — Full TCO framework for PKI and certificate operations
Talk to Someone Who's Done This
We've implemented HSM infrastructure at Barclays, Deutsche Bank, TSB Bank, and Sky. On-premise, cloud, and hybrid. In regulated and unregulated environments. We know which patterns work and which ones generate $500K remediation projects.
If you're making this decision and want an independent view — no vendor relationships, no commissions — get in touch.