|
For the latest version, please use Certificate Lifecycle Manager 6.8.1! |
On-Premises HA - Two Location Cold-Standby
Overview
This page describes a two-location, high-availability (HA) architecture for a Public Key Infrastructure (PKI) platform. The architecture provides redundancy across two geographically separated sites using a cold-standby, active/passive failover model.
Architecture Components
This architecture spans over two geographically separated locations and consists of the following core components:
-
Two CARA VMs running independently and unaware of each other.
-
Two HSMs, clustered and connected via PKCS#11.
-
Two CLM VMs running independently and unaware of each other.
-
Two Database nodes forming an active/passive cluster.
The design also supports network segmentation to isolate different components, such as CLM (RA), CARA (CA), and the Database.
CARA VMs
The CARA VMs consist of the following core components:
-
Webserver / Reverse Proxy → Performs TLS termination and forwards traffic to the local CARA services.
-
CARA Services → Run locally behind the reverse proxy.
HSMs
The HSMs are clustered, with each HSM connected only to the CARA instance at its own location. The specifics of cluster configuration and replication are vendor-dependent and are not covered in this page.
Data Flow and General Considerations
Clients and administrators connect directly to the CLM and CARA instances. Traffic to the active site can be controlled via static BGP routing or, for example, by switching CNAME records in DNS management.
-
Each CARA VM connects only to the HSM at its own location.
-
Each CLM VM connects only to the CARA VM at its own location.
-
Both CARA and CLM VMs connect directly to the database node at their own location.
In this architecture, failures must be detected by a monitoring system and handled manually by the operations team. Failover to the standby site is a conscious decision and requires manual action.
| Due to the high split-brain inherent risk in not having proper quorum capabilities, setting up a hot-standby failover mechanism across two locations only is strongly discouraged. |
Conclusion
This two-location cold-standby architecture provides a georedundant design for a PKI platform, ensuring availability of a standby site to recover operations, in case of a failure at the primary site.
Traffic can be directed to the sites via static BGP routes or CNAME records. However, failover requires manual intervention by the operations team. Proper monitoring is essential for detecting failures and coordinate recovery.
Overall, this architecture provides resilience against site-level failures, for inquiries where a third location for automatic failover is not feasible. Due to the manual failover, this architecture ensures high data integrity and predictable traffic flows.