For the latest version, please use Certificate Lifecycle Manager 5.0.2! |
Single-Tier High Availability Architecture for On-Premises PKI Platform
Overview
This page describes a single-tier, high-availability (HA) architecture for a Public Key Infrastructure (PKI) platform. The system is designed with redundancy and failover mechanisms to ensure continuous operation and resilience.
Architecture Components
The architecture consists of two main locations and an arbitrator:
-
Location A (Primary)
-
Location B (Secondary)
-
Location C (Arbitrator)
Each location has distinct roles to provide a fault-tolerant PKI environment.
Location A (Primary)
-
Keepalived (VRRP) for high-availability networking.
-
Keycloak for authentication and identity management.
-
ACME for automated certificate issuance.
-
OCSP / HTTP CRL for certificate revocation services.
-
EST, CLM, and SCEP for certificate management.
-
MariaDB Galera Node for database clustering.
-
CARA Admin and CARA for certificate authority and administration.
-
Runs on a single Virtual Machine (VM).
-
HSM (Hardware Security Module) for key management and cryptographic operations.
High-Availability Mechanisms
-
VRRP (Keepalived): Used to provide floating virtual IP addresses for frontend and backend services, ensuring seamless failover.
-
MariaDB Galera Cluster: Multi-master database cluster that ensures data consistency and failover between primary and secondary locations.
-
HSM Synchronization: Ensures cryptographic operations remain consistent across locations.
-
Quorum-based Failover: The arbitrator node prevents split-brain situations by participating in voting for database cluster integrity.
Data Flow and Failover Scenarios
-
Normal Operation:
-
Location A handles primary PKI operations.
-
Location B remains in sync, ready to take over.
-
Location C acts as an arbitrator for database quorum.
-
-
Failure at Location A:
-
Location B automatically takes over using VRRP failover mechanisms.
-
Database operations remain available due to Galera Cluster.
-
-
Failure at Location B:
-
Location A continues normal operations.
-
Location C ensures database cluster stability.
-
-
Failure at Location C:
-
No immediate impact unless a second location fails.
-
Database failover still works between primary and secondary sites.
-
Conclusion
This architecture provides a robust HA design for a PKI platform, ensuring minimal downtime, database integrity, and secure cryptographic operations. By leveraging VRRP, MariaDB Galera Cluster, and HSM synchronization, the system achieves redundancy, failover readiness, and secure key management across multiple locations.