For the latest version, please use Certificate Lifecycle Manager 5.0.2!

Single-Tier High Availability Architecture for On-Premises PKI Platform

Overview

This page describes a single-tier, high-availability (HA) architecture for a Public Key Infrastructure (PKI) platform. The system is designed with redundancy and failover mechanisms to ensure continuous operation and resilience.

1200

Architecture Components

The architecture consists of two main locations and an arbitrator:

  1. Location A (Primary)

  2. Location B (Secondary)

  3. Location C (Arbitrator)

Each location has distinct roles to provide a fault-tolerant PKI environment.

Location A (Primary)

  • Keepalived (VRRP) for high-availability networking.

  • Keycloak for authentication and identity management.

  • ACME for automated certificate issuance.

  • OCSP / HTTP CRL for certificate revocation services.

  • EST, CLM, and SCEP for certificate management.

  • MariaDB Galera Node for database clustering.

  • CARA Admin and CARA for certificate authority and administration.

  • Runs on a single Virtual Machine (VM).

  • HSM (Hardware Security Module) for key management and cryptographic operations.

Location B (Secondary)

  • Mirrors the setup of Location A to provide failover capability.

  • Participates in the HA clustering for frontend and backend services.

  • Maintains database replication with the primary site via MariaDB Galera Cluster.

  • HSM is also present and synchronized with the primary site.

Location C (Arbitrator)

  • Ensures quorum for MariaDB Galera Cluster.

  • Runs Keepalived (VRRP) to participate in HA networking.

  • Acts as a MariaDB Galera Node or Arbitrator to prevent split-brain scenarios.

  • Runs on a single VM.

High-Availability Mechanisms

  1. VRRP (Keepalived): Used to provide floating virtual IP addresses for frontend and backend services, ensuring seamless failover.

  2. MariaDB Galera Cluster: Multi-master database cluster that ensures data consistency and failover between primary and secondary locations.

  3. HSM Synchronization: Ensures cryptographic operations remain consistent across locations.

  4. Quorum-based Failover: The arbitrator node prevents split-brain situations by participating in voting for database cluster integrity.

Data Flow and Failover Scenarios

  1. Normal Operation:

    • Location A handles primary PKI operations.

    • Location B remains in sync, ready to take over.

    • Location C acts as an arbitrator for database quorum.

  2. Failure at Location A:

    • Location B automatically takes over using VRRP failover mechanisms.

    • Database operations remain available due to Galera Cluster.

  3. Failure at Location B:

    • Location A continues normal operations.

    • Location C ensures database cluster stability.

  4. Failure at Location C:

    • No immediate impact unless a second location fails.

    • Database failover still works between primary and secondary sites.

Conclusion

This architecture provides a robust HA design for a PKI platform, ensuring minimal downtime, database integrity, and secure cryptographic operations. By leveraging VRRP, MariaDB Galera Cluster, and HSM synchronization, the system achieves redundancy, failover readiness, and secure key management across multiple locations.