On-Premises HA - Three Location Loadbalancing

Overview

This page describes a three-location, high-availability (HA) architecture concept for a Public Key Infrastructure (PKI) platform. The architecture provides redundancy across three geographically separated sites using active/active load balancing, enabling optimal hardware utilization and horizontal scalability.

Due to the complexity of geographically distributed deployments and the wide range of infrastructure and compliance requirements, this architecture should be considered a highly theoretical concept. It illustrates how such a setup could be achieved but intentionally omits several critical considerations that are highly use-case specific and, therefore, outside the scope of this page.

To achieve geo-redundancy, high availability, load balancing and horizontal scalability, the use of Kubernetes as a hosting platform for MTG ERS® is strongly recommended.

Overview Diagram Three-Location Setup

Architecture Components

This architecture spans over three geographically separated locations and consists of the following core components:

Three Loadbalancers distributing the traffic across all three locations.
Three CARA VMs running independently and unaware of each other.
Three HSMs, clustered and connected via PKCS#11.
Three CLM VMs running independently and unaware of each other.
Three MariaDB Nodes forming an active/active Galera Cluster.

The design also supports network segmentation to isolate different components, such as Loadbalancers, CLM (RA), CARA (CA) and the Database.

Loadbalancers

Loadbalancer VMs consist of the following core components:

Keepalived with a VRRP → Manages a floating IP for high availability and failover.
Loadblancer service → Distributes incoming traffic across all three locations.

It is also possible to use a cloud-based edge load balancer instead of self-managed load balancer VMs.

CARA VMs

The CARA VMs consist of the following core components:

Webserver / Reverse Proxy → Performs TLS termination and forwards traffic to the local CARA services.
CARA Services → Run locally behind the reverse proxy.

HSMs

The HSMs are clustered, with each HSM connected only to the CARA instance at its own location, to minimize latency across seperated environments. \ The specifics of cluster configuration and replication are vendor-dependent and are not covered in this page.

CLM VMs

The core components of the CLM VMs are the following:

Webserver / Reverse Proxy → Performs TLS termination and forwards traffic to the local CLM services.
CLM Services → Run locally behind the reverse proxy.

MariaDB Nodes

Three MariaDB nodes form an active/active Galera Cluster, hosting the databases for both CLM and CARA applications across all three locations. There is also the option to deploy two separate 3-node Galera clusters, to further isolate CLM and CARA application data, if so desired.

Data Flow and General Considerations

Clients and administrators connect to the load balancers via a floating IP. The load balancers then distribute traffic to backend services across all three locations.

Each load balancer connects to all CARA and CLM VMs.
Each CARA VM connects only to the HSM at its own location.
Each CLM VM connects only to the CARA VM at its own location.
Both CARA and CLM VMs connect directly to the database node at their own location.

This architecture assumes reliable inter-site connectivity and consistent latency characteristics between locations. Operational considerations such as quorum behavior, latency-sensitive replication and compliance constraints must be carefully evaluated before implementing such a setup in production.

Conclusion

This three-location, load-balanced architecture concept provides a high degree of geo-redundancy for a PKI platform, along with horizontal scalability and efficient hardware utilization.

By distributing services across multiple sites and leveraging active/active components, the architecture aims to provide resilience against site-level failures and automated traffic redistribution. A three-node Galera Cluster supports data consistency across locations. However, the feasibility and stability of such a configuration depend heavily on network characteristics and operational maturity.

Overall, this architecture illustrates how a globally distributed, highly available PKI platform could be designed, but it should be evaluated carefully and adapted to specific infrastructure, latency and compliance requirements before being considered for real-world deployment.