I am going to spend August and September highlighting Java EE and JBoss EAP in the cloud. I will focus on deploying Java EE applications JBoss EAP running on OpenShift Online (Public PaaS), OpenShift Enterprise (Private PaaS), and OpenStack (Private IaaS). In fact, I am planning to run OpenShift Enterprise on Red Hat Cloud Infrastructure in a dedicated lab at Red Hat. More on that later.
I am starting with OpenShift.
OpenShift is a next generation platform as a service (PaaS) that makes it easy to develop, deploy, and scale applications in the cloud – public or private.
- Pushes code to the application’s Git repository.
- Builds the application with Jenkins.
- Provisions the application stack (e.g. JBoss EAP + PostgreSQL).
- Deploys the application.
My intention is to highlight the internal architecture from a Java EE architect point of view.
Traditional Middleware Infrastructure
JBoss EAP instances run on one or more nodes with a node being a physical server or virtual machine. One or more Apache HTTP Server instances with mod_cluster plugins will proxy and load balance HTTP requests to the JBoss EAP instances.
JBoss EAP instances, by default, rely on UDP multicast messages for intra-cluster communication.
OpenShift Middleware Architecture
JBoss EAP instances run on one or more gears on one or more nodes with a node being a physical server or virtual machine. An Apache HTTP Server instance will proxy HTTP requests to a single HAProxy instance per application. The HAProxy instance will load balance HTTP requests to the JBoss EAP instances via a local HAProxy instance per node.
JBoss EAP instances rely on a local HAProxy instance per node and TCP unicast messages to communicate with each other.
JBoss EAP instances to not communicate with each other directly.
My initial reaction can best be described with a certain three letter acronym. However, after discussing it with Red Hat engineers, I realized that there is in fact a method to the madness.
- Apache HTTP Server is run as a reverse proxy.
- There are multiple gears running per node with the same external ip:port but with different hostnames. It is the responsibility of the Apache HTTP Server instance to map the external ip:port and a hostname to an internal ip:port with virtual hosts and mod_proxy. It is not possibly with HAProxy.
- The HAProxy service (per node) is run as a TCP port proxy.
- The HAProxy cartridge (per application) is run as a load balancer.
- There is one HAProxy cartridge per application to support multitenancy (security and resource management) and to ensure that the proxy, for example via a restart, does not impact multiple applications.
Why does OpenShift use HAProxy for load balancing?
- Lower Overhead
- Better Performance
- Better Statistics (for Auto Scaling)
The HAProxy cartridge is configured to use the least connection algorithm for load balancing (link).
- SSL terminates the Apache HTTP Server instances.
- The Apache HTTP Server instances are configured to use the prefork MPM.
- The HAProxy instances, both the service and the cartridge, rely on the -sf command line option to reload the configuration without dropping requests (link).
Recognized Areas of Improvement
- Support multiple load balancers (i.e. cartridges) per application.
- Support pluggable load balancers (i.e. alternatives to HAProxy).
- Retire the HAProxy service (port proxy) in favor of iptables.
- Upgrade Apache HTTP Server (from 2.2 to 2.4).
- OpenShift is no longer limited to the 2.2 release.
I found Luke Meyer’s attempt to diagram the request flow in OpenShift to be quite helpful (link).
In addition, here are two networking diagrams provided by the OpenShift team.
Gears of War? No. Well, yes. The war on old school middleware infrastructure.
OpenShift relies on namespaces and control groups (i.e. process containers) similar to, but different from, Linux Containers (LXC) or Solaris Zones for resource management and isolation via the operating system (link) instead of a full virtualization (link) or paravirtualization (link) with a hypervisor. The result is a userspace container with full resource isolation and control for an application. In addition, OpenShift relies on SELinux for security within a multitenant environment.
Lightweight virtualization results in less overhead than standard virtualization. This allows OpenShift to run many JBoss EAP instances per node efficiently.
LXC supports running applications or operating systems within userspace containers. OpenShift does not run an operating system within userspace containers.
This is an informative article comparing containers to hypervisors (link).
What about JBoss EAP?
JBoss EAP is deployed to one or more gears in the form of cartridges. There are cartridges for application servers / languages and databases (e.g. PostgreSQL). The services provided by cartridges are accessed via environment variables. Just as applications run within containers (LXC), cartridges run within gears (OpenShift).
The applications running in gears are bound to a loopback interface. As a result, they are not directly accessible by other gears. That is why there is an HAProxy service running on each node. It is the responsibility of this service to proxy incoming requests to one of the nodes external addresses to the internal address of one of the nodes gears.
What about the build and deploy process?
OpenShift applications include Git repositories. In addition, a Jenkins cartridge can be deployed for building enterprise applications.
- Nodes – Host gears.
- Brokers – Provides management and provisioning via a REST API.
- Message Bus (AMQP) – Handles broker / node communication.
- MCollective (on Nodes)
- MongoDB – For persistent state.
There is a nice diagram of the components on the OpenShift architecture page (link) under System Components.
- One host for all components.
- One host for a single broker and ActiveMQ.
- One or more hosts for nodes.
- Multiples hosts for load balanced brokers.
- One host for ActiveMQ.
- Multiple hosts for MongoDB (replicated).
- One or more hosts for nodes.
The documentation is great: