Java EE in the Cloud: OpenShift Architecture

I am going to spend August and September highlighting Java EE and JBoss EAP in the cloud. I will focus on deploying Java EE applications JBoss EAP running on OpenShift Online (Public PaaS), OpenShift Enterprise (Private PaaS), and OpenStack (Private IaaS). In fact, I am planning to run OpenShift Enterprise on Red Hat Cloud Infrastructure in a dedicated lab at Red Hat. More on that later.


I am starting with OpenShift.

OpenShift is a next generation platform as a service (PaaS) that makes it easy to develop, deploy, and scale applications in the cloud – public or private.

overview-paas

  1. Developer
    1. Pushes code to the application’s Git repository.
  2. OpenShift
    1. Builds the application with Jenkins.
    2. Provisions the application stack (e.g. JBoss EAP + PostgreSQL).
    3. Deploys the application.

My intention is to highlight the internal architecture from a Java EE architect point of view.

Traditional Middleware Infrastructure

JBoss EAP instances run on one or more nodes with a node being a physical server or virtual machine. One or more Apache HTTP Server instances with mod_cluster plugins will proxy and load balance HTTP requests to the JBoss EAP instances.

trad_client

JBoss EAP instances, by default, rely on UDP multicast messages for intra-cluster communication.

trad_server

OpenShift Middleware Architecture

JBoss EAP instances run on one or more gears on one or more nodes with a node being a physical server or virtual machine. An Apache HTTP Server instance will proxy HTTP requests to a single HAProxy instance per application. The HAProxy instance will load balance HTTP requests to the JBoss EAP instances via a local HAProxy instance per node.

os_client

JBoss EAP instances rely on a local HAProxy instance per node and TCP unicast messages to communicate with each other.

os_server

JBoss EAP instances to not communicate with each other directly.

???

My initial reaction can best be described with a certain three letter acronym. However, after discussing it with Red Hat engineers, I realized that there is in fact a method to the madness.

  • Apache HTTP Server is run as a reverse proxy.
    • There are multiple gears running per node with the same external ip:port but with different hostnames. It is the responsibility of the Apache HTTP Server instance to map the external ip:port and a hostname to an internal ip:port with virtual hosts and mod_proxy. It is not possibly with HAProxy.
  • The HAProxy service (per node) is run as a TCP port proxy.
  • The HAProxy cartridge (per application) is run as a load balancer.
    • There is one HAProxy cartridge per application to support multitenancy (security and resource management) and to ensure that the proxy, for example via a restart, does not impact multiple applications.

Why does OpenShift use HAProxy for load balancing?

  • Lower Overhead
  • Better Performance
  • Better Statistics (for Auto Scaling)

The HAProxy cartridge is configured to use the least connection algorithm for load balancing (link).

I found this post on inter-gear communication by Bill DeCoste to helpful (link) as was this discussion on the OpenShift mailing list (link).

  • SSL terminates the Apache HTTP Server instances.
  • The Apache HTTP Server instances are configured to use the prefork MPM.
  • The HAProxy instances, both the service and the cartridge, rely on the -sf command line option to reload the configuration without dropping requests (link).

Recognized Areas of Improvement

  • Support multiple load balancers (i.e. cartridges) per application.
  • Support pluggable load balancers (i.e. alternatives to HAProxy).

Under Consideration

  • Retire the HAProxy service (port proxy) in favor of iptables.
  • Upgrade Apache HTTP Server (from 2.2 to 2.4).
    • OpenShift is no longer limited to the 2.2 release.

FYI
I found Luke Meyer’s attempt to diagram the request flow in OpenShift to be quite helpful (link).

In addition, here are two networking diagrams provided by the OpenShift team.

GearNetworking-2-v04GearNetworking-1-v04

Gears

Gears of War? No. Well, yes. The war on old school middleware infrastructure.

OpenShift relies on namespaces and control groups (i.e. process containers) similar to, but different from, Linux Containers (LXC) or Solaris Zones for resource management and isolation via the operating system (link) instead of a full virtualization (link) or paravirtualization (link) with a hypervisor. The result is a userspace container with full resource isolation and control for an application. In addition, OpenShift relies on SELinux for security within a multitenant environment.

Bottom Line
Lightweight virtualization results in less overhead than standard virtualization. This allows OpenShift to run many JBoss EAP instances per node efficiently.

lxc_virt

Note
LXC supports running applications or operating systems within userspace containers. OpenShift does not run an operating system within userspace containers.

FYI
This is an informative article comparing containers to hypervisors (link).

What about JBoss EAP?

JBoss EAP is deployed to one or more gears in the form of cartridges. There are cartridges for application servers / languages and databases (e.g. PostgreSQL). The services provided by cartridges are accessed via environment variables. Just as applications run within containers (LXC), cartridges run within gears (OpenShift).

The applications running in gears are bound to a loopback interface. As a result, they are not directly accessible by other gears. That is why there is an HAProxy service running on each node. It is the responsibility of this service to proxy incoming requests to one of the nodes external addresses to the internal address of one of the nodes gears.

What about the build and deploy process?

OpenShift applications include Git repositories. In addition, a Jenkins cartridge can be deployed for building enterprise applications.

more

OpenShift Components

  • Nodes – Host gears.
  • Brokers – Provides management and provisioning via a REST API.
  • Message Bus (AMQP) – Handles broker / node communication.
    • ActiveMQ
    • MCollective (on Nodes)
  • MongoDB – For persistent state.

There is a nice diagram of the components on the OpenShift architecture page (link) under System Components.

Topologies

  • Beginner
    • One host for all components.
  • Intermediate
    • One host for a single broker and ActiveMQ.
    • One or more hosts for nodes.
  • Advanced
    • Multiples hosts for load balanced brokers.
    • One host for ActiveMQ.
    • Multiple hosts for MongoDB (replicated).
    • One or more hosts for nodes.

Documentation

The documentation is great:

  • How it Works (link)
  • Documentation (link)
  • System Architecture Guide (link)
,

About Shane K Johnson

Technical Marketing Manager, Red Hat Inc.

View all posts by Shane K Johnson

3 Comments on “Java EE in the Cloud: OpenShift Architecture”

  1. Seth Williams Says:

    The future surely seems to be moving to the cloud. Thanks a lot for an excellent tutorial.

    Reply

Trackbacks/Pingbacks

  1. Java EE in the Cloud: OpenShift Architecture | ... - August 19, 2013

    […] I am going to spend August and September highlighting Java EE and JBoss EAP in the cloud. I will focus on deploying Java EE applications JBoss EAP running on OpenShift Online (Public PaaS), OpenShi…  […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 117 other followers

%d bloggers like this: