We, Data Grid

October 25, 2012

Data Grid, NOSQL

This post gives an overview of the standard features, functionality, and configuration options of a data grid. It does so by establishing a base line with two data grids: JBoss Data Grid (6.0.1) and Oracle Coherence (3.7.1). It begins with core concepts and proceeds directly to intermediate and advanced concepts implemented by both data grids.

  • Core
  • Distributed
  • Concurrency
  • Processing
  • Remote
  • Storage
  • Management

Core

TOPOLOGY

JBoss
Data Grid
Oracle
Coherence
Local Y Y
Invalidation Y Y
Replicated Y Y
Distributed Y Y

BASIC

JBoss
Data Grid
Oracle
Coherence
Eviction Y Y
Expiration Y Y

Distributed

A data grid is a distributed system with a single system image (SSI); it presents itself as a single system. However, a data grid should support the load balancing of data by distributing it to highly available (HA) partitions on multiple nodes via dynamic partitioning. When a node is added, an HA partition is created. As a result, a data grid should support state transfer. It is the process of transferring ownership of a subset of the entries within the grid to a new node when one is added. While the data is distributed, it may be preferable to ensure that related entries are stored on the same node. This is known as data affinity. For example, a department and its employees. It may be desirable to ensure that all employees within a department are stored on the same node. A data grid is elastic; nodes can be added or remove on an on-demand basis. Further, a data grid should support node discovery and failure detection. Because a data grid is elastic and because the data is distributed, it scales linearly. When a node is added, its full capacity is added to the data grid.

A data grid should support both broadcast and peer-to-peer communication. It should support both UDP/IP and TCP/IP. In addition, a data grid should support both multicast and unicast messaging. While node discovery, node failure detection via heartbeats, and group wide communication rely on multicast messaging, peer-to-peer communication relies on unicast messaging. JBoss Data Grid relies on JGroups for its group membership and communication protocol whereas Oracle Coherence relies on Tangosol Cluster Management Protocol (TCMP).

JBoss
Data Grid
Oracle
Coherence
Single System Image Y Y
Data Load Balancing Y Y
Distributed Data Y Y
Highly Available Partitions Y Y
Dynamic Partioning Y Y
State Transfer Y Y
Data Affinity Y Y
Elastic Y Y
Node Discovery & Failure Detection Y Y
UDP/IP & TCP/IP Y Y
Multicast & Unicast Messaging Y Y
Peer-to-Peer Communciation Y Y
Linear Scaling Y Y

Concurrency

A data grid should support transactions with ACID properties and concurrent access. It should be able to participate in Java Transaction API (JTA) compliant transaction. In addition, it should be able to participate in an XA (distributed) transaction to guarantee transaction consistency and support transaction recovery.

A data grid should support distributed locking and it should support Multi Version Concurrency Control (MVCC) to provide transactions with ACID properties. Distributed locking ensures that concurrent transactions do not write to the same entry as the same time. When a transaction locks an entry for writing, other transactions writing to that same entry fail or are blocked. MVCC ensures that transactions that are reading an entry do not block transactions that are writing that same entry and vice versa; reads are not blocked by writes, writes are not blocked by reads.

A data grid should support both optimistic and pessimistic locking. When a transaction is configured with pessimistic locking, the lock is acquired immediately (before the entry is written). When a transaction is configured with optimistic locking, lock acquisition is deferred (until the prepare phase). In addition, a data grid should support explicit locking. With explicit locking, the entry is locked manually via the API. Finally, a data grid should support deadlock detection in the event that concurrent transaction block each other.

A data grid should support the following isolation levels: read committed, repeatable read. With read committed, a transaction always sees the latest version of an entry. If a transaction reads an entry and a second transaction then updates and commits that same entry, the first transaction will see the updated version if it reads that same entry again before committing. With repeatable read, a transaction will always see the same version of that entry regardless of whether or not it has been updated by a separate transaction before it has committed.

JBoss
Data Grid
Oracle
Coherence
Java Transaction API Y Y
ACID Properties Y Y
XA Compliant Resource Y Y
Transaction Consistency Y Y
Distributed Locking Y Y
Optimistic / Pessimistic Locking Y ?
Explicit Locking Y Y
Deadlock Detection Y Y
Multi Version Concurrency Control Y Y
Read Committed Isolation Y Y
Repeatable Read Isolation Y Y

Note

I am unable to state whether or not optimistic and / or pessimistic locking can be configured with Oracle Coherence. The locking strategy is described in neither the NamedCache documentation nor the OptimisticNamedCache documentation. While the interface name OptimisticNamedCache implies optimistic locking, there is no reference to pessimistic locking in the transaction documentation.

Processing

A data grid should support distributed tasks. It should be able to executed a task on some or all of the nodes in parallel. In addition, a data grid should be able to determine which nodes to pass the task to. Rather than passing data to the task, a data grid passes a task to the data. To that end, a data grid should be able to determine which nodes the task should be passed to.

JBoss
Data Grid
Oracle
Coherence
Distributed Tasks Y Y
Parallel Processing Y Y
Grid Processing Y Y

Remote

A data grid should support remote access via a Java API and it should include a Java client. In addition, it should provide an HTTP / REST API to support remote access from clients written in languages other than Java (e.g. Ruby / Python).

JBoss
Data Grid
Oracle
Coherence
Java API Y Y
Java Client Y Y
REST API Y Y

Storage

A data grid should support a read data from a cache loader and read / write data to a cache store. While a cache loader may be used for a read only data grid, a cache store may be used for a read / write data grid. It should support writing entries to a cache store, and it should support reading entries from a cache loader and / or cache store when they are not in the data grid. A cache loader and / or cache store may be implemented with a file system or a database.

In addition, a data grid should support both write-through and write-behind persistence with a cache store. With write-through persistence, the write call does not return until the entry has been written to both the data grid and the cache store. The entry is written to both the data grid and the cache store in the same call. Whereas write-through persistence is synchronous, write-behind persistence is asynchronous. With write-behind persistence, the write call returns after the entry has been written to the data grid. The entry is written to the cache store afterwards.

Finally, a data grid should support activation and passivation via a cache store. Passivation occurs when an entry is evicted from the data grid. It is written to the cache store and deleted from the data grid. Activation occurs when an entry is read after having been evicted. It will be activated. It will be read from the cache store and written to the data grid.

JBoss
Data Grid
Oracle
Coherence
File System Cache Loader / Store Y Y
Database Cache Loader / Store Y Y
Read-Through Y Y
Write-Through Y Y
Write-Behind Y Y

Management

A data grid should support management and monitoring options via the Java Management Extensions (JMX) API with applications such as JConsole or VisualVM.

In addition, a data grid should support an eventing model. For example, when an entry is created, updated, or deleted the data grid should should be able to treat the call as an event. Further, applications should be able to register both synchronous and asynchronous listeners with a data grid so that the can be notified of events. If a synchronous listener has been registered with the data grid, a call (e.g. put / delete) will not return until the listener has been notified.  If an asynchronous listener has been registered with the data grid, a call (e.g. put / delete) will return before the listener has been notified. The listener will be notified afterwards.

JBoss
Data Grid
Oracle
Coherence
Distributed Java Management Extensions Y Y
Events / Notification Y Y
Synchronous Listeners Y Y
Asynchronous Listeners Y Y

This post is the second in a series introducing the concepts of data grids and JBoss Data Grid itself.

  1. Data Grid – Cache Evolved
  2. We, Data Grid
  3. Data Grid, JBoss Data Grid
, ,

About Shane K Johnson

Technical Marketing Manager, Red Hat Inc.

View all posts by Shane K Johnson

8 Comments on “We, Data Grid”

  1. RK Says:

    How about performance numbers?

    Reply

  2. guest Says:

    I’m also interested in this question too RK. JDG lack of references that could corroborate performance majority against Coherence. Coherence in the other hand has a lot of public cases that shows how scalable, reliable and fast it is. It is the word’s first in-memory computing platform of the world, so this blog doesn’t offer credibility at all, mostly because Shane is a marketing guy from Red Hat.

    He’s just using a old marketing technique to improve reliability of their offers product, comparing it with another one which is leader in its industry, like Coherence. Comparing with Coherence would pass the idea of “JDG is so good just like Coherence, so instead of buying Coherence, buy from Red Hat” but in fact it is not true. JDG should implement A LOT OF features to be comparable with Coherence.

    Reply

    • Shane K Johnson Says:

      I published the results of a performance test (JDG 6.0.1) last December (link). I have written a technical white paper that includes the results of a number of performance tests (JDG 6.0.1). However, it is awaiting publication. I expect it to be made available via the Red Hat Customer Portal. In addition, I will be publishing the results of a few performance tests (JDG 6.1) executed on better hardware to How to JBoss within the next two weeks.

      I executed the performance tests with RadarGun (link), an open source project for data grid performance testing. When I published the results, I provided both the RadarGun and the JDG configuration files. The best way for an organization to select a data grid based on performance and reliability is to configure and execute their own performance tests based on their own requirements in their own controlled environment.

      Tongosol Coherence was an innovative product in its day, but that day was several years ago. I do not question that it remains reliable. However, there have been a number of advancements in distributed systems over the past few years. JBoss Data Grid brings together the reliability of the previous generation of data grids and the innovation of the next generation of data grids.

      Red Hat public references include both Chicago Board Options Exchange (CBOE) and Cisco, and they have both presented at Red Hat Summit / JBoss World. I can’t think of an environment with higher demands for both performance and reliability than financial trading. The Pentaho BI Platform / Server includes a plugin for Infinispan (link). There is no Oracle Coherence plugin.

      Are you stating that because the company you work for (Oracle) productized (well, acquired) a data grid before the company I work for (Red Hat) and that my role is now in marketing, I lack credibility in the data grid domain? I would advise against such a statement. My technical knowledge of data grids is second to none, and it is not derived from my role in marketing. I have worked with a number of enterprise organizations in the financial, telecommunication, and media sectors in a developer / architect capacity in my previous role to integrate data grids in demanding environments.

      How am I “improving the reliability” of JBoss Data Grid by identifying the functionality that both JBoss Data Grid and Oracle Coherence provide? Do you not believe that JBoss Data Grid has implemented A LOT OF features? The functionality descibed in this post represents nearly all of the features and benefits listed in the Oracle Coherence data sheet (link). JBoss Data Grid lacks a few features provided by Oracle Coherence. Oracle Coherence lacks a few features provided by JBoss Data Grid. Would you say that Oracle Coherence has not implemented A LOT OF features because it lacks a few features provided by JBoss Data Grid?

      Reply

      • guest Says:

        “I published the results of a performance test (JDG 6.0.1) last December…”

        Those results compares JDG against Terracotta from Software AG, not with Coherence from Oracle. You cannot say at all that JDG is better than Coherence because you’ve never tested. Again, not reliable statements coming from you. You’ve tried to use a Terracotta comparison to generalize JDG performance results. Lets call Oracle, VMware, IBM, Gigaspaces and TIBCO to participate of the tests.

        “Tongosol Coherence was an innovative products in its day, but that day was several years ago”

        For some unique features of Coherence like its non-blocking I/O TCP/IP network based on TCMP, which allow it to achive better results with distributed transactions, fail-over detection (the fastest of the industry) WAN replication with latency issues due geographical distribution and the HTTP Session offload from AppServers. Not mentioning integration with A LOT OF AppServers like WebLogic, GlassFish, Websphere, Tomcat, IIS, Resin and even your JBoss AS. JDG only gives support for which is from Red Hat. What a nice example of being “open” hãm ?! :)

        “Red Hat public references include both Chicago Board Options Exchange (CBOE) and Cisco, and they have both presented at Red Hat Summit / JBoss World”

        Only this? Coherence has thousands of customer references, including mission critical ones that for years NEVER, I mean, NEVER restarted their servers. Come on, you can do better than this. Red Hat (you) should be a little bit more humble when talking about leaders like Oracle. Someday Red Hat will be a huge company, I don’t doubt that, but that didn’t happened so far and will take some time.

        “I have worked with a number of enterprise organizations…”

        Oh yes? Give me examples of data grid technologies you’ve worked with, scenarios of data partitioning and JVM tuning you’ve implemented for, entity domain versioning strategies you’ve designed it, hashCode algorithms strategies that you’ve proposed for a complex based key node, examples of KPIs that you retrieved from JMX and from the DG, and of course, examples of the following DG scenarios: average latency less than 600 microseconds, 5k TPS or higher considering a transaction with a minimum of 15KB of size, client applications both based on Java, C++, .NET and “the rest of world” that could be accessed with REST or SOAP, projects with more than 20K hours of duration (real one projects) instead of stupid POCs, usage of at least three serious data grids technologies including Coherence, GemFire, Websphere eXtreme Scale, Gigapaces, TIBCO ActiveSpaces, etc.

        “Do you not believe that JBoss Data Grid has implemented A LOT OF features?”

        No! It just had integrated a couple open-source existing technologies into a new ecosystem and productized in a minimum level to take some money from the customers with subscriptions. Nothing really new, innovated, creative or respectable. The type of thing Red Hat likes to do: take existing technologies, combine them and make some money.

        “nearly all of the features and benefits listed in the Oracle Coherence data sheet…”

        You really knows to play with words, starting with the usage of the word “nearly” :)

        You forgot some key features that only Coherence has like: Elastic Data (off-heap and SSD storage of data), distributed GC against any type of storage and cache layout, ability to handle thousands of GB being able to handle even terabytes of data. Don’t came say to me that with on-heap allocation and regular JVM like HotSpot (or OpenJDK which is even worse) you could allocate terabytes of data. Native SDKs for C/C++ and .NET, Continuous Queries, support for many AppServers rather than only JBoss, integration with Java EE 6 using @Resource annotation, monitoring and management capabilities both integrated with the product and with other external tools like Enterprise Manager, integration with CEP world to enrich events and being the clustering enabled mechanism to handle fail-over scenarios, security features that could deal with scenarios of authentication, authorization, SSL and load-balancers (Eg: BigIP) integration. Pre-built filters and a powerful query language that could make easier for the developers to interact with the cache instead of force them to write Java code, support for Hibernate, Toplink, EclipseLink, GoldenGate, etc. Thousands of pre-implemented scenario patterns in the product and externally with the incubator strategy started by Tangosol and now owned by Oracle. Oh and of course: support for a high performance serialization strategy and a highly scalable TCP/IP implementation like TCMP. Not mentioning that support for InfiniBand based networks.

        “Oracle Coherence lacks a few features provided by JBoss Data Grid. Would you say that Oracle Coherence has not implemented A LOT OF features because it lacks a few features provided by JBoss Data Grid?”

        All of the “unique” features provided by JDG are not considered by real customers, independent analysts like Gartner, Forrester and IDC as really important. Are features that just align with the Red Hat strategy to force its entrance in the Big Data world, which on the other hand is a terrible strategy because to a real Big Data strategy Red Hat lacks A LOT OF technology stacks compared with real Big Data vendors like Oracle, EMC and IBM. Just an example, even Oracle does not consider Coherence as its Big Data strategy. When Oracle talk about Coherence, they’re talking about caching, grid and in-memory computing scenarios, which fits perfectly to elastic data grid technologies.

  3. Satish Kale Says:

    Awesome Shane. Its amazing how the JBoss Data Grid provides feature parity at fraction of cost.

    Reply

  4. Anonymous Says:

    Hi Shane,

    You mention JBoss Data Grid supporting distributed tasks and parallel processing but I can’t find this in the docs. Is this supported now or is it an Infinispan only feature?

    Steve

    Reply

    • Shane K Johnson Says:

      Steve, they are technical preview with 6.0.1. However, they are supported in 6.1 and the beta is available for download.

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 117 other followers

%d bloggers like this: