Faster Big Data

August 29, 2013

Big Data, Data Grid, NOSQL

Volume is one thing. Velocity is another.

Apache Hadoop
It is the big data platform for storing and processing lots of data, later.

What about storing and processing a subset of that data, now, by…

  • reducing the length of time it takes to store the data.
  • reducing the length of time it takes to process the data.

Data Grid

Store the data in a data grid, in memory.
Process the data within a data grid, in memory.

Big Data Stack

The data can be stored in a data grid and in a big data platform. The data can be stored in a data grid, and the data grid can persist the data to a big data platform. It can do so asynchronously to maintain performance, or it can do so synchronously to guarantee consistency.

The data grid can rely on expiration to maintain a finite set of data. For example, the data grid can store current data (e.g. five business days) and the big data platform can store the historic data.

If an analyst needs to analyze current data…
the analyst can submit a map / reduce job to the data grid.

If an analyst needs to analyze historic data…
the analyst can schedule a map / reduce job with the big data platform.

dg_bg_stack

Use Case

Investment Banking & Risk Management

To analyze portfolio data at the close of business, every day.

The working set is portfolio data for the day. It does not include, for example, portfolio data for the year. The sooner the data is analyzed, the better. As a result, it does not make sense to analyze all of the portfolio data in a big data platform or to schedule a map / reduce job with batch processing. It is more efficient to limit analysis to the working set, in memory, and to submit a map / reduce job for near real-time processing.

,

About Shane K Johnson

Technical Marketing Manager, Red Hat Inc.

View all posts by Shane K Johnson

2 Comments on “Faster Big Data”

Trackbacks/Pingbacks

  1. Faster Big Data | I can explain it to you, but ... - August 30, 2013

    […] Apache Hadoop. It is the big data platform for storing and processing lots of data, later. What about storing and processing a subset of that data, now, by…  […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 106 other followers

%d bloggers like this: