Big Data and the Flying V

January 29, 2013

Big Data, Data Grid, NOSQL

Big Data in Theory

What is it? It’s big data. Right?

I’m not sure if I like the term Big Data. I think it’s right up there with the term Cloud.

I do, however, like the framework created by Doug Laney: Volume, Velocity, and Variety. It’s the de facto description of Big Data, and it predates the Big Data phenomenon. That, and I like both alliteration and the KISS principle. Who doesn’t?

Here is my, albeit short, interpretation of the 3Vs.

Volume – More data.
Velocity – Data (in), faster. Information (out), faster.
Variety – More data sources and / or formats.

What about the Flying V?

Thinking about the 3Vs reminded me of the Flying V.

Then it occurred to me…

The Flying V worked in The Mighty Ducks. Yes, I watched The Mighty Ducks. It did not work in D2. Yes, I watched the sequel. No, I did not watch D3. I can only hope that it did not do to The Mighty Ducks what Alien 3 did to Alien.

Update It’s come to my attention that not everyone has seen The Mighty Ducks. The Ducks are a youth ice hockey team. I’ve been told that ice hockey is not the only hockey. Really? The Flying V is their trick play. It’s like how the option offense in college football (NCAA) does not work in professional football (NFL).

The 3Vs are a valid description of Big Data in theory, but they are not a valid description of Big Data in practice. Perhaps it is because they state the obvious, hint at the problem, and do not mention the solution.

Big Data in Practice

Volume

Volume is addressed with distributed storage using a shared nothing architecture on commodity hardware.

Examples

  • Distributed File System – Red Hat Storage, Hadoop Distributed File System
  • NoSQL – MongoDB
  • In-Memory Data Grid – JBoss Data Grid

Velocity

Outgoing information is generated faster with parallel processing in the form of batch processing (e.g. map / reduce), near real-time processing (e.g. distributed tasks), and real-time processing (e.g. stream processing).

Examples

  • Map / Reduce Tasks – JBoss Data Grid, NoSQL, Hadoop MapReduce
  • Distributed Tasks – JBoss Data Grid
  • Stream Processing – Storm  / S4

Data Locality

Volume and velocity are often two sides of the same coin. Incoming data is stored faster using distributed storage. While outgoing information is generated faster with parallel processing, it is often done in conjunction with distributed storage via data locality. The parallel processes are executed on the distributed storage nodes.

Examples

Apache Hadoop (HDFS + MapReduce), JBoss Data Grid

Variety

Variety is addressed with NoSQL for structured / semi-structured data and distributed file systems for unstructured data.

Examples

  • Key / Value Store – JBoss Data Grid
  • Document Store – MongoDB
  • Column Oriented Store – Apache HBase (Hadoop)
  • Hierarchical Store – ModeShape

Additional Thoughts

It’s true. I liked The Mighty Ducks. I was a kid. That being said, it’s not The Goonies. If The Goonies is on television, I watch it for the nth time. If The Mighty Ducks is on television, I put in Serenity (BD) and watch it for the nth time.

Alien and Aliens are two of the greatest films ever. Period.

About Shane K Johnson

Technical Marketing Manager, Red Hat Inc.

View all posts by Shane K Johnson

2 Comments on “Big Data and the Flying V”

  1. rettori Says:

    I’ve recently gone though an article from IBM which mentioned a 4th V, Veracity.
    Bottom line it is about how much you can trust certain data and it also deals with data that constantly changes, such as weather forecast.
    Caught my attention.

    https://www14.software.ibm.com/webapp/iwm/web/signup.do?source=csuite-NA&S_PKG=Q412IBVBigData

    “Veracity, the fourth “V”
    Some data is inherently uncertain, for example: sentiment
    and truthfulness in humans; GPS sensors bouncing
    among the skyscrapers of Manhattan; weather condi-
    tions; economic factors; and the future. When dealing
    with these types of data, no amount of data cleansing can
    correct for it. Yet despite uncertainty, the data still
    contains valuable information. The need to acknowledge
    and embrace this uncertainty is a hallmark of big data.
    ¨

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 115 other followers

%d bloggers like this: