The Structure of Big Data

January 31, 2013

Big Data

First things first, all data is more or less structured. That being said, there is…

  • Structured Data
  • Semi-Structured Data
  • Unstructured Data

I tend to think of it as: data, composite or simple, with or without content. In that context, email is structured composite data (from, to, subject, date) with unstructured content (message body). The composite data is structured. The content is unstructured. Though simple data may or may not be structured. The ‘subject’ data is unstructured. The ‘to’ data is structured. It is composed of a local-part (username) and a domain.

While content is unstructured, there may be an implied structure.

So what is the difference between structured data, semi-structured data, and unstructured data?

Structured Data

The structure is externally enforced. Data

  • The data is stored in a database.
    • Relational
      • Transaction
    • XML
      • Catalog

The data itself is not structured. The structure is defined by the database. The data would be semi-structured if it was exported and transformed into JSON or XML.

Semi-Structured Data

The structure is self defined. Data and / or Content

  • The data is stored as text.
    • JSON
      • User Profile
    • XML
      • Application / Form

Unstructured Data

The structure of the data is externally defined. Metadata and Content

  • The data is stored in a binary format and / or document.
    • Media (e.g. Ogg).
    • Video (e.g. Vorbis).
    • Audio (e.g. Theora).
    • Image (e.g. PNG).
    • Microsoft Word
    • Adobe PDF

The structure is defined by the file format. The data is composed of structured metadata and unstructured content. That being said, a video is composed of frames and an image is composed of pixels.

  • The data is stored as plain text.
    • Log

The structure is defined by a pattern in the logging configuration file. The data is composed of structured metadata (e.g. severity) and unstructured content (log message).

  • The data is user generated.
    • Status (Facebook)
    • Tweet (Twitter)
    • Comment (WordPress)

The structure is defined by the application / form. The data is composed of structured metadata (e.g. user ID) and unstructured content (user message).

Update

I would say that the structure of content is user defined and thus interpreted. However, content is often a component of data (unstructured, externally defined). Though if I typed up this post in gedit and saved it as a text file, that might constitute content independent of data.


With all all that said, the structure of data is not exactly black and white.

About Shane K Johnson

Technical Marketing Manager, Red Hat Inc.

View all posts by Shane K Johnson

4 Comments on “The Structure of Big Data”

  1. Rick Wagner Says:

    Red Hat (through long-time expatriates from a donor entity) has known about Big Data since before it was called Big Data. Several of our engineers contributed to the design and construction of what is arguably the largest demographics database in the world.

    Rick, GSS engineer and credited contributor to “Data Engineering: Mining, Information and Intelligence (International Series in Operations Research & Management Science)”

    Reply

Trackbacks/Pingbacks

  1. Structured, unstructured and everything between « ModeShape - January 31, 2013

    [...] gives a good breakdown of the various ways to classify data as structured or unstructured. He points out that very often [...]

  2. The Structure of Big Data | Code: Big Data | Scoop.it - February 8, 2013

    [...] First things first, all data is more or less structured. That being said, there is… Structured Data Semi-Structured Data Unstructured Data I tend to think of it as: data, composite or simple, wit…  [...]

  3. The Structure of Big Data « A Data Head's Diary - February 17, 2013

    [...] came across this article through myNoSQL. The author does a good job of  giving examples of and [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 118 other followers

%d bloggers like this: