Today we’re going to talk about Big Data, and what to do with it. Basically, Big Data is all the thousands or millions of facts and figures about your business that you’re starting to collect. Clicks on ads, retweets, the fact that the blue socks sell better on Tuesday evenings – all those factoids add up.

Hadoop is a software framework for dealing with all of that data. Hadoop is from Apache, and its original components were invented by Doug Cutting based on a 2004 paper about Google’s MapReduce data processing software.

Hadoop is good for storing and retrieving large amounts of data in parallel from disjointed (read: cloud) storage devices. So, it’s great for storing your customers’ non-sensitive profile information, all of your transaction histories, etc. It’s also great for cloud computing: it seamlessly handles data that’s sitting on multiple physical storage devices, and it’s designed to keep chugging along even if a significant portion of its servers are knocked out. It’s not so great for performing complex calculations on small amounts of data, and it’s definitely not great for things like financial transactions that can get messed up if executed in parallel. But it can store all the information related to those financial transactions, so you can do things like track spending patterns to identify fraudulant purchases.

So, good stuff! Venturing into the world of Big Data? Keep track of it all with Hadoop.

Head over to Apache’s website for all the technical details.

By Sharon Campbell