Big top data and the yellow elephant

Big Top Data and the Yellow Elephant

Introducing “big data:” what it is, how it works and what it could mean to you

by Dan Dubriwny

It’s a circus out there, and almost everyone is talking about big data.

The term "big data" implies that, up until now, all of our data was small. Which is utterly ridiculous. After all, big data has been around for a long time, but what is changing are the tools. Big data platforms offer us the capability to ingest, store and query more data than ever before. The size of that data ranges from a few terabytes to hundreds of petabytes.


Big data, weird data

Not only is the data big, it's weird. It's not nice and neat, or easily transformable into rows and columns. It's messy and noisy, and does not play well in standard data environments. And it can come from all kinds of sources like Web-clickstream, networks, sensors, videos, blogs and social media. How do you make sense of it? How do you store it, search it and combine it? It's a three-ring circus. Perhaps we should call it Big Top Data!


In the summer of 2010, I started learning about Hadoop (The Yellow Elephant). People thought I was nuts. That opinion hasn't changed much. I'm still nuts, but now I can tell you, Hadoop was created by Apache Software Foundation chairman Doug Cutting, who named it after his son's toy elephant – and because it was thought to be easy to pronounce and simple to find using a search engine.


How do you make sense of it? How do you store it, search it and combine it? It’s a three-ring circus.


The first time I said "Hadoop" in a presentation, a friend of mine said "bless you." It's not so funny anymore. Hadoop is a massively parallel file system that runs on inexpensive, easy-to-replicate hardware, and it is catching on. IBM is extending the capabilities that are required for enterprise deployments, thereby reducing risk for customers who are executing a big data strategy. More importantly, we are focused on analytics that will be executed directly in Hadoop. It is the analytics that will drive business value from insights that have not existed before.


Just three Vs

How can you tell if a project would benefit from big data technology? Use the concept of the three Vs: Volume, Velocity and Variety.


  • Volume is obvious. If a project requires hundreds of terabytes or multiple petabytes, big data should be considered
  • It isn't just real time that defines Velocity; it is the speed at which answers are or remain relevant that is important.
  • Variety is the messy and weird nature of big data. If the data is unstructured and/or not derived from traditional online transaction processing, it is most likely a candidate for big data.


If you identify one of the Vs, you should consider a big data solution. If you have two of them, the Big Top Yellow Elephant should most certainly be on your short list.


Big data in application: social media

To illustrate a big data application, let's examine how social media can be used to monitor and predict consumer behavior. There are 12 terabytes of tweets and ten terabytes of Facebook entries every 24 hours. AT&T pumps 24 petabytes of data through their network on a daily basis. Would you bet that big value can be derived from that information? In a lab in a not-so-secret location in California, a big data platform is analyzing just a fraction of social media data from Twitter, Facebook, blogs and forums. All of the information is public, which means there is no mining of personal information. After running the system for a few months and examining the text that is part of each entry, more than 65 million profiles were created in North America alone. The methods used to extract those profiles can be tuned for just about any industry.


Add to that real-time analysis and the system can predict consumer behavior: from intent to commit fraud to intent to purchase. The profiles are used to analyze who is influential in their social networks. Therefore, the analysis of those profiles helps predict who will bring pre-referenced prospects to just about any product or service. The same technology and methods can be used to monitor reputational risk. In order to do that, the fast processing of a vast amount of information is the key, not just in determining "likes" and "dislikes," but understanding why consumers are thinking and acting for or against a brand.


AT&T pumps 24 petabytes of data through their network on a daily basis. Would you bet that big value can be derived from that information?


Waking up and asking questions

Retailers are waking up to social media in a big way. Augmenting the information that credit card processors already have with social media profile extensions and behavioral analysis is a business expansion area that will not be ignored. Retailers are looking for this information now and someone will step in and provide it – or they will do it themselves.


Another example: A credit card processor knows when a retail store credit card is close to its limit. With a big data platform, that credit card processor can also predict customer loyalty. Listening and analyzing social media offers those insights, and does it quickly enough to act upon them. Banks that issue prepaid cards are facing new restrictions and regulations. New insights from social media could help target the right customers, and credit card processors could be the hub of this new analysis.


Exploring how social media impacts your business should be a priority. The key word is "exploring." Big data systems offer a place for exactly that: exploring seemingly unrelated and impenetrable data with new techniques, processes and technology to find insights that turn into unique value. It will be the basis for new and innovative businesses. What was, months ago, head scratching and chin rubbing is now becoming a strategic initiative for an ever increasing number of organizations – especially retailers, banks and other financial institutions. Those strategic initiatives will result in increasing customer service, improving profitability and setting a foundation for responding to competitive demands.


Is it time to join the circus?

Article Archive

by topic

Consumer Behavior
Emerging Payments
Game Changers
New Trends

by issue

About the Author

Dan Dubriwny has more than 31 years of experience in the computer industry, and is a member of the IBM Information Management World Wide Big Data Tiger Team, which is responsible for spreading the Big Data story and driving new business solutions with Big Data technology. Dan has deep experience in Business Intelligence, Data Warehousing and Analytical Applications.