Organizations have been collecting, processing and storing data in a variety of formats for over fifty years.  Early on, computing resources were scarce, complex and expensive.  Over time, Moore’s Law proved to be correct, and processors, memory and storage all became bigger, faster and cheaper every two years or so.  As storage became cheaper, many organizations built data warehouses (central repositories of data), so they could begin to evaluate their business based upon what happened yesterday.  These data marts have typically been based upon the facts and figures (quantitative data) captured in various siloed applications.  These data warehouses have been expensive to build, hard to maintain and require specialized resources to implement them.

Here's where it starts getting interesting.  Consumers have spent the last twenty five years purchasing desktops, laptops, tablets, MP3 players, and video set top boxes so they could connect them to the Internet and store more and more data.  They buy every next generation smart phone to stay connected to work, family and friends 24 x 365.  People have no problem facebooking their latest selfies and tweeting about their love for their latest purchase (qualitative data).  And companies are retaining all of it.

Data Hoarders

Businesses now collect and keep data on everything, and they never delete it!  What if A&E Hoarders walked into your business and told you they were doing a story on data hoarding; how would you respond?

Where are businesses going with all this data?  Technology that was scarce, complex and expensive has become plentiful, abundant and cheap.  Data that has specific relationships can be loosely coupled and integrated together so Data Scientists or Business Analysts can discover relationships they didn’t know existed.  New technologies have emerged and matured over the last decade that can bridge quantitative information and qualitative information into extremely beneficial insights which are predictable and actionable.

No Slow Down in Sight

The amount of data businesses capture and store doesn’t look like it’s slowing down anytime soon.  In fact, it continues to gain momentum.  The latest Internet marketing buzz the "Internet of Things" (IoT) suggests tens of billions of things (embedded devices, including smart objects) will be connected via Internet Protocol by 2020.  Additionally, these things will transfer information back to an infrastructure at specifically defined intervals.

Blockbuster: Thingy – The Movie

To tie all these devices on the IoT together, let’s create a fictitious movie called Thingy.  The movie is based on the shenanigans of the main character, Thingy, and Thingy's friends Red Fox, Red Robin and Blue Fish.

The movie is the box office success of the year.  Consumers (parents and kids) are tweeting their likes and dislikes about the characters.  Based upon these tweets, the movie studio learns that all consumers like Thingy, parents like Red Fox and Red Robin, and kids like Blue Fish.  To the business, this information provides insights into sales and marketing campaigns (i.e., which characters will sell faster) and can be integrated into the overall supply chain.

Thingy is loved by all, and parents are prepared to purchase Thingy products for their kids. Blue Fish has a rough side, though, and parents are tweeting about Blue Fish's over-the-top antics.  They won’t support their kids having Blue Fish items.  All of this data is stored in various legacy and modern applications and has been integrated together to provide these types of insights. Data Scientists will translate this simple example into various forms of predictive analytics, utilizing their legacy products and services life cycle.  These insights could predict how much blue dye will be needed to mold specific Blue Fish products, for example, and how many more Thingys should be kept in inventory.

Let’s add one more twist to the characters.  Thingy was sold with an embedded Internet enabled radio frequency identification number and a couple of sensors inside the plastic body.  Over Thingy’s life span, a bunch of data is collected: where he was purchased, when he was first turned on, how often he was picked up, how many times he was wet, what time of the day was he played with most, where is he located, where has he visited...  The amount of information goes on and on.  Every bit of this data is extremely useful to the business as they evaluate their next toy, what features were important and which ones weren’t.  This information also becomes very useful to toy collectors 100 years from now.  This toy has a story and can be correlated with all the other toys, so an even bigger story can be shared and predicted about the future.

Data Science and The Big Data Deal

Instead of evaluating business functions based upon what happened yesterday, businesses should link and integrate data sources together, amassing information over time. They should leverage legacy siloed applications and new applications together, so Data Scientists or Business Analysts can see what’s happening in real-time, and predict what’s going to happen in the business tomorrow. Predictive analytics is here to stay.

No matter how you look at it, that’s the Big Data deal!

An IT pro who'll take the time to learn my business. Is that too much to ask?



Big Data is surely a big deal. We definitely are seeing an increase in activity with companies responding to the impact big data has made on their business. For companies any size, getting meaningful insights from data analytics is an important priority. LexisNexis has open sourced its HPCC Systems big data platform which represents more than a decade of internal research and development in the big data analytics field. Designed by data scientists, their built-in libraries for Machine Learning and BI integration provide a complete integrated solution from data ingestion and data processing to data delivery.