A lot of business decision makers hear stories about how they can use Big Data analysis to improve their business results but hesitate to join the party. It may be that they don’t fully grasp the potential benefits of handling massive amounts of structured and unstructured data, which simply cannot be effectively managed using traditional databases and software. Or expense trepidation may be a factor; perhaps they fear that Big Data must be code for “Big Cost”.
The reality is companies of all sizes can use Big Data analytics to make better decisions, control risk, discover new product opportunities, or improve customer service – just to name a few. And the expense associated with getting started need not be large. Part of the TDK consulting process begins by urging clients to start small and focus on results.
“Big Data management is about collecting, processing, and utilizing data from various domains so they can be analyzed to uncover patterns and trends,” said Larry Steele, TDK Director of Data Solutions. “Then repeatable algorithms are created that look at how data is being utilized across various solutions to pro-actively predict and act on it in an automated way. You can also simply streamline basic reporting for up-to-the minute information on revenue and costs to make informed decisions that quickly affect bottom line performance.”
Options for Getting Started with Big Data Analysis
Starting small is possible because Big Data can be accessed and stored efficiently using commodity style hardware and inexpensive open source software like Hadoop. When Hadoop is in place, legacy programs like SAS, Business Objects and other programs make predictive analytics of vast and varied data types possible.
The first step in using Hadoop is to understand the infrastructure cost options, including data storage. There are three main data storage options, outlined below with the basic elements of an entry level approach for nearly any market segment.
- On-premises infrastructures involve an upfront capital investment ranging from $7,500 - $12,000 and require enough on-site space for the data using the following off-the-shelf hardware:
- 4 servers minimum
- 6 terabytes (TB) of local storage space on each server
- 1 gigabyte (GB) network interface card
- High performance switch
- High performance firewall
- Data security strategy so users and programs have appropriate data access
- A well trained staff to support the infrastructure and Hadoop
- Cloud-based infrastructures have the same or similar configuration as on-premises set-ups and are favored by many integrated data solutions experts due to its quick availability, monthly operational expenses of $4,000 - $10,000 (depending on the cloud provider) and no long term commitments. Overall, this is good option for Proof of Concept or short term projects.
- Managed hosting infrastructures also have similar configurations. However, the equipment and data is dedicated to the customer, stored at an off-site facility and managed by the hosting company. The monthly costs are approximately $10,000 - $15,000 per month on a fixed 36-month term for essentially outsourcing all the infrastructure, hardware, and software with 24 x 7 support.
“In the short term, to get off the ground, the cloud may be the most economical way to go,” Steele said. “But if you are going to leave data in the cloud over a long period of time and add sophistication, bringing data in house may prove to be more economical than continuously paying cloud fees.”
The Impact of a Big Data Investment
Investing in a foundation for Big Data analysis opens up possibilities to new business insights by providing a way to analyze both large amounts and different kinds of data. A McKinsey & Company report showed that in 15 of the U.S. economy’s 17 sectors, companies with 1,000 or more employees store, on average, more than 235 terabytes of data. This includes market segments like healthcare, banking, financial services, manufacturing, construction and retail. That 235 terabytes represent more information than is contained in the entire Library of Congress! A Big Data infrastructure provides a pathway for businesses to “do something” with all that information.
The different types of data can be classified in the following way:
- Structured data is located in a fixed field within a record or file (database, mobile apps, RFID, voice/video/e-mail)
- Unstructured data is information that lives outside a traditional row-column spreadsheet (images, web code, documents, Twitter)
- Third-party data (GPS, weather, stock market updates)
Before the idea of Big Data, analysis was confined to reviewing structured data in relationship-oriented databases. But with a Big Data solution in place, information that can’t be placed neatly inside the rows and columns of a database with clearly defined relationships can be analyzed to provide useful insight. For example, Hadoop can be configured with programs capable of reviewing unstructured data from new media sources like Twitter to make predictions about behaviors that drive customer revenue.
The best approach to handling the differences in data types is to be selective. Selecting the right data and loosely coupling that data will have a profound impact. Companies that have done so are beginning to organize research & development, product execution, sales and other business functions around data.
“Companies that have taken a foray into harnessing and using data have seen results to the point where business strategies are being changed. That’s a pretty radical shift,” Steele said. “It’s been interesting to hear the dialogue and watch the transformation of people who have being doing this for 20-plus years. They used to believe that only information in databases with pre-defined relationships can be analyzed. But it doesn’t need to be that way. The world works differently now.”
Setting up the infrastructure and then extracting, transforming and loading data into Hadoop can seem daunting. Don’t let that keep you from investing in the infrastructure for a Big Data management plan. It is possible to start small. And businesses that compile varying types and amounts of information reap the benefits of analyzing the data to improve profitability.