Big Data is a “Wild Wild West” movement, with companies in all industries trying to figure out whether to invest in these supercharged analytics projects, and how. The CEO wants the game-changing results he’s been reading about in the press. Yet there are significant risks when it comes to IT investment and human resources. If the CIO gets it wrong, a company could be out millions of dollars.
First, CIOs and IT managers should step back and evaluate the broader goals of the big data program and then determine how and if new technology can help meet those goals. Here are some questions to consider when designing a Big Data system:
Real-time or not?
Do you need real time (on the order of seconds) responses from the Big Data system or insights that are “batch processed” and reported on defined intervals such as several minutes to hours? Do you need both?
Although many vendors say that their system will deliver results in “real time”, CIOs need to validate this capability under many different system conditions, such as loading data, running many concurrent jobs, and many users accessing the system at once.
What is the end goal?
Do you need the ability to continuously iterate on the analytic models and perform exploratory investigation of data as the business comes up with new questions? Or is the business looking to obtain well-understood metrics, such as clickstream analysis? The former will require a more flexible, customized system and trained data scientists, whereas the latter is something that a packaged solution or service provider can fulfill.
Do I need a Big Data Platform or a Big Data Solution?
Typically, if business leaders are looking for information around a concrete area, such as customer retention or clickstream analysis, you can purchase a Big Data solution that provides the platform and the analytics bundled as one integrated system. This saves time and money.
If the business is looking for an exploratory analytic system, you’ll need to invest in a Big Data platform that offers the tools to store and process the data, but doesn’t provide insights out of the box. Hadoop is a Big Data platform which gives you the parallel analytic processing and distributed, redundant storage – but you still need to build (or buy) and integrate the other components together, such as ETL, analytics and reporting.
Here’s another way to look at the technology choices:
Best of Breed aka Platform: A company can acquire each technology component from the best vendor in each category and build the applications themselves—a costly but highly flexible and powerful option for a company that is serious about Big Data.
Packaged applications: A company can purchase a pre-integrated Big Data suite from a single technology vendor—a cost-effective choice when the business is asking for insights into one area, such as online customer behavior.
Software as a Service: SaaS vendors will collect and store the data, house the infrastructure and do all the heavy-lifting analytics and data management for you—avoiding the hefty capital investment required in hardware, software and staff.
These aren’t the only questions to consider, of course. Staffing needs, process changes and metrics programs are also critical. But building the right technology foundation from the get-go will help CIO build a framework for success with any Big Data project that comes his or her way.
Molly Stamos is Director of Products at Boundary.