Cloud computing helps organizations store, manage, share, and analyze their Big Data in an affordable and easy-to-use way. Today’s cloud Infrastructure-as-a-Service (IaaS) providers such as Microsoft, GoGrid, Amazon, Google, Rackspace and Slicehost, supported by the on-demand analytics solution vendors, make Big Data analytics very affordable.

Most corporate enterprises don’t fully leverage their data. Data is usually locked in multiple databases and processing systems throughout the enterprise. However an aggregate view of all the data is sometimes needed to answer tough questions from customers or analysts. Most importantly, by analyzing their Big Data trends, statistics, and other actionable information to help decide on their next move, companies can grow their business by uncovering important information. Take for example Google whose success can be attributed primarily to its ability to analyze large amounts of data. In fact, Google developed a software framework called MapReduce to support large distributed data sets on clusters of computers. MapReduce has the advantage of processing structured and unstructured data. A paper published by Google engineers, “MapReduce: Simplified Data Processing on Large Clusters,” clearly describes how MapReduce works. As a result of this paper, many open source implementations of MapReduce emerged between 2004 to the present. There are open source software instances that leverage MapReduce, such as Hadoop, an infrastructure that helps the construction of reliable, scalable, distributed systems.

So how does Hadoop, MapReduce and cloud computing come together to solve the Big Data problem?

Hadoop consists of two key services: reliable data storage using the Hadoop Distributed File System (HDFS) and high-performance parallel data processing using MapReduce. Think of  MapReduce as the engine that  brings speed and agility to the Hadoop platform. With MapReduce, developers can create programs that process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers. Hadoop allows enterprises to easily explore this complex data using custom analyses tailored to their information and questions.

Hadoop runs on a collection of commodity, shared-nothing servers. Hadoop self-restores, meaning you can add or remove servers in a Hadoop cluster at will; the system detects and compensates for hardware or system problems on any server. It can deliver data — and run large-scale, high-performance processing jobs — in spite of system changes or failures.

Many tools are built using Hadoop as the foundation, for example: open source support tools such as Thrift and Clojure and dozens of commercial solutions such as Appistry, Cloudera, Goto Metrics, Karmasphere, and Talend. Also the three main database vendors – IBM, Microsoft, and Oracle – all support Hadoop interaction in different ways.

However, the Big Data problem is not just all about size of the data; it is also about performance and how fast can data be processed.

Businesses also have another path to Big Data analytics: the cloud. Cloud services for Big Data are popping up, offering platforms and tools to perform analytics quickly and efficiently.

Take for example cloud computing platforms like Amazon EC2, on which you can rent virtual Linux servers, and then introduce the open source Hadoop, which will be built onto the virtual Linux servers to establish the cloud computing framework.

Amazon EC2 is playing the role as the IaaS and provides users virtualized hosts. IaaS  is the leasing of infrastructure as a service with specific quality-of-service constraints that has the ability to execute certain operating systems and software. PaaS or Platform-as-a-Service focuses on the software framework or services, which provide the ability of APIs to “cloud” computing on the infrastructure. Hadoop plays a role as PaaS and is built on the virtualized hosts as the cloud computing platform. However, Hadoop is not restricted to be deployed on VMs hosted by any vendor; you can also deploy it on normal Linux OS on physical machines.

In conclusion, as costs fall and companies think of new ways to correlate and analyze data, Big Data analytics will become more common. Small businesses will especially benefit given their low-cost ability to manage and analyze Big Data. Recall that Google and Facebook were all once small companies that leveraged their data to grow significantly. No wonder many of the foundations of Big Data came from the methods these businesses developed.

Tagged with:
 
The State of Cloud Computing Around the World: China
China is the newest country to see the massive potential of cloud computing, and the nation is now throwing its massive weight behind the cloud. READ MORE
The Future of Mobile: It’s all About Services – GigaOM Mobilize Review
When Apple first introduced iPhone back in 2007, late Steve jobs made the famous remarks as iPhone was five year ahead of competition. READ MORE
The Future of Cloud and SaaS: Forecasts and Prospects
The volume of investments in cloud computing is increasing more rapidly than investment in IT in general. READ MORE
The State of Cloud Computing Around the World: Europe
Key areas where actions are needed in order to help drive the adoption of cloud computing in Europe. READ MORE
The Basics of Cloud Forensics
Cloud forensics is the application of digital forensics in cloud computing as a subset of network forensics. READ MORE
Gartner: Top 10 Key Technology Trends for 2013
Gartner named the top 10 technologies and trends in IT that will be strategic for most organizations in 2013. READ MORE