Research firm IDC in a recent report shows that companies combine Hadoop with other databases to make big data analysis. A significant proportion of survey respondents said Hadoop is used to replace traditional data warehouse technologies. This ranges from the analysis of raw data, whether operational data, data from different machines or terminals or point-of-sale, or data on customer behavior collected by the e-commerce retail systems.
Hadoop was designed especially for the analysis of large data sets to build scalable, distributed applications. The last October announcement of Hadoop version 2.2 by Apache Foundation marks the first stable release of Hadoop – making it move even deeper into the organization to pursue unstructured and semi-structured data types.
To manage big data, Hadoop implements the paradigm called MapReduce defined by Google according to which the applications are divided into small pieces of software, each of which can be run on a different node of all those who make up the system.
One of the most important components in Hadoop 2.0 is YARN. YARN is often referred to as MapReduce 2.0 or MRv2. Compared to MapReduce 1.0, YARN the advantage that the management of the engine is excluded from the actual algorithm. This also means that the user can use MapReduce algorithm a plug-in instead of using as an interactive process. This is considered a milestone in the development of Hadoop from a simple tool to a complete operating system for big data.
Analysts from research firm Gartner says Hadoop 2 marks a significant development of the open source project that the passionate developer of the Apache community have created together. Their goal was to make especially the data platform easier to use and more stable. The new organization led by YARN allows the simultaneous execution of multiple applications on HDFS, the distributed file system while providing better monitoring of data throughout its lifecycle.
During a recent Gigaom Research webinar, analyst and researcher for Gigaom Research said that YARN is loud Hortonworks rather a framework that can not only analyze batch processes, but also data streams and can also analyze interactive queries. Companies like Amazon Web Services, Cloudera, Hortonworks, IBM, Intel, MapR Technologies, Pivotal Software, Twitter, Facebook and others are couching their big data message and providing insight into where the market is headed using Apache Hadoop technology. NASA relies on Hadoop to handle large volumes of data in projects such as the Square Kilometer Array, for viewing the heavens. Hadoop partners Cloudera and Hortonworks have already endorsed the new version and adapted their products to the new YARN framework.
Hadoop 2.2 stimulates not only the way apps for big data platform to be written, but also makes entirely new methodologies of data crunching possible that were previously unthinkable due to architectural limitations.