Recently the Hortonworks announced that its solution for Big Data the Hortonworks Data Platform (HDP) is certified in Google Cloud Platform. From now on, the Hortonworks Data Platform can be used with the cloud infrastructure of Google, using the resources of Google Compute Engine and Google Cloud Platform to store, search and analyze massive data sets.
Users can now quickly perform data analysis, among other integration thanks to the Spark in-memory processing framework and Apache Kafka technology. Hadoop on Google Cloud Platform is a set of setup scripts and software libraries optimized for Google infrastructure, bringing ease of use and high performance to data processing with Hadoop.
Hortonworks in a blog post said that with this new certification, enterprises worldwide can dynamically provision HDP clusters on Google Compute Engine and Google Cloud Storage to store, discover and analyze a unified collection of structured and unstructured information assets. With Google Cloud Platform and Hortonworks Data Platform, enterprises benefit from limitless scalability and an enterprise-grade platform backed by community driven open source innovation.
The engineering collaboration work include integrating “bdutil”(command-line script used to manage Apache Hadoop instances on Google Compute Engine) with the Apache Ambari (Hadoop management project) plugin to provision and manage infrastructure, and a Google Cloud Storage connector for HDP.
The team also worked with Apache Ambari Blueprints API to deliver a simple and streamlined provisioning experience for the end user,” says the company. Key highlights of the joint solution include source code available for use & open contribution on GitHub with Apache License v2 and Google’s “bdutil” with Apache Ambari plugin to provision infrastructure and fully configure the Hadoop cluster.
In addition, Hortonworks also announced the creation of the Data Governance Initiative (DGI). DGI will develop an extensible foundation that addresses enterprise requirements for comprehensive data governance. The company will partner with the other DGI members to build a completely open data governance foundation that meets enterprise requirements. DGI members will work with the open source community to deliver a comprehensive solution; offering fast, flexible and powerful metadata services, deep audit store and an advanced policy rules engine.
According to an earlier report published by Allied Market Research, the worldwide market for Hadoop set to grow to $50.2 billion by 2020 from $2.0 billion in 2013 with a CAGR of 58.2% between 2013 and 2020. The Hadoop market is driven by the exponential growth in the volume of unstructured data from big data analysis and the ability to access data at high speed with reduced costs compared to traditional RDBMSs.
In October last year, Microsoft added support of real-time analytics for Apache Hadoop in Azure HDInsight, Apache Storm and new machine learning capabilities in the Azure Marketplace. Hortonworks announced that the next version of the HDP will include a hybrid data connection, which will allow customers to extend their Hadoop-deployment in Azure and use the cloud for backup, testing and scaling.