The big data platform MapR just introduced version 5.0 of its Hadoop distribution based on version 2.7 of the open source framework designed for the processing of very large volumes of data with the support for Docker containers. MapR 5.0 also relies on the Yarn resource manager.
This version strengthens the operational capacity real-time platform. In particular, it extended the highly reliable data transport framework used in the function table MapR-DB Replication (which allows replication between multiple data centers) to provide data to external motors and synchronize in real time.
Compared to other Hadoop distributions, MapR extends the functionality of the framework on security aspects (data protection, user authentication, disaster recovery), but also high availability and performance. Version 5.0 brings further improvements in governance, with a full audit access to data through JSON and Apache Drill Views of support for secure access to data analyze.
More and more companies deploy multiple applications on the same Hadoop cluster. In this context, the latest MapR manages automated synchronization of storage, databases and search index.
To facilitate the deployment of Hadoop clusters, the publisher has also included new models of self-provisioning to set up a cluster as if it were an appliance without using specific hardware. These models can be deployed using the MapR installer. Among the possible configurations, there are the Lake Data services, data mining (Interactive SQL with Apache Drill) and analysis of operational data (basic and MapR NoSQL-DB).
The Apache project will help in the analysis and the use of batch processes and their pipelines with rapid and extensive calculations. The announced distribution automatically synced storage, databases and search indices to allow complex real-time applications. It also has new auditing capabilities.
MapR Technologies intends to continue its growth in big data and analytics-segment. In the context of the MapR database now has the ability to the table replication to synchronize data in real time and make it available for external calculators. The first case that is based on Lucene search platform Elasticsearch is supported to enable synchronized full-text search indexes automatically.
Last year, MapR and Apache Spark integrated their technologies to offer its users an all-around the clock support for Spark to develop the solution and related projects at a faster rate and to integrate more innovative changes. In addition, the two companies are working together on a rapid development of the software and other complementary innovative new features. This will pay off for MapR customers and the Hadoop community well over the coming years.
Recently, Oracle released a new software product that is designed to help big data demands. This product called Oracle Big Data Spatial and Graph provides new analytical capabilities for Hadoop and NoSQL. Oracle created the product so that it can process data natively on Hadoop and parallel on MapReduce using structures in memory.