big data 300x225 9 Open Source Big Data Technologies Set to Change the WebBig Data is booming these days, as more and more companies realize the benefit of storing data and leveraging it for useful insights. At the forefront of this Big Data revolution is Open Source technology, since majority of Big Data companies prefer it over closed source technology. Here are nine open source Big Data technologies that you should keep an eye on:

Apache Hadoop

Apache Hadoop was originally created by Dough Cutting in order to support his work on Nutch, which is an open source Web search engine. Hadoop is basically a MapReduce facility and distributed file system merged together, and was designed initially to meet Nutch’s multimachine processing requirements. The basic principle behind Hadoop is that it splices and distributes big data over a series of nodes running on commodity hardware.

R

Designed by Ross Ihaka and Robert Gentleman at the University of Auckland, NZ in 1993, R is an open source programming language that became the de facto standard for statistical analysis of very large data sets, as it is specially designed with statistical computing and visualization in mind.

Cascading

Cascading is an open source abstraction layer for Hadoop, that works as an alternative to MapReduce. Cascading allows the execution of data processing workflows using any JVM based language, with the goal of concealing the inherent complexity of MapReduce jobs, in order to make it easier for people who don’t need or don’t want to bother with the nitty gritty of log file analysis, bioinformatics, machine learning, and other MapReduce jobs.

Scribe

Developed and released last 2008 by social media giant Facebook, it was designed to aggregate log data that is streamed in real time from a large number of servers. The original purpose was to handle Facebook’s own scaling problems. So far, Scribe has been successful and is currently handling tens of billions of messages a day.

ElasticSearch

ElasticSearch is an open source search server developed by Shay Bannon and based on Apache Lucene. ElasticSearch’s main selling point is that it doesn’t require a special configuration and is perfectly scalable while still supporting near real-time search and multitenancy. It is currently used by a number of high profile companies, particularly Mozilla and StumbleUpon.

Apache Hbase

Designed to run on top of Hadoop’s Distributed Filesystem, Apache Hbase is an open source, non-relational columnar distributed database that is modeled after Google’s BigTable. Hbase’ most notable user is Facebook, which adopted the platform last 2010 for use in its messaging service.

Apache Cassandra

Another one of Facebook’s aces, Apache Cassandra was originally developed as a NoSQL data storage solution that will power the social network’s Inbox Search Feature. Facebook has since abandoned Cassandra in favor of Hbase, but it is still being used by a number of high profile companies such as Netflix, particularly as a back end DB for their streaming services. Cassandrai s currently available under the Apache License 2.0.

MongoDB

MongoDB is a popular open source NoSQL data store that uses structured data in JSON-like documents using a dynamic schemas called Binary JSON. Created by the founders of DoubleClick, MongoDB is currently used by several large enterprises such as Craigslist, Disney Interactive Media Group, Etsy, The New York Times, and MTV Networks.

Apache CouchDB

Yet another open sourche NoSQL DB, CouchDB uses a blend of JSON, Javascript, MapREduce, and HTTP to store and query data. The platform was originally created in 2005 by former IBM developer Damien Katz as a storage protocol for large scale objects. One of CouchDB’s more popular users is The British Broadcasting Corporation, which uses it for their dynamic content platforms.

The State of Cloud Computing Around the World: China
China is the newest country to see the massive potential of cloud computing, and the nation is now throwing its massive weight behind the cloud. READ MORE
The Future of Mobile: It’s all About Services – GigaOM Mobilize Review
When Apple first introduced iPhone back in 2007, late Steve jobs made the famous remarks as iPhone was five year ahead of competition. READ MORE
The Future of Cloud and SaaS: Forecasts and Prospects
The volume of investments in cloud computing is increasing more rapidly than investment in IT in general. READ MORE
The State of Cloud Computing Around the World: Europe
Key areas where actions are needed in order to help drive the adoption of cloud computing in Europe. READ MORE
The Basics of Cloud Forensics
Cloud forensics is the application of digital forensics in cloud computing as a subset of network forensics. READ MORE
Gartner: Top 10 Key Technology Trends for 2013
Gartner named the top 10 technologies and trends in IT that will be strategic for most organizations in 2013. READ MORE