hadoop logo Hadoop Best Practice: FacebookHadoop is a free Java framework for distributed applications and intensive data management. It enables applications to work with thousands of nodes and petabytes of data.

With Hadoop, companies discover and put into practice new techniques for analysis and retrieval of data, techniques previously impossible to implement for reasons of performance, cost and technology. As a result, Hadoop is an option that is gaining popularity to treat, store and analyze large volumes of raw data, semi-structured or unstructured data from the most disparate data sources.

More than one billion users of Facebook generated about 2,500 million updates, about 300 million images and links. This makes the social network to process more than 500 terabytes of content daily.

Facebook uses several open source solutions in their data warehouses – Apache Hadoop, Apache Hive, Apache HBase, Apache Thrift or in-house Facebook Scribe tool to manage such vast amount of data.

At Facebook, Hadoop is used in three different ways: as a data warehouse for web analytics, to store distributed database, and finally for backups of these MySQL database servers.

Facebook is probably maintain the largest Hadoop cluster in the world, with over 105 terabytes every 30 minutes including data related to 2.7 billion Likes and 2.5 billion content items shared per day for more than a billion users. In addition, Hadoop is used for thousands of simultaneous requests, mining operations, social analysis, and management of resources.

The main strength Hadoop lies in its scalability. Hadoop allows Facebook to exploit the current hardware. This framework supports the treatment of all types of data – structured, semi-structured or unstructured and its scalability allows Facebook developers to extend through more specialized functionality for a wide range of applications.

Hadoop does not replace existing systems. Instead, Hadoop increases their power by allowing additional processing of large volumes of data so that existing systems can focus on what they do best. Facebook’s Prism project plays an important role for the company to harness the potential of Hadoop in a hybrid environment to take advantage of truly unique benefits of each technology and maximize the performance of the whole environment.

One of the major limitations in Hadoop is to make things work; the servers have to be next to each other. But, Facebook is taking Hadoop to the next level with its Prism project. Prism project is a way to run a Hadoop cluster across multiple data centers around the world. It automatically replicates and moves data to where it is needed on a vast network of computers.

Hottest IT Skills in 2013 – Cloud, Mobile and BI
In 2012, more than 1.7 million jobs in the field of cloud computing remained unoccupied, according to analysts firm IDC. READ MORE
How Cloud Computing Influences Digital Marketing
Cloud marketing has the ability to drastically change the ways in which they reach and engage their audience, particularly with regard to distributing and storing mission-critical data. READ MORE
Gartner: BYOD to Take Center Stage For Mobile App Use by 2017
More and more companies encourage their employees to work on their devices, thus reducing the cost of computer equipment, but also increase the cost to maintain licenses and safety. READ MORE
Maturity in the Cloud: Start Thinking Like a Grown-Up
Despite the inclination to wait until all of the cloud’s kinks have been worked out, holding off on cloud initiatives until the industry matures won’t guarantee success. READ MORE
PwC: Cloud, SaaS and Mobile Are Changing Software Industry
The software industry is undergoing major changes by trends such as cloud, SaaS, mobile technology and the “consumerization of IT”. READ MORE
10 Cloud Computing Game Changers
Here are the ten most influential cloud computing companies, and the reason why. READ MORE