 Every day there are more and more companies that handle untold amounts of information, where we talk about petabytes. If we add the management of data from social networks like Facebook, Twitter and LinkedIn, then the concept of Big Data comes into picture.
Every day there are more and more companies that handle untold amounts of information, where we talk about petabytes. If we add the management of data from social networks like Facebook, Twitter and LinkedIn, then the concept of Big Data comes into picture.
Facebook revealed some big, big stats on big data to a few reporters at its head quarter. Today, its system processes 2.5 billion pieces of content and 500+ terabytes of data each day. The system is pulling 2.7 billion times “Like” button per day, 300 million photos per day, and it scans roughly 210 terabytes of data each every hour executing 70,000 daily queries.
Jay Parikh, VP of Engineering Facebook, explained why this is so important to Facebook: “Big data really is about having insights and making an impact on your business. If you aren’t taking advantage of the data you’re collecting, then you just have a pile of data, you don’t have big data. By processing data within minutes, Facebook can rollout out new products, understand user reactions, and modify designs in near real-time.”
Big Data Management
Facebook also revealed that over 100 petebytes of data are stored in a single Hadoop cluster disk, and Parikh noted “We think we operate the single largest Hadoop system in the world.”
To cope with the increasing demand of big data, company has set what is known internally as Project Prism. Prism Project aims to have a single monolithic virtual store, physically separated but with a single view of data that allows distributed and decentralized infrastructure that now too dependent on one single data center. To do this, Parikh said that the company stores almost all information on the same level so that any engineer can use the data in the fastest way.
“This project will allow us to take this store monolithic and physically separate, but maintaining a centralized view of data,” Facebook said.
Today, technologies such as Hadoop, Mahout, Cassandra and others, combined with the cloud, allow the world to do better analytics, such as natural language processing, machine learning, semantic analysis and cluster analysis of large data.
The analysis of the 2.7 zettabytes digital data, which is expected to be stored globally in 2012, opens new business opportunities for companies. Companies like Facebook now have a new generation of technologies and architectures, and facilitate greater organizational efficiency, serve to extract economic value of the capture and management of these large volumes of data.
According to research firm Gartner, big data will play a major role in technologies like In-Memory Database Management Systems, Cloud Computing, and Column-Store Database Management Systems in next five years.
The data is not just helpful for Facebook. It passes on the benefits to its advertisers. Facebook is now tracking how ads are doing across different dimensions of users across the site, based on gender, age, and interests. For the ads that are doing better in a particular region, Facebook can now show more such kind of ads to make it more successful.
This implies that all such data processing that occur within minutes, the team of Mark Zuckerberg can better understand the reactions of customers, implement new products and modify certain designs in real time.