How Does Facebook Manage It’s Big Data?

Ashutosh Kulkarni
4 min readMar 7, 2021
Facebook connectivity of users across globe

Have you ever seen one of the videos on Facebook that shows a “flashback” of posts, likes, or images — like the ones you might see on your birthday or on the anniversary of becoming friends with someone? If so, you have seen examples of how Facebook uses Big Data.

A report from McKinsey & Co. stated that by 2009, companies with more than 1,000 employees already had more than 200 terabytes of data of their customer’s lives stored. Consider adding that startling amount of stored data to the rapid growth of data provided to social media platforms since then. There are trillions of tweets, billions of Facebook likes, and other social media sites like Snapchat, Instagram, and Pinterest are only adding to this social media data flood.

The convergence of social media and big data gives birth to a whole new level of technology.

Facebook has become one of the world’s largest repositories of personal data, with an ever growing range of potential uses. That’s why the monetization of data in the social network has become paramount.

Can you imagine, how many active users are there on Facebook?

Guess a number, maybe almost everyone is connected on it, so what the number comes down to?

Well, let’s ask Google itself!

That’s a huge number, isn’t it? Almost 3 billion people are there and actively using Facebook everyday!

It’s almost half of earth’s existing population.

More than half of the people prefer Facebook over other social media platforms.

That’s sound pretty interesting right, that there are so many users, so let’s think that, what if on Christmas, everybody on Facebook posts to wish “Merry Christmas” post, irrespective of whether it’s text, image, audio or video.

How much data can 3 billion people generate on that particular day?

500+ terabytes of data each day!

It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour.

Thinking of that number, just makes us feel wow! But, Facebook is not only storing the data, but also processing it in real time.

How’s that even possible to manage that kind of enormous data?

Well, this shouldn’t come as a surprise, Facebook has an insane amount of data that grows every moment. And they have an infrastructure in place to manage such an ocean of data.

Facebook Data Center at Virginia
Internal view of Data Center

And what do they run on these systems to manage the data?

We have an open-source application, which helps us to manage this enormous amount of data, and it’s under Apache License, namely known as Apache Hadoop.

Apache Hadoop

Hadoop uses Master-Slave architecture of HDFS Cluster to manage big data.

What is HDFS?

HDFS is a distributed file system that handles large data sets running on commodity hardware. It is used to scale a single Apache Hadoop cluster to hundreds (and even thousands) of nodes. HDFS is one of the major components of Apache Hadoop.

In Master-Slave cluster,

Data comes from the facebook front end application to the Master Node of hdfs cluster whenever user posts or comments etc and get split into equal chunks of data which are processed in the Slave Nodes/Data Nodes.

Distributed computing

That’s how Facebook manages huge amount of data.

--

--