Facebook outage was caused by maintenance error, company says

Facebook outage was caused by maintenance error, company says


Dallas, Texas (US): An error during routine maintenance on Facebook Inc.’s network of data centers caused Monday’s collapse of its global system for more than six hours, leading to a torrent of problems that delayed the repairs, the company said.

The
Facebook outage was the largest that Downdetector, a web monitoring firm, said it had ever seen. It
blocked access to apps for billions of users of Facebook, Instagram and WhatsApp, further
intensifying weeks of scrutiny for the nearly $1-trillion company.

At a US Senate hearing on Tuesday, a former employee turned whistleblower
accused Facebook of putting profits before people’s safety,
which the company denies.

In a blog post, Facebook Vice President of Engineering Santosh Janardhan explained the company’s engineers issued a command that unintentionally disconnected Facebook data centers from the rest of the world.

Facebook’s systems are designed to audit commands to prevent mistakes, but the audit tool had a bug and failed to stop the command that caused the outage, the company said.

The outage
was not caused by malicious activity, it added.

ALSO READ TECH NEWSLETTER OF THE DAY

A day after a faulty configuration change knocked Facebook, WhatsApp and Instagram offline for almost six hours, a former employee turned whistleblower testified before a US senate committee, accusing the company of making “destructive choices”.


Read Now



While users lost access to one of the world’s most popular messaging apps—WhatsApp has more than 2 billion users—employees were also blocked from internal tools. The outage knocked out tools that engineers would normally use to investigate and repair such outages, making the task even more difficult, Facebook said. The company said it sent a team of engineers to the location of its data centers to try to debug and restart the systems.

However, it took the company extra time to get engineers inside to work on the servers due to the high physical and system security in place. And even after network connectivity was restored to the data centres, Facebook said it worried a surge in traffic would cause its websites and apps to crash.

But because the company had run drills to prepare for such situations, access to its services returned relatively quickly. “Every failure like this is an opportunity to learn and get better,” Janardhan wrote. “From here on out, our job is to…make sure events like this happen as rarely as possible.”



Source link

Add a Comment

Your email address will not be published.