GitHub Says Database Issues Caused This Week’s Outage and Performance Problems

A database migration gone awry caused the outage and poor availability that GitHub customers experienced this week.

In a lengthy blog post today, GitHub’s Jesse Newland apologized for the outage and said overall it was way below the company’s standards.

The root of the problem stemmed from a database replacement done last month. During that maintenance, the GitHub team replaced its aging pair of DRBD-backed MySQL servers with a 3-node cluster. The new infrastructure is designed so the MySQL database can run on all nodes at all times. This means a failover “simply moves the appropriate virtual IP between nodes after flushing transactions and appropriately changing the read_only MySQL variable.”

With the new architecture, MySQL can run on all servers all the time. In the old way, failing over from one database to another required a cold start of MySQL.

It sounds basic enough but as often happens, seemingly insignificant events can create a series of problems. As a result, GitHub is now taking a closer look at how it manages failovers and the overall management of its new cluster environment.

On Monday, the outage stemmed from what GitHub called an innocuous database migration.  What resulted were some higher-than-expected loads that the GitHub operations team has not previously seen during these sorts of migrations. And that led to a cascading series of errors that resulted in the downtime.

On Tuesday, a cluster partition occurred that caused customers to get data from other users’ dashboards. In addition, some repositories created during this window were incorrectly routed. Newland said the company has removed all of the leaked events and performed an audit of all repositories that were incorrectly routed.

Techcrunch event

Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025

Netflix, Box, a16z, ElevenLabs, Wayve, Hugging Face, Elad Gil, Vinod Khosla — just some of the 250+ heavy hitters leading 200+ sessions designed to deliver the insights that fuel startup growth and sharpen your edge. Don’t miss the 20th anniversary of TechCrunch, and a chance to learn from the top voices in tech. Grab your ticket before doors open to save up to $444.

Join 10k+ tech and VC leaders for growth and connections at Disrupt 2025

Netflix, Box, a16z, ElevenLabs, Wayve, Hugging Face, Elad Gil, Vinod Khosla — just some of the 250+ heavy hitters leading 200+ sessions designed to deliver the insights that fuel startup growth and sharpen your edge. Don’t miss a chance to learn from the top voices in tech. Grab your ticket before doors open to save up to $444.

San Francisco | October 27-29, 2025

Newland said 16 of these repositories were private. For seven minutes they were accessible to people outside of the repository’s list of collaborators or team members. Newland said all of the owners of these repositories were contacted about the problem.

Newland said in summary that three primary events contributed to the downtime of the past few days:

  • Several failovers of the “active” database role happened when they shouldn’t have.
  • A cluster partition occurred that resulted in incorrect actions being performed by its cluster-management software.
  • And the failovers triggered by the first two events impacted performance and availability more than they should have.

GitHub’s problems stem from a change to its database stack. It illustrates an issue with growing online communities.  MySQL simply does not scale very well. And so when problems occur, it can cause issues that affect the entire organization, its customers, and overall perception.

Topics

, , , ,
Loading the next article
Error loading the next article