From 17:39-18:12 UTC GitHub was down in parts of North America, particularly the US East coast, and South America.
GitHub takes measures to ensure that we have redundancy in our system for various disaster scenarios. We have been working on building redundancy to an earlier single point of failure in our network architecture at a second Internet edge facility. This second Internet edge facility was completed in January and has been actively routing production traffic since then. Today we were performing a live failover test to validate that we could in fact use this second Internet edge facility if the primary were to fail. Unfortunately, during this failover we inadvertently caused a production outage.
During the test we exposed that the secondary site had a network pathing configuration issue that prevented it from properly functioning as the primary facility. This caused issues with Internet connectivity to GitHub, ultimately resulting in an outage. We were immediately notified of the issue in our monitoring and alerting. Within two minutes of being alerted we reverted the change and brought the primary facility back online. Once online it took time for traffic to be rebalanced and for our border routers to reconverge restoring public connectivity to affected GitHub systems.
This failover test helped expose the configuration issue, and we are addressing the gaps in both configuration and our failover testing which will help make GitHub more resilient. We recognize the severity of this outage and apologize for the impact it has to our customers.
We have recovered and are operating normally
The root cause has been mitigated and most services have fully recovered. We are still monitoring for full recovery.
We are continuing to investigate this issue.
We are continuing to see recovery and are continuing to monitor as recovery continues.
We are starting to see recovery and are continuing to monitor as we mitigate.
We have identified the root cause of the outage and are working toward mitigation
We are currently experiencing an outage of GitHub products and are investigating.
This incident affected: Git Operations, API Requests, Webhooks, Issues, Pull Requests, Actions, Packages, Pages, Codespaces, and Copilot.