On 2024-01-21 from 02:05 UTC to 06:19 UTC, GitHub Hosted Runners experienced increased error rates from our main cloud service provider. The errors were initially limited to a single region and we were able to route around the issue by transparently failing over to other regions. However, errors gradually expanded across all regions we deploy to and led to our available compute capacity being exhausted.
During the incident, up to 35% of Actions jobs using Larger Runners and 2% of Actions jobs using GitHub Hosted Runners overall may have experienced intermittent delays in starting. Once the issue was resolved by our cloud service provider, our systems made a full recovery without intervention.
We’re working closely with our service provider to understand the cause of the outage and mitigations we can put in place. We’re also working to increase our resilience to outages of this nature by expanding the regions we deploy to beyond the existing set, especially for Larger Runners.