GitHub header

Incident with multiple GitHub services

Incident Report for GitHub

Resolved

On April 23, 2026, between 16:03 UTC and 17:27 UTC, multiple GitHub services experienced elevated error rates and degraded performance due to DNS resolution failures originating from our DNS infrastructure in our VA3 datacenter. Approximately 5–7% of overall traffic was affected during the impact window:

- Webhooks: ~0.35% of API requests returned 5xx (peak ~0.39%). ~0.88% of requests exceeded 3s latency; at peak, >3s responses represented ~10% of Webhooks API traffic.

- Copilot Metrics: ~9% of Copilot Insights dashboard requests returned 5xx.

- Copilot cloud agents: ~10% of cloud agent sessions were affected and failing.

- Octoshift: 0.88% of active repo migrations failed and 79% saw elevated durations (avg. 5.2 min) during this period.

- Git Operations: averaged 1.25% errors over the duration of the incident, with a peak of 2.07% errors.

- Actions: Workflow run status updates experienced delays of up to ~8s over the duration of the incident window.

Our DNS infrastructure in VA3 entered a degraded state and began intermittently returning NXDOMAIN responses and timing out on lookups for both internal service discovery and external endpoints. This caused a cascading impact across the dependent services listed above.

We identified a specific load pattern under which our DNS resolvers began failing. The evidence points to a recently introduced traffic-balancing mechanism, rolled out progressively to support our growth, as the root cause. We have since reverted this change.

We are immediately prioritizing investments in a more controlled rollout and validation process, including a dedicated environment to safely shadow production DNS traffic and detect these failure modes before they can affect production.
Posted Apr 23, 2026 - 17:30 UTC

Update

Webhooks is operating normally.
Posted Apr 23, 2026 - 17:10 UTC

Update

Many services are mitigated and are validating the remaining services.
Posted Apr 23, 2026 - 17:04 UTC

Update

The degradation affecting Actions and Copilot has been mitigated. We are monitoring to ensure stability.
Posted Apr 23, 2026 - 17:03 UTC

Update

We have identified the root problem and are working on mitigation.
Posted Apr 23, 2026 - 16:52 UTC

Update

Actions is experiencing degraded performance. We are continuing to investigate.
Posted Apr 23, 2026 - 16:34 UTC

Update

We are investigating multiple unavailable services.
Posted Apr 23, 2026 - 16:19 UTC

Investigating

We are investigating reports of degraded availability for Copilot and Webhooks
Posted Apr 23, 2026 - 16:12 UTC
This incident affected: Webhooks, Actions, and Copilot.