GitHub header

Incident with Git Operations, Issues, Pull Requests, Actions, API Requests, Codespaces, Packages, Pages and Webhooks

Incident Report for GitHub

Resolved

A performance and resilience optimization to the authorization microservice contained a memory leak that was exposed under high traffic. This resulted in a number of pages returning 404’s that should not have. Testing the build in our canary ring did not expose the service to sufficient traffic to discover the leak, allowing it to graduate to production at 6:37 PM UTC. The memory leak under high load caused pods to crash repeatedly starting at 6:42 PM UTC, failing authorization checks. These failures triggered alerts at 6:44 PM UTC. Rolling back the authorization service change was delayed as parts of the deployment infrastructure relied on the authorization service and required manual intervention to complete. Rollback completed at 7:08 PM UTC and all impacted GitHub features recovered after pods came back online. We are evaluating changes to our rollout strategy to better detect this sooner and with less impact, changes to remove the dependency between authorization services and deployment rollback, and incident response improvements to reduce the overall time to recover.



This incident is unrelated to the Slack integration incident. The combined status updates are a limitation of our status reporting tooling and we recognize the confusion this creates. Work to address this was already in progress and will be complete this month.
Posted Nov 03, 2023 - 19:21 UTC

Update

Slack notifications have recovered.
Posted Nov 03, 2023 - 19:17 UTC

Update

Webhooks is operating normally.
Posted Nov 03, 2023 - 19:15 UTC

Update

Pull Requests is operating normally.
Posted Nov 03, 2023 - 19:15 UTC

Update

Issues is operating normally.
Posted Nov 03, 2023 - 19:15 UTC

Update

Git Operations is operating normally.
Posted Nov 03, 2023 - 19:15 UTC

Update

API Requests is operating normally.
Posted Nov 03, 2023 - 19:15 UTC

Update

Actions is operating normally.
Posted Nov 03, 2023 - 19:15 UTC

Update

Pages is operating normally.
Posted Nov 03, 2023 - 19:14 UTC

Update

Codespaces is operating normally.
Posted Nov 03, 2023 - 19:14 UTC

Update

Packages is operating normally.
Posted Nov 03, 2023 - 19:13 UTC

Update

We have completed the rollback and are monitoring recovery.
Posted Nov 03, 2023 - 19:10 UTC

Update

We’re in the process of rolling back an authorization-related change that is causing 404s and other errors.
Posted Nov 03, 2023 - 19:09 UTC

Update

Packages is experiencing degraded availability. We are continuing to investigate.
Posted Nov 03, 2023 - 19:08 UTC

Update

Pages is experiencing degraded availability. We are continuing to investigate.
Posted Nov 03, 2023 - 19:07 UTC

Update

Actions is experiencing degraded availability. We are continuing to investigate.
Posted Nov 03, 2023 - 19:01 UTC

Update

Packages is experiencing degraded performance. We are continuing to investigate.
Posted Nov 03, 2023 - 19:00 UTC

Update

Codespaces is experiencing degraded performance. We are continuing to investigate.
Posted Nov 03, 2023 - 18:59 UTC

Update

API Requests is experiencing degraded performance. We are continuing to investigate.
Posted Nov 03, 2023 - 18:58 UTC

Update

Actions is experiencing degraded performance. We are continuing to investigate.
Posted Nov 03, 2023 - 18:56 UTC

Update

Pull Requests is experiencing degraded performance. We are continuing to investigate.
Posted Nov 03, 2023 - 18:55 UTC

Update

Issues is experiencing degraded performance. We are continuing to investigate.
Posted Nov 03, 2023 - 18:55 UTC

Update

Git Operations is experiencing degraded performance. We are continuing to investigate.
Posted Nov 03, 2023 - 18:55 UTC

Update

The delayed Slack notifications should be fully processed in about 30 minutes.
Posted Nov 03, 2023 - 18:25 UTC

Update

Delayed Slack notifications are processing and the queue is expected to clear in just over an hour.
Posted Nov 03, 2023 - 17:53 UTC

Update

Users may see delayed Slack notifications coming through as the queue is processed.
Posted Nov 03, 2023 - 17:25 UTC

Update

Fix has been deployed, Slack integrations are recovering
Posted Nov 03, 2023 - 17:22 UTC

Update

We are testing a fix for Slack integrations.
Posted Nov 03, 2023 - 16:56 UTC

Update

We are aware of issues with Slack integration and are working on resolving the problem.
Posted Nov 03, 2023 - 16:11 UTC

Investigating

We are currently investigating this issue.
Posted Nov 03, 2023 - 16:10 UTC
This incident affected: Git Operations, Webhooks, API Requests, Issues, Pull Requests, Actions, Packages, Pages, and Codespaces.