Processing delays to some Issues, Pull Requests and Webhooks

Incident Report for GitHub

Resolved

On Sep 13, 2024, between 05:03 UTC and 07:13 UTC, the Webhooks and Actions services were degraded resulting in some customers experiencing delayed processing of Webhooks and Actions Runs. 0.5% of Webhook deliveries were delayed more than 2 minutes during the incident. 15% of Actions Runs started between 05:03 and 05:24 UTC saw run start delays or failures. At 05:24 UTC, we implemented a mitigation to shift traffic to healthy infrastructure and new Actions Runs resumed normal operations. During the rest of the incident window, Actions runs started before 05:24 UTC continued to see delays publishing logs or job results. No Actions runs or Webhook deliveries were lost, only delayed.

We mitigated the incident by immediately shifting traffic to a healthy cluster while investigating. The incident was caused by an erroneous configuration change on our eventing platform. A permanent fix was deployed at 06:22 UTC after which services began to recover and burn down their backed up queues, with full recovery by 07:13 UTC.

We are working to reduce our time to detection and develop test automation to prevent issues like this one in the future.

Posted Sep 13, 2024 - 07:13 UTC

Update

We are seeing improvements in telemetry and are monitoring the delivery of delayed Webhooks and Actions job statuses.

Posted Sep 13, 2024 - 06:49 UTC

Update

We've applied a mitigation to fix the issues being experienced in some cases with delays to webhook deliveries, and the delayed reporting of the outcome of some running Actions jobs. We are monitoring for full recovery.

Posted Sep 13, 2024 - 06:23 UTC

Update

Actions is experiencing degraded performance. We are continuing to investigate.

Posted Sep 13, 2024 - 05:59 UTC

Investigating

We are investigating reports of degraded performance for Issues, Pull Requests and Webhooks

Posted Sep 13, 2024 - 05:42 UTC

This incident affected: Webhooks, Issues, Pull Requests, and Actions.