On July 31, 2024, between 07:05 UTC and 09:01 UTC the Actions service experienced degradation, preventing it from processing API requests and executing jobs, in particular Pages builds. On average, 2% of jobs run during the incident window were affected. This was due to some nodes in one of our partner services experiencing connectivity issues in the East US2 region. We mitigated the incident by failing over the impacted service and re-routing the service’s traffic out of that region.
We are working to improve monitoring and processes of failover to reduce our time to detection and mitigation of issues like this one in the future.
Posted Jul 31, 2024 - 09:20 UTC
Update
Actions is operating normally.
Posted Jul 31, 2024 - 09:20 UTC
Update
We are continuing to see improvements in queuing and running Actions jobs and are monitoring for full recovery.
Posted Jul 31, 2024 - 09:13 UTC
Update
We've applied a mitigation to fix the issues with queuing and running Actions jobs. We are seeing improvements in telemetry and are monitoring for full recovery.
Posted Jul 31, 2024 - 08:28 UTC
Update
Actions is experiencing degraded performance. We are continuing to investigate.
Posted Jul 31, 2024 - 08:07 UTC
Update
We are investigating reports of degraded performance in some Redis clusters.