On March 1, 2024, between 17:00 UTC and 17:42 UTC, we saw elevated failure rates (from 1 to 10%) for Copilot, Actions, Pages, and Git for various APIs.
This incident was triggered by a newly-discovered failure mode of a deployment pipeline to one of our compute clusters when it could not write a specific configuration file. This caused a drop in the amount of resources available in this cluster, which was mitigated by a redeployment.
We have addressed the specific scenario to ensure resources are properly written and retrieved and added safeguards to ensure the deployment does not proceed if there is an issue of this type. We are also reviewing our systems to more effectively route traffic toward healthy clusters during an outage and adding more safeguards on cluster resource adjustments.
Posted Mar 01, 2024 - 17:42 UTC
Update
Git Operations is operating normally.
Posted Mar 01, 2024 - 17:42 UTC
Update
Actions and Pages are operating normally.
Posted Mar 01, 2024 - 17:41 UTC
Update
Copilot is operating normally.
Posted Mar 01, 2024 - 17:36 UTC
Update
Pages is experiencing degraded performance. We are continuing to investigate.
Posted Mar 01, 2024 - 17:34 UTC
Update
One of our clusters is experiencing problems, and we are working on restoring the cluster at this time.
Posted Mar 01, 2024 - 17:34 UTC
Investigating
We are investigating reports of degraded performance for API Requests, Copilot, Git Operations and Actions
Posted Mar 01, 2024 - 17:30 UTC
This incident affected: Git Operations, API Requests, Actions, Pages, and Copilot.