7 Sept 2022 - RCA
Started 7 Sep at 07:24pm UTC.
Problem Description, Impact, and Resolution
At 16:25 UTC on September 7, 2022 we observed intermittent issues in the API and Elements services as well as in the Portal in parts of the US, which resulted in some requests from customer applications being unable to access the service in those regions. The issue was caused by an incident with a provider for our CDN.
We pushed a fix to route around the provider for the API and Elements at 17:22pm UTC and saw that the issue was fully resolved for the API though there continued to be intermittent issues with Elements. We made an additional change to the Elements service to route around the issue and it was fully resolved at 18:35 UTC. Additional changes to the Portal were made to fully resolve the incident at 19:11 UTC.
Mitigation Steps and Future Preventative Measures
To ensure this issue does not occur again we have updated our runbook for routing around this service to ensure it can be quickly resolved for the Elements service and the Admin Portal. We will also be reviewing our dependence on this service.