Incidents | Basis Theory Incidents reported on status page for Basis Theory https://status.basistheory.com/ https://d1lppblt9t2x15.cloudfront.net/logos/f35dd65f3c7cbd534672a7f536bd197b.png Incidents | Basis Theory https://status.basistheory.com/ en Intermittent 500 Errors https://status.basistheory.com/incident/768929 Thu, 20 Nov 2025 00:37:00 -0000 https://status.basistheory.com/incident/768929#d950831d9381f03403ce16d996e2472d2ed27b74050e7f017009fb609b5a6c5e # **Problem Description, Impact, and Resolution** At approximately **11:30 UTC on November 18, 2025**, customers began experiencing failures when calling any Basis Theory service due to a **global outage at our regional edge provider, Cloudflare**. The outage impacted DNS resolution, edge routing, and WAF processing across Cloudflare’s network, preventing customer requests from reaching our public endpoints. During the peak of the event, Cloudflare returned 500-series errors for **100% of inbound traffic**, resulting in a complete outage across all products, including Vault, Card Management Services, Elements, and 3DS. Customers reliant on Basis Theory for payment collection or card processing were unable to complete those operations during the outage window. As Cloudflare’s network partially recovered at various points, some customers experienced intermittent success, leading to inconsistent behavior across geographies and increased retry traffic. Our internal systems, including Vault and all supporting services, remained fully healthy and available throughout the incident, with no degradation in performance or capacity. However, because Cloudflare serves as the first-hop ingress for all customer traffic, the global failure of their DNS and routing layers prevented any requests from reaching our infrastructure. During the incident, we used our disaster recovery playbook to bypass Cloudflare by disabling Cloudflare edge processing and routing traffic directly to our AWS load balancers. These changes were unsuccessful, as Cloudflare’s degraded services appeared to prevent DNS updates from propagating externally. This left no viable alternative path to reroute production traffic during the outage. Cloudflare’s global services began to gradually recover starting at **14:30 UTC**, leading to a steady decline in error rates. By **17:30 UTC**, Cloudflare’s network had fully stabilized, and all traffic to Basis Theory was routing typically, with request success rates returning to 100%. No additional corrective actions were required on our side once Cloudflare restored its global network, and no further customer impact was observed. To prevent this type of outage in the future, we are redesigning our edge architecture to eliminate Cloudflare as a single point of failure, strengthening our operational playbooks for faster edge bypass during vendor outages, and enhancing our monitoring to detect regional or intermittent failures more promptly. # **Detailed Timeline of Events** - 2025-11-28 - 11:30 (UTC) - Cloudflare begins having issues routing and serving requests across all Basis Theory domains (js.basistheory.com, api.basistheory.com, 3ds.basistheory.com). - 11:57 (UTC) - First internal page fired due to synthetic test failures. - 12:07 (UTC) - Some successful calls began sporadically succeeding across all products (intermittent recovery). - 12:23 (UTC) - Vault traffic degraded , ~50% success rate observed. - 12:47 (UTC) - Some successful calls began sporadically succeeding across all products (intermittent recovery). - 13:00 (UTC) - Full global outage , 100% of customer traffic failing. - Team initiates disaster recovery efforts under assumption Cloudflare load balancer and routing layers are failing. Plan established to bypass Cloudflare entirely. - 13:40 (UTC) - Attempted mitigation: Updated Cloudflare DNS to route to api.basistheory.com to route traffic directly to AWS. - Assumption was this would restore ~80% of US traffic; no improvement observed. It was deemed this chagne had **no effect**, likely because Cloudflare’s DNS changes were not fully propagating due to their degraded edge. - 14:08 (UTC) - Cloudflare routing re-enabled; error rates spike due to system retries and customer retry logic. - 14:30 (UTC) - Cloudflare systems began to recover. Error rates drop below 50%. - 14:40 (UTC) - Error rates drop below 15% and continue declining over the next 40 minutes. - 17:30 (UTC) - All customer requests succeeding; service fully restored. # **Root Cause Explanation** Cloudflare experienced a **global network failure** affecting DNS resolution, edge routing, WAF processing, and global load balancing. Their incident summary is here, although the root cause of the Cloudflare outage does not change the fact that we have a single point of failure in our edge routing - https://blog.cloudflare.com/18-november-2025-outage/. Below is a description of why this outage caused a Basis Theory outage. Basis Theory’s architecture relies on Cloudflare for: - **Public DNS** - **Global CDN** - **WAF** - **Traffic steering & routing** - **Load balancing** Because Cloudflare is the exclusive ingress path for all customer traffic, their global outage made all Basis Theory products unreachable, even though our AWS infrastructure remained fully healthy. Efforts to bypass Cloudflare through DNS changes were unsuccessful because Cloudflare’s internal DNS services were also degraded and unable to propagate changes. This prevented traffic from being routed directly to AWS. As a result, **Cloudflare became a critical single point of failure**, and Basis Theory had no alternative routing path available during the incident. # **What Worked and What Didn’t** ### **What Worked** - Internal alerting correctly detected failures across all products - Core infrastructure remained healthy and fully operational - Internal traffic and health checks confirmed backend health throughout ### **What Didn’t Work** - Synthetic monitoring provided false confidence early in the event due to intermittent Cloudflare recoveries - DNS Proxy-disable/fallback routing could not propagate during Cloudflare’s global degradation - No independent DNS provider or edge-bypass path existed to re-route traffic during Cloudflare failure - Cloudflare’s full ownership of DNS + routing + proxying created complete ingress lock-in - Basis Theory’s Status page was also impacted by these issues, causing a delay in communication on the impact of our systems on our customers. # **Future Prevention & Next Steps** To ensure an outage of this scale cannot recur, we are implementing several improvements across our operational processes, monitoring strategy, and edge architecture. 1. **Edge Architecture Redesign** We are re-architecting our ingress and edge routing strategy to eliminate Cloudflare as a single point of failure. This includes introducing multi-provider redundancy, improving automated failover capabilities, and enforcing service-level objectives that trigger autonomous routing changes when error or latency thresholds are exceeded. 2. **Operational Readiness & Response Improvements** - We have updated our Support and Operations action plans to include clearer escalation paths and explicit procedures for cases where our status page or other external communication systems are degraded. - We have validated and refined our disaster-recovery playbooks to ensure we can rapidly execute Cloudflare Edge bypass procedures. 3. **Monitoring and Alerting Enhancements** We are reconfiguring our monitors to shorten detection windows and improve sensitivity to intermittent, region-specific failures. This will enable us to identify partial outages more quickly and make more informed decisions earlier during edge instability. 4. **Customer-Controlled Bypass Path** We are developing a fully supported, customer-accessible bypass option that can be activated during severe edge degradation. This will provide customers with a direct path to our infrastructure when needed, even if an edge provider is impaired, and will serve as a last-resort mechanism to ensure traffic can still reach us if other mitigation steps fail. These changes aim to significantly reduce dependency concentration, accelerate detection and mitigation, and ensure reliable access to Basis Theory services even in the event of large-scale external vendor outages. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Thu, 20 Nov 2025 00:37:00 -0000 https://status.basistheory.com/incident/768929#d950831d9381f03403ce16d996e2472d2ed27b74050e7f017009fb609b5a6c5e # **Problem Description, Impact, and Resolution** At approximately **11:30 UTC on November 18, 2025**, customers began experiencing failures when calling any Basis Theory service due to a **global outage at our regional edge provider, Cloudflare**. The outage impacted DNS resolution, edge routing, and WAF processing across Cloudflare’s network, preventing customer requests from reaching our public endpoints. During the peak of the event, Cloudflare returned 500-series errors for **100% of inbound traffic**, resulting in a complete outage across all products, including Vault, Card Management Services, Elements, and 3DS. Customers reliant on Basis Theory for payment collection or card processing were unable to complete those operations during the outage window. As Cloudflare’s network partially recovered at various points, some customers experienced intermittent success, leading to inconsistent behavior across geographies and increased retry traffic. Our internal systems, including Vault and all supporting services, remained fully healthy and available throughout the incident, with no degradation in performance or capacity. However, because Cloudflare serves as the first-hop ingress for all customer traffic, the global failure of their DNS and routing layers prevented any requests from reaching our infrastructure. During the incident, we used our disaster recovery playbook to bypass Cloudflare by disabling Cloudflare edge processing and routing traffic directly to our AWS load balancers. These changes were unsuccessful, as Cloudflare’s degraded services appeared to prevent DNS updates from propagating externally. This left no viable alternative path to reroute production traffic during the outage. Cloudflare’s global services began to gradually recover starting at **14:30 UTC**, leading to a steady decline in error rates. By **17:30 UTC**, Cloudflare’s network had fully stabilized, and all traffic to Basis Theory was routing typically, with request success rates returning to 100%. No additional corrective actions were required on our side once Cloudflare restored its global network, and no further customer impact was observed. To prevent this type of outage in the future, we are redesigning our edge architecture to eliminate Cloudflare as a single point of failure, strengthening our operational playbooks for faster edge bypass during vendor outages, and enhancing our monitoring to detect regional or intermittent failures more promptly. # **Detailed Timeline of Events** - 2025-11-28 - 11:30 (UTC) - Cloudflare begins having issues routing and serving requests across all Basis Theory domains (js.basistheory.com, api.basistheory.com, 3ds.basistheory.com). - 11:57 (UTC) - First internal page fired due to synthetic test failures. - 12:07 (UTC) - Some successful calls began sporadically succeeding across all products (intermittent recovery). - 12:23 (UTC) - Vault traffic degraded , ~50% success rate observed. - 12:47 (UTC) - Some successful calls began sporadically succeeding across all products (intermittent recovery). - 13:00 (UTC) - Full global outage , 100% of customer traffic failing. - Team initiates disaster recovery efforts under assumption Cloudflare load balancer and routing layers are failing. Plan established to bypass Cloudflare entirely. - 13:40 (UTC) - Attempted mitigation: Updated Cloudflare DNS to route to api.basistheory.com to route traffic directly to AWS. - Assumption was this would restore ~80% of US traffic; no improvement observed. It was deemed this chagne had **no effect**, likely because Cloudflare’s DNS changes were not fully propagating due to their degraded edge. - 14:08 (UTC) - Cloudflare routing re-enabled; error rates spike due to system retries and customer retry logic. - 14:30 (UTC) - Cloudflare systems began to recover. Error rates drop below 50%. - 14:40 (UTC) - Error rates drop below 15% and continue declining over the next 40 minutes. - 17:30 (UTC) - All customer requests succeeding; service fully restored. # **Root Cause Explanation** Cloudflare experienced a **global network failure** affecting DNS resolution, edge routing, WAF processing, and global load balancing. Their incident summary is here, although the root cause of the Cloudflare outage does not change the fact that we have a single point of failure in our edge routing - https://blog.cloudflare.com/18-november-2025-outage/. Below is a description of why this outage caused a Basis Theory outage. Basis Theory’s architecture relies on Cloudflare for: - **Public DNS** - **Global CDN** - **WAF** - **Traffic steering & routing** - **Load balancing** Because Cloudflare is the exclusive ingress path for all customer traffic, their global outage made all Basis Theory products unreachable, even though our AWS infrastructure remained fully healthy. Efforts to bypass Cloudflare through DNS changes were unsuccessful because Cloudflare’s internal DNS services were also degraded and unable to propagate changes. This prevented traffic from being routed directly to AWS. As a result, **Cloudflare became a critical single point of failure**, and Basis Theory had no alternative routing path available during the incident. # **What Worked and What Didn’t** ### **What Worked** - Internal alerting correctly detected failures across all products - Core infrastructure remained healthy and fully operational - Internal traffic and health checks confirmed backend health throughout ### **What Didn’t Work** - Synthetic monitoring provided false confidence early in the event due to intermittent Cloudflare recoveries - DNS Proxy-disable/fallback routing could not propagate during Cloudflare’s global degradation - No independent DNS provider or edge-bypass path existed to re-route traffic during Cloudflare failure - Cloudflare’s full ownership of DNS + routing + proxying created complete ingress lock-in - Basis Theory’s Status page was also impacted by these issues, causing a delay in communication on the impact of our systems on our customers. # **Future Prevention & Next Steps** To ensure an outage of this scale cannot recur, we are implementing several improvements across our operational processes, monitoring strategy, and edge architecture. 1. **Edge Architecture Redesign** We are re-architecting our ingress and edge routing strategy to eliminate Cloudflare as a single point of failure. This includes introducing multi-provider redundancy, improving automated failover capabilities, and enforcing service-level objectives that trigger autonomous routing changes when error or latency thresholds are exceeded. 2. **Operational Readiness & Response Improvements** - We have updated our Support and Operations action plans to include clearer escalation paths and explicit procedures for cases where our status page or other external communication systems are degraded. - We have validated and refined our disaster-recovery playbooks to ensure we can rapidly execute Cloudflare Edge bypass procedures. 3. **Monitoring and Alerting Enhancements** We are reconfiguring our monitors to shorten detection windows and improve sensitivity to intermittent, region-specific failures. This will enable us to identify partial outages more quickly and make more informed decisions earlier during edge instability. 4. **Customer-Controlled Bypass Path** We are developing a fully supported, customer-accessible bypass option that can be activated during severe edge degradation. This will provide customers with a direct path to our infrastructure when needed, even if an edge provider is impaired, and will serve as a last-resort mechanism to ensure traffic can still reach us if other mitigation steps fail. These changes aim to significantly reduce dependency concentration, accelerate detection and mitigation, and ensure reliable access to Basis Theory services even in the event of large-scale external vendor outages. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Thu, 20 Nov 2025 00:37:00 -0000 https://status.basistheory.com/incident/768929#d950831d9381f03403ce16d996e2472d2ed27b74050e7f017009fb609b5a6c5e # **Problem Description, Impact, and Resolution** At approximately **11:30 UTC on November 18, 2025**, customers began experiencing failures when calling any Basis Theory service due to a **global outage at our regional edge provider, Cloudflare**. The outage impacted DNS resolution, edge routing, and WAF processing across Cloudflare’s network, preventing customer requests from reaching our public endpoints. During the peak of the event, Cloudflare returned 500-series errors for **100% of inbound traffic**, resulting in a complete outage across all products, including Vault, Card Management Services, Elements, and 3DS. Customers reliant on Basis Theory for payment collection or card processing were unable to complete those operations during the outage window. As Cloudflare’s network partially recovered at various points, some customers experienced intermittent success, leading to inconsistent behavior across geographies and increased retry traffic. Our internal systems, including Vault and all supporting services, remained fully healthy and available throughout the incident, with no degradation in performance or capacity. However, because Cloudflare serves as the first-hop ingress for all customer traffic, the global failure of their DNS and routing layers prevented any requests from reaching our infrastructure. During the incident, we used our disaster recovery playbook to bypass Cloudflare by disabling Cloudflare edge processing and routing traffic directly to our AWS load balancers. These changes were unsuccessful, as Cloudflare’s degraded services appeared to prevent DNS updates from propagating externally. This left no viable alternative path to reroute production traffic during the outage. Cloudflare’s global services began to gradually recover starting at **14:30 UTC**, leading to a steady decline in error rates. By **17:30 UTC**, Cloudflare’s network had fully stabilized, and all traffic to Basis Theory was routing typically, with request success rates returning to 100%. No additional corrective actions were required on our side once Cloudflare restored its global network, and no further customer impact was observed. To prevent this type of outage in the future, we are redesigning our edge architecture to eliminate Cloudflare as a single point of failure, strengthening our operational playbooks for faster edge bypass during vendor outages, and enhancing our monitoring to detect regional or intermittent failures more promptly. # **Detailed Timeline of Events** - 2025-11-28 - 11:30 (UTC) - Cloudflare begins having issues routing and serving requests across all Basis Theory domains (js.basistheory.com, api.basistheory.com, 3ds.basistheory.com). - 11:57 (UTC) - First internal page fired due to synthetic test failures. - 12:07 (UTC) - Some successful calls began sporadically succeeding across all products (intermittent recovery). - 12:23 (UTC) - Vault traffic degraded , ~50% success rate observed. - 12:47 (UTC) - Some successful calls began sporadically succeeding across all products (intermittent recovery). - 13:00 (UTC) - Full global outage , 100% of customer traffic failing. - Team initiates disaster recovery efforts under assumption Cloudflare load balancer and routing layers are failing. Plan established to bypass Cloudflare entirely. - 13:40 (UTC) - Attempted mitigation: Updated Cloudflare DNS to route to api.basistheory.com to route traffic directly to AWS. - Assumption was this would restore ~80% of US traffic; no improvement observed. It was deemed this chagne had **no effect**, likely because Cloudflare’s DNS changes were not fully propagating due to their degraded edge. - 14:08 (UTC) - Cloudflare routing re-enabled; error rates spike due to system retries and customer retry logic. - 14:30 (UTC) - Cloudflare systems began to recover. Error rates drop below 50%. - 14:40 (UTC) - Error rates drop below 15% and continue declining over the next 40 minutes. - 17:30 (UTC) - All customer requests succeeding; service fully restored. # **Root Cause Explanation** Cloudflare experienced a **global network failure** affecting DNS resolution, edge routing, WAF processing, and global load balancing. Their incident summary is here, although the root cause of the Cloudflare outage does not change the fact that we have a single point of failure in our edge routing - https://blog.cloudflare.com/18-november-2025-outage/. Below is a description of why this outage caused a Basis Theory outage. Basis Theory’s architecture relies on Cloudflare for: - **Public DNS** - **Global CDN** - **WAF** - **Traffic steering & routing** - **Load balancing** Because Cloudflare is the exclusive ingress path for all customer traffic, their global outage made all Basis Theory products unreachable, even though our AWS infrastructure remained fully healthy. Efforts to bypass Cloudflare through DNS changes were unsuccessful because Cloudflare’s internal DNS services were also degraded and unable to propagate changes. This prevented traffic from being routed directly to AWS. As a result, **Cloudflare became a critical single point of failure**, and Basis Theory had no alternative routing path available during the incident. # **What Worked and What Didn’t** ### **What Worked** - Internal alerting correctly detected failures across all products - Core infrastructure remained healthy and fully operational - Internal traffic and health checks confirmed backend health throughout ### **What Didn’t Work** - Synthetic monitoring provided false confidence early in the event due to intermittent Cloudflare recoveries - DNS Proxy-disable/fallback routing could not propagate during Cloudflare’s global degradation - No independent DNS provider or edge-bypass path existed to re-route traffic during Cloudflare failure - Cloudflare’s full ownership of DNS + routing + proxying created complete ingress lock-in - Basis Theory’s Status page was also impacted by these issues, causing a delay in communication on the impact of our systems on our customers. # **Future Prevention & Next Steps** To ensure an outage of this scale cannot recur, we are implementing several improvements across our operational processes, monitoring strategy, and edge architecture. 1. **Edge Architecture Redesign** We are re-architecting our ingress and edge routing strategy to eliminate Cloudflare as a single point of failure. This includes introducing multi-provider redundancy, improving automated failover capabilities, and enforcing service-level objectives that trigger autonomous routing changes when error or latency thresholds are exceeded. 2. **Operational Readiness & Response Improvements** - We have updated our Support and Operations action plans to include clearer escalation paths and explicit procedures for cases where our status page or other external communication systems are degraded. - We have validated and refined our disaster-recovery playbooks to ensure we can rapidly execute Cloudflare Edge bypass procedures. 3. **Monitoring and Alerting Enhancements** We are reconfiguring our monitors to shorten detection windows and improve sensitivity to intermittent, region-specific failures. This will enable us to identify partial outages more quickly and make more informed decisions earlier during edge instability. 4. **Customer-Controlled Bypass Path** We are developing a fully supported, customer-accessible bypass option that can be activated during severe edge degradation. This will provide customers with a direct path to our infrastructure when needed, even if an edge provider is impaired, and will serve as a last-resort mechanism to ensure traffic can still reach us if other mitigation steps fail. These changes aim to significantly reduce dependency concentration, accelerate detection and mitigation, and ensure reliable access to Basis Theory services even in the event of large-scale external vendor outages. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Thu, 20 Nov 2025 00:37:00 -0000 https://status.basistheory.com/incident/768929#d950831d9381f03403ce16d996e2472d2ed27b74050e7f017009fb609b5a6c5e # **Problem Description, Impact, and Resolution** At approximately **11:30 UTC on November 18, 2025**, customers began experiencing failures when calling any Basis Theory service due to a **global outage at our regional edge provider, Cloudflare**. The outage impacted DNS resolution, edge routing, and WAF processing across Cloudflare’s network, preventing customer requests from reaching our public endpoints. During the peak of the event, Cloudflare returned 500-series errors for **100% of inbound traffic**, resulting in a complete outage across all products, including Vault, Card Management Services, Elements, and 3DS. Customers reliant on Basis Theory for payment collection or card processing were unable to complete those operations during the outage window. As Cloudflare’s network partially recovered at various points, some customers experienced intermittent success, leading to inconsistent behavior across geographies and increased retry traffic. Our internal systems, including Vault and all supporting services, remained fully healthy and available throughout the incident, with no degradation in performance or capacity. However, because Cloudflare serves as the first-hop ingress for all customer traffic, the global failure of their DNS and routing layers prevented any requests from reaching our infrastructure. During the incident, we used our disaster recovery playbook to bypass Cloudflare by disabling Cloudflare edge processing and routing traffic directly to our AWS load balancers. These changes were unsuccessful, as Cloudflare’s degraded services appeared to prevent DNS updates from propagating externally. This left no viable alternative path to reroute production traffic during the outage. Cloudflare’s global services began to gradually recover starting at **14:30 UTC**, leading to a steady decline in error rates. By **17:30 UTC**, Cloudflare’s network had fully stabilized, and all traffic to Basis Theory was routing typically, with request success rates returning to 100%. No additional corrective actions were required on our side once Cloudflare restored its global network, and no further customer impact was observed. To prevent this type of outage in the future, we are redesigning our edge architecture to eliminate Cloudflare as a single point of failure, strengthening our operational playbooks for faster edge bypass during vendor outages, and enhancing our monitoring to detect regional or intermittent failures more promptly. # **Detailed Timeline of Events** - 2025-11-28 - 11:30 (UTC) - Cloudflare begins having issues routing and serving requests across all Basis Theory domains (js.basistheory.com, api.basistheory.com, 3ds.basistheory.com). - 11:57 (UTC) - First internal page fired due to synthetic test failures. - 12:07 (UTC) - Some successful calls began sporadically succeeding across all products (intermittent recovery). - 12:23 (UTC) - Vault traffic degraded , ~50% success rate observed. - 12:47 (UTC) - Some successful calls began sporadically succeeding across all products (intermittent recovery). - 13:00 (UTC) - Full global outage , 100% of customer traffic failing. - Team initiates disaster recovery efforts under assumption Cloudflare load balancer and routing layers are failing. Plan established to bypass Cloudflare entirely. - 13:40 (UTC) - Attempted mitigation: Updated Cloudflare DNS to route to api.basistheory.com to route traffic directly to AWS. - Assumption was this would restore ~80% of US traffic; no improvement observed. It was deemed this chagne had **no effect**, likely because Cloudflare’s DNS changes were not fully propagating due to their degraded edge. - 14:08 (UTC) - Cloudflare routing re-enabled; error rates spike due to system retries and customer retry logic. - 14:30 (UTC) - Cloudflare systems began to recover. Error rates drop below 50%. - 14:40 (UTC) - Error rates drop below 15% and continue declining over the next 40 minutes. - 17:30 (UTC) - All customer requests succeeding; service fully restored. # **Root Cause Explanation** Cloudflare experienced a **global network failure** affecting DNS resolution, edge routing, WAF processing, and global load balancing. Their incident summary is here, although the root cause of the Cloudflare outage does not change the fact that we have a single point of failure in our edge routing - https://blog.cloudflare.com/18-november-2025-outage/. Below is a description of why this outage caused a Basis Theory outage. Basis Theory’s architecture relies on Cloudflare for: - **Public DNS** - **Global CDN** - **WAF** - **Traffic steering & routing** - **Load balancing** Because Cloudflare is the exclusive ingress path for all customer traffic, their global outage made all Basis Theory products unreachable, even though our AWS infrastructure remained fully healthy. Efforts to bypass Cloudflare through DNS changes were unsuccessful because Cloudflare’s internal DNS services were also degraded and unable to propagate changes. This prevented traffic from being routed directly to AWS. As a result, **Cloudflare became a critical single point of failure**, and Basis Theory had no alternative routing path available during the incident. # **What Worked and What Didn’t** ### **What Worked** - Internal alerting correctly detected failures across all products - Core infrastructure remained healthy and fully operational - Internal traffic and health checks confirmed backend health throughout ### **What Didn’t Work** - Synthetic monitoring provided false confidence early in the event due to intermittent Cloudflare recoveries - DNS Proxy-disable/fallback routing could not propagate during Cloudflare’s global degradation - No independent DNS provider or edge-bypass path existed to re-route traffic during Cloudflare failure - Cloudflare’s full ownership of DNS + routing + proxying created complete ingress lock-in - Basis Theory’s Status page was also impacted by these issues, causing a delay in communication on the impact of our systems on our customers. # **Future Prevention & Next Steps** To ensure an outage of this scale cannot recur, we are implementing several improvements across our operational processes, monitoring strategy, and edge architecture. 1. **Edge Architecture Redesign** We are re-architecting our ingress and edge routing strategy to eliminate Cloudflare as a single point of failure. This includes introducing multi-provider redundancy, improving automated failover capabilities, and enforcing service-level objectives that trigger autonomous routing changes when error or latency thresholds are exceeded. 2. **Operational Readiness & Response Improvements** - We have updated our Support and Operations action plans to include clearer escalation paths and explicit procedures for cases where our status page or other external communication systems are degraded. - We have validated and refined our disaster-recovery playbooks to ensure we can rapidly execute Cloudflare Edge bypass procedures. 3. **Monitoring and Alerting Enhancements** We are reconfiguring our monitors to shorten detection windows and improve sensitivity to intermittent, region-specific failures. This will enable us to identify partial outages more quickly and make more informed decisions earlier during edge instability. 4. **Customer-Controlled Bypass Path** We are developing a fully supported, customer-accessible bypass option that can be activated during severe edge degradation. This will provide customers with a direct path to our infrastructure when needed, even if an edge provider is impaired, and will serve as a last-resort mechanism to ensure traffic can still reach us if other mitigation steps fail. These changes aim to significantly reduce dependency concentration, accelerate detection and mitigation, and ensure reliable access to Basis Theory services even in the event of large-scale external vendor outages. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 17:13:00 -0000 https://status.basistheory.com/incident/768929#36e1dfeeb0574189bf8cec857108b0a72dce1445d69b64ca5ea9aece62e5196e Services have been fully restored. A full RCA will be provided within 24 hours. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 17:13:00 -0000 https://status.basistheory.com/incident/768929#36e1dfeeb0574189bf8cec857108b0a72dce1445d69b64ca5ea9aece62e5196e Services have been fully restored. A full RCA will be provided within 24 hours. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 17:13:00 -0000 https://status.basistheory.com/incident/768929#36e1dfeeb0574189bf8cec857108b0a72dce1445d69b64ca5ea9aece62e5196e Services have been fully restored. A full RCA will be provided within 24 hours. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 17:13:00 -0000 https://status.basistheory.com/incident/768929#36e1dfeeb0574189bf8cec857108b0a72dce1445d69b64ca5ea9aece62e5196e Services have been fully restored. A full RCA will be provided within 24 hours. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 16:25:00 -0000 https://status.basistheory.com/incident/768929#6a640498efede76a54852cd38f560d937805410d36b3567525332da99ef61890 Nearly all traffic has been restored. Less than 2% of traffic is still seeing intermittent 500 errors; our edge service provider is continuing to restore full traffic. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 16:25:00 -0000 https://status.basistheory.com/incident/768929#6a640498efede76a54852cd38f560d937805410d36b3567525332da99ef61890 Nearly all traffic has been restored. Less than 2% of traffic is still seeing intermittent 500 errors; our edge service provider is continuing to restore full traffic. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 16:25:00 -0000 https://status.basistheory.com/incident/768929#6a640498efede76a54852cd38f560d937805410d36b3567525332da99ef61890 Nearly all traffic has been restored. Less than 2% of traffic is still seeing intermittent 500 errors; our edge service provider is continuing to restore full traffic. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 16:25:00 -0000 https://status.basistheory.com/incident/768929#6a640498efede76a54852cd38f560d937805410d36b3567525332da99ef61890 Nearly all traffic has been restored. Less than 2% of traffic is still seeing intermittent 500 errors; our edge service provider is continuing to restore full traffic. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 15:08:00 -0000 https://status.basistheory.com/incident/768929#b94f4bfea746bfd5be0e26d84f273264e9e50cadc1fd98735cc3c475f670ec8e We are seeing a large portion of our requests resolve and return successful responses. We will continue to monitor the situation closely. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 15:08:00 -0000 https://status.basistheory.com/incident/768929#b94f4bfea746bfd5be0e26d84f273264e9e50cadc1fd98735cc3c475f670ec8e We are seeing a large portion of our requests resolve and return successful responses. We will continue to monitor the situation closely. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 15:08:00 -0000 https://status.basistheory.com/incident/768929#b94f4bfea746bfd5be0e26d84f273264e9e50cadc1fd98735cc3c475f670ec8e We are seeing a large portion of our requests resolve and return successful responses. We will continue to monitor the situation closely. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 15:08:00 -0000 https://status.basistheory.com/incident/768929#b94f4bfea746bfd5be0e26d84f273264e9e50cadc1fd98735cc3c475f670ec8e We are seeing a large portion of our requests resolve and return successful responses. We will continue to monitor the situation closely. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 14:34:00 -0000 https://status.basistheory.com/incident/768929#ae2259f925eb7215cf3afd2e48ceac489b240fe21f78d821fcba3fd05900fe7f We are seeing a large portion of our requests resolve and return successful responses. We will continue to monitor the situation closely. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 14:34:00 -0000 https://status.basistheory.com/incident/768929#ae2259f925eb7215cf3afd2e48ceac489b240fe21f78d821fcba3fd05900fe7f We are seeing a large portion of our requests resolve and return successful responses. We will continue to monitor the situation closely. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 14:34:00 -0000 https://status.basistheory.com/incident/768929#ae2259f925eb7215cf3afd2e48ceac489b240fe21f78d821fcba3fd05900fe7f We are seeing a large portion of our requests resolve and return successful responses. We will continue to monitor the situation closely. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 14:34:00 -0000 https://status.basistheory.com/incident/768929#ae2259f925eb7215cf3afd2e48ceac489b240fe21f78d821fcba3fd05900fe7f We are seeing a large portion of our requests resolve and return successful responses. We will continue to monitor the situation closely. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 13:10:00 -0000 https://status.basistheory.com/incident/768929#4f26aa5468d4b7b76f0d25a3dce3aa878cc0ec285f039919bb7a146b1629767b Customers are still seeing elevated 500 errors. We are continuing to investigate alternatives to edge routing outage. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 13:10:00 -0000 https://status.basistheory.com/incident/768929#4f26aa5468d4b7b76f0d25a3dce3aa878cc0ec285f039919bb7a146b1629767b Customers are still seeing elevated 500 errors. We are continuing to investigate alternatives to edge routing outage. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 13:10:00 -0000 https://status.basistheory.com/incident/768929#4f26aa5468d4b7b76f0d25a3dce3aa878cc0ec285f039919bb7a146b1629767b Customers are still seeing elevated 500 errors. We are continuing to investigate alternatives to edge routing outage. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 13:10:00 -0000 https://status.basistheory.com/incident/768929#4f26aa5468d4b7b76f0d25a3dce3aa878cc0ec285f039919bb7a146b1629767b Customers are still seeing elevated 500 errors. We are continuing to investigate alternatives to edge routing outage. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 12:52:00 -0000 https://status.basistheory.com/incident/768929#096b22a035f202b3faaac69392a069624fcf14580342ad7e733a0238eac456ca Our systems are still experiencing active degradation, and some customers are encountering intermittent issues. We are monitoring the situation and actively working on a bypass. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 12:52:00 -0000 https://status.basistheory.com/incident/768929#096b22a035f202b3faaac69392a069624fcf14580342ad7e733a0238eac456ca Our systems are still experiencing active degradation, and some customers are encountering intermittent issues. We are monitoring the situation and actively working on a bypass. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 12:52:00 -0000 https://status.basistheory.com/incident/768929#096b22a035f202b3faaac69392a069624fcf14580342ad7e733a0238eac456ca Our systems are still experiencing active degradation, and some customers are encountering intermittent issues. We are monitoring the situation and actively working on a bypass. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 12:52:00 -0000 https://status.basistheory.com/incident/768929#096b22a035f202b3faaac69392a069624fcf14580342ad7e733a0238eac456ca Our systems are still experiencing active degradation, and some customers are encountering intermittent issues. We are monitoring the situation and actively working on a bypass. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 11:37:00 -0000 https://status.basistheory.com/incident/768929#839f543678299a1b5aa3397d8d4efed0e10cba91972b31797572e35f433628d4 We are experiencing intermittent failures at our edge service provider, resulting in an increased number of 500 errors. We are monitoring the situation and will report back as soon as we have an update. Currently, we are seeing a stable platform. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 11:37:00 -0000 https://status.basistheory.com/incident/768929#839f543678299a1b5aa3397d8d4efed0e10cba91972b31797572e35f433628d4 We are experiencing intermittent failures at our edge service provider, resulting in an increased number of 500 errors. We are monitoring the situation and will report back as soon as we have an update. Currently, we are seeing a stable platform. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 11:37:00 -0000 https://status.basistheory.com/incident/768929#839f543678299a1b5aa3397d8d4efed0e10cba91972b31797572e35f433628d4 We are experiencing intermittent failures at our edge service provider, resulting in an increased number of 500 errors. We are monitoring the situation and will report back as soon as we have an update. Currently, we are seeing a stable platform. Intermittent 500 Errors https://status.basistheory.com/incident/768929 Tue, 18 Nov 2025 11:37:00 -0000 https://status.basistheory.com/incident/768929#839f543678299a1b5aa3397d8d4efed0e10cba91972b31797572e35f433628d4 We are experiencing intermittent failures at our edge service provider, resulting in an increased number of 500 errors. We are monitoring the situation and will report back as soon as we have an update. Currently, we are seeing a stable platform. API Degraded https://status.basistheory.com/incident/710086 Thu, 21 Aug 2025 20:06:00 -0000 https://status.basistheory.com/incident/710086#8829a2c6019223af0bb28b8712ecd887f46b4327ffac60d20372eca1487c13fb After investigation, we confirmed the issue stemmed from a network performance problem with one of our edge cloud providers in the Ashburn, VA (IAD) region. This impacted a portion of traffic routed through this area, but our core infrastructure and services remained unaffected. The issue has been fully resolved as of 3:48 PM EST, and all traffic is now operating normally. We are continuing to monitor for stability. If you experience any issues or have questions, please contact us at support@basistheory.com. API Degraded https://status.basistheory.com/incident/710086 Thu, 21 Aug 2025 18:29:00 -0000 https://status.basistheory.com/incident/710086#0f96ef8521e14fd546403c26f2fc3a6abb4039c837d3eaba423415a1dbce9e25 At approximately 2:00 PM EST, we identified an issue where some HTTP requests may have failed to connect to Basis Theory. Our team is actively investigating the matter to determine the root cause and ensure a swift resolution. At this time, we believe there is a network issue on certain colo’s on edge service routing and not core services within our core services or infrastructure - this only affects certain regional colos. API - 503 Service Unavailable https://status.basistheory.com/incident/534944 Wed, 26 Mar 2025 21:57:00 -0000 https://status.basistheory.com/incident/534944#0c3f734a8fd468d4a7ca2a9f70f64d77677b347d25302fbae5919f9f9c9d9c66 # Problem Description, Impact, and Resolution At 14:22 UTC on March 26th, 2025, the API in our primary region across both the US and EU services became unavailable. After a few seconds of unavailability, our services correctly redirected traffic to another region and successfully served traffic until 14:24 UTC, when these services also became unavailable. This resulted in customers receiving a `503 Service Unavailable` status code, preventing them from interacting with APIs. The issue was caused by a policy change that removed access to an asset before it had been entirely de-provisioned from the health check of the underlying service; this caused the health check to fail and our system to remove it from rotation, causing a cascading error to our API. We rolled back the policy change at 14:25 UTC, and the issue was resolved at 14:32 UTC. # Mitigation Steps and Future Preventative Measures We have fixed the issue with the policy change and successfully released the platform update. But, to ensure a similar issue does not occur again, we are actively working to solve the following as immediate fixes within our platform: 1. In addition to our existing automated smoke testing release process, we will ensure these policy changes have specific smoke tests in our secondary region before being promoted to our primary region. We will also review our entire portfolio of services and ensure any missing smoke tests are added to remove the chance of complete regional cascading failures in the future. 2. We are reducing our primary dependency on this service to reduce the impact and blast radius it can expose to our API. 3. In addition to the automated steps above, we will review and update our code review processes and procedures to ensure manual checks are in place for policy changes. We will also be investigating and prioritizing the following changes over the next few months: 1. Introducing a new environment will further stage changes within our deployment process and create another smoke test gate within our release lifecycle. 2. Simplifying coordination of service and policy deployment into a single deployable API - 503 Service Unavailable https://status.basistheory.com/incident/534944 Wed, 26 Mar 2025 21:57:00 -0000 https://status.basistheory.com/incident/534944#0c3f734a8fd468d4a7ca2a9f70f64d77677b347d25302fbae5919f9f9c9d9c66 # Problem Description, Impact, and Resolution At 14:22 UTC on March 26th, 2025, the API in our primary region across both the US and EU services became unavailable. After a few seconds of unavailability, our services correctly redirected traffic to another region and successfully served traffic until 14:24 UTC, when these services also became unavailable. This resulted in customers receiving a `503 Service Unavailable` status code, preventing them from interacting with APIs. The issue was caused by a policy change that removed access to an asset before it had been entirely de-provisioned from the health check of the underlying service; this caused the health check to fail and our system to remove it from rotation, causing a cascading error to our API. We rolled back the policy change at 14:25 UTC, and the issue was resolved at 14:32 UTC. # Mitigation Steps and Future Preventative Measures We have fixed the issue with the policy change and successfully released the platform update. But, to ensure a similar issue does not occur again, we are actively working to solve the following as immediate fixes within our platform: 1. In addition to our existing automated smoke testing release process, we will ensure these policy changes have specific smoke tests in our secondary region before being promoted to our primary region. We will also review our entire portfolio of services and ensure any missing smoke tests are added to remove the chance of complete regional cascading failures in the future. 2. We are reducing our primary dependency on this service to reduce the impact and blast radius it can expose to our API. 3. In addition to the automated steps above, we will review and update our code review processes and procedures to ensure manual checks are in place for policy changes. We will also be investigating and prioritizing the following changes over the next few months: 1. Introducing a new environment will further stage changes within our deployment process and create another smoke test gate within our release lifecycle. 2. Simplifying coordination of service and policy deployment into a single deployable API - 503 Service Unavailable https://status.basistheory.com/incident/534944 Wed, 26 Mar 2025 14:33:00 -0000 https://status.basistheory.com/incident/534944#4aaff324b294babf0bea6fed7eb20740eb9e070cb7e1796fc510baa10f25e490 This has been resolved. We will follow up with an RCA on this issue within the next 24 hours. API - 503 Service Unavailable https://status.basistheory.com/incident/534944 Wed, 26 Mar 2025 14:33:00 -0000 https://status.basistheory.com/incident/534944#4aaff324b294babf0bea6fed7eb20740eb9e070cb7e1796fc510baa10f25e490 This has been resolved. We will follow up with an RCA on this issue within the next 24 hours. API - 503 Service Unavailable https://status.basistheory.com/incident/534944 Wed, 26 Mar 2025 14:27:00 -0000 https://status.basistheory.com/incident/534944#e42f044f3c3c7f30b0f469f7f86a571bf863126829165af23556d5ee283da0f0 API is currently having issues serving requests. API - 503 Service Unavailable https://status.basistheory.com/incident/534944 Wed, 26 Mar 2025 14:27:00 -0000 https://status.basistheory.com/incident/534944#e42f044f3c3c7f30b0f469f7f86a571bf863126829165af23556d5ee283da0f0 API is currently having issues serving requests. JS Elements CDN Intermittent Loading Issues https://status.basistheory.com/incident/516835 Fri, 21 Feb 2025 04:00:00 -0000 https://status.basistheory.com/incident/516835#934e26e80ef97295228ce929533133ad9a43e0dfbf7f1e511b3dc1604b856132 # Problem Description, Impact, and Resolution At 8:30 p.m. CST on February 20, 2025, we observed increased errors in loading Basis Theory Elements in our `js.basistheory.com` service. This prevented customers from loading Elements when using the CDN-hosted version. The issue was caused by an incorrect base URL configuration when building the static version of `basis-theory-js` for the `js.basistheory.com` CDN version, which was deployed at 4:00 p.m. CST and slowly rolled out to customers over the next 3 hours as the cache was invalidated. This resulted in Elements not loading correctly for customers relying on the CDN version. By 9:00 p.m. CST, we determined that the problem affected all customers using the CDN version. We deployed a fix at 9:57 p.m. CST, and by 10:00 p.m. CST, errors immediately decreased as our CDN cache and browser caches began to retrieve the new version. # Mitigation Steps and Future Preventative Measures To prevent this issue, we have already implemented versioning for our packaged Elements ([web-elements](https://developers.basistheory.com/docs/sdks/web/web-elements/) and [react-elements](https://developers.basistheory.com/docs/sdks/web/react/)) and all versions (down to patch) of our new [web-elements](https://developers.basistheory.com/docs/sdks/web/web-elements/) package in the CDN - enabling customers to pin to a specific version of Elements from our CDN. Additionally, we will immediately add monitoring for the following: - Enhance our automated smoke tests for [`js.basistheory.com`](http://js.basistheory.com) elements in our pre-deploy steps. - Implement enhanced monitoring for failures to load files from `js.basistheory.com`. - Increase visibility on pinning specific patch versions of `web-elements` in [developer documentation](https://developers.basistheory.com) and add a check for version pinning to our [production checklist](https://developers.basistheory.com/docs/guides/production-checklist). JS Elements CDN Intermittent Loading Issues https://status.basistheory.com/incident/516835 Fri, 21 Feb 2025 02:30:00 -0000 https://status.basistheory.com/incident/516835#c0744c32bedb3470ad15b87d775886ae1bd059ba4ee4cacab0acc5c5a29151bd At 8:30 p.m. CST on February 20, 2025, we observed increased errors in loading Basis Theory Elements in our [js.basistheory.com](http://js.basistheory.com/) service. This prevented customers from loading Elements when using the CDN-hosted version. NOTE: This does not affect customers using our NPM modules to load elements. High Latencies https://status.basistheory.com/incident/417729 Thu, 22 Aug 2024 18:40:00 -0000 https://status.basistheory.com/incident/417729#e2fdcb9706d9143afa698c34efbdb509370b9e64237e0505944fd3ecef421ac7 Latencies have returned to normal High Latencies https://status.basistheory.com/incident/417729 Thu, 22 Aug 2024 18:28:00 -0000 https://status.basistheory.com/incident/417729#ebc67eb9a8e4afa43d057ac49bb51934b248069c2c45dcabd9f19aaa9695e5b8 Some customers are experiencing high latencies while creating tokens. Global Portal Maintenance https://status.basistheory.com/incident/403217 Fri, 26 Jul 2024 02:30:00 -0000 https://status.basistheory.com/incident/403217#f3ce7ffeab926633db7a18040475a08730ffab1491c0a331dc2870b0d222ea95 Maintenance completed Global Portal Maintenance https://status.basistheory.com/incident/403217 Fri, 26 Jul 2024 02:00:00 -0000 https://status.basistheory.com/incident/403217#c1f1ffcb61ad2578567b63ba596d5dcd16801d3bb74f7532d327dfc21bb65ab0 Routine maintenance is scheduled for 10 PM EDT on July 25th, 2024. During this time, the Portal may be intermittently inaccessible, and users may experience logouts. The maintenance is expected to last 10-15 minutes. High Request Latencies in US https://status.basistheory.com/incident/294848 Fri, 01 Dec 2023 21:39:00 -0000 https://status.basistheory.com/incident/294848#8a6ba053fdfdbd361f4a93dcf736555b84cc67451494c1991e0d0569575ed655 ### Problem Description, Impact, and Resolution At 18:08 UTC on November 29th customers in our US region started experiencing higher than normal latencies, at points reaching as high as 5 seconds, for all requests and some requests were failing with `504` or `499` status codes. During this incident, 7.1% of all requests failed. The immediate cause of the incident was a drastic increase in CPU usage on our API’s primary database in the US region because there was an issue with data indexing leading to some queries requiring increased CPU and Memory. At 18:33 UTC we deployed a maintenance solution that optimized the indexes and we saw an immediate decrease in CPU usage and latencies though it took until 19:29 UTC for the fix to be fully rolled out across the US region and for the incident to be fully resolved with latencies back to normal ranges. The root cause for the incident was not immediately apparent and we prioritized a process to regularly optimize indexes as system load and volume increase and we put in place additional monitoring and alerts triggers for CPU level thresholds in our databases. The following day, on November 30th at 18:10 UTC, a similar increase in CPU usage occurred although the optimized indexes and monitoring in place enabled us to take mitigating actions to ensure that there was no impact on customers. Following the second occurrence, we determined the root cause for this behavior in the database was an upgrade in our client library on November 29th resulting in queries that our database could not optimize for larger volume customers. We immediately rolled back the upgrade and have not seen a re-occurrence of the issue. ### Mitigation Steps and Future Preventative Measures **To ensure this issue does not occur again we have:** - Increased the volume of data our automated tests use to ensure we catch any query degradations - Regularly scheduled index maintenance **To ensure similar issues to this one do not arise in the future, we will be:** - Updating our load and acceptance testing plans with significantly more aggressive loads and data volumes. - Increasing visibility and monitoring of low-level metrics that may indicate similar problems. - Increased monitoring and alerting around degraded latencies experienced by our customers. High Request Latencies in US https://status.basistheory.com/incident/294848 Wed, 29 Nov 2023 19:22:00 -0000 https://status.basistheory.com/incident/294848#4e30ae2e429a480c63a8ee52357b83a6aad7d69a640024e8853f2ad159f07f09 Systems are fully operational. High Request Latencies in US https://status.basistheory.com/incident/294848 Wed, 29 Nov 2023 18:08:00 -0000 https://status.basistheory.com/incident/294848#596e73b7de6687e58504fc71d37f166fa624727918df5672327a98d2a315d0da We're experiencing instability and are investigating. Reactor Networking https://status.basistheory.com/incident/289959 Fri, 17 Nov 2023 20:00:00 -0000 https://status.basistheory.com/incident/289959#2bf8068d16d477de34044adf38bb368568db14b1f1b7eb1377aa1400b114c6ff # Problem Description, Impact, and Resolution At 19:18 UTC on Nov 17, 2023 we observed networking issues on the Reactors service in the US region, which resulted in intermittent 404 errors on token retrieves. The issue was caused by an internal and private DNS route update that temporarily broke the traffic route for the `bt` instantiated Basis Theory SDK in Reactors. We immediately updated the DNS record to route all traffic to the correct destination and the issue was fully resolved at 20:00 UTC. # Mitigation Steps and Future Preventative Measures To mitigate the recurrence of this issue, we have decommissioned the specific route configuration within our reactor services. This action will prevent any potential disruptions in future DNS changes when leveraging the `bt` property within Reactors. United States regional maintenance https://status.basistheory.com/incident/285044 Mon, 13 Nov 2023 04:00:00 -0000 https://status.basistheory.com/incident/285044#4a8b51890d8942f6fa3dbec361430addd7f1dc7575177baafdac50a3adf70b3f Maintenance completed United States regional maintenance https://status.basistheory.com/incident/285044 Mon, 13 Nov 2023 04:00:00 -0000 https://status.basistheory.com/incident/285044#4a8b51890d8942f6fa3dbec361430addd7f1dc7575177baafdac50a3adf70b3f Maintenance completed United States regional maintenance https://status.basistheory.com/incident/285044 Mon, 13 Nov 2023 01:00:00 -0000 https://status.basistheory.com/incident/285044#d5dd56f0216021dd117cff7fb7b1b9c1fbd5c663d6a2f0b51023e6165e47d8f2 Basis Theory will be performing a scheduled system upgrade on Monday, November 13, 2023, from 01:00 UTC to 04:00 UTC. (Sunday, November 12th, 2023 from 8:00 PM ET to 11:00 PM ET) We do not anticipate any outage as a part of this upgrade, although a brief increase in latency and errors may occur for a short number of seconds. United States regional maintenance https://status.basistheory.com/incident/285044 Mon, 13 Nov 2023 01:00:00 -0000 https://status.basistheory.com/incident/285044#d5dd56f0216021dd117cff7fb7b1b9c1fbd5c663d6a2f0b51023e6165e47d8f2 Basis Theory will be performing a scheduled system upgrade on Monday, November 13, 2023, from 01:00 UTC to 04:00 UTC. (Sunday, November 12th, 2023 from 8:00 PM ET to 11:00 PM ET) We do not anticipate any outage as a part of this upgrade, although a brief increase in latency and errors may occur for a short number of seconds. Planned System Upgrades https://status.basistheory.com/incident/214366 Fri, 02 Jun 2023 03:00:00 -0000 https://status.basistheory.com/incident/214366#5c09f790b6b79507c13e530b16680b1598ceb74d3b54f19ecda91ac0ea9de133 Maintenance completed Planned System Upgrades https://status.basistheory.com/incident/214366 Fri, 02 Jun 2023 02:00:00 -0000 https://status.basistheory.com/incident/214366#635aaf622f9ce047507613a7481930c26149077572fa7add7577f36ea31b15b1 Basis Theory will be performing a scheduled system upgrade on Friday, June 2, 2023, from 0200 UTC to 0300 UTC. (Thursday, June 1st, 2023 from 2200 ET to 2300 ET) We do not anticipate any outage or service interruption as a part of this upgrade. Intermittent issues with Web Elements https://status.basistheory.com/incident/188429 Thu, 23 Mar 2023 18:29:00 -0000 https://status.basistheory.com/incident/188429#a8739f50ccab6ab9ab36d269d9e69ede128d445a4bc8eae52745a7dcdc375b51 Between 14:50 UTC and 15:19 UTC some customers experienced intermittent errors resulting in Web Elements being slow to load for some of their users. This issue was resolved but we will be investigating further and will release an RCA. Database maintenance https://status.basistheory.com/incident/129595 Sat, 29 Oct 2022 15:00:00 -0000 https://status.basistheory.com/incident/129595#b3581bf999eef79c6b6fa14126052afaf82f31bb65a2418afaa7a9fdfaf2aa95 Maintenance completed Database maintenance https://status.basistheory.com/incident/129595 Sat, 29 Oct 2022 13:00:00 -0000 https://status.basistheory.com/incident/129595#b6d97aed5c900fb4a962ebef5439a35a5c40927e96a3479f61e017e3e06129cd Basis Theory will be performing a scheduled system upgrade on Saturday, October 29, 2022 from 1300 UTC to 1600 UTC. We do not anticipate any outage or service interruption as a part of this upgrade. Intermittent connection issues https://status.basistheory.com/incident/128191 Thu, 20 Oct 2022 21:12:00 -0000 https://status.basistheory.com/incident/128191#25da18c8edf93a140370ddfb6ab0bba63753c0dc7c4011a63522c5daa9d7f7e1 # Problem Description, Impact, and Resolution At 20:05 UTC on October 20, 2022 we observed 500 errors coming from creating Tokens in all regions, which resulted in intermittent issues for some customers to create and detokenize tokens. The issue was caused by a manual configuration change during a deployment for new production infrastructure. At 20:30 UTC we identified the issue was a manual configuration change and the issue was fully resolved at 20:35 UTC. # Mitigation Steps and Future Preventative Measures We have updated our checklist for manual changes to production systems that would prevent this type of configuration error and ensure that a similar issue does not occur in the future. Intermittent connection issues https://status.basistheory.com/incident/128191 Thu, 20 Oct 2022 20:34:00 -0000 https://status.basistheory.com/incident/128191#88b60501eeeecce69e68a2f45e3454020d28457773e0e54a78c272b65b369765 We saw intermittent connection issues that would have impacted creating Tokens. The issue has now been resolved 7 Sept 2022 - RCA https://status.basistheory.com/incident/116207 Wed, 07 Sep 2022 19:24:00 -0000 https://status.basistheory.com/incident/116207#fa9cbbec872d6527e552bde71373730efbbf0a061ab4b8c5cae373318d480d35 Problem Description, Impact, and Resolution At 16:25 UTC on September 7, 2022 we observed intermittent issues in the API and Elements services as well as in the Portal in parts of the US, which resulted in some requests from customer applications being unable to access the service in those regions. The issue was caused by an incident with a provider for our CDN. We pushed a fix to route around the provider for the API and Elements at 17:22pm UTC and saw that the issue was fully resolved for the API though there continued to be intermittent issues with Elements. We made an additional change to the Elements service to route around the issue and it was fully resolved at 18:35 UTC. Additional changes to the Portal were made to fully resolve the incident at 19:11 UTC. Mitigation Steps and Future Preventative Measures To ensure this issue does not occur again we have updated our runbook for routing around this service to ensure it can be quickly resolved for the Elements service and the Admin Portal. We will also be reviewing our dependence on this service. 7 Sept 2022 - RCA https://status.basistheory.com/incident/116207 Wed, 07 Sep 2022 19:24:00 -0000 https://status.basistheory.com/incident/116207#fa9cbbec872d6527e552bde71373730efbbf0a061ab4b8c5cae373318d480d35 Problem Description, Impact, and Resolution At 16:25 UTC on September 7, 2022 we observed intermittent issues in the API and Elements services as well as in the Portal in parts of the US, which resulted in some requests from customer applications being unable to access the service in those regions. The issue was caused by an incident with a provider for our CDN. We pushed a fix to route around the provider for the API and Elements at 17:22pm UTC and saw that the issue was fully resolved for the API though there continued to be intermittent issues with Elements. We made an additional change to the Elements service to route around the issue and it was fully resolved at 18:35 UTC. Additional changes to the Portal were made to fully resolve the incident at 19:11 UTC. Mitigation Steps and Future Preventative Measures To ensure this issue does not occur again we have updated our runbook for routing around this service to ensure it can be quickly resolved for the Elements service and the Admin Portal. We will also be reviewing our dependence on this service. 7 Sept 2022 - RCA https://status.basistheory.com/incident/116207 Wed, 07 Sep 2022 19:24:00 -0000 https://status.basistheory.com/incident/116207#fa9cbbec872d6527e552bde71373730efbbf0a061ab4b8c5cae373318d480d35 Problem Description, Impact, and Resolution At 16:25 UTC on September 7, 2022 we observed intermittent issues in the API and Elements services as well as in the Portal in parts of the US, which resulted in some requests from customer applications being unable to access the service in those regions. The issue was caused by an incident with a provider for our CDN. We pushed a fix to route around the provider for the API and Elements at 17:22pm UTC and saw that the issue was fully resolved for the API though there continued to be intermittent issues with Elements. We made an additional change to the Elements service to route around the issue and it was fully resolved at 18:35 UTC. Additional changes to the Portal were made to fully resolve the incident at 19:11 UTC. Mitigation Steps and Future Preventative Measures To ensure this issue does not occur again we have updated our runbook for routing around this service to ensure it can be quickly resolved for the Elements service and the Admin Portal. We will also be reviewing our dependence on this service. Intermittent connection issues https://status.basistheory.com/incident/116186 Wed, 07 Sep 2022 19:09:00 -0000 https://status.basistheory.com/incident/116186#597735dcb9626c183b640ace51155aba558b35c897cf28089fab247f84d676a7 Our cloud service provider was experiencing intermittent issues at 4:25 PM UTC. We have restored all services at 7:00 PM UTC. We will follow up with a post-mortem in the next 24 hours on status.basistheory.com. Intermittent connection issues https://status.basistheory.com/incident/116186 Wed, 07 Sep 2022 19:09:00 -0000 https://status.basistheory.com/incident/116186#597735dcb9626c183b640ace51155aba558b35c897cf28089fab247f84d676a7 Our cloud service provider was experiencing intermittent issues at 4:25 PM UTC. We have restored all services at 7:00 PM UTC. We will follow up with a post-mortem in the next 24 hours on status.basistheory.com. Intermittent connection issues https://status.basistheory.com/incident/116186 Wed, 07 Sep 2022 19:09:00 -0000 https://status.basistheory.com/incident/116186#597735dcb9626c183b640ace51155aba558b35c897cf28089fab247f84d676a7 Our cloud service provider was experiencing intermittent issues at 4:25 PM UTC. We have restored all services at 7:00 PM UTC. We will follow up with a post-mortem in the next 24 hours on status.basistheory.com. Intermittent connection issues https://status.basistheory.com/incident/116186 Wed, 07 Sep 2022 18:36:00 -0000 https://status.basistheory.com/incident/116186#8c482a64a3dfb3ebac85cd4d70044f494818d1b5d01f34a1b2a43c2d29f5190b Elements connectivity issues have been resolved. Intermittent connection issues https://status.basistheory.com/incident/116186 Wed, 07 Sep 2022 18:36:00 -0000 https://status.basistheory.com/incident/116186#8c482a64a3dfb3ebac85cd4d70044f494818d1b5d01f34a1b2a43c2d29f5190b Elements connectivity issues have been resolved. Intermittent connection issues https://status.basistheory.com/incident/116186 Wed, 07 Sep 2022 18:36:00 -0000 https://status.basistheory.com/incident/116186#8c482a64a3dfb3ebac85cd4d70044f494818d1b5d01f34a1b2a43c2d29f5190b Elements connectivity issues have been resolved. Intermittent connection issues https://status.basistheory.com/incident/116186 Wed, 07 Sep 2022 17:49:00 -0000 https://status.basistheory.com/incident/116186#62b49bff3176a8c31885d661dd12b04f6ac282a547c44db4f99d16a7acf4f03f Fix with Elements connectivity only partially resolved the underlying connectivity issues and are still seeing intermittent regional issues. Intermittent connection issues https://status.basistheory.com/incident/116186 Wed, 07 Sep 2022 17:49:00 -0000 https://status.basistheory.com/incident/116186#62b49bff3176a8c31885d661dd12b04f6ac282a547c44db4f99d16a7acf4f03f Fix with Elements connectivity only partially resolved the underlying connectivity issues and are still seeing intermittent regional issues. Intermittent connection issues https://status.basistheory.com/incident/116186 Wed, 07 Sep 2022 17:49:00 -0000 https://status.basistheory.com/incident/116186#62b49bff3176a8c31885d661dd12b04f6ac282a547c44db4f99d16a7acf4f03f Fix with Elements connectivity only partially resolved the underlying connectivity issues and are still seeing intermittent regional issues. Intermittent connection issues https://status.basistheory.com/incident/116186 Wed, 07 Sep 2022 16:53:00 -0000 https://status.basistheory.com/incident/116186#471c3aacb1f5c622432df2cdfbe8299eb5199e631ff120b9dc60fe5588b777db Resolved issues with API and Elements, still seeing connectivity issues with the Web Portal and are working with a downstream service provider. Intermittent connection issues https://status.basistheory.com/incident/116186 Wed, 07 Sep 2022 16:53:00 -0000 https://status.basistheory.com/incident/116186#471c3aacb1f5c622432df2cdfbe8299eb5199e631ff120b9dc60fe5588b777db Resolved issues with API and Elements, still seeing connectivity issues with the Web Portal and are working with a downstream service provider. Intermittent connection issues https://status.basistheory.com/incident/116186 Wed, 07 Sep 2022 16:53:00 -0000 https://status.basistheory.com/incident/116186#471c3aacb1f5c622432df2cdfbe8299eb5199e631ff120b9dc60fe5588b777db Resolved issues with API and Elements, still seeing connectivity issues with the Web Portal and are working with a downstream service provider. 8 June 2022 - RCA https://status.basistheory.com/incident/93727 Thu, 09 Jun 2022 19:03:00 -0000 https://status.basistheory.com/incident/93727#494336e7f9b2cb09bfb531e34a4a9a9597682f490ef8dec3177d87048d0297aa Problem Description, Impact, and Resolution At 6:10 PM UTC on June 8, 2022 we were alerted that requests to all services began failing, which resulted in requests not resolving to the Basis Theory hostnames resulting in a DNS Resolution error for all connections to Basis Theory services. A global DNS outage at our service provider caused the incident. While the incident was ongoing, we were unable to route traffic around the affected service and the provider resolved the incident before we were able to implement the mitigation. The service was restored in North America at 6:56 PM UTC and globally at 7:01 PM UTC. Mitigation Steps and Future Preventative Measures To ensure this issue does not occur again we have created a process and run book to route around this service and limit the impact on customers. By end of Q2 2022, we will have a long-term plan to eliminate service impacts to be completed by end of Q3 2022. 8 June 2022 - RCA https://status.basistheory.com/incident/93727 Thu, 09 Jun 2022 19:03:00 -0000 https://status.basistheory.com/incident/93727#494336e7f9b2cb09bfb531e34a4a9a9597682f490ef8dec3177d87048d0297aa Problem Description, Impact, and Resolution At 6:10 PM UTC on June 8, 2022 we were alerted that requests to all services began failing, which resulted in requests not resolving to the Basis Theory hostnames resulting in a DNS Resolution error for all connections to Basis Theory services. A global DNS outage at our service provider caused the incident. While the incident was ongoing, we were unable to route traffic around the affected service and the provider resolved the incident before we were able to implement the mitigation. The service was restored in North America at 6:56 PM UTC and globally at 7:01 PM UTC. Mitigation Steps and Future Preventative Measures To ensure this issue does not occur again we have created a process and run book to route around this service and limit the impact on customers. By end of Q2 2022, we will have a long-term plan to eliminate service impacts to be completed by end of Q3 2022. 8 June 2022 - RCA https://status.basistheory.com/incident/93727 Thu, 09 Jun 2022 19:03:00 -0000 https://status.basistheory.com/incident/93727#494336e7f9b2cb09bfb531e34a4a9a9597682f490ef8dec3177d87048d0297aa Problem Description, Impact, and Resolution At 6:10 PM UTC on June 8, 2022 we were alerted that requests to all services began failing, which resulted in requests not resolving to the Basis Theory hostnames resulting in a DNS Resolution error for all connections to Basis Theory services. A global DNS outage at our service provider caused the incident. While the incident was ongoing, we were unable to route traffic around the affected service and the provider resolved the incident before we were able to implement the mitigation. The service was restored in North America at 6:56 PM UTC and globally at 7:01 PM UTC. Mitigation Steps and Future Preventative Measures To ensure this issue does not occur again we have created a process and run book to route around this service and limit the impact on customers. By end of Q2 2022, we will have a long-term plan to eliminate service impacts to be completed by end of Q3 2022. Issue resolved https://status.basistheory.com/incident/93465 Wed, 08 Jun 2022 19:30:00 -0000 https://status.basistheory.com/incident/93465#48ebf799a6a232f103455bb4f324d47dbfe7e48446306bb5ca22ef8483b515b9 Our cloud service provider experienced a global DNS outage at 6:10 PM UTC. Services were restored in North America at 6:56 PM UTC and globally at 7:01 PM UTC. • Basis Theory API • Basis Theory Portal • Basis Theory Elements We will follow up with a post-mortem in the next 24 hours on status.basistheory.com. Issue resolved https://status.basistheory.com/incident/93465 Wed, 08 Jun 2022 19:30:00 -0000 https://status.basistheory.com/incident/93465#48ebf799a6a232f103455bb4f324d47dbfe7e48446306bb5ca22ef8483b515b9 Our cloud service provider experienced a global DNS outage at 6:10 PM UTC. Services were restored in North America at 6:56 PM UTC and globally at 7:01 PM UTC. • Basis Theory API • Basis Theory Portal • Basis Theory Elements We will follow up with a post-mortem in the next 24 hours on status.basistheory.com. Issue resolved https://status.basistheory.com/incident/93465 Wed, 08 Jun 2022 19:30:00 -0000 https://status.basistheory.com/incident/93465#48ebf799a6a232f103455bb4f324d47dbfe7e48446306bb5ca22ef8483b515b9 Our cloud service provider experienced a global DNS outage at 6:10 PM UTC. Services were restored in North America at 6:56 PM UTC and globally at 7:01 PM UTC. • Basis Theory API • Basis Theory Portal • Basis Theory Elements We will follow up with a post-mortem in the next 24 hours on status.basistheory.com. Vendor unavailability https://status.basistheory.com/incident/93452 Wed, 08 Jun 2022 18:40:00 -0000 https://status.basistheory.com/incident/93452#81a45a54d583f8aaa17a4f1da62c910258f4b0d67cfca04df0ba64c910a4467a At 6:39 PM UTC, Basis Theory’s cloud service provider has experienced a global outage, rendering the following services unresponsive and unavailable to customers. • Basis Theory API • Basis Theory Portal • Basis Theory Elements Basis Theory will post updates on status.basistheory.com every 15-minutes. Vendor unavailability https://status.basistheory.com/incident/93452 Wed, 08 Jun 2022 18:40:00 -0000 https://status.basistheory.com/incident/93452#81a45a54d583f8aaa17a4f1da62c910258f4b0d67cfca04df0ba64c910a4467a At 6:39 PM UTC, Basis Theory’s cloud service provider has experienced a global outage, rendering the following services unresponsive and unavailable to customers. • Basis Theory API • Basis Theory Portal • Basis Theory Elements Basis Theory will post updates on status.basistheory.com every 15-minutes. Vendor unavailability https://status.basistheory.com/incident/93452 Wed, 08 Jun 2022 18:40:00 -0000 https://status.basistheory.com/incident/93452#81a45a54d583f8aaa17a4f1da62c910258f4b0d67cfca04df0ba64c910a4467a At 6:39 PM UTC, Basis Theory’s cloud service provider has experienced a global outage, rendering the following services unresponsive and unavailable to customers. • Basis Theory API • Basis Theory Portal • Basis Theory Elements Basis Theory will post updates on status.basistheory.com every 15-minutes.