Dear TeleSign’s Customers and Partners,
Below you will find the Root Cause Analysis for the incident that occurred on Monday, October 7th, 2019.
Original Reported Subject: TeleSign Delivery Delay
Date: Monday, October 7th, 2019
Start Time: 07:50 AM PDT
End Time: 07:58 AM PDT
On Monday, October 7th customers experienced API response latency or HTTP errors for all products. The incident stared at 07:50 AM PDT and lasted until 07:58 AM PDT. During this time, impacted customers received the following possible HTTP errors: 404, 500, 503, & 504.
Root Cause Analysis:
A hardware failure in one data center triggered the expected failover mechanism, however there was still latency within the API workflow and eventual HTTP errors. TeleSign noticed the latency and additionally moved all traffic away from the impacted data center at 07:50 AM PDT. Customers honoring TeleSign’s TTL recommendations then stopped experiencing latency and HTTP errors at 07:58 AM PDT. TeleSign’s network team did not move traffic back to the impacted data center until the hardware was replaced and confirmed operational.
To minimize the risk of, and/or prevent this issue from recurring in the future, TeleSign’s Tech OPS team has taken the following actions:
• The failed DIMM was replaced and confirmed fully operational at 02:29 PM PDT
• The incident has been escalated to the hardware vendor for further review and analysis of the firmware failover mechanism
We apologize for the inconvenience this may have caused you. Should you have any questions, please don’t hesitate to contact us at email@example.com