System Status


This page offers API uptime information for Telesign services, in addition to incident history reports and service issues alerts.

Service Metrics

SMS Delivery Delay - Resolved
Incident Report for Telesign system status
Resolved
Below you will find the Root Cause Analysis for the incident that occurred on Saturday, October 5th, 2019.

Original Reported Subject: TeleSign Delivery Delay
Date: Saturday, October 5th, 2019
Start Time: 10:52 AM PDT
End Time: 11:20 AM PDT


Summary:

As a part of TeleSign’s continuous work to scale our platform and improve performance, our Network Operations Team had been working to upgrade the elastic search infrastructure. This upgrade was released in production two weeks ago and it introduced a new ELK feature that puts all indices into read-only mode once disk watermark threshold is exceeded. This led to log queuing on a few virtual machines in one of our data centers and slowing down its SMS processing, which caused a portion of SMS messages to be delayed. As soon as the delay was detected, engineers had been working to unlock a specific index and get the service running back smoothly. No further SMS delays were detected after 11:20 AM PDT.


Preventive measures:

In order to avoid future incidents like the one experienced above, the TeleSign’s Operations and Development Teams are working diligently on the following remediation actions:
*Adding additional storage to the affected nodes to ensure more disk space is available.
*Reviewing the logstash setup on a new ELK cluster to improve its performance and efficiency.
*Updating alerting thresholds for new ELK nodes to have better follow up in case of any future events.


We apologize for the inconvenience this may have caused you. Should you have any questions, please don’t hesitate to let us know.
Posted Oct 05, 2019 - 11:20 PDT