Some of you may be familiar with the Butterfly Effect, a theory that a small change in one thing can have a dramatic impact on later outcomes. The metaphorical example of this is that a tornado was formed, in part, because a butterfly flapped its wings weeks earlier.
That same metaphorical chain of cause and effect could be applied to March 15, 2018, when a chain of events originating from a wireless carrier’s failed network switch affected nationwide message delivery.
11:20 am: In the Eye of the Storm
Vibes In the eye of the storm
It all started a little after 11:20 in the morning, Central Time, when our customer care center started to receive several reports that subscribers were receiving multiple copies of the same text message(s) to their phones.
Our customer care team members quickly reviewed the message logs and confirmed that our platform had only generated a single copy of each message for delivery. At the same time, a technical operations team member verified the actual delivery log to the carrier to confirm that each message was attempted and confirmed just one time.
Once these initial steps were verified in about 10 minutes, we noticed that all of the phone numbers were with the same carrier. At that point, our technical operations team opened up a ticket to the carrier to report the issue.
11:55 am: Going to the Source
The carrier was not aware of the issue, and confirmed that Vibes was the first provider to report it. Within a few minutes, the carrier’s triage team confirmed that they were receiving one copy of the message from Vibes and that they also successfully forwarded it one time to their network service for delivery. They escalated it to their network team for further investigation.
We continued to work with the carrier’s support and network teams. After about 20 minutes, the carrier’s team was able to confirm for a few of the initial phone numbers that the messages were being sent to the phone and not acknowledged, meaning that delivery would be attempted again and again.
At that point, we had confirmation from the carrier of the problem, but they were unable to identify the scope of the problem and where it was occurring, or how many people were affected. Meanwhile, Vibes‘ customer care team continued to receive additional reports that subscribers were receiving duplicate messaging, with reports ranging from Texas, across the South, and into parts of the East Coast. Given that information, our business operations team decided to pause delivery of messages for this carrier.
12:00 pm: Lightning-Fast Response Team
Lightening-fast response team
The Vibes Mobile Engagement Platform allows us to change, pause, and configure routing on the fly, and pausing Mobile Terminated message delivery for one carrier is a proverbial flip of the switch. With delivery paused, the platform continued to accept and queue messages for eventual delivery once the service was re-enabled.
The carrier’s operations team continued to investigate and drill further into their issue. They were able to eventually isolate the affected subscribers connected to a single network switch in the suburbs of Houston, Texas. The reported mobile directory numbers from other areas turned out to be travelers in that same area.
Given that information, the Vibes team decided to enable the message delivery and resume message flow. Again, this was easily enabled on the platform and the queued messages were quickly delivered until the backlog was drained and normal processing ensued. Later, the carrier confirmed that the issue with the switch was resolved, that no more consumers were impacted, and we communicated that to our customers.
The Benefit of Working with a Storm Chaser Like Vibes
As a provider, it can be difficult to determine exactly what is going on, since you sit in the middle of the messaging chain and don’t have direct exposure at either the carrier (that metaphorical butterfly) or consumer (tornado) ends of the chain. All that you can do is go off of the information that’s being reported and try and make sense as best you can.
It would have been great if we (or any other provider) could have identified and isolated the specific phone numbers experiencing the issue, but since it was at the wireless switch, there was no way to reliably determine that.
At the end of the day, having an available and dedicated customer support framework, a highly-skilled and responsive investigative team, a flexible platform to give us operational control, and strong relationships with our carrier providers allowed us to quickly identify, triage, respond, escalate, and mitigate the issue quickly and efficiently.
I don’t know if there’s another way to detect a butterfly flapping its wings, but if there is, I wouldn’t trade it for Vibes’ platform because with it, we’re able to avoid the impacts of the tornado, no matter where in the chain they occur.
About the Author
Steven Mastandrea is the Sr. Director of Software Engineering at Vibes. He has 18 years of professional experience building highly scalable software, recruiting agile teams, establishing engineering processes and crafting technology solutions.More Content by Steven Mastandrea