April offers a look at the October 2019 woes

The period of Thursday, October 24th through Sunday 27th October, 2019 saw Second Life encounter a rolling set of issues which finally came to a head on Sunday, October 27th. The issues affected many Second Life users and services from logging-in through to inventory / asset handling.

As has become the case with these matters, April Linden, the Second Life Operations Manager, has provided a post-mortem blog post on the issue and her team’s work in addressing the problems. And as always, her post provides insight into the complexities in keeping a platform such as Second Life running.

In short, the root cause of the weekend’s upsets lay not with and of the Second Life services but with one of the Lab’s network providers – and was exacerbated by the fact the first couple of times it happened – Thursday and Friday – it appeared to correct itself on both occasions before the Lab could fully identify the root cause.

April Linden

On Sunday, the problems started up again, but fortunately April’s team were able to pin down the issue and commence work with their provider – which obviously meant getting Second Life back on an even keel was pretty much in the hands of a third-party rather than being fully under the Lab’s control.

Our stuff was (and still is) working just fine, but we were getting intermittent errors and delays on traffic that was routed through one of our providers. We quickly opened a ticket with the network provider and started engaging with them. That’s never a fun thing to do because these are times when we’re waiting on hold on the phone with a vendor while Second Life isn’t running as well as it usually does.

After several hours trying to troubleshoot with the vendor, we decided to swing a bigger hammer and adjust our Internet routing. It took a few attempts, but we finally got it, and we were able to route around the problematic network. We’re still trying to troubleshoot with the vendor, but Second Life is back to normal again.

– Extract from April Linden’s blog post

As a result of the problems April’s team is working on moving some of the Lab’s services to make Second Life more resilient to similar incidents.

During the issues, some speculated if the problems were a result of the power outages being experienced in California at the time. As April notes, this was not the case – while Linden Lab’s head office is in San Francisco, the core servers and services are located in Arizona. However, resolving the issues from California were affected by the outages, again as April notes in her post.

It’s something I’ve note before, and will likely state again: feedback like this from April, laying out what happened when SL encounters problems are always an educational  / invaluable read, not only explaining the issue itself, but in also providing worthwhile insight into the complexities of Second Life.