2020 SUG meeting week #48: Uplift update

Time Remains, October 2020 – blog post

The following notes were taken from the Tuesday, November 24th Simulator User Group meeting.

Server Updates

Please also refer to the week’s server deployment thread.

  • On Tuesday November 24th,  RC channels were updated to simulator version 552571, comprising “some bug fixes and internal tweaks”. However, one fix should correct the inability to correctly set permissions (e.g. edit rights) for friends.
  • There is no planned SLS Main channel deployment for week #48.

Week #49

If all goes to plan at the Lab, week #49 (commencing Monday, November 30th) will see a daily series of rolling restarts across the grid. Due to start on Monday, this will be batches of regions being restarted, with the plan that restarts are staggered to avoid too much disruption, and if everything proceeds smoothly, each region should only be restarted once at some point in the week.

Commenting on the restarts, Maxidox Linden stated:

I think it will be something like at most 14-16 hours between rolls. Though as Rider says unless something changes we don’t plan to roll the same region in two successive rolls. And we’re going to do our best to avoid times with tons of people on-line if we can.

These deployments are liable to occur at 16-hour intervals.

Uplift Status

As per my blog post from week #48 (see: LL confirms Second Life regions now all on AWS), whilst all regions are now running on AWS services, the work in transitioning all of the Second Life back-end services is not complete, and LL are still “operating with one foot in either camp” – and this may be exacerbating the problems currently being experienced by some.

Another factor could be the different communications routes between viewers and servers following the move from operating out of the Lab’s co-lo facility in Arizona to the Amazon centre in Oregon. For some, this has definitely resulted in a noticeable increase in basic ping times to / from the servers, although for others, this has barely changed.

Commenting on the general state of play, Mazidox Linden observed:

We are not yet at what I would call “Final uplift performance” (that is to say, without any explicit attempts at tuning performance and behaviour of system communication). There is still plenty of stuff making that round trip over hundreds of miles to the data centre, slowing things down.

In this respect, some of the issues people are noticing at the moment may be down to the fact that LL haven’t as yet started fine tuning things, and are unlikely to do so until all services are running via AWS. As such, some of the problems people are noting might be down to this. In this respect, Oz Linden noted:

We’re much more focused right now on getting things other than the simulators uplifted and fixing anything that really breaks. Performance problems are a step down in priority until that’s done, but we won’t forget about them.

SL Viewer

The Start of the week has seen no change to the current crop of official viewers, leaving the as follows:

  • Current release viewer version, formerly Cachaça Maintenance RC viewer promoted on November 12 – No change
  • Release channel cohorts:
    •  Custom Key Mappings project viewer, version, November 12.
  • Project viewers:
    • Simple Cache project viewer, version, issued on November 12.
    • Project Jelly project viewer (Jellydoll updates), version, November 2.
    • Legacy Profiles viewer, version, October 26.
    • Copy / Paste viewer, version, December 9, 2019.
    • Project Muscadine (Animesh follow-on) project viewer, version, November 22, 2019.
    • 360 Snapshot project viewer, version, July 16, 2019.

In Brief

  • Group Chat: There has been an update to the Group chat servers which the Lab hopes will help alleviate the issues of the last couple of months. The hope is that as a result of this, things should be somewhat better, although it is acknowledged things are “not perfectly solid” as yet.
  • Map Tiles: there is a known issue with in-world Map tiles failing to update. At the time of writing, there was no ETA on when a fix will be implemented.
  • Teleports: people are still reporting teleport failures, although data collected by the Lab using additional logging apparently shows the overall level of teleport failures as being back to “normal” after the recent spike.
  • Textures: Slow texture loads: people are reporting slower than usual texture loading. When isn’t currently understood (given textures are among the data coming to users via the CDN, so not directly a part of the AWS transition). The speculation offered by Oz Linden is that where texture messaging is concerned, it may not be going as fast as LL like.
  • Scripts: there have also bee reports of some regions initially showing improved script performance, only to apparently drop back to “pre-uplift” levels of processing. Commenting on this, Maxidox Linden stated:
 If you’re seeing changes to scripts run the likely explanation is that there is contention for shared system resources beneath the simulator layer. That is something we had only mild control over before and have even less control over now..  That said, it’s on our radar. I can’t promise that even when someone gets time to look at it there will be anything we can *do*, but we are aware, and we’re not ignoring it. …
I mean, it is almost possible that we’re calculating that number wrong Lucia, because we have certainly changed the hardware the simulators are running on in ways that the people who made that statistic probably never imagined. I’m not going to swear that is or isn’t happening, but it will certainly be one of the many things we look at.