2020 SL project updates week #14: TPVD summary

The Muse – The Library, February 2020 – blog post

The following notes are taken from the TPV Developer meeting held on Friday, April 3rd, 2020. These meetings are generally held every other week, unless otherwise noted in any given summary. The embedded video is provided to Pantera – my thanks to her for recording and providing it. Time stamps are included with the notes will open the video at the point(s) where a specific topic is discussed.

This was a short meeting – less than 20 minutes.

SL Viewer News

[0:13-2:43]

There were no viewer updates in week #14, leaving the official viewer pipelines as follows:

  • Current Release version  version 6.3.8.538264, dated March 12, promoted March 18th. Formerly the Premium RC viewer – No change.
  • Release channel cohorts:
    • Camera Presets RC viewer, version 6.3.9.538729 March 25.
    • Love Me Render RC viewer, version 6.3.9.538760, March 25.
    • EEP RC viewer updated to version 6.4.0.538823, March 20.
    • Zirbenz Maintenance RC viewer, version 6.3.9.538719, issued March 19.
  • Project viewers:
    • Copy / Paste viewer, version 6.3.5.533365, December 9, 2019.
    • Project Muscadine (Animesh follow-on) project viewer, version 6.4.0.532999, November 22, 2019.
    • Legacy Profiles viewer, version 6.3.2.530836, September 17, 2019. Covers the re-integration of Viewer Profiles.
    • 360 Snapshot project viewer, version 6.2.4.529111, July 16, 2019.

General Viewer Notes

  • EEP is now extremely close to release. The hope is to have the final RC version available for users in week #15 (commencing Monday, April 6th).
    • Providing no major issues are encountered with that version, and allowing for it gaining sufficient user hours as an RC cohort, it will then be promoted to de facto release status.
    • This viewer still has one of the lowest crash rates for the official viewer, and which is described as being “dramatically lower” than the current viewer release.
  • The Love Me Render (LMR) and Camera Presets RC viewers are both getting close to a point where they could be released at some point after EEP.
  • Tools update (Visual Studio 2017 and a more recent version of Xcode): the first full viewer build is ready to be issued, so an RC could be appearing in week #15. If so, it may be fast-tracked to release status behind EEP and ahead of other RCs.
  • There is still work to be done on the Copy / Paste and Legacy Profiles viewers to get them up to RC status.
  • Work is also continuing on the mesh uploader viewer, a version of which had yet to be made available to users as a compiled viewer.

Server / Simulator News

[3:44-6:42]

  • The server team believe they have fixes for the issue off-line inventory losses from objects (see: BUG-227179 “All offline inventory offers from scripted objects are STILL lost”).
  • These fixes should be going to a simulator RC release in week #15, and no viewer-side updates are required for either of the fixes (UDP and HTTP).
  • TPVs have been asked to confirm the HTTP fix works, and if so, to switch to that mechanism (if they have not already done so), rather than continuing to rely on UDP messaging for off-line inventory offers, so that path can be deprecated.
  • Details on where the fixes can be tested will be made available to TPVs through the Open-Source Dev mailing list.
  • Apologies have been offered for the time it took LL to fix and fix the underlying causes.

Lab Gab 20 summary: Second Life cloud uplift & more

via Linden Lab

The 20th edition of Lab Gab live streamed on Friday, April 3rd, featuring Oz Linden, the Lab’s Vice President of Engineering and a member of the company’s management team, and April Linden, the Lab’s Systems Engineering Manager. They were appearing to primarily discuss the work in transitioning Second Life to commercial cloud environments. Ekim Linden had also been scheduled to appear, but was unable to do so.

The official video of the segment is available via You Tube, and is embedded at the end of this article. The following is a summary of the key topics discussed and responses to questions asked. Note that the first half of the video is related to the cloud uplift, and the second half to broader engineering-related questions.

April Linden has some 20 years of experience in systems engineering, and is genuinely passionate about Second Life. She first became involved in the platform in 2006 as a resident (and is still extremely active as a resident). She joined the Lab in 2013. She worked within the systems engineering team, and was promoted to her current position of Systems Engineering Manager, Operations, some 18 months ago. For her, the great attraction of the platform has been, and remains, the empowerment it gives people to express themselves positively.

Oz Linden joined the company in 2010 specifically to take on the role of managing the open-source aspects of the Second Life viewer and managing the relationship with third-party viewers, a role that fully engaged him during the first two years of his time at the Lab. His role then started expanding to encompass more and more of the engineering side of Second Life, leading to his currently senior position within the company.

Both are genuinely passionate and enthusiastic about Second Life and its users.

The bunny and the wizard who bring us Second Life: April Linden (Systems Engineering Manager, Operations) and Oz Linden (Vice President, Second Life Engineering)

The Cloud Uplift

What is It?

[5:40-9:45]

  • Cloud Uplift is the term Linden Lab use for transitioning all of Second Life’s server-based operations and services from their own proprietary systems and services housed within a single co-location data centre in Tucson, Arizona, to  commercial cloud services provided by Amazon Web Services (AWS) and Google.
  • The process of moving individual services to the cloud is called “lift and shift” – take each element of software, making the required adjustments so it can run within a cloud computing environment, then relocate it to cloud infrastructure and hardware in a manner that allows it to keep running and avoids disruptions that may impact users, and continues to run exactly as it did prior to the transfer.
  • The current plan is to have all of this work – up to an including moving all of the SL region simulators – to cloud services by the end of 2020.
  • Numerous services have been transitioned to date.
    • The Lab generally prefers not to discussion which specific services have been moved, to prevent users seeing the move as a placebo reason for issues they may be encountering, thus biasing their bug reports.
    • However, one service that is known to have moved is the inventory (asset) database, so that all users’ inventories are obtained via the cloud, and not from a dedicated asset cluster within the Lab’s co-lo facility.
  • With the services that have moved, the Lab has seen noticeable improvements in performance, partially as a result of cloud services using more recently / more powerful hardware configurations than the Lab can run without making a major new capital expenditure in equipment (which the uplift is intended to avoid).
  • A practical advantage of cloud operations is the ability for LL to scale services to meet demand.  The recent increase in users logging-in to SL, for example, placed a strain on the services that feed the CDNs that in turn deliver the majority of asset data to users (mesh data, textures, sounds, gestures, clothing, etc.). These services we then able to dynamically scale to an increased number of nodes to handle the load, something LL would not have been able to do without first sourcing, installing ans configuring the required hardware.
Oz and April with Strawberry Linden (c)

What Improvements Might Users See from the Uplift?

[9:48-14:42]

  • Between now and the end of 2020, no appreciable different should be observable to users.
  • The move is initially being made to a single AWS centre, so things like ping times to regions (once they are moved) shouldn’t change.
  • In terms of reducing simulator-side lag, the answer is unclear, as simulators have yet to be tested – this is due to start with simulators internal to the Lab Soon™. This will enable the Lab to begin to get real numbers in terms of simulator performance.
    • It is believed that simply moving simulators to the more recent, more powerful hardware used by cloud services should on its own result in a modest improvement in simulator performance.
    • That said, the outcome of performance adjustments in distributed environments is “really, really hard to predict”.
  • Longer-term, as the Lab is able to start exploiting the advantages of being in the cloud, there is confidence performance will improved in various areas.
    • For example, if simulators can be distributed in accordance with the geographical locations of their primary audiences (e.g. simulators that tend to get the majority of their audience from South America being located in South America), then this could reduce network time in connecting to them for those audiences, and so help boost performance as seen by those users.
    • While this is a longer-term goal for the cloud migration (it’s not going to be there from “day 1”), it is a part of the motivation to make the transition.

How will the Lab Handle Costs?

[14:45-18:40]

Sidebar note: cloud services typically bill based on demand and usage. This has given rise in some quarters to concerns / beliefs that LL could find themselves facing unexpected large bills for hosting.

  • Two answers: the first is nothing is ever certain.
  • The second is, the Lab, with April and Ekim in particular leading the effort, put a lot of work into modelling their likely operations and costs when using cloud services and infrastructure.
    • This work involved a lot of assumptions on how LL anticipated their costs would look based on how the planned to operate SL in a the cloud.
    • This model was then put to both AWS and to an independent, outside consultancy with expertise in advising clients on the use of cloud-base service provisioning, both of who gave positive feedback on the approach the Lab would be taking and the likely costs involved.
  • Further, the fact that SL isn’t a service that dynamically expands under use. All of its services are operating 24/7, so the costs can be readily calculated and pretty much consistent, therefore, the dynamic surges that can lead to high service bills don’t actually apply.
  • While there are some back-end services that can leverage dynamic hardware use in times of heavy load, these are in the minority (all of SL’s back-end services account for only 15% of its server fleet), so again, dynamic increases in hardware use for those services that can leverage it, are not going to be massively excessive.
  • As such, and allowing for answer (1), the Lab isn’t overly concerned about costs spiralling.

Will There Be Cost Saving that Can Be Passed to Users?

[18:41-19:54]

  • Unfortunately, the engineering teams are not responsible for determining fees charged to users.
  • More practically, it is not going to be possible to make any informed judgements on costs to users until the Lab has had the opportunity to see how actual operating costs compare with their predicted costs model.
  • Further, it is not anticipated that any cost savings will be made in the first 1-2 years of cloud uplift, so any decisions on if and where to reduce costs to users won’t be made for a a while to come, and those involved in making such decisions are not in the engineering teams.