The 20th edition of Lab Gab live streamed on Friday, April 3rd, featuring Oz Linden, the Lab’s Vice President of Engineering and a member of the company’s management team, and April Linden, the Lab’s Systems Engineering Manager. They were appearing to primarily discuss the work in transitioning Second Life to commercial cloud environments. Ekim Linden had also been scheduled to appear, but was unable to do so.
The official video of the segment is available via You Tube, and is embedded at the end of this article. The following is a summary of the key topics discussed and responses to questions asked. Note that the first half of the video is related to the cloud uplift, and the second half to broader engineering-related questions.
April Linden has some 20 years of experience in systems engineering, and is genuinely passionate about Second Life. She first became involved in the platform in 2006 as a resident (and is still extremely active as a resident). She joined the Lab in 2013. She worked within the systems engineering team, and was promoted to her current position of Systems Engineering Manager, Operations, some 18 months ago. For her, the great attraction of the platform has been, and remains, the empowerment it gives people to express themselves positively.
Oz Linden joined the company in 2010 specifically to take on the role of managing the open-source aspects of the Second Life viewer and managing the relationship with third-party viewers, a role that fully engaged him during the first two years of his time at the Lab. His role then started expanding to encompass more and more of the engineering side of Second Life, leading to his currently senior position within the company.
Both are genuinely passionate and enthusiastic about Second Life and its users.
The Cloud Uplift
What is It?
- Cloud Uplift is the term Linden Lab use for transitioning all of Second Life’s server-based operations and services from their own proprietary systems and services housed within a single co-location data centre in Tucson, Arizona, to commercial cloud services provided by Amazon Web Services (AWS) and Google.
- The process of moving individual services to the cloud is called “lift and shift” – take each element of software, making the required adjustments so it can run within a cloud computing environment, then relocate it to cloud infrastructure and hardware in a manner that allows it to keep running and avoids disruptions that may impact users, and continues to run exactly as it did prior to the transfer.
- The current plan is to have all of this work – up to an including moving all of the SL region simulators – to cloud services by the end of 2020.
- Numerous services have been transitioned to date.
- The Lab generally prefers not to discussion which specific services have been moved, to prevent users seeing the move as a placebo reason for issues they may be encountering, thus biasing their bug reports.
- However, one service that is known to have moved is the inventory (asset) database, so that all users’ inventories are obtained via the cloud, and not from a dedicated asset cluster within the Lab’s co-lo facility.
- With the services that have moved, the Lab has seen noticeable improvements in performance, partially as a result of cloud services using more recently / more powerful hardware configurations than the Lab can run without making a major new capital expenditure in equipment (which the uplift is intended to avoid).
- A practical advantage of cloud operations is the ability for LL to scale services to meet demand. The recent increase in users logging-in to SL, for example, placed a strain on the services that feed the CDNs that in turn deliver the majority of asset data to users (mesh data, textures, sounds, gestures, clothing, etc.). These services we then able to dynamically scale to an increased number of nodes to handle the load, something LL would not have been able to do without first sourcing, installing ans configuring the required hardware.
What Improvements Might Users See from the Uplift?
- Between now and the end of 2020, no appreciable different should be observable to users.
- The move is initially being made to a single AWS centre, so things like ping times to regions (once they are moved) shouldn’t change.
- In terms of reducing simulator-side lag, the answer is unclear, as simulators have yet to be tested – this is due to start with simulators internal to the Lab Soon™. This will enable the Lab to begin to get real numbers in terms of simulator performance.
- It is believed that simply moving simulators to the more recent, more powerful hardware used by cloud services should on its own result in a modest improvement in simulator performance.
- That said, the outcome of performance adjustments in distributed environments is “really, really hard to predict”.
- Longer-term, as the Lab is able to start exploiting the advantages of being in the cloud, there is confidence performance will improved in various areas.
- For example, if simulators can be distributed in accordance with the geographical locations of their primary audiences (e.g. simulators that tend to get the majority of their audience from South America being located in South America), then this could reduce network time in connecting to them for those audiences, and so help boost performance as seen by those users.
- While this is a longer-term goal for the cloud migration (it’s not going to be there from “day 1”), it is a part of the motivation to make the transition.
How will the Lab Handle Costs?
Sidebar note: cloud services typically bill based on demand and usage. This has given rise in some quarters to concerns / beliefs that LL could find themselves facing unexpected large bills for hosting.
- Two answers: the first is nothing is ever certain.
- The second is, the Lab, with April and Ekim in particular leading the effort, put a lot of work into modelling their likely operations and costs when using cloud services and infrastructure.
- This work involved a lot of assumptions on how LL anticipated their costs would look based on how the planned to operate SL in a the cloud.
- This model was then put to both AWS and to an independent, outside consultancy with expertise in advising clients on the use of cloud-base service provisioning, both of who gave positive feedback on the approach the Lab would be taking and the likely costs involved.
- Further, the fact that SL isn’t a service that dynamically expands under use. All of its services are operating 24/7, so the costs can be readily calculated and pretty much consistent, therefore, the dynamic surges that can lead to high service bills don’t actually apply.
- While there are some back-end services that can leverage dynamic hardware use in times of heavy load, these are in the minority (all of SL’s back-end services account for only 15% of its server fleet), so again, dynamic increases in hardware use for those services that can leverage it, are not going to be massively excessive.
- As such, and allowing for answer (1), the Lab isn’t overly concerned about costs spiralling.
Will There Be Cost Saving that Can Be Passed to Users?
- Unfortunately, the engineering teams are not responsible for determining fees charged to users.
- More practically, it is not going to be possible to make any informed judgements on costs to users until the Lab has had the opportunity to see how actual operating costs compare with their predicted costs model.
- Further, it is not anticipated that any cost savings will be made in the first 1-2 years of cloud uplift, so any decisions on if and where to reduce costs to users won’t be made for a a while to come, and those involved in making such decisions are not in the engineering teams.