On Friday November 6th, 2020 Lab Gab, the live streamed chat show hosted by Strawberry Linden on all things Second Life returned to the the subject of the work to transition all Second Life services to Amazon Web Services (AWS) and away from running on the Labs’ proprietary hardware and infrastructure.
The session came some 7 months after the last Lab Gab to focus on this work in April 2020 with Oz Linden and April Linden (see Lab Gab 20 summary: Second Life cloud uplift & more), and this time, Oz Linden sat in the hot seat alongside Mazidox Linden.
The official video of the segment is available via You Tube, and is embedded at the end of this article. The following is a summary of the key topics discussed and responses to questions asked.
Mazidox Linden is a relative newcomer to the Linden Lab team, having joined the company in 2017 – although like many Lab staff, he’s been a Second Life resident for considerably longer, having first signed-up in 2005.
|Table of Contents
He is the lead QA engineer for everything simulator related, which means his work not only encompasses the simulator and simhost code itself, but also touches on almost all of the back-end services the simulator software communicates with. For the last year he has been specifically focused on QA work related to transitioning the simulator code to AWS services. He took his name from the Mazidox pesticide and combined it with the idea of a bug spray to create is avatar, to visualise the idea of QA work being about finding and removing bugs.
Oz Linden joined the company in 2010 specifically to take on the role of managing the open-source aspects of the Second Life viewer and managing the relationship with third-party viewers, a role that fully engaged him during the first two years of his time at the Lab. His role then started expanding to encompass more and more of the engineering side of Second Life, leading to his currently senior position within the company.
What is the “Cloud Uplift”?
- Cloud Uplift is the term Linden Lab use for transitioning all of Second Life’s server-based operations and services from their own proprietary systems and services housed within a single co-location data centre to commercial cloud services.
- The work involves not only the visible aspects of SL – the simulators and web pages, etc., but also all the many back-end services operated as a part of the overall Second Life product, not all of which may be known to users.
- The process of moving individual services to the cloud is called “lift and shift” – take each element of software, making the required adjustments so it can run within a cloud computing environment, then relocate it to AWS infrastructure and hardware in a manner that allows it to keep running exactly as it did prior to the transfer, while avoiding disruptions that may impact users.
- The current plan is to have all of the transitional work completed before the end of 2020.
- However, this doe not mean all the the work related to operating SL in the cloud will have been completed: there will be further work on things like optimising how the various services run on AWS, etc.,
Why is it Important?
- It allows Second Life to run on hardware that is a lot more recent than the servers the Lab operates, and allows the Lab to evolve SL to run on newer and newer hardware as it becomes available a lot faster than is currently the case.
- In particular, up until now, the route to upgrading hardware has involved the Lab reviewing, testing and selecting hardware options, then making a large capital expenditure to procure the hardware, implement it, test it, then port their services over to the hardware and test, then implement – all of which could take up to 18 months to achieve.
- By leveraging AWS services, all of the initial heavy lifting of reviewing, testing, selecting and implementing new server types is managed entirely by Amazon, leaving the Lab with just the software testing / implementation work.
- A further benefit is that when SL was built, the capabilities to manage large-scale distributed systems at scale didn’t exist, so LL had to create their own. Today, such tools and services are a core part of product offerings alike AWS, allowing the Lab to leverage them and move away from having to run (and manage / update) dedicated software.
- Two practical benefits of the move are:
- Regions running on AWS can run more scripts / script events in the same amount of time than can be achieved on non-AWS regions.
- The way in which simulators are now managed mean that LL can more directly obtain logs for a specific region, filter logs by criteria to find information, etc., and the entire process is far less manually intensive.
How Secure is SL User Data on AWS?
- It has always been LL’s policy when dealing with third-party vendors (which is what AWS is) not to expose SL user data to those vendors, beyond what is absolutely necessary for the Lab to make use of the vendor’s service(s).
- This means that while SL user data is stored on AWS machines,it it not stored in a manner Amazon could read, and is further safeguarded by strict contractual requirements that deny a company like Amazon the right to use any of the information, even if they were to be able to read it.
- In fact, in most cases, user-sensitive data is effectively “hidden” from Amazon.
- LL is, and always has been, very sensitive to the need to protect user data,even from internal prying.
- In terms of the simulators, a core part of testing by Mazidox’s team is to ensure that where user data is being handled (e.g. account / payment information, etc.), it cannot even be reached internally by the lab, as certainly not through things like scripted enquiries, malicious intent or prying on the part of third-party vendors.
- [54:30-55:18] Taken as a whole, SL on AWS will be more secure, as Amazon provide additional protection against hacking, and these have been combined with significant changes LL have made to their services in the interest of security.
Why is Uplift Taking So Long?
- The biggest challenge has been continuing to offer SL as a 24/7 service to users without taking it down, or at least with minimal impact on users.
- This generally requires a lot of internal testing beforehand to reach a point of confidence to transition a service, then make the transition and then step back and wait to see if anything goes dramatically wrong, or users perceive a degraded service, etc.
- An example of this is extensive study, testing, etc., allowed LL to switch over inventory management from their own systems to being provisioned via AWS relatively early on in the process, and with no announcement it had been done – and users never noticed the difference.
- Another major challenge has been to investigate the AWS service offerings and determine how they might best be leveraged by SL services.
- As many of the SL services are overlapping one another (e.g. simulators utilise the inventory service, the group services, the IM services, etc.), a further element has been determining a methodical manner in which services can be transitioned without impacts users or interrupting dependencies on them that may exist elsewhere.
- The technology underpinning Second Life is a lot more advanced and recent within the AWS environment, and this means LL have a had to change how they go about certain aspects of managing SL. This has in turn required experimentation, perhaps the deployment of new tools and / or the update / replacement of code, etc..
Will Running on AWS Lower Operating Costs?
- During the transitional period it has been “significantly” more expensive to operate SL, inasmuch as LL is paying to continue to operate its proprietary systems and services within their co-lo facility and pay for running services via AWS.
- Even after the need to continue paying for operating the co-lo facility has ended, it is unlikely that the shift to AWS will start to immediately reduce costs.
- However, the belief is that moving to AWS will, in the longer term, reduce operating costs.
- Whether reduced operating costs lead to reduced costs to users, or whether the savings will be re-invested in making further improvements to the service lay outside of this discussion.
- Right now the focus is not on driving down costs or making service significantly better, but is solely the work of getting everything transitioned. Lowering costs, making more efficient use of the underpinning capabilities provided by AWS will come after the migration work has been completed.
What Happens to the Old Hardware / Facility, Post-Uplift?
- Several years ago, LL consolidated all of their hardware and infrastructure into a single co-location data centre in Arizona.
- Most of the hardware in that facility is now so old it has depreciated in value to a point where it is pretty much worthless.
- A specialist company has therefore been contracted to clear-out the Lab’s cage(s) at the co-lo facility and dispose of the hardware.
- As a demonstration of LL’s drive to protect user data, all drives on the servers will be removed under inspection and physically destroyed via grinding them up on-site.