LL confirms Second Life regions now all on AWS

Logos ©, ™ and ®Linden Lab and Amazon Inc

On Thursday, November 19th, and after several months of very hard work in order to manage things in an orderly and as non-disruptive manner as possible, the last remaining regions on the Agni (the Second Life main grid) were successfully transitioned over to running on Amazon Web Services (AWS), thus placing the entire grid “in the cloud”.

The announcement can first via Twitter, and from April Linden, the Lab’s Systems Engineering Manager, Operations, who announced:

April Linden’s announcement

The Lab actually started transitioning regions several weeks ago, and without fanfare, first moving a number of regions only accessible to Linden personnel, and they carefully widening things to include selected public regions on the Mainland, and  – subject to the estate owners initially keeping quiet as well – private regions that experience assorted loads.

These initial transitions were more about testing certain aspects of simulator operations, rather than marking the outright start of any region migration process; the Lab wanted to gather data on simulator / region performance on AWS and investigate how simulators with a wide mix of avatar  / content loads behaved.

However, these initial moves quickly gave April and her team, the QA team under Mazidox Linden  and the simulator development team, the confidence to start broadening the “uplift” process further, extending things first to the simulator release candidate deployment channels (RC channels) and then, in the last couple of weeks, the bulk of the regions as they sit on the SLS “Main” channel.

While there have been hiccups along the way – most notably with teleport problems and group chat / IM failures,together with some performance degradation in other areas – on the whole, the entire transition of the grid has been remarkably smooth and problem-free.

However, this does not mean all of the work is over: as LL would only be quick to point out themselves, there are still a number of back-end systems to transition to AWS, and after that, there will inevitably be a period of “bedding in” everything to get things running, before work can start on the “fine tuning” of all the various services. (there are also some regions still running in the Lab’s co-location facility in Arizona to help  people with workarounds for specific issues, but these are perhaps just a handful, including a couple of  public regions – Debug1 and Debug2.)

Soft Linden on the AWS transition

Nevertheless, this is a huge achievement, and marks a hugely significant milestone in what has thus far been around a 3-year project to get all of Second Life safely transitioned over to AWS, so congratulations to all of those at the Lab who have been working very hard to make this happen, and without causing widespread upset or issues.

2020 SUG meeting week #47: uplift

Paradise on Sea, October 2020 – blog post

The following notes were taken from the Tuesday, November 17th Simulator User Group meeting.

Server Updates and Cloud Uplift

Please also refer to the week’s server deployment thread.

  • On Tuesday November 17th, the AWS RC channels were updated to simulator version 552183, which includes internal configuration changes, and the outcome of this deployment is being monitored.
  • On Wednesday, November 18th, the rest of the SLS Main channel may be migrated to running on AWS, with simulators on that channel also running sever update package 551942, which will mean all main grid (Agni) regions will be running via AWS. However, this is currently fluid – check the deployment thread for updates.
  • There may be a further deployment on Thursday, November 18th. Again, check the server deployment thread for updates.

Additional Notes

  • Due to  known issues with regions running on AWS, the Lab will continue to run Debug1 and Debug2 from their co-location facility,  for residents who need to to use for workarounds to these issues.
  • It is hoped that the configuration changes will help improve the recent TP failure and group chat  issues many have been experiencing – however, this is dependent on the above-noted monitoring of the simulator update.

SL Viewer

The Start of the week has seen no change to the current crop of official viewers, leaving the as follows:

  • Current release viewer version 6.4.11.551711, formerly Cachaça Maintenance RC viewer promoted on November 12 – NEW.
  • Release channel cohorts:
    •  Custom Key Mappings project viewer, version 6.4.12.552100, November 12.
  • Project viewers:
    • Simple Cache project viewer, version 6.4.11.551403, issued on November 12.
    • Project Jelly project viewer (Jellydoll updates), version 6.4.11.551213, November 2.
    • Legacy Profiles viewer, version 6.4.11.550519, October 26.
    • Copy / Paste viewer, version 6.3.5.533365, December 9, 2019.
    • Project Muscadine (Animesh follow-on) project viewer, version 6.4.0.532999, November 22, 2019.
    • 360 Snapshot project viewer, version 6.2.4.529111, July 16, 2019.

 

2020 SUG meeting week #46: uplift

Still Waters, September 2020 – blog post

The following notes were taken from the November 12th Simulator User Group meeting.

Server Updates and Cloud Uplift

Please reference to the server deployment thread for the latest updates.

  • On Tuesday, November 10th, the uplifted AWS RC channels were updated with simulator release 551942. This version includes some cloud configuration changes that these may improve some of the performance metrics, but otherwise should not be anything user visible.
  • On Wednesday, November 11th, around 50% of the SLS channel will be transitioned to AWS services, also running simulator version 551942.

SL Viewer

The Start of the week has seen no change to the current crop of official viewers, leaving the as follows:

  • Current release viewer version 6.4.10.549686, formerly the Mesh Uploader RC promoted on October 14 – No Change.
  • Release channel cohorts:
    • Cachaça Maintenance RC viewer updated to version 6.4.11.551711,on November 6.
  • Project viewers:
    • Project Jelly project viewer (Jellydoll updates), version 6.4.11.551213, November 2.
    • Custom Key Mappings project viewer, version 6.4.10.549685, November 2.
    • Legacy Profiles viewer, version 6.4.11.550519, October 26.
    • Copy / Paste viewer, version 6.3.5.533365, December 9, 2019.
    • Project Muscadine (Animesh follow-on) project viewer, version 6.4.0.532999, November 22, 2019.
    • 360 Snapshot project viewer, version 6.2.4.529111, July 16, 2019.

Teleport Issues

The teleport issues – particularly AWS-to-AWS regions – are still proving problematic for some.

There are reports from some AWS-hosted regions of issues with TPs manifesting with other issues – rezzing problems, errors trying to add items to object contents. All seem to be rectified by a region restart (hardly the best solution), before things start going awry once more.

The Lab have added more logging to the simulator so they can further analyse the problem(s).

Lab Gab November 6th: Cloud Uplift update

via Linden Lab

On Friday November 6th, 2020 Lab Gab, the live streamed chat show hosted by Strawberry Linden on all things Second Life returned to the the subject of the work to transition all Second Life services to Amazon Web Services (AWS) and away from running on the Labs’ proprietary hardware and infrastructure.

The session came some 7 months after the last Lab Gab to focus on this work in April 2020 with Oz Linden and April Linden (see Lab Gab 20 summary: Second Life cloud uplift & more), and this time, Oz Linden sat in the hot seat alongside Mazidox Linden.

The official video of the segment is available via You Tube, and is embedded at the end of this article. The following is a summary of the key topics discussed and responses to questions asked.

Mazidox Linden is a relative newcomer to the Linden Lab team, having joined the company in 2017 – although like many Lab staff, he’s been a Second Life resident for considerably longer, having first signed-up in 2005.

Table of Contents

He is the lead QA engineer for everything simulator related, which means his work not only encompasses the simulator and simhost code itself, but also touches on almost all of the back-end services the simulator software communicates with. For the last year he has been specifically focused on QA work related to transitioning the simulator code to AWS services. He  took his name from the Mazidox pesticide and combined it with the idea of a bug spray to create is avatar, to visualise the idea of QA work being about finding and removing bugs.

Oz Linden joined the company in 2010 specifically to take on the role of managing the open-source aspects of the Second Life viewer and managing the relationship with third-party viewers, a role that fully engaged him during the first two years of his time at the Lab. His role then started expanding to encompass more and more of the engineering side of Second Life, leading to his currently senior position within the company.

“The Bugspray” Mazidox Linden (l) and Oz Linden joined Strawberry Linden for the Friday, November 6th segment of Lab Gab to discuss the cloud migration work

What is the “Cloud Uplift”?

[3:25-5:55]

  • Cloud Uplift is the term Linden Lab use for transitioning all of Second Life’s server-based operations and services from their own proprietary systems and services housed within a single co-location data centre to  commercial cloud services.
  • The work involves not only the visible aspects of SL – the simulators and web pages, etc., but also all the many back-end services operated as a part of the overall Second Life product,  not all of which may be known to users.
  • The process of moving individual services to the cloud is called “lift and shift” – take each element of software, making the required adjustments so it can run within a cloud computing environment, then relocate it to AWS infrastructure and hardware in a manner that allows it to keep running exactly as it did prior to the transfer, while avoiding disruptions that may impact users.
  • The current plan is to have all of the transitional work completed before the end of 2020.
  • However, this doe not mean all the the work related to operating SL in the cloud will have been completed: there will be further work on things like optimising how the various services run on AWS, etc.,

Why is it Important?

[5:56-12:12]

  • It allows Second Life to run on hardware that is a lot more recent than the servers the Lab operates, and allows the Lab to evolve SL to run on newer and newer hardware as it becomes available a lot faster than is currently the case.
    • In particular, up until now, the route to upgrading hardware has involved the Lab reviewing, testing and selecting hardware options, then making a large capital expenditure to procure  the hardware, implement it, test it, then port their services over to the hardware and test, then implement – all of which could take up to 18 months to achieve.
    • By leveraging AWS services, all of the initial heavy lifting of reviewing, testing, selecting and implementing new server types is managed entirely by Amazon, leaving the Lab with just the software testing / implementation work.
  • A further benefit is that when SL was built, the capabilities to manage large-scale distributed systems at scale didn’t exist, so LL had to create their own. Today, such tools and services are a core part of product offerings alike AWS, allowing the Lab to leverage them and move away from having to run (and manage / update) dedicated software.
  • Two practical benefits of the move are:
    • Regions running on AWS can run more scripts / script events in the same amount of time than can be achieved on non-AWS regions.
    • The way in which simulators are now managed mean that LL can more directly obtain logs for a specific region, filter logs by criteria to find information, etc., and the entire process is far less manually intensive.

How Secure is SL User Data on AWS?

[12:20-15:43]

  • It has always been LL’s policy when dealing with third-party vendors (which is what AWS is) not to expose SL user data to those vendors, beyond what is absolutely necessary for the Lab to make use of the vendor’s service(s).
  • This means that while SL user data is stored on AWS machines,it it not stored in a manner Amazon could read, and is further safeguarded by strict contractual requirements that deny a company like Amazon the right to use any of the information, even if they were to be able to read it.
  • In fact, in most cases, user-sensitive data is effectively “hidden” from Amazon.
  • LL  is, and always has been, very sensitive to the need to protect user data,even from internal prying.
  • In terms of the simulators, a core part of testing by Mazidox’s team is to ensure that where user data is being handled (e.g. account / payment information, etc.), it cannot even be reached internally by the lab, as certainly not through things like scripted enquiries, malicious intent or prying on the part of third-party vendors.
  • [54:30-55:18] Taken as a whole, SL on AWS will be more secure, as Amazon provide additional protection against hacking, and these have been combined with significant changes LL have made to their services in the interest of security.

Why is Uplift Taking So Long?

[15:48-19:20]

  • The biggest challenge has been continuing to offer SL as a 24/7 service to users without taking it down, or at least with minimal impact on users.
    • This generally requires a lot of internal testing beforehand to reach a point of confidence to transition a service, then make the transition and then step back and wait to see if anything goes dramatically wrong, or users perceive a degraded service, etc.
    • An example of this is extensive study, testing, etc., allowed LL to switch over inventory management from their own systems to being provisioned via AWS relatively early on in the process, and with no announcement it had been done – and users never noticed the difference.
  • Another major challenge has been to investigate the AWS service offerings and determine how they might best be leveraged by SL services.
  • As many of the SL services are overlapping one another (e.g. simulators utilise the inventory service, the group services, the IM services, etc.), a further element has been determining a methodical manner in which services can be transitioned without impacts users or interrupting dependencies on them that may exist elsewhere.
  • The technology underpinning Second Life is a lot more advanced and recent within the AWS environment, and this means LL have a had to change how they go about certain aspects of managing SL. This has in turn required experimentation, perhaps the deployment of new tools and / or the update / replacement of code, etc..

Will Running on AWS Lower Operating Costs?

[19:33-23:00]

  • During the transitional period it has been “significantly” more expensive to operate SL, inasmuch as LL is paying to continue to operate its proprietary systems and services within their co-lo facility and pay for running services via AWS.
  • Even after the need to continue paying for operating the co-lo facility has ended, it is unlikely that the shift to AWS will start to immediately reduce costs.
  • However, the belief is that moving to AWS will,  in the longer term, reduce operating costs.
  • Whether reduced operating costs lead to reduced costs to users, or whether the savings will be re-invested in making further improvements to the service lay outside of this discussion.
  • Right now the focus is not on driving down costs or making service significantly better, but is solely the work of getting everything transitioned. Lowering costs, making more efficient use of the underpinning capabilities provided by AWS will come after the migration work has been completed.

What Happens to the Old Hardware / Facility, Post-Uplift?

[23:09-25:15]

  • Several years ago, LL consolidated all of their hardware and infrastructure into a single co-location data centre in Arizona.
  • Most of the hardware in that facility is now so old it has depreciated in value to a point where it is pretty much worthless.
  • A specialist company has therefore been contracted to clear-out the Lab’s cage(s) at the co-lo facility and dispose of the hardware.
    • As a demonstration of LL’s drive to protect user data, all drives on the servers will be removed under inspection and physically destroyed via grinding them up on-site.

Continue reading “Lab Gab November 6th: Cloud Uplift update”

Don’t forget: Lab Gab, November 6th: cloud update

via Linden Lab

Lab Gab returns on Friday, November 6th, 2020, with a cloud migration update.

As most are aware, the work to transition Second Life to operating via Amazon Web Services (AWS) has now progressed to a point where regions on the main grid (called Agni) are starting to be transitioned. In fact, by the time the Lab Gab show live streams, approximately one-third of all Agni regions will be operating via AWS services.

At the same time, as as per my November 2020 Web User Group summary, the Web teams are hopeful that all web properties will be running via AWS by early December, placing the Lab on course to achieve its target of completing the migration (referred to as Project Uplift) by the end of 2020 (although there will likely be more work related to it to follow in early 2021).

This being the case, the Lab Gab segment will feature Oz Linden, the Lab’s Vice President of Engineering (and the man pretty much in overall charge of the engineering / technical aspect of the work) and Mazidox Linden, the Lab’s senior QA Engineer who has been particularly involved in the migration work, testing the simulator code in reference to the migration work, and who describes the project as “the largest change to the simulator [software] ever.”

“The Bugspray” Mazidox Linden (l) and Oz Linden will be joining Strawberry Linden on the Friday, November 6th segment of Lab Gab to discuss the cloud migration work

As usual, the programme will be streamed via YouTube, Facebook, or Periscope, at 10:00 SLT, and if all goes according to plan, I’ll have a summary of the video (and the video itself) available soon after the the broadcast, for those unable to watch live.

For those who may have questions on the migration work, there is still time to submit them via the Lab Gab Google form, in addition, and if there is time, questions may also be taken from the chat feeds associated with the live stream channels.

2020 SUG meeting week #45: further uplift update

A Thousand Windows, September 2020 – blog post

The following notes were taken from the November 3rd Simulator User Group meeting.

Server Updates and Cloud Uplift

Please reference to the server deployment thread for the latest updates.

  • There are no planned deployments to the simulators running on the Lab’s core SLS channel.
  • RC deployments are planned as follows:
    • On Wednesday, 4th November all simulators on the LeTigre and BlueSteel RC channels should become AWS hosted.
    • On Thursday, 5th November all simulators on the Magnum RC channel should also become AWS hosted.
    • However, at the time of the meeting, it was not clear if all of the RC channels would be running the same version of simulator software.

The current work on migration is such that as per Private Regions Available in Limited Quantity (via Linden Lab), private regions are once more being made available.

SL Viewer

The Start of the week saw the following viewer updates on Monday, November 2nd:

  • The Jellydoll project viewer updated to version 6.4.11.551213.
  • Custom Key Mappings project viewer updated to version 6.4.10.549685.

The rest of the official viewers in the pipelines remain as follows:

  • Current release viewer version 6.4.10.549686, formerly the Mesh Uploader RC promoted on October 14 – No Change.
  • Release channel cohorts:
    • Cachaça Maintenance RC viewer, version 6.4.11.551139, issued October 27.
  • Project viewers:
    • Legacy Profiles viewer, version 6.4.11.550519, October 26.
    • Copy / Paste viewer, version 6.3.5.533365, December 9, 2019.
    • Project Muscadine (Animesh follow-on) project viewer, version 6.4.0.532999, November 22, 2019.
    • 360 Snapshot project viewer, version 6.2.4.529111, July 16, 2019.

In Brief

  • Group Chat: LL deployed updates to the group chat service in an attempt to relieve at least some of the issues that groups have been experiencing over the last several weeks. Testing has suggested the group chat sessions should be faster and more reliable than has been experienced within some groups (notably those with large memberships). However, the issue remains open pending further observation / feedback.
  • TP failures continue, and are being noted by the Lab, although not at the rates at which users appear to be experiencing them.
    • However, the nature of the beast means that at present, correlation of all the logs involved in a teleport has to be done manually, and this is impacting the Lab’s ability to arrive at a potential root cause (or causes).
    • Once the majority of cloud migration work has been completed, and if the matters hasn’t been resolved, Simon Linden may set-up another round of TP testing by users as we’ve seen in past issues of teleport issues.