Lab Gab Feb 26 summary: AWS update and a farewell to Oz

via Linden Lab

On Friday February 26th, Lab Gab, the live streamed chat show hosted by Strawberry Linden on all things Second Life, returned with a show of two halves.

Featuring guests Grumpity Linden, the Lab’s Vice President of Product and Oz Linden, the Lab’s Vice President of Engineering, the first part of the show took a look at the latest of the work to migrate Second Life and all its services to running on Amazon Web Services (AWS) hardware and infrastructure and attempted to address questions forwarded to the Lab by Second life users.

Table of Contents

The show was also an opportunity to say “farewell” to one of the leading lights at the Lab – Oz himself, who is retiring from the company and from full-time work as a whole – after more than 11 years with the company.

The official video of the segment is available via You Tube, and is embedded at the end of this article. The following is a summary of the key topics discussed and responses to questions asked.

Please be aware that as some topics were touched on more than once during the conversation, the notes blow collect references together, rather than presenting them in chronological order. However, where relevant, time-stamps are provided.

Strawberry linden (l), Oz Linden and Grumpity linden (wearing an Oz ‘tache and goatee in his honour)

On the Status of the AWS Migration and the Future

Current Status

[1:43-2:50]

  • All of the services related to Second Life were transitioned to running on AWS hardware and infrastructure by the end of December 2020.
  • The last aspect of the core work was the removal of all of the Lab’s own hardware and equipment from the Arizona co-location facility that had been hosting Second Life, which included the shredding of 10,588 hard and solid state drives to ensure data security.
  • The majority of the work went a lot more smoothly than had been thought might be the case, however, there are some services that have given rise to some problems that are still being resolved.
  • Chief among the latter is the Land Store, which was once again turned on ready for use on Thursday, February 24th.
  • Map title generation has also been a issue sine the migration, but work is progressing on fixing this.
    • [9:09-11:34] A core issue with the Map tile generation failure lay in the fact that the code had not been touched in a “long, long time” – so long, in fact, that the code isn’t geared to rendering mesh objects, hence why they can look so abstracted on a map tile.
    • In terms of the current problems, the code made a lot of assumptions about the architectural environment in what it was running, assumptions that are no longer true with the move to AWS.
    • The current work is focused purely on getting the service to generate Map tiles one more, without making any additional changes to the code to account for things like rendering mesh objects correctly  or addressing other bugs.
    • Most of this work is now thought to be complete and Map tiles are now being generated as they should. however, there is some work to be completed on stitching tile images together when a user zooms out on the Map.
    • There is a project to improve the overall appearance of Map tiles, but this was put aside in the focus of migrating to AWS, but will hopefully be picked up again at some point in the future.

What is Next?

[2:54-5:45]

  • While the physical migration of Second Life services from a proprietary environment to AWS is complete, the Uplift Project work is not, and so will continue to be a focus of engineering efforts.
  • In  particular, the immediate focus is on optimisation work, which encompasses:
    • Optimising the performance of the various series on the new hardware / infrastructure.
    • Optimising (for the Lab) the cost involved in running within an AWS environment.
    • Fine tuning systems and operations within the new environment.
    • Working to leverage the new hardware options and infrastructure presented by AWS to favour Second Life as a product running in that environment.
  • In this it should be remembered at the initial migration work of getting Second Life transitioned to AWS was devoted purely to taking all of the SL services – front-end simulators, back-end services, middleware, web properties and services, supporting tools, etc., – from the proprietary environment in which they had always run and just getting them running on AWS in what was called a “lift and shift” operation, whilst making as few changes to any of the services as possible.
  • With the “lift and shift” aspect of the work completed, the engineering team has turned its attention to gathering data on exactly how the various services are running in the new environment and understanding where opportunities for making the improvements noted above may lie, and how they might best achieve them.
  • In this, the Lab now has much improved service monitoring tools at their disposal, and these are now allowing the initial work on tuning performance on key services to be made.
  • Two practical benefits of the move are:
    • Regions running on AWS can run more scripts / script events in the same amount of time than can be achieved on non-AWS regions.
    • The way in which simulators are now managed mean that LL can more directly obtain logs for a specific region, filter logs by criteria to find information, etc., and the entire process is far less manually intensive.

Will the Migration Mean lower Prices for Users?

[5:46-9:02]

  • Sort answer for the foreseeable future: no.
  • There has been an idea circulating among users that running SL on AWS is “dramatically cheaper” for Linden Lab; but this is actually not accurate.
  • Prior to the migration, all of SL and its services had been running on LL’s own hardware  for which there had been no capital expenditure for years, and which had completely depreciated.
  • The move to AWS represents something of a new capex spend, increasing the Lab’s costs [although it is not unfair to say that the capex involved is liable to be significantly less over time than repeatedly buying-in new server clusters to allow SL to run on more modern systems].
  • Rather than presenting LL with reduced costs, the move to AWS is designed to:
    • Present the company with far broader options for delivering a more performant and capable service to users – although as noted above, it will take time for all of this to be delivered.
    • Improve the overall longevity of the Second Life service through the noted performance improvements and access to better hardware and infrastructure services.

Second Life Mobile App  Status

[19:20-20:39]

  • Mobile has taken longer than expected to bring forth, for two primary reasons:
  • The first is that while the initial release will be more of a communications tool, considerable foundational work has been put into ensuring the app can be encompass a lot more functionality than that in the future.
  • The second has been that as a result of testing by Apple, the Lab has been forced to make changes to the way in which chat works.
    • These changes will, in time, be filtering through into the viewer as well.
    • They should actually make chat more reliable in the future.
  • No commitment as to when the app may be more widely available.

Other Technical  Questions

  • [11:38-17:47] There have been numerous niggling issues of late: further issues with search (e.g. avatars failing to show in search), profile issues, etc). When are these likely to be addressed? Should users report bugs then find?
    • Whilst the majority of the migration process did go smoothly, there have been glitches, and the Lab is working to address them alongside of working on the performance, etc., work mentioned above.
    • There are a lot of aspects of SL built on old technology, so there is an expectation that, over time, and as things can be looked at, not only will niggles go away, but software and capabilities as a  whole can be made a lot more stable and resilient.
    • Bugs should most definitely be reported using the SL Jira. Information on how to file bug reports (and feature requests) can be found here:
  • [17:55-19:18] Will capabilities that were being worked on some time ago (e.g. 360 snapshot viewer) ever be completed?
    • The migration work has demonstrated what can be achieved with a tightly defined set of goals and teams focused solely on those goals.
    • This is an approach Grumpity would like to carry forward, with a commitment to review current and past projects to determine what might be required to bring them to completion (input, time, resources, etc), and then make decisions from there.

About the Lab’s New Owners

[20:50-24:06]

Looking Back at Oz’s Time at the Lab

[24:20-46:20]

The latter half of the programme looks back over Oz’s time at the Lab and provides him with the opportunity to discuss what attracted him to Linden Lab, the nature of his work, why he regards his time with LL as potentially the best job he’s ever had, and to discuss his post-retirement plans and answers various questions.

Rather than offer a summary of this part of the show, I encourage people to listen to it in full, as it really is informative and enlightening, particularly if you’re not familiar with Oz’s work, his teams, or the Lab as a whole.

Lindens say “farewell” to Oz

[46:23-end]

The end of the show sees Strawberry teleport Oz to s special in-world retirement party, where the teams reporting into him and other LL staff have gathered to wish him well. This again a touching and moving tribute that says so much about Oz and the high regard in which he is rightfully held, and should be seen without input from the likes of myself.

For my part – and because Oz has been both a direct and indirect influence in my SL time – I’d like to just repeat what I wrote a few days ago on reading of his upcoming retirement:

For my part, I cannot claim to know Oz as well as I would like to – but I’ve always found find his enthusiasm for Second Life never to be anything less than totally honest and infectious, and his high regard for users utterly genuine and sincere.
As such … I’d like to take this opportunity to offer him a personal and public “thank you” for all the times he’s provided me with insight and / or encouraged me to get involved in various projects, all of it has been greatly appreciated. I am, and will be, genuinely saddened to see him leave the Lab; we are all losing something in his departure, and the void left will not be easy for the management team to fill.

Don’t forget: Friday Feb 26th – Lab Gab AWS update and a farewell

via Linden Lab

Just a quick reminder to folks who may not have caught the official announcement at the start of the week.

The latest edition of the Lab’s chat show series hosted by Strawberry Linden, Lab Gab, streams at 11:00 am on Friday, November 26th. And it’s a special show.

As most are aware, the work to transition Second Life to operating via Amazon Web Services (AWS) was completed at the end of December 2020, and the Lab has completely moved out of its former co-location facility in Arizona.

Since then work has been continuing to make tweaks and updates to both help get some services that didn’t make the transition as smoothly as hoped (perhaps most notably to most people, Map tile updates) once again running as they should, with work also progressing on fine-tuning things, with the Lab looking to better optimise their services to take full advantage of the the hardware and infrastructure provided AWS.

Given all this, the show will feature Grumpity and Oz Linden, respectively the Lab’s Vice President of Product and Vice President of Engineering, will be providing a update on how things are going.

Oz and Grumpity Linden, with Strawberry Linden between them, will be appearing on Lab Gab on Friday, February 26th, at 11:00am SLT. Image courtesy of linden Lab

In addition, Friday, February 26th marks the end of an era. As he recently announced, Oz  Linden is retiring from the Lab as from today, as so the show marks once of his last public appearances as a member of the Lab’s management team – and indeed as a Linden.

In his time at the Lab – which amounts to something over 10 years -, he has achieved and overseen a lot, and has been responsible both directly and indirectly for making Second Life a much more capable platform, and in building a solid and fruitful relationship with both third-party viewer developers and the open-source community in Second Life; he also makes no secret of the fact that he has enjoyed his time at the Lab immensely.

To mark the fact that this is potentially the last time the users will get to hear from Oz, the show will also look back over his time at  the Lab – so be sure not to miss it and hear from him on a personal level.

You can catch it through the Lab’s streaming outlets on YouTube, Facebook, or Periscope, and I’ll more than likely have a summary of the show out within 24 hours of it airing.

LL confirms Second Life regions now all on AWS

Logos ©, ™ and ®Linden Lab and Amazon Inc

On Thursday, November 19th, and after several months of very hard work in order to manage things in an orderly and as non-disruptive manner as possible, the last remaining regions on the Agni (the Second Life main grid) were successfully transitioned over to running on Amazon Web Services (AWS), thus placing the entire grid “in the cloud”.

The announcement can first via Twitter, and from April Linden, the Lab’s Systems Engineering Manager, Operations, who announced:

April Linden’s announcement

The Lab actually started transitioning regions several weeks ago, and without fanfare, first moving a number of regions only accessible to Linden personnel, and they carefully widening things to include selected public regions on the Mainland, and  – subject to the estate owners initially keeping quiet as well – private regions that experience assorted loads.

These initial transitions were more about testing certain aspects of simulator operations, rather than marking the outright start of any region migration process; the Lab wanted to gather data on simulator / region performance on AWS and investigate how simulators with a wide mix of avatar  / content loads behaved.

However, these initial moves quickly gave April and her team, the QA team under Mazidox Linden  and the simulator development team, the confidence to start broadening the “uplift” process further, extending things first to the simulator release candidate deployment channels (RC channels) and then, in the last couple of weeks, the bulk of the regions as they sit on the SLS “Main” channel.

While there have been hiccups along the way – most notably with teleport problems and group chat / IM failures,together with some performance degradation in other areas – on the whole, the entire transition of the grid has been remarkably smooth and problem-free.

However, this does not mean all of the work is over: as LL would only be quick to point out themselves, there are still a number of back-end systems to transition to AWS, and after that, there will inevitably be a period of “bedding in” everything to get things running, before work can start on the “fine tuning” of all the various services. (there are also some regions still running in the Lab’s co-location facility in Arizona to help  people with workarounds for specific issues, but these are perhaps just a handful, including a couple of  public regions – Debug1 and Debug2.)

Soft Linden on the AWS transition

Nevertheless, this is a huge achievement, and marks a hugely significant milestone in what has thus far been around a 3-year project to get all of Second Life safely transitioned over to AWS, so congratulations to all of those at the Lab who have been working very hard to make this happen, and without causing widespread upset or issues.

2020 SUG meeting week #47: uplift

Paradise on Sea, October 2020 – blog post

The following notes were taken from the Tuesday, November 17th Simulator User Group meeting.

Server Updates and Cloud Uplift

Please also refer to the week’s server deployment thread.

  • On Tuesday November 17th, the AWS RC channels were updated to simulator version 552183, which includes internal configuration changes, and the outcome of this deployment is being monitored.
  • On Wednesday, November 18th, the rest of the SLS Main channel may be migrated to running on AWS, with simulators on that channel also running sever update package 551942, which will mean all main grid (Agni) regions will be running via AWS. However, this is currently fluid – check the deployment thread for updates.
  • There may be a further deployment on Thursday, November 18th. Again, check the server deployment thread for updates.

Additional Notes

  • Due to  known issues with regions running on AWS, the Lab will continue to run Debug1 and Debug2 from their co-location facility,  for residents who need to to use for workarounds to these issues.
  • It is hoped that the configuration changes will help improve the recent TP failure and group chat  issues many have been experiencing – however, this is dependent on the above-noted monitoring of the simulator update.

SL Viewer

The Start of the week has seen no change to the current crop of official viewers, leaving the as follows:

  • Current release viewer version 6.4.11.551711, formerly Cachaça Maintenance RC viewer promoted on November 12 – NEW.
  • Release channel cohorts:
    •  Custom Key Mappings project viewer, version 6.4.12.552100, November 12.
  • Project viewers:
    • Simple Cache project viewer, version 6.4.11.551403, issued on November 12.
    • Project Jelly project viewer (Jellydoll updates), version 6.4.11.551213, November 2.
    • Legacy Profiles viewer, version 6.4.11.550519, October 26.
    • Copy / Paste viewer, version 6.3.5.533365, December 9, 2019.
    • Project Muscadine (Animesh follow-on) project viewer, version 6.4.0.532999, November 22, 2019.
    • 360 Snapshot project viewer, version 6.2.4.529111, July 16, 2019.

 

2020 SUG meeting week #46: uplift

Still Waters, September 2020 – blog post

The following notes were taken from the November 12th Simulator User Group meeting.

Server Updates and Cloud Uplift

Please reference to the server deployment thread for the latest updates.

  • On Tuesday, November 10th, the uplifted AWS RC channels were updated with simulator release 551942. This version includes some cloud configuration changes that these may improve some of the performance metrics, but otherwise should not be anything user visible.
  • On Wednesday, November 11th, around 50% of the SLS channel will be transitioned to AWS services, also running simulator version 551942.

SL Viewer

The Start of the week has seen no change to the current crop of official viewers, leaving the as follows:

  • Current release viewer version 6.4.10.549686, formerly the Mesh Uploader RC promoted on October 14 – No Change.
  • Release channel cohorts:
    • Cachaça Maintenance RC viewer updated to version 6.4.11.551711,on November 6.
  • Project viewers:
    • Project Jelly project viewer (Jellydoll updates), version 6.4.11.551213, November 2.
    • Custom Key Mappings project viewer, version 6.4.10.549685, November 2.
    • Legacy Profiles viewer, version 6.4.11.550519, October 26.
    • Copy / Paste viewer, version 6.3.5.533365, December 9, 2019.
    • Project Muscadine (Animesh follow-on) project viewer, version 6.4.0.532999, November 22, 2019.
    • 360 Snapshot project viewer, version 6.2.4.529111, July 16, 2019.

Teleport Issues

The teleport issues – particularly AWS-to-AWS regions – are still proving problematic for some.

There are reports from some AWS-hosted regions of issues with TPs manifesting with other issues – rezzing problems, errors trying to add items to object contents. All seem to be rectified by a region restart (hardly the best solution), before things start going awry once more.

The Lab have added more logging to the simulator so they can further analyse the problem(s).

Lab Gab November 6th: Cloud Uplift update

via Linden Lab

On Friday November 6th, 2020 Lab Gab, the live streamed chat show hosted by Strawberry Linden on all things Second Life returned to the the subject of the work to transition all Second Life services to Amazon Web Services (AWS) and away from running on the Labs’ proprietary hardware and infrastructure.

The session came some 7 months after the last Lab Gab to focus on this work in April 2020 with Oz Linden and April Linden (see Lab Gab 20 summary: Second Life cloud uplift & more), and this time, Oz Linden sat in the hot seat alongside Mazidox Linden.

The official video of the segment is available via You Tube, and is embedded at the end of this article. The following is a summary of the key topics discussed and responses to questions asked.

Mazidox Linden is a relative newcomer to the Linden Lab team, having joined the company in 2017 – although like many Lab staff, he’s been a Second Life resident for considerably longer, having first signed-up in 2005.

Table of Contents

He is the lead QA engineer for everything simulator related, which means his work not only encompasses the simulator and simhost code itself, but also touches on almost all of the back-end services the simulator software communicates with. For the last year he has been specifically focused on QA work related to transitioning the simulator code to AWS services. He  took his name from the Mazidox pesticide and combined it with the idea of a bug spray to create is avatar, to visualise the idea of QA work being about finding and removing bugs.

Oz Linden joined the company in 2010 specifically to take on the role of managing the open-source aspects of the Second Life viewer and managing the relationship with third-party viewers, a role that fully engaged him during the first two years of his time at the Lab. His role then started expanding to encompass more and more of the engineering side of Second Life, leading to his currently senior position within the company.

“The Bugspray” Mazidox Linden (l) and Oz Linden joined Strawberry Linden for the Friday, November 6th segment of Lab Gab to discuss the cloud migration work

What is the “Cloud Uplift”?

[3:25-5:55]

  • Cloud Uplift is the term Linden Lab use for transitioning all of Second Life’s server-based operations and services from their own proprietary systems and services housed within a single co-location data centre to  commercial cloud services.
  • The work involves not only the visible aspects of SL – the simulators and web pages, etc., but also all the many back-end services operated as a part of the overall Second Life product,  not all of which may be known to users.
  • The process of moving individual services to the cloud is called “lift and shift” – take each element of software, making the required adjustments so it can run within a cloud computing environment, then relocate it to AWS infrastructure and hardware in a manner that allows it to keep running exactly as it did prior to the transfer, while avoiding disruptions that may impact users.
  • The current plan is to have all of the transitional work completed before the end of 2020.
  • However, this doe not mean all the the work related to operating SL in the cloud will have been completed: there will be further work on things like optimising how the various services run on AWS, etc.,

Why is it Important?

[5:56-12:12]

  • It allows Second Life to run on hardware that is a lot more recent than the servers the Lab operates, and allows the Lab to evolve SL to run on newer and newer hardware as it becomes available a lot faster than is currently the case.
    • In particular, up until now, the route to upgrading hardware has involved the Lab reviewing, testing and selecting hardware options, then making a large capital expenditure to procure  the hardware, implement it, test it, then port their services over to the hardware and test, then implement – all of which could take up to 18 months to achieve.
    • By leveraging AWS services, all of the initial heavy lifting of reviewing, testing, selecting and implementing new server types is managed entirely by Amazon, leaving the Lab with just the software testing / implementation work.
  • A further benefit is that when SL was built, the capabilities to manage large-scale distributed systems at scale didn’t exist, so LL had to create their own. Today, such tools and services are a core part of product offerings alike AWS, allowing the Lab to leverage them and move away from having to run (and manage / update) dedicated software.
  • Two practical benefits of the move are:
    • Regions running on AWS can run more scripts / script events in the same amount of time than can be achieved on non-AWS regions.
    • The way in which simulators are now managed mean that LL can more directly obtain logs for a specific region, filter logs by criteria to find information, etc., and the entire process is far less manually intensive.

How Secure is SL User Data on AWS?

[12:20-15:43]

  • It has always been LL’s policy when dealing with third-party vendors (which is what AWS is) not to expose SL user data to those vendors, beyond what is absolutely necessary for the Lab to make use of the vendor’s service(s).
  • This means that while SL user data is stored on AWS machines,it it not stored in a manner Amazon could read, and is further safeguarded by strict contractual requirements that deny a company like Amazon the right to use any of the information, even if they were to be able to read it.
  • In fact, in most cases, user-sensitive data is effectively “hidden” from Amazon.
  • LL  is, and always has been, very sensitive to the need to protect user data,even from internal prying.
  • In terms of the simulators, a core part of testing by Mazidox’s team is to ensure that where user data is being handled (e.g. account / payment information, etc.), it cannot even be reached internally by the lab, as certainly not through things like scripted enquiries, malicious intent or prying on the part of third-party vendors.
  • [54:30-55:18] Taken as a whole, SL on AWS will be more secure, as Amazon provide additional protection against hacking, and these have been combined with significant changes LL have made to their services in the interest of security.

Why is Uplift Taking So Long?

[15:48-19:20]

  • The biggest challenge has been continuing to offer SL as a 24/7 service to users without taking it down, or at least with minimal impact on users.
    • This generally requires a lot of internal testing beforehand to reach a point of confidence to transition a service, then make the transition and then step back and wait to see if anything goes dramatically wrong, or users perceive a degraded service, etc.
    • An example of this is extensive study, testing, etc., allowed LL to switch over inventory management from their own systems to being provisioned via AWS relatively early on in the process, and with no announcement it had been done – and users never noticed the difference.
  • Another major challenge has been to investigate the AWS service offerings and determine how they might best be leveraged by SL services.
  • As many of the SL services are overlapping one another (e.g. simulators utilise the inventory service, the group services, the IM services, etc.), a further element has been determining a methodical manner in which services can be transitioned without impacts users or interrupting dependencies on them that may exist elsewhere.
  • The technology underpinning Second Life is a lot more advanced and recent within the AWS environment, and this means LL have a had to change how they go about certain aspects of managing SL. This has in turn required experimentation, perhaps the deployment of new tools and / or the update / replacement of code, etc..

Will Running on AWS Lower Operating Costs?

[19:33-23:00]

  • During the transitional period it has been “significantly” more expensive to operate SL, inasmuch as LL is paying to continue to operate its proprietary systems and services within their co-lo facility and pay for running services via AWS.
  • Even after the need to continue paying for operating the co-lo facility has ended, it is unlikely that the shift to AWS will start to immediately reduce costs.
  • However, the belief is that moving to AWS will,  in the longer term, reduce operating costs.
  • Whether reduced operating costs lead to reduced costs to users, or whether the savings will be re-invested in making further improvements to the service lay outside of this discussion.
  • Right now the focus is not on driving down costs or making service significantly better, but is solely the work of getting everything transitioned. Lowering costs, making more efficient use of the underpinning capabilities provided by AWS will come after the migration work has been completed.

What Happens to the Old Hardware / Facility, Post-Uplift?

[23:09-25:15]

  • Several years ago, LL consolidated all of their hardware and infrastructure into a single co-location data centre in Arizona.
  • Most of the hardware in that facility is now so old it has depreciated in value to a point where it is pretty much worthless.
  • A specialist company has therefore been contracted to clear-out the Lab’s cage(s) at the co-lo facility and dispose of the hardware.
    • As a demonstration of LL’s drive to protect user data, all drives on the servers will be removed under inspection and physically destroyed via grinding them up on-site.

Continue reading “Lab Gab November 6th: Cloud Uplift update”