HTTP and Group Services updates

There are a number of projects underway at the moment to improve various aspects of Second Life performance. Some of these have been reported on as a part of the Shining Project, others are being dealt with elsewhere are reported on through the likes of the SL Scripting User Group and the fortnightly TPV/Developer meetings.

The following is by way of a brief update on the ongoing HTTP Library and Group Management projects with information taken from the most recent TPV/Developer meeting (recording link).

HTTP Library

The focus of this aspect of the Shining Project is to improve the underpinning HTTP messaging that is crucial to simulator / simulator and simulator / viewer communications, and it is under the management of Monty Linden.

Discussion on progress with the project commences at 36:36 into the recording.

The project code (textures only) is with the Linden Lab QA team and is expected to be in the 3.4.1 viewer once it has been released by QA. In the meantime, the HTTP project viewer was updated at the end of July. Many people are noticing improvements in viewer performance that go beyond initial texture loading, although there have been reports of other aspects of the viewer which use HTTP apparently being “slower” to use. This latter issue is most likely a false impression, with Monty commenting at the August 24th meeting that, “Most parts shouldn’t be affected. It’s competitive, when you’re doing both texture downloading and some of that work … but other things aren’t being cheated if you’re not downloading textures at the time.”

An issue has been noted in older Macbook Pro systems (late 2007 into 2008 dual-core systems, although the span of the problem isn’t clear) using nVidia drivers, wherein the expected speed-up with cached data which can be seen on other systems isn’t occurring. Monty is still investigating this. Overall, however, feedback on this project has been positive.

Group Management Functions

Large group loading: a familiar problem

Baker Linden has been working to resolve this problem, and his plan is also to go the HTTP route, which will require changes on both the server and the viewer sides of the equation. His comments on progress start at 42:53 into the TPV/Developer meeting recording.

The server-side code for an initial implementation of the solution has been passed to LL’s QA and is expected to be rolled to selected regions on the Beta (Aditi) grid soon.

In terms of the viewer, the plan is to develop a Project Viewer, which will be made available in the near future for people to use with the Aditi test regions. How soon this viewer is likely to appear is open to question – the code will initially need to be passed by LL’s QA (who may have received it on the 24th August) prior to the viewer being built. Once in the project viewer repository, the code will also be available for TPVs to produce test viewers of their own.

How long the testing period will last is also open to question and dependent upon feedback / issues arising. However, the plan will be to follow the usual pattern for roll-outs in that once the code has been tested on Aditi and necessary updates made, it will be rolled to a main grid RC for more more involved testing. This is important, as there is a significant different in the number and sizes of groups operating on the two grids. For example, the largest group on Aditi numbers some 40,000 members; on the main grid the largest group is about 112K, and there are many more groups with between 40K and 112K members.

One thing that has been made clear is that there will be no attempt at backward compatibility with V1-based viewers on the Lab’s part; the new code will be aimed solely at the V3 code base. However, V1-based viewers will still be able to use the UDP protocols for group management, although the LL servers will limit UDP access to groups with 10K members or fewer, so V1-viewers will have some more code backporting on their hands.

There will also initially be some issues around the new HTTP protocol. For example, in the first implementation, the data will be uncompressed. This means that a 40K member group is around 5Mb in size, which can take up to a few minutes to download, depending on someone’s connection speed, so some frustrations are liable to continue. While data compression will eventually be used, this is not planned for the initial implementation.

The discussion involved providing an option to routinely clear-down group lists based on people’s last log-in date, or who have not logged in for a (group owner specified) number of days. However, LL are not going to implement such a feature on the grounds that it could lead to mistakes being made, and people being accidentally removed from a group.

Time Scale and Implementation

As mentioned above, there is no definitive time scale for this work to be completed. Testing is liable to take several weeks at the very least, so it is unlikely the new group management capabilities will be rolled-out on a widespread basis for at least another month, or possibly longer.

However, and like the upcoming new avatar bake service, once the server code is available on the grid, the switch-over will be transparent. If a viewer has the code to use the new group management HTTP service, it will do so, if it has not been updated, it will continue to use the UDP service (with the aforementioned 10K “cap”) until such time as that capability is “retired” from the grid.

Lab seeks to improve how TPV support issues are addressed

C & TM Linden Research

As mentioned in the TPV/Developer meeting of the 24th August, Oz Linden has been taking steps to try an improve how issues are addressed by the company’s support teams when dealing providing support to users who are using a TPV as their viewer of choice.

That TPVs are collectively more popular than the official SL viewer is not that surprising. However, a lot of people still turn to Linden Lab for help when they encounter issues. As a result of this, LL have come in for criticism as to how they handle users who report that they are using TPVs, and it is this that has prompted Oz to try to improve how matters can be handled and addressed.

Identifying the Problem

The first part of dealing with any problem is correctly diagnosing whether it is in fact viewer-related or server-related. This isn’t as easy as it sounds because there are many parts of SL where the problem could reside either within the viewer or on the server-side of things (inventory issues being a good example) – hence why LL often get the call when things go wrong.

Because of this complexity, and in order to help improve the initial viewer issue / server issue diagnosis, Oz is working with LL’s support teams to put together a better set of heuristics for use in support staff training and guidance in identifying where a particular problem may reside. To help with this work, he has asked the TPVs supply lists of issues they have encountered which they know are not viewer issues, and how to recognise them. These lists can then be added to the information supplied to LL support staff to both speed the initial diagnosis of a problem and reduce the chances of a problem being mis-diagnosed from the outset.

It’s a Viewer Problem – But Can it be Reproduced on the LL Viewer?

When it comes to trying to resolve what appears to be a viewer issue, LL support staff will ask a) whether the user is using the official LL viewer; and b) if they have tried to reproduce the issue using the official LL viewer. These questions are often taken to mean LL’s support staff “do not want to help” with the problem if it appears to be TPV related.

However, this is not the case; the question is a perfectly valid part of trying diagnose a problem because:

  • If the problem can be reproduced using the official viewer, there is a chance support staff may be able to provide SL-viewer based assistance to resolve the issue
  • If the problem cannot be reproduced on the official viewer, then it at least helps point to the problem potentially being related to the TPV itself.

Obviously, if the problem does appear to be viewer-related but only manifests in a TPV, LL’s support personnel are unlikely to be able to give detailed help (simply because it is unfair to expect LL’s support personnel to be intimately versed in how to resolve issues occurring with all of the TPVs used to access SL). As such, they are going to pass the matter back to the user. When this happens, it can lead to frustrations and a feeling that LL “aren’t interested” in solving the problem.

To avoid this in the future, Oz is working with TPVs to ensure LL’s support staff are better placed to provide onward guidance rather than leaving users feeling they “don’t want to help”. This is being done by each TPV listed in the TPV Directory being asked to:

  • Add the details of any in-world support group(s) they operate to their Directory listing if they haven’t already done so
  • Use a new field in the Directory to give details of any additional locations where help on a specific TPV might be obtained (e.g. a website, a support forum, etc.)

Thus, should an issue appear to be related to a specific viewer which LL staff cannot help resolve, they will at least be able to point the user concerned in one or more directions where they can receive more focused assistance in order to resolve the problem.

Asking People to Complete the Survey

During the discussion, Oz reiterated that every support issue dealt with by LL staff should trigger a follow-up e-mail to the user concerned. While this might not happen until up to four days after the event itself, the e-mail does include a customer satisfaction survey. This is important for two reasons:

  • All survey responses are reviewed by a Linden Lab staffer; they are not farmed out to a third-party survey company or ignored or handled by an automated process
  • They are seen as a primary mechanism for determining how well support is identifying and dealing with issues to the satisfaction of LL’s users.

As such, Oz emphasised the importance for feedback to be given, particularly where there is strong evidence to show that support have failed to provide the correct assistance. While completing the survey may not help in resolving the issue itself, it may help pin-point errors within the support process, particularly if a number of surveys are received highlighting the same fault.

The current process by which support issues – particularly those with TPV problems reported to LL – are handled doesn’t always run smoothly, and there are times when issues do get mis-directed. However, Oz’s response to concerns raised during recent TPV developer meetings demonstrates that steps are being taken to address them. It has been suggested that LL post a blog entry on the initiatives explained here (particularly on the need for TPV users to understand why LL do ask about reproducing issues encountered using the official viewer). In lieu of that happening, I hope this piece will serve as an informational.