Difference between revisions of "Brainstorming"

From Second Life Wiki
Jump to navigation Jump to search
Line 48: Line 48:
* For all the mentioned permissions the protocol should provide a means of specifying these capabilities. If you want to sell derivations the other agent domain also needs to support it and needs to be trusted. Possible new permissions could be 'sell copies' and 'franchise' with additional minimum percentage the creator defines (modifiable upwards by the next owner, but not downwards). If object is copy and transfer, seller's copies get deleted with a transfer, if object is set to sell copies, seller's copies are not deleted. Maybe the definition of a minimum price might be interesting, too.
* For all the mentioned permissions the protocol should provide a means of specifying these capabilities. If you want to sell derivations the other agent domain also needs to support it and needs to be trusted. Possible new permissions could be 'sell copies' and 'franchise' with additional minimum percentage the creator defines (modifiable upwards by the next owner, but not downwards). If object is copy and transfer, seller's copies get deleted with a transfer, if object is set to sell copies, seller's copies are not deleted. Maybe the definition of a minimum price might be interesting, too.
* Access controls beyond the walls of a single managed protection domain like the Linden grid are actually doomed to failure, for numerous reasons detailed by security and cryptography heavyweights who have explored DRM.  Nevertheless, extending existing SL object permissions to "trusted" external sites is still perfectly feasible ... but it would be a mistake to believe that such access restrictions would be anything but temporary.  ''This fact will have to be made clear to asset creators to avoid legal problems and discontent later on,'' as a social measure.  As a limited technical measure, a Local_Distribution_Only permissions bit will probably be required as well.
* Access controls beyond the walls of a single managed protection domain like the Linden grid are actually doomed to failure, for numerous reasons detailed by security and cryptography heavyweights who have explored DRM.  Nevertheless, extending existing SL object permissions to "trusted" external sites is still perfectly feasible ... but it would be a mistake to believe that such access restrictions would be anything but temporary.  ''This fact will have to be made clear to asset creators to avoid legal problems and discontent later on,'' as a social measure.  As a limited technical measure, a Local_Distribution_Only permissions bit will probably be required as well.
** To put it bluntly, unless you can create a system which either allows no objects without full perms to leave a particular trusted group of agent domains, or create a system which guarantees that permissions will be respected, you will get a massive shit-storm from content creators, which no amount of explanations about the ultimate pointlessness of DRM will negate. Create a grid system which doesn't respect the current system or permissions, and content creators will not use it. --[[User:Ian Betteridge|Ian Betteridge]] 08:49, 2 November 2007 (PDT)


=== Identity ===
=== Identity ===

Revision as of 07:49, 2 November 2007

Architecture Working Group main Project Motivation Proposed Architecture Use Cases In World Chatlogs

This page is all about brainstorming about the upcoming architecture. Add your thoughts here in no particular format. Can be use cases, requirements, scenarios. Maybe shouldn't be too long but long enough to get your idea across. Can also be implementation details maybe but in the lower section.


Usage examples and requirements

General architecture

  • always keep the scary numbers of Project_Motivation in mind, ie. design for scalability.
  • allow to run a small grid on my laptop.
  • allow big-iron sites (eg. LL) to run as proxy cache for small but authoritative private grid.
    • I don't understand this. Zero Linden 11:27, 16 October 2007 (PDT)
  • allow plug-ins as a general principle for extensibility by attachment; this should apply at all levels.
    • We must be careful here: plug-ins often lead to balkanization. Zero Linden 11:27, 16 October 2007 (PDT)
  • Talk to other online virtual worlds to discuss interconnectivity, at least to some degree. For instance, with one online ID and perhaps registration with each virtual world, residents could inhabit seamlessly Second Life then possibly World of Warcraft or any other virtual world, in the same way online surfers can go from one website to another. In short, let there eventually be one huge interconnected 3d environment, with each world as distinctive as a website on the net.
    • Other industry groups may be tackling this soon. We shouldn't loose our focus on extending our virtual world to Internet scale. Zero Linden 11:27, 16 October 2007 (PDT)
  • Integrate within Second Life a platform to allow residents to create their own virtual world which can then be connected via a personal server to the grid. If this is not entirely possible then create a download so a world can be created offline to be later uploaded to a personal server and then be added to the grid. The development of such software could be a challenge to present to the many resident coders. People with technical knowledge should not be the only ones able to add their own sim to the grid. This should be developed in a way to enable everyone to do so.
    • Enable short-term sandboxes launched from within a viewer - think like a traditional LAN party for a multiplayer game, this would do for short social events etc, for 24/7 sims allow hosting by 3rd parties and those 3rd parties (like web hosts) charge for their server space and administration.
  • Make Second Life more democratic. Rather than having rules and regulations imposed on residents, allow democratically elected members to represent residents wishes and aspirations, and even let such individuals govern regions. Essentially, Linden Labs should let go of its junta like hold of Second life and allow residents to govern themselves. In its present state Second Life does not represent a free or democratic entity and this lack of freedom is becoming ever more tangible to residents.
    • Perhaps this is a better way to say this: Ensure other entities hosting land can manage those regions as they choose. This may include land managed as LL does now, land managed by elected leaders, or land run under very strict control. Zero Linden 11:27, 16 October 2007 (PDT)

Interoperability

  • allow different formats for objects or assets in general. You might need different viewers for that (e.g. WoW and SL or EVE are quite different in their structure) but you should at least be able to share an identity. You might need different agents though as e.g. in games you want to store game information in the profile and the profile is attached to an agent. (See Viewer Refactoring below to help with this.)
  • for interoperability, objects at the protocol level must be self-describing to allow transformation to the local format.
  • allow different types of regions and region formats. Again as an example, EVE and WoW have different understanding on how to partition the world and how to implement it.
  • look at things like Multiverse or Metaplace to see how these can be interconnected.
  • Use some common IM format to IM between agents. Jabber comes to mind but should probably also be pluggable. Maybe integrating existing multi-format clients (of open source) is an option.

Commerce

Ensure that the entities used in commerce are supported

  • currencies or other forms of payment accepted (object trading, external payment systems)
  • exchanging rates and such
  • timestamped transaction records (this was exchanged for that)
  • unforgeable documents for contracts

Objects and Assets

This section has been moved to its own wikipage: Asset

Permissions

  • What does a creator need to define? It should be possible for the creator to e.g. define that an object is only valid within a certain grid. The question here might be what makes up a grid. Probably it means a certain agent domain. What does that mean to the user though? If the user bought that object I'd think that he/she should really own it and be able to copy it to other agent domains and other grids. The problem might be what happens if these have lesser security policies and it got copied? (of course in general there is no way to prevent it anyway, I am talking here about different policies for different grids). Should it be some new permission which say "copy to other grids"? When buying such an object you can decide then whether it's worth it or not.
  • We already discussed DRM in general and the opinion was that it should be not hardwired into the protocol. Of course it should be possible to plug it in if people think it makes sense.
  • What about being able to link from objects. E.g. it is annoying right now that an object cannot be mod,copy,trans but "trans" meaning here that all the copied will be deleted once you give it to another person. Because of that one permission is usually missing. Either you cannot copy it or you cannot transfer it. So here is how it might be
   * Agent buys object which is full perm but it should not be allowed to sell multiple copies of it, only the one you bought.
   * Agent rezzes multiple copies and eventually modifies them
   * Agent give the object to another agent (or sells it). When he does this all the copies should be deleted.
It could be implemented as links to the original object or the object itself is more or less the group of all rezzed items. It makes sure the user can do whatever he wants but cannot just make money from selling multiple copies if not bought.
  • Make derivations possible, like e.g. IMVU has it. You buy some object and modify it. You then can resell it but the original creator still gets some fraction for each sold copy.
  • For all the mentioned permissions the protocol should provide a means of specifying these capabilities. If you want to sell derivations the other agent domain also needs to support it and needs to be trusted. Possible new permissions could be 'sell copies' and 'franchise' with additional minimum percentage the creator defines (modifiable upwards by the next owner, but not downwards). If object is copy and transfer, seller's copies get deleted with a transfer, if object is set to sell copies, seller's copies are not deleted. Maybe the definition of a minimum price might be interesting, too.
  • Access controls beyond the walls of a single managed protection domain like the Linden grid are actually doomed to failure, for numerous reasons detailed by security and cryptography heavyweights who have explored DRM. Nevertheless, extending existing SL object permissions to "trusted" external sites is still perfectly feasible ... but it would be a mistake to believe that such access restrictions would be anything but temporary. This fact will have to be made clear to asset creators to avoid legal problems and discontent later on, as a social measure. As a limited technical measure, a Local_Distribution_Only permissions bit will probably be required as well.
    • To put it bluntly, unless you can create a system which either allows no objects without full perms to leave a particular trusted group of agent domains, or create a system which guarantees that permissions will be respected, you will get a massive shit-storm from content creators, which no amount of explanations about the ultimate pointlessness of DRM will negate. Create a grid system which doesn't respect the current system or permissions, and content creators will not use it. --Ian Betteridge 08:49, 2 November 2007 (PDT)

Identity

This section has been moved to its own wikipage: Identity

Privacy and control

  • Motivation
  • The single-grid SL implemented privacy and control in a manner not untypical for centralized, managed systems, effectively denying privacy as a concept and exercising control (lighthanded, but nevertheless control) over world inhabitants. The policy also lead to implicit responsibility for the actions of inhabitants, on the basis of "if you police it, then you are responsible for any infractions".
  • In a worldwide, distributed, and massively scaled universe of interconnected 3rd party worlds and grids, this current approach is no longer tenable, for a number of reasons:
  • RL identity verification is neither generally feasible nor in many instances desireable (see Identity, above).
  • Attempts to enforce strong guarantees suffer from the same fundamental weakness as DRM, and lead to a similar arms race.
  • The laws of California are not generally of interest in the rest of the world.
  • The operators of individual grids will not in general have control rights elsewhere, let alone legal jurisdiction.
  • In some countries more than others, citizens place extremely high value on individual privacy, often protected by law.
  • Privacy is relevant to personal security as well, thwarting stalkers and even being helpful for domestic security.
  • Social mores vary immensely across the world, so imposing those of one society on another is misplaced and generally not desireable.
  • Open source denies the possibility of adding covert wiretaps and similar measures, so distributed monitoring is infeasible.
  • Policing and assuming responsibility for the actions of others is itself a recipe for discontent and ultimately failure.
  • The above suggests that attempting to implement the privacy and control policies of the original SL in the new architecture would not be an appropriate goal. Instead, privacy and control might better be focussed in a different direction, as below.
  • Requirements
  • The following requirements could be well suited to a universe of multiple distributed domains and high cultural diversity:
  • Implement strong(er) privacy guarantees
Although "evil" attached worlds and grids will unavoidably be able to subvert privacy measures within their own domains, by default there should be a presumption of privacy to cater for those societies where privacy is valued. This includes:
  • Built-in volumetric barriers, for example Argent Stonecutter's remarkably simple [Parcel Basements].
  • End-to-end encryption for non-vicinity communication channels.
  • Encryption of client-server traffic to defeat in-transit monitoring.
  • Disclaim control / gain immunity of common carrier
"Common carrier" immunity is historically only recognized for PTTs, and has been slow in moving into the realm of ISPs and higher level services providers. Nevertheless, commonsense offers that you cannot control what you cannot see, and you cannot reasonably be held accountable for what you do not control. Barriers to visibility added for the sake of privacy can therefore also deliver a useful defence and tangible business benefits, particularly in domains where personal freedoms are expected.
  • Limit the amount of logging, and keep permanent records for financial transactions alone.
  • Replace "guardian responsibility" by "personal responsibility":
  • Increase the power of parcel and region owners to detect the sources of abuse and remove it.
  • Increase the power of individuals to make unwanted effects/objects disappear from client visibility.
  • Add arbitrary owner-defined land tags to provide flexible classification, and land entry barriers.
  • Add personal interest tagging, to trigger land barriers and avoid unwanted surprises.
  • While the above are technical measures relevant to architecture, many coercive forces are arrayed against privacy and towards stronger control over virtual citizens, so be prepared for hard times and pressure.

Viewer

  • allow all sorts of viewers, from 3D to cellphone to web sites
    • Create a viewer with the absolute minimum functionality possible (i.e. core client functions) and then make it optionally extensible based on client capability.
      • Put core protocol functionality (login/logout, movement etc) into a library
      • Make the server only send client data it has asked for (client sends "i want prims" message and get prims etc)
      • Should be trivial to integrate with other systems if this is done correctly - IRC gateways for IMs for example - note: there should be thoughts about SPAM prevention from the very beginning. A "We can think about that later" wont work here!
    • Allow the client to reassign controls and common functions to a Human Interface Device (gamepad, joystick, customized keyboard, a Bluetooth-connected Wii Remote, etc.)
  • Client-side Script runtime
    • In keeping with the idea of the core protocol functionality being a mere library, it could be possible to make the rest of the viewer very modular in this sense and use a scripting language as the glue, or to enable the scripts to execute library calls directly depending on configuration
    • scriptable chat & movements avatar (bot)
    • advanced graphical hud
      • This needs clarification - it seems a good idea to allow 2D sprites rather than only 3D objects built on prims to be mapped onto the HUD, it also seems a good idea to empower the scripts with better capabilities for drawing on such a 2D plane
  • Functional refactoring of viewer
    • The current SL viewer implements functions from a large number of domains, many of them concerned with maintaining avatar presence (login, avatar state and behaviour, communications/protocol and event handling, etc), and many others concerned with the actual viewing (managing and rendering a graphic 2D UI and 3D world representation). It is large, monolithic and growing, and quite inflexible in this state, because new clients can be supported only by full code replication, and only one can run at a time per presence because of this. Large-scale refactoring is required here if we desire something better.
  • To cater for the stated requirement of allowing all sorts of viewers and running client-side scripting in a flexible manner, then at least the presence-related functions need to be extracted from current graphics code. This would then permit the following scenarios, for example:
[Use cases]
  • An intermittent mobile viewer could periodically manipulate or monitor an avatar presence currently logged in from home.
  • Within the home, secondary screens running on home theatre equipment would offer a very appealing view of the world.
  • A scripted presence could run without any 3D viewer at all, for example a shop transactions handler.
  • An audio-only mobile viewer could attend an SL live music concert and still pay the musician's tip jar.
  • An event-only viewer could send events to the presence corresponding to RL events, eg. from musical instruments.
  • The machinima community could run multiple cameras observing a single presence for great flexibility.
  • Under crowd conditions (eg. most live music events), a low-detail viewer could be switched in without relogging.
  • SL-based games and sports would be able to run their own bespoke viewer and UI, without reworking presence code.
  • A client could be used for remote changing/editing 'information'. Think of a 'blog editor' for SL, for text displayed by hovertext, prim displays and future SVG or HTML displays.
  • To achieve this, the "viewer" needs to be redesigned as a view attached to a presence (more precisely, to the client-side presence handler, because the real presence is more properly considered to be server-side), in an N:1 relationship. Client-side scripting would then also attach to the presence handler in a similar manner, for UI control, as a proxy for human input, and also for distributed computation at the request of servers.
  • Or, as another way of looking at it, the presence handler (currently a part of the monolithic viewer) needs to become an independent multiplexer process, with graphics viewers and other applications and scripts attaching to it as needed, or not at all if no view is required. The development of many other diverse and quite minimalist viewers then becomes significantly easier.

Agents

This section has been to its own wikipage: Agent

Regions

This section has been moved its own wikipage: Region

Virtualization of regions

This section has been moved to its own wikipage: Virtualization

IM

Scalability of Instant Messaging (IM) is required along all the dimensions given in Project_Motivation, with added requirements:

  • Messaging must be extended to the distributed worldspace of attached regions and identities
  • Messaging fanout capability needs to grow automatically with world population to avoid endemic lag
  • Alongside increases to IM extent, we also need recipient-end facilities for IM denial and filtering
  • Local vicinity messaging requires additional controls to tame crowd spam at very large events

The magnitude of this task, as shown by the inability of the current IM system to handle even current population levels, suggests that it may be best not to reinvent the wheel of IM, but merely to interface to existing open-source messaging systems.

Implementation thoughts

Regions

This section has been moved its own wikipage: Region

Currency

  • how will virtual currency be handled in a distributed grid architecture?
    • Will LL still support L$ in future, or will it be phased out? (perhaps a virtual currency should have no special place in the grid at all - just as there is no special currency on the web ?)
    • L$ exists as "a limited license right" within Second Life and therefore only makes sense within the official grid(s) owned by LL.
      • Privately-owned grids could be responsible for issuing their own licenses, to throttle their clients' use of those private resources.
        • Allowing private grids to issue their own limited license rights for use of their hardware makes it possible to disconnect them from many if not all centralized systems.
          • Remember that "resources" also includes "design and creation talent". Simply because there is no cost of reproduction of objects doesn't mean that it takes zero resources to make them. It takes the time, effort and knowledge of the designer. This is classical economics - labour, capital, and the means of production :) --Ian Betteridge 08:37, 2 November 2007 (PDT)
        • Such disconnections may allow us to implement a fully localized grid for use on private intranets or on a standalone system.
      • By their nature, private licenses would be mutually incompatible with the official "L$" licenses.
        • However it may be possible to setup automated gateways to convert between a 3rd-party currency and L$ or any other arbitary currency - something like a cross between paypal and the current popups for when one's L$ balance is too low
          • This is possible, but who gets to do the arbitrage? My guess is that, even if such systems are not created, there will emerge individuals and corporations who will exchange one currency for another for popular grid groupings. Consider, for example, the gold sellers in WoW. --Ian Betteridge 08:37, 2 November 2007 (PDT)
      • Alternative servers (or local grids) with accounts linked to LL servers might be able to purchase L$ to be issued to its members for use on LL servers.
  • allow for secure transactions other than in L$ via PayPal, credit cards, etc.
  • It is possible to maintain a single payment mechanism across multiple grids in a decentralized way using a social-network credit, or Hawala. Actual payment systems using this method exist (for example Ripple) and allow transparent convertibility between different currencies, some of them also allow transfers from and to Paypal and other online payment systems.
  • Replace currency with cryptographically signed documents with the electronic equivalent of an IOU?

Assets

This section has been moved its own wikipage: Asset

Protocols and interaction patterns

I've added this bucket to discuss the related topics of protocols and interaction patterns.

We have several building blocks proposed from Linden Labs, in the form of [certified http], REST, and [capabilties].

We do not have much described from Linden in two other major areas, namely how to choreograph calls into sets of calls which achieve an end (interaction patterns) and how to manage situations where we wish to setup an ongoing stream of interaction between multiple components. This last case is especially cogent when we discuss how region servers interact to provide the illusion of seamlass land.

Zha Ewry - 9/25/07

Capabilities

The capabilities of a domain, region or client do not have to mesh perfectly, but we should set about making sure it doesn't choke.

3D Web

3D Web simply put is forcing the multidimensional flat data of the internet into a single continuous three dimensional space. In the past it has largely failed because of a lack of hardware and attainable social pressure. SL has the potential to fill the 3D Web niche if it can at the region & domain level be integrated with traditional web applications.

Limited Capability Clients

When a client such as a cellphone or any limited capability client (LCC) connects, many of the more advanced features could be turned off for them but if there were a way to automate some of them, then the LCC could still participate.

There are a number of features that would enable supporting LCCs.

  • Scriptable capability deficiency handler
    • This could allow the world to automate features not supported by the client so the client could still participate.
      • A cellphone user goes to a show to listen to the audio stream, when they connect to the audio stream the region automatically seats them in a vacant chair.
  • Scriptable Auto-navigation waypoint system (choices would be present as menus, etc)
    • This could allow a user who can't see the world navigate it by menu.
  • Embedded web pages from prims made the primary focus so they can broken out and interacted with.


There is no requirement stated here that they be scripted in LSL. I don't think LSL would necessarily be appropriate -- Strife Onizuka 17:07, 27 September 2007 (PDT)

Capabilities: The path to 3D Web

With a framework in place to negotiate and script deficiencies in supported features, the ability to support an LCC that is in fact a web browser becomes possible. More importantly, the transition from Web to full 3D Web can be transitioned one feature at a time.

A website could be described as something like an art gallery. Information is organized into rooms and it typically line the walls. The passages between the rooms allow traversal from one page to another. This analogy is nowhere from perfect.

With a dynamic capabilities system, the client could drop down to dumb mode and browse the art gallery like a webpage or it could explore it as a 3D space with all the features turned on.

One interesting question for a migration/incremental move between static 3d content and a fully immersive environment, is when you can, and how you can, bring the space from static traditional content to a space where multiple avatars can interact. When we look at this space, it is interesting to notice that we're really doing several things at once in a space like SecondLife. There is the presentation of 3d content, there is the melding of multiple presences with the 3d content and the presentation of that content, the shared space to the viewers interacting in the space. Note that a 3d web, that is to say our current web with 3d content, would be a very static place, there isn't much notion of shared presence in the 2d web. -- Zha
If the Capabilities Framework is designed properly, it can allow for static 3d spaces with no user interaction. They won't be much fun, but thats another issue altogether. IMHO 3D Web by itself is doomed but when you tie it into the SL grid it has a chance of not being a total flop. LL is in an interesting position, they could be the next Network Solutions but for the 3D Web. -- Strife Onizuka 23:49, 27 September 2007 (PDT)


ANALYSIS: Region Subdivision as a scaling method

  • A recent AWGroupies discussion examined the issue of sim and region scalability in the direction of increasing the processing power available to regions. One mechanism for achieving scalability that was proposed was region subdivision of statically-bounded region spaces. This section first provides a rough picture of conditions under such scaling, and then examines the problems that exist with that approach which impose inherent limits to its scalability.

Crowd conditions under projected scaling -- a few numbers

  • The scalability for events target for the normal use case has an event scaling factor requirement of 200 (20,000/100) --- this is the factor by which the number of people who wish to attend a maxed-out-sim event today will increase under Zero's total-population projections. (The massively higher viewer numbers possible for the SecondlifeTube use cases are not examined numerically for now, except in discussing event headroom.) Event scalability projections were calculated in the discussion on Project Motivation - Scaling for events.
  • For this event scaling factor requirement of 200, the region subdivision method subdivides today's 65,536 m^2 of land per region into subregions of (65,536/200) = 327.68 m^2 under a policy of equal-size subdivision, which is a square of 18.1m per side and 25.6m diagonal. Each subdivision then holds 100 people, each with a personal space of 3.28 m^2 (a square of 1.81m per side). Thus, this represents a lighter packing density than normal crowd packing in real life (RL), which approaches 1 person per m^2, and hence represents a less onerous scaling than would a true crowd simulation. Visually then, this packing density would provide a normal and untroubling experience for any person used to popular standup venues in RL.
  • Assuming simple rectangular NS/EW subdivision, the maximum rate of sim handover would be experienced every 18.1m of linear travel when aligned with a NS/EW grid, and the minimum rate when travelling diagonally every 25.6m. Determining the rate of handovers is hard without performing a statistical study of typical avatar movement patterns, so this is not attempted here. However, under rectangular spacing, a square subregion containing 100 people has 10 of them along each side bordering on the adjacent subregion, which provides a lower bound on handover for near-zero travel and therefore bears examination.
  • At each such boundary, any outbound movement by anyone from within their subregion causes handover (this is reduced by hysteresis, see below), and what's more it occurs in both directions across a boundary concurrently (by symmetry with the adjacent subregion), within any cross-section of an event crowd.
Therefore the maximum rate of handovers at a single boundary without any contribution from deeper within a subregion is somewhere between zero (nobody moving) and 2*10 = 20 (10 on each side of a boundary moving outbound). Since there are 4 adjacent subregions per subregion, this gives us a maximum of 20*4 = 80 possible handovers at the boundaries of any given subregion, under conditions of zero hysteresis, nobody coming out from deeper within their subregion, nobody getting closer than 1.8m to anyone else, and any movement whatsoever that is greater than zero distance (since hysteresis == 0). The average would be half of this or 40 handovers, since each of those 10 might be moving deeper into their own subregion rather than outbound.
  • Note that this is not a handover rate as such (since movement rates are unknown), but a bounding coefficient governing maximum handover rates for minimum travel. It doesn't establish an actual target for handover rate handling, but just gives us some idea of the magnitudes involved --- in other words, it shows that discussing thousands of handovers per second per subregion is not relevant, but a few tens of handovers might be. It offers an initial suggestion of possible viability.

Potential problems with Region Subdivision

The above scenario gives some idea of operational conditions under the normal (ie. current) use case, which are complicated by the following observations:

Max-scaling conditions apply under very small scaling (density-based subdivision is required)

  • First of all, it needs to be borne in mind that the above conditions do not apply only to the end projection of a 2bn user population, because crowds at events almost always congregate on the primary focus of attention, for example at the stage of a live music event. This would apply right now if regions and clients were scalable, because the top musicians already max out their sims for well-advertised concerts, even if they perform more than once a day. (Check out any [Komuso Tokugawa] concert, for example.) Live music is of course not the only type of event that maxes out sims in SL today, even without considering one-off special events like the recent SL4B celebration that attracted many thousands concurrently but was not able to handle them.
  • Because (i) popular events easily exceed the 100-av subregion regularly today, (ii) will do so even more with each passing day, and (iii) audiences draw together around an attraction, the described subdivision policy of equal-size subdivision is not viable. Instead, we require an equal-density subdivision policy, because without it we will not be able to scale events in the immediate future, let alone far ahead. The equal-size policy was described only to provide a rough idea of the numbers that govern handover densities, but the actual policy to be employed must be based on local participant density or it will not accomplish its goal, because otherwise subregions at the event focus would be massively oversubscribed while outer ones would be empty.
This is clearly more complex from a design and implementation standpoint, but it is inevitable under the region subdivision method otherwise the premise of limiting subregions to handling a maximum of 100 avs is exceeded very early on in the population growth, or even right now.
An alternative to equal-density subdivision that might address the problem is modulus subdivision (the region is sliced into sections each containing rather less than the maximum number a typical CPU can handle, plus one more section to carry the remainder). Geometric land partitioning however is not an option, because agent density is highly non-uniform.
  • In other words, we could not design for equal-size subdivision initially and then evolve to equal-density subdivision, but instead we would require a density-based subdivision right from the start: equal-size subdivision does not accomplish the goal, which is to handle loading from high avatar densities. And that introduces numerous difficulties, examined later.

Hard ceiling on scalability (non-existence of scalability headroom)

  • The 18.1m per side square of the simple example subregion (which is not actually viable because equal-density subdivision would be needed as above, but is nevertheless still useful for analysis) seems at first glance to be a viable extent for a subregion. Unfortunately, this would not be the minimum size of a subregion under the figures of our Project_Motivation. The reasons why subregions would have to be far smaller than 18.1m include the following:
  • Prim count expectations will rise, inevitably. This includes land-placed prims in the event region, but these are not the primary concern, since the biggest prim-related loading is generated by avatar attachments in all event-scaling scenarios. While it is hoped that sculpties will reduce the need for very high-prim attachments, a knowledge of people suggests that the opposite will occur, and that ever-more detailed attachments will increase instead of, in parallel with, or on top of sculptie-based designs. Region loading from prims will therefore be higher than forecasts based only on population growth.
  • Scripting load is highly likely to increase, for three reasons:
  • The number of scripts never decreases, and if the ceiling on land prim counts is raised then script numbers will rise too. More importantly however, scripts numbers in attachments have no user-explicit ceiling, and will be expected to grow not only alongside normal population growth but also because people will want facilities for crowd management.
  • As the population grows, new residents experience the pleasure of LSL programming, so the volume of available scripted objects always heads upwards. One additional factor that adds to the concurrent scripting load is that as the worldwide number of free or cheap scripts increases, the initial expense barrier to event participants running quality scripted products falls away. Since exponential growth of the SL population creates an ever-larger percentage of newcomers compared to well-funded oldies, this effect results in an ever more rapid takeup of scripted objects by residents. The onset of usage of scripted attachments is therefore no longer deferred while they earn some money, but becomes ever more immediate.
  • Mono is coming. This will speed up scripts markedly, and means that more scripts will be runnable in a region for any fixed amount of CPU resource, and therefore inevitably also means that more scripts will be run. It also means that more efficient scripts will access backend assets more rapidly, which compounds loading still further.
Adding these 3 points together suggests that region loading from scripts will therefore be higher than forecasts based only on population growth.
  • The SecondlifeTube use cases are highly likely to become massively popular, potentially becoming the "new TV" of the second decade of this millennium. The impact of these uses cases on server-side scalability is nothing short of ultra scary, and since the client-side changes for these use cases are quite trivial, the pressure will be on for massive server-side upscaling for events.
  • Factoring in the effect of more prims, more scripts, and a future which includes SecondlifeTube-type clients, immediately indicates that the 18x18m minimum-size subregions that we thought might be sufficient for scalability by region subdivision are wholly inappropriate. Indeed, the analysis is out by some orders of magnitude (log10(~millions/20,000)), whereas not even a single order of magnitude improvement is available since 18m/10 is roughly the size of an avatar, and other factors add loading too.
In case the impact of this is not immediately obvious: subregions would have to be smaller than an avatar to scale fully by this method. In other words, the method is completely non-scalable to many use cases of high interest, because region handover makes no sense when regions are smaller than avatars.
  • Or, as another way of putting it, the region subdivision proposal contains an immoveable ceiling or cap on region scalability. What little headroom exists for possible reduction of subregions below 18m per side can never go below the size of an avatar because of handover, so the ceiling cannot be raised further. Simple use cases that are expected to be popular require scalability beyond that.

Adjacent region view non-scalability

  • When a region is subdivided into a large number of small subregions, the 3D view from any given position will require object data to be gathered from all the region servers whose land or objects are visible within the observer's field of view, or possibly more for speculative caching. This process entails a lot of distributed activity. While that is not in itself a problem in the first instance, it gets progressively worse as the subregions become smaller and smaller, since the number then involved in almost any view will then grow larger and larger.
What's more, the intended goal of letting other machines assist by working on their own subregion is lost, because the physical proximity of subregions to other subregions forces additional object viewing traffic on them. The separation of domains which gives tiling its good performance is lost when those domains are so close that they become local. This is a fundamental design flaw in Region Subdivision.
  • The effect is pathological. As subregion size is reduced, the interconnection topology tends towards total connectivity.
  • In other words, region object loading which used to be primarily local now becomes global as regions are subdivided, because 3D viewing distances are not reduced in proportion. This appears to be a recipe for large additional machine loading which wastes CPU and network resources by coupling region land extent to the (desired) distribution of computing.

Increasing grid workload from subdivision (non-scalable internal workload)

  • Region subdivision always increases the overall workload. This stems from the simple observation that, for any given area, area subdivision adds new interaction points within that area.
  • As an example, consider a square of 100 units per side and containing a uniform density of objects (or other reasons for boundary interactions to occur), and assume that each unit of distance also generates a unit of boundary interaction workload:
  • S/1: The original 100x100 square then features (4 * 100) = 400 units of workload, ie. 100 per side.
  • S/2: Splitting the square down the middle results in 2*((2 * 100) + (2 * 50)) = 600 units of workload.
  • S/4: Splitting each half into half again results in 4*((2 * 50) + (2 * 50)) = 800 units of workload.
  • S/8: Splitting each quarter into halves again results in 8*((2 * 50) + (2 * 25)) = 1200 units of workload.
  • S/16: More directly, dividing the original square into 16 results in 16*(4 * 100/(16/4)) = 1600 units of workload.
  • And so on. For every 4-way split, the boundary interaction workload doubles, ie. exponential in 2^(2N).
  • This then is a design for non-scalability, since in the limit it tends towards infinite internal workload as the subdivision tends towards zero size, and since the trend is exponential, it is the exact opposite of a scalable trend.

Hysteresis zone size shrinks to unusable

  • Real systems which employ the concept of handover at region boundaries do not usually trigger handover at the actual boundary lines, because this would generate pathological behaviour for agents sitting exactly on the boundary, moving along the boundary line, or making infinitessimal movements while at the boundary. All of these cases can result in 'handover thrashing, ie. continuous switching back and forth between regions, potentially at very high rates.
  • To avoid this, handover is usually tamed by placing handover bands that add [hysteresis] to handover at boundaries. This can be done in many ways, but the general goal is to introduce delay into the handover process (this ties into normal control theory for damping uncontrolled behaviour through negative feedback with a delay in the feedback loop). The handover band does not necessarily have a physical extent (it's more of a time parameter), but it can be associated with a physical extent if the velocity of an agent perpendicular to the boundary is considered.
  • As the conceptual handover band shrinks in size, hysteresis is reduced, and handover behaviour draws closer to the pathologcal condition that it was designed to avoid.
  • When regions are repeatedly subdivided as in this proposal, retaining existing sim behaviour will require reducing the size of handover bands correspondingly, otherwise in due course handover bands will overlap and the current handover design is then no longer viable since the handover process loses its normal 1:1 mapping. If the handover bands are reduced sufficiently, this then leads to handover thrashing.
  • In the limit of subdivision, handover cannot work at all without continuous thrashing, although in practice the bag of problems is so full by then (eg. regions smaller than avatars) that it hardly matters.
  • In summary, the concept of handover by region adjacency becomes progressively less manageable and usable as subregions shrink in size, and would need to be replaced by some form of non-adjacent region handover in order to retain the important principle of hysteresis.

Adjacency-based power contribution is flawed

  • Region Subdivision is an attempt to retain static mapping between land and servers while sharing the workload of highly loaded adjacent regions. In other words, load sharing requires land sharing (through subdivision), in this model. This constraint on load sharing is deeply flawed, as a very simple example illustrates:
  • Imagine a grid split down the middle, in which all servers in the right half are 100% idle, and all servers in the left half are 100% busy.
  • Now replace the server right in the middle of the left half by one that is running at 80% CPU capacity, that knows that a major event is scheduled in 5 minutes' time, and that is desperate for help to avoid total collapse.
  • None of the adjacent servers in the left half can help, as they have no spare capacity. None of the extremely bored servers on the right half can help because their land is not adjacent to that of the ailing server.
  • In 5 minutes' time, that server collapses.
  • While this is not a real scenario, it illustrates well that the concept of land adjacency is not helpful for power sharing. In real scenarios, exactly the same considerations apply: the ability of a server to accept a share of a region's workload should have nothing at all to do with land adjacency, extent, or location. Land should be a virtual concept, and not tied to the CPU resourcing model.

Dynamic subregioning vs stateless access to virtual regions

  • As a little analysis revealed above, some key aspects of Region Subdivision have turned out to require dynamic handling:
  • The actual subdivision policy cannot be equal-size subdivision, otherwise workload sharing is not effective and some subregions suffer overload beyond viable limits while others in the partitioning are relatively idle. Instead, a policy of equal-density subdivision is required, and this is a dynamic policy.
  • Hysteresis for handover stability requires handover band reduction proportional to subregion shrinkage to retain 1:1 handovers, or else requires that the handover mechanism be replaced entirely by one that is not based on subregion adjacency. Such handling becomes highly dynamic.
  • As subregions reduce in size to cope with increased region load, each contributing CPU is required to handle object fanout for more and more viewers because 3D views do not shrink in step with subregion reduction. Consequently subregion shrinkage tends towards serving the needs of everyone in the locality, which is the same goal that Virtualization of regions achieves without any handover overhead.
  • When the design of region mechanisms starts to embrace changing sizes and adaptive handover systems, it is no longer a simple statically tiled grid architecture. Instead, it is halfway to a dynamic architecture in which region parameters are entirely virtualized, but instead of benefitting from that new freedom, it retains the disadvantages of the old design and introduces many new problems, as highlighted in the analysis.
  • This tends to suggest that, if the static grid functionality is being redesigned to have dynamic features, then those dynamic features should operate independently of those static constraints, in order not to introduce new disadvantages by distorting the old model.
  • The latter is the approach suggested in Virtualization of regions, which works alongside the old static resource mapping, merely offering any unused local resources to all other participants in a parallel dynamic infrastructure.
  • It is worth noting that virtualizing regions inherently results in stateless communication with the servers that perform operations on virtual regions, and stateless communication is a central tennet of REST. The principle of virtualizing regions is therefore inherently consistent with massively scalable HTTP-based loadbalanced access to virtual resources, of which a virtual region is an archetypal case.

Scalability vs scaling

  • When the proposal to use Region Subdivision was described and discussed in the AWGroupies discussion, the impact of the scary numbers of Project_Motivation on the proposal was not considered, and was in fact dismissed, on the grounds that scaling would occur in an evolutionary manner and therefore only a smaller number would initially be relevant. This is flawed reasoning, because it confuses scalability with degree of scaling.
  • If the goal is to design a system that can handle (say) a population of N users, then in general there will be any number of different designs that can fulfil that requirement. These designs will typically differ in their scalability: one may support only N users and no more, one may be able to handle some small multiple of N, and another may be able to support a vastly larger population. However, because they all fulfil the short term goal of supporting N users, these differences are not apparent if the only design case that has been considered is the scaling to N. If a system implements a design that is viable only up to N users, then if that system needs to be scaled up further, the only way forward will entail a redesign, and if that new design is not evolutionary from the earlier one then this will also entail a reimplementation. This is clearly not satisfactory if the envisaged population is hugely greater than N right from the start, even if N is the initial target.
  • In respect of the Region Subdivision proposal then, it is not sound scalability analysis to ignore the scary numbers of Project_Motivation just because some smaller number is envisaged as a short term scaling target (a specific implementation). All candidate designs need to be assessed against the scary numbers to ensure that they remain viable at the projected limits of scalability, regardless of the magnitude of short term physical scaling targets. Without that, gradual evolution is not assured.

Illusion of scalability (abstracting resources into clouds)

  • First, a general observation on allegedly designing for scalability without actually doing so (also known as designing with clouds):
You cannot put a scalable, massively parallelized HTTP access mechanism on the front of a non-scalable resource and magically claim that you have a scalable resource. All you really have is a scalable access mechanism. If the resource is non-scalable, then the scalable access mechanism is wasted. This means that if we put the resource in a cloud labelled "Scalability to be left to implementors" then we have not actually achieved resource scalability. We have not even produced a design that makes the resource scalable, let alone actually achieved scaling. We *know* that stateless HTTP front ends and load balancers give us a scalable access mechanism --- it's been normal industry practice for 10-15 years now (the author scaled a national ISP from 64k users to 2+ million over 6 years using that approach, and it was severely limited in scope). That's not the hard task that the AWG is tackling, it's the easy part of the hard task.
So, if your project goal is to design scalable resources, you have to do exactly that. Anything else is self-delusion.
  • Now to place this in the current context: it has been suggested that we might abstract scalability of regions by placing the internal design of regions inside a design cloud, and to claim scalability because we have placed a scalable HTTP-based access method in front of that cloud. That suggestion is not designing for region scalability, it is brushing the whole issue under the carpet. It does not achieve scalability of regions in the current SL, nor in the projected SL, neither in design nor in implementation. It is simply ignoring the issue of region scalability entirely.
Given that scalability for events is needed extremely badly right now, and that the nil scalability for events of the current grid has been causing intense dissatisfaction among event participants for far longer than the limits along any other dimension of scalability (well over 3 years), ignoring this element of scalability while pretending that it is addressed is not a attitude of responsibility towards SL residents. It should not be even considered. (This paragraph is not technical, but is relevant in the sense that design engineering and the work of the AWG ultimately has a social purpose.)
  • We are not designing for a future SL in which 99.5% to 99.995% of the residents who wish to attend a given event have to stay at home --- see Talk:Project_Motivation. Region scalability for events is a primary goal, as well as an urgent requirement.

Summary of observations

  • The Region Subdivision approach suffers from several difficulties and restrictions which will result in high overheads if scaled even to the lower reaches of the numbers offered in Project_Motivation, and high overheads generally reduce actual ability to scale to far less than predicted figures. Even more worrying though is that the proposed approach contains an inherent ceiling on achievable scalability (let alone actual scaling) for predictably popular use cases which are already implemented in other systems.
  • Given that such difficulties are already identified at this early design stage, Region Subdivision would be a poor design choice for a new architecture.


"This concludes the results of the Norwegion Jury." Not quite nil points, but close.


ANALYSIS: Per-resident subdivision of the Grid as a scaling method

Observation:

Any SL scalability study aims at ramping up the processing and network capacity of a given location in Second Life with the number of avatars present at this location in the most linear fashion attainable: the logical solution is to tie this performance to the avatars and not the land. Because any given object or avatar can only be at one place at any given time, it also makes sense that avatars and objects belonging to a given player be simulated by the same server. Because each resident's grid usage is roughly comparable to another's, distributing the load per resident also makes sense. Because the load induced by a given resident is generally not expected to jump up and down widely in the short term, even an initially static distribution of server performance across players is a good start. Because a resident's performance usage in the grid is highly dependent on the number of inworld objects they possess, ramping up the performance alloted to a given player according to their land possession / prim allotment is a good solution, and is also fair considering landowners pay for the SL servers in the first place.

Principle:

Instead of subdivising the simulation load geographically among static regions, subdivise this load across the owners of the avatars and objects. Your avatar and all your prims, the physics and scripts of your objects, etc. are simulated and served by your one allocated simulator gridwide, regardless of where they are in the Grid. An avatar UUID is paired to a simulator, much like a web domain name is tied to an IP address through DNS.
A client's viewer or an object is then understood as a presence, encoded into the ownerkey / avatar UUID, and simulators only have to keep track of this presence on your parcel, which is a lot less work than receiving data representing the session of the visiting avatar, their attachments, states of the running scripts inside, etc. When your avatar moves and comes into view of a new parcel, your viewer gets the ID of the landowner, which in turn indicates which simulator (the "local simulator") runs that person's content. Your simulator connects to this server, provided this parcel is accessible for you. Then this "local" simulator provides your own simulator with a presence list, which is nothing more than a list of the ownerkeys of content or avatars that are present on this parcel. Your viewer opens a connection for each ownerkey/simulator on this list, and they then send you relevant information about the content they each run, that is in view for you. This distributes the load between multiple simulators. Your viewer simply aggregates all this information incoming from multiple sources in a single 3D renderview.
  • Each resident UID becomes tied, possibly dynamically, to a share of a specific server's processing and network capacity. Basically, the sqm of SL land becomes the unit for measuring raw grid simulating power, alloted on a per-person basis.
  • A mechanism for coupling UUIDs of land/content/avatar owners with their simulator's address is required, and replaces the current coupling of region and sim address.
  • Such a mechanism (a form of DNS) could include some load-balancing method. Having multiple simulators running a given resident's content could help scalability, too.
  • This multiplies the number of concurrent simulators serving the local content (especially avatars and attachments of the many visitors), managing physics, running scripts, etc. in crowded locations, distributing much of the supplementary load induced by the crowd's presence.
  • This also makes all of one's content gridwide easily accessible from one server, wherever it is inworld, just like the inventory allows to do with your non-rezzed content.
  • Object and avatar entry is controlled by the local simulator reporting or not reporting the presence of objects and avatars entering from the outside.
  • This means people and objects can move anywhere freely, without restraint, but they can be viewed as "being there" and interact with things and people present there only if the local land owner allows it. No more bumping into red lines, but in exchange you don't see any of the people and content that are on the land you're not allowed in, and no one there can see you or any of your objects, nor interact with you. That's the "ghost snub" way of access restriction, in a sense.
  • Basically, one can fly anywhere uninterrupted in the Grid, including empty ocean, and the landmass and sea look seamless, but parcels that are not accessible appear entirely void of any avatar or content, your viewer not even receiving any data for them. This is an appreciable improvement in both privacy and travel ability.
  • The land topology, parcel information, wind, sun, clouds, etc. is seperated from the "regular job" of simulating content and can be transferred to yet another cluster of servers/databases.
  • I figure the land textures would either be served by this "Land domain" ; alternatively, if serving land textures is kept as a job for the sim, viewers would display the default "grey" texture on the ground while it attempts to load it.
  • It would allow per-parcel ground texture customisation.
  • It means the landmass can vastly outgrow the "inhabited" zones of SL: land with no content or owner can be added independantly, coasts and plains and oceans can be added transparently between the islands and continents. Because such a content is extremely static, it can be neatly cached for long periods of time, maybe even downloaded as a standalone addition.
  • However, no owner for the land means no simulation, so you cannot meet people if you're there, you're lost in your own empty alternate dimension and not connected to the grid except for fetching heightmap, sun, wind and clouds information.
  • This allows for non-uniform server types for simulators, that can be just evaluated by total sqm capacity, and whose "slices" of such capacity get allocated among SL content owners according to their gridwide total land possessions.
  • Employing more than one full server worth of simulation capacity would require a mechanism for balancing the load between the same-owner simulators.
  • Such a mechanism already exists: geographic proximity.
  • A resident's parcels could be joined across different regions, fusing prim capacity and total visitor capacity gridwide.
  • They would only fuse up to a full server worth.
  • Right now a single server (machine) runs four standard regions, a small but appreciable improvement potential. Also, if a sim can be occasionnally transferred from one server to another, and considering the possible non-uniformity of servers mentionned above, any sufficiently big land acquisition could get your personal sim transferred to a bigger server that would fit better. So the "full server" limit could scale, too. A dynamic mechanism for moving the "slabs of sqm" to properly sized servers would allow some load-balancing in advance.
  • How would a viewer know which of multiple servers running a single person's land or content (in the cass of over a full server's worth of sqm) to connect to ?
  • One solution is that each of the multiple servers run all the land and content (memory overhead ?) and viewers that connect are distributed among them (amounts to sharding). Another is to have a mechanism that forwards a viewer from one of those sims to the relevant one based on geographic proximity (preferred).
  • This rids us of the region crossing lag, since the physics of your avatar stay tied to your own simulator which does not change for the whole session.
  • However this method requires some serious thought on collision handling, which still has to be done by the simulator running the parcel where the avatar or physical object is located. Either physics remain simulated entirely by the local simulator, and scalability using this method is significantly reduced ; or collisions can be handled with a small delay and be disabled automatically at the location if the number of physical objects and avatars becomes too high ; or collision info (relevant boundary boxes) can be exchanged between simulators, and possibly be cached. This last method is preferred, it would come as a replacement for the transfers of object and avatar data that currently occur upon region-crossing.
  • A lazier, simpler approach could prove sufficient: have your own simulator handle collisions for your avatar and objects in a best effort kind of way, assuming no collision unless it receives boundary box info from the local and present simulators, and detects a collision. Each sim would be responsible for respecting collisions as much as they can and not care for the behaviour of other sims in this regard, save for providing boundary boxes info when asked.
  • Your own simulator still manages the presence of content and avatars on your parcel(s), though this function is radically simplified by only having to keep track of single ownerkeys and simulators associated to them, instead of whole individual objects and avatar sessions.
  • That's one simulator per resident instead of one per region. Let's consider overhead waste ?
  • How many of the current millions of residents registered have land and inworld content that needs a simulator ? This is a decisive question.
  • Empty land frees processing power from simulators, which helps directly with heavy-loaded places. Additionnally, the non-geographic allocation of simulator capacity helps spread the average load across the entire fleet of servers, because new users' content can be tied in priority to less-used servers. The less content and visitors in one place, the less processing power spent on it, down to the bare minimum of managing an empty presence list.
  • The performance impact of basic account residents can be better controlled by LindenLab, as their simulator usage becomes seperated from premium account residents - an usage (and cost) that can be measured and regulated directly by the number and capacity of simulators dedicated to them. This also allows fine balancing between the two populations' Grid usage. Same with the nationality of players: grid servers can be better distributed across the world.
  • I'd say one simulator for basic accounts for each simulator dedicated to premium accounts (logical servers, not physical ones) could be a good starting basis.
  • Public land is still possible, as one can allocate a share of their sqm to run content from other people, be they anonymous visitors or specific residents of their choice.
  • This implies I'd have to choose, when creating a prim or rezzing something on such land that allows building, between counting this prim/object against my own sqm or that of the local landowner. That means an additional mechanism to implement.
  • Empty land does not free much processing power in this scheme, actually: with one sim per resident that has land and content, we could have a hundred sims running concurrently per server. One crowded place would benefit only from unused performance on this hundred "smaller" sims, at most. On machines with only low-usage sims running, there is still waste.
  • This is why a dynamic pool of allotable sim capacity is to be preferred.
  • This requires a mechanism for scaling up or down a whole simulator process (max prims, max physicals and max running scripts, mainly). Maybe at startup of the sim process, though it means land buying would require a personal sim reboot ? Memory overhead ?
  • Let's think differently: one simulator software running per machine, but capable of opening or closing or transferring single VMs corresponding to a given resident, much like it is capable today of running many scripts, stopping, starting, throttling and transferring them. This makes sense for mass distributed processing.
  • This method makes it possible to filter content by ownerkey from your viewer, so the avatar and/or his objects don't even render on your client nor use up your bandwidth.
  • Chat, inventory transfers and IMs can be decentralized the same way (per owner), and filtered the same way too, at the sim level.
  • Login could be simplified: only one connection to your alloted simulator is required to access the Grid. Because residents are distributed across all the simulators, spreading the load, login can be decentralized, with login info stored in a backend database accessible only by the sims (and possibly cached for a few hours ?).
  • Script runtime load is more just, since your scripts only take up your own land's processing power.
  • This method breaks region coordinates for scripts, unless the llGetPos() function (and its equivalent using PrimParams) is modified to apply (modulo 256) to the global coordinates before returning them. The llGetRegionCorner() function would just have to return clamped global coordinates.
  • Sim crashes would affect residents in a different way: instead of everyone being logged out, only the person(s) whose sim crashed would be logged out, their land (or some of it, up to a single region) would become inaccessible and their objects would disappear from the whole grid during the outage. Having an alt would allow some redundancy.
  • Movement information from the viewer would have to be replicated for each simulator the viewer or the is connected to.
  • That's true, although this replicated information does not have to be kept as up to date as possible. A given simulator only needs to keep track of viewers it is connected to (for maintaining a presence list), which is simply done by checking whether the connection is still up or not. Your avatar's movement is handled by your alloted simulator, so a high frequency update is important in this case (I think the current frequency for this is 20Hz). Your viewer maintains the connections to the "non-local" simulators according to the local presence list, and receives updates for content that is in view based on your position, in this case a much lower update frequency is sufficient.
  • Alternatively, your dedicated simulator can handle this replication for you and keep the simulators of the presence list informed of your avatar's position, so they can send relevant updates to your viewer. The same problem and solutions apply to travelling objects instead of avatars.
  • In practice there are very few different owners for the content present on most parcels of land in SL, but crowded areas would have a very long presence list. Instead of cutting off people from a crowded place (or the whole region), the viewers themselves would simply max out on the number of different simulators they manage connections with: instead of having 100 avatars in the same place at most, each of them lagging the same and capable of viewing each other, there could be thousands of avatars piled up, but each of them would only see a fraction of all the persons present and it would look like each of them is lagging in variable amounts. This is an interesting effect, a sort of automatic sharding of crowded space, in a decentralized way.
  • The ultimate aim of this method of subdivision is to allow players to host and serve their own content easily. Once a simulator is tied only to a resident, there is no security issues if they run it on their own computer instead of in the Grid, decentralizing the Grid completely except for the land heightmaps, clouds, sun and parcel info: the Grid could still remain one.
  • One big issue here: scripts. If they run on a third-party machine, they're as good as public-domain. Same with restricted textures, no-mod objects, etc...
  • I personnally think copyright is an abuse of property rights with no coherent justification, that once you divulge some information to someone, it is theirs just as well as it is yours, so I'd view this kind of objection as a non-issue. However one possible compromise with the copyright crowd is that a sim process may not be "outsourced" if it contains copyrighted data. This should stamp out any potential legal issue.
  • Problem: the provision of parcel information (who owns which parcel of land somewhere) is still done centrally. Although it is a very light and static load likely to be cached and scaled up, it does not scale the same way as the rest of the proposed model. It may not be a problem with 2000 concurrent avatars but may prove so with a couple orders of magnitude more.
  • True, but this provision of parcel information can be decentralized as well, either by being region-subdivised as it is currently in SL, or by being integrated into the per-resident sims. In the latter, the parcel info is distributed across the simulators - each simulator retaining the parcel info of its N immediate neighbours - and obtained through a distributed protocol like Tapestry.
  • And in the case of conlficts between multiple simulators insisting that a given parcel of land is theirs ?
  • Existing distributed services like AnoNet solve this by allowing the local simulator (the onlooking viewer's) make an authoritative choice at its own level. This takes care of malicious injection of false parcel information, and even lets multiple persons own the same location, but in different, partially exclusive interpretations of the Grid (one could call these "parallel universes"). In simpler words, you can make it look like you evicted your neighbour in your own client but cannot impose this view on anyone else.
  • What about heightmaps ?
  • One solution for this would be to have a fixed global heightmap source, whatever form it takes (preferably a static download with each client software). On top of this local land simulators could apply heightmap offsets of their own to override it, for terraforming. On occasions existing overrides for the heightmaps could be accrued into the common global source, though it would not necessarily save bandwidth. The world map can then be extended by adding heightmaps for further regions at anytime.
  • Another solution is simply to have the simulators stream their own heightmaps, and assume "out of world" or empty ocean for where the client has no such data. But then it makes it impossible to have a landmass where no one owns the land. This corresponds to what we currently have in SL.
  • It is possible to seperate the region-related load and agent-related load, as in the LL proposal with distinct Agent Domain and Region Domain. As an aside it allows gracefully handling the difference between basic and premium account residents, as well as limiting the number of concurrent simulators running, down to only the number of residents who hold land - so that it might be much more workable. This implies seperating Grid machines in two groups, for running simulators on one hand and running agent sessions on the other. Preferably, the agent ones would run in the clients and profit from the massive computer power that residents can contribute to running the Grid. This means identity would not be certified, just as an IP address does not certify a web browser's origin, but that's fine because the content would be safe in the region servers.
  • This means that asset servers are no more autoritative, and instead the sims take precedence over them: the asset databases then only serve as backups and not sources, while the sims are no more caches, but the real thing.
  • It also solves the problem of knowing on which server to run content newly created: on the individual sim if it has spare running capacity (prim allotment, script capacity, etc.) or on the local land sim if there's no land, spare space and if the local land allows it. That means a landless resident would host his or her content on a specific person's sim, on an individual basis. A mechanism for allotting a given sim space to a specific resident might be needed.

A 'per-resident sim' as a finite state automaton ?

"one simulator software running per machine, but capable of opening or closing or transferring single VMs corresponding to a given resident"

That's one more trail of thought to pursue: how far can sim processes be parallelized ? The per-resident subdivision of the Grid might allow for a considerable independance of sim processes, that could in turn make it possible to run simulators as interchangeable instances. Thoughts, suggestions ?

Customer processing power contribution in exchange for sqm

One easy way to solve the problem of landless residents having no real simulator of their own or parasiting other people's sqm allotments: allow them to contribute their computer's processing power and net access bandwidth to the Grid, and earn sqm in return, so they can rez objects and run scripts. Make it possible to resell or rent those sqm, and the Grid will be self-growing.

ANALYSIS: Scalability through reverse proxies: the paravirtual grid

This section is now located on its own page: AWG:Scalability_through_reverse_proxies:_the_paravirtual_grid