AWG Scalability through per-resident subdivision of the Grid

From Second Life Wiki
Jump to navigation Jump to search

Observation

Any SL scalability study aims at ramping up the processing and network capacity of a given location in Second Life with the number of avatars present at this location in the most linear fashion attainable: the logical solution is to tie this performance to the avatars and not the land. Because any given object or avatar can only be at one place at any given time, it also makes sense that avatars and objects belonging to a given player be simulated by the same server. Because each resident's grid usage is roughly comparable to another's, distributing the load per resident also makes sense. Because the load induced by a given resident is generally not expected to jump up and down widely in the short term, even an initially static distribution of server performance across players is a good start. Because a resident's performance usage in the grid is highly dependent on the number of inworld objects they possess, ramping up the performance alloted to a given player according to their land possession / prim allotment is a good solution, and is also fair considering landowners pay for the SL servers in the first place.

Principle:

Instead of subdivising the simulation load geographically among static regions, subdivise this load across the owners of the avatars and objects. Your avatar and all your prims, the physics and scripts of your objects, etc. are simulated and served by your one allocated simulator gridwide, regardless of where they are in the Grid. An avatar UUID is paired to a simulator, much like a web domain name is tied to an IP address through DNS.
A client's viewer or a task is then understood as a presence, encoded into the ownerkey / avatar UUID, and simulators only have to keep track of this presence on your parcel, which is a lot less work than receiving data representing the session of the visiting avatar, their attachments, states of the running scripts inside, etc. When your avatar moves and comes into view of a new parcel, your viewer gets the ID of the landowner, which in turn indicates which simulator (the "local simulator") runs that person's content. Your simulator connects to this server, provided this parcel is accessible for you. Then this "local" simulator provides your own simulator with a presence list, which is nothing more than a list of the ownerkeys of content or avatars that are present on this parcel. Your viewer opens a connection for each ownerkey/simulator on this list, and they then send you relevant information about the content they each run, that is in view for you. This distributes the load between multiple simulators. Your viewer simply aggregates all this information incoming from multiple sources in a single 3D renderview.

Illustration: Jesrad Seraph meets, and chats with, Mustbe Barmy at some third-party's store, while receiving IMs from James Samiam. PRS-Client-view.png Purple, then Green, then Blue

As can be seen above, the client receives content from three different simulators instead of just one, and each of those simulators bears a load that varies only to a small extent wherever the agent goes.

In this scheme, concurrency is encoded into the state of connections between simulators and clients: an open connection is an indicator of local concurrency for both sides, whereas an absence of connection indicates there is no concurrency so far between the entities. This is useful for scalability of things such as chat, because then the work of determining distances and limits is already pre-calculated to a great extent.

Workflow of basic operations under Per-Resident Subdivision: apart from the distribution of the simulation load between multiple simulators, these operations follow steps similar to those of the current Grid. PRS-Flow.png

Quick analysis:

  • Pros:
  • SL scales much better
  • Your inworld content is accessible gridwide just like your inventory
  • No more region crossing lag, because the whole Grid is your region
  • You can make someone and all their belongings disappear entirely from your Second Life, and not waste a kilobyte of bandwidth or disk space on them anymore
  • You can at long last enjoy perfect privacy in SL, even on main continents
  • You can travel freely anywhere, even off world, or over (or beneath) the oceans
  • No more red banlines, the places you are not allowed in simply appear to not be there at all, save for the bare ground
  • No more autoreturn, your stuff can stay forever anywhere as long as you have the corresponding land somewhere in the Grid ; and no more need for autoreturn because your stuff can be "disappeared" from just the view of other people with a single click from them, and does not enter anyone else's prim count
  • No more need for "no-script" zones, since your scripts run on your land and not the host's
  • Your prim count is global, it stays valid anywhere in the Grid
  • You only lag as little as you can afford, being in a crowded place makes you lag more but you can cap that effect
  • Makes it possible, with some more work, to decentralize the Grid entirely and host your own or other people's content
  • You may customize your ground textures per parcel
  • You don't collide with non-rezzed stuff anymore
  • Login can be made faster
  • The Grid, or part of the Grid, can follow a pure P2P model by integrating the individual sim and session into the client
  • Cons:
  • You have a limit on the number of running scripts
  • In crowded places you may not be able to see your present friends at all if you max out your client's capacity
  • Stuff may delay more before it starts rezzing
  • Shooting (projectiles) may not work as well or at all, unless you're shooting your own stuff*
  • Sensors may be a bit slower*
  • Collision events may be a bit slower, or broken for static objects, except when triggered by your own objects*

* unless the interacting owners have sims that run on the same machine or on closely-networked machines

Brainstorming:

  • Each resident UID becomes tied, possibly dynamically, to a share of a specific server's processing and network capacity. Basically, the prim allotment/capacity becomes the unit for measuring raw grid simulating power, attributed on a per-person basis.
  • A mechanism for coupling UUIDs of land/content/avatar owners with their simulator's address is required, and replaces the current coupling of region and sim address.
  • Such a mechanism (a form of DNS) could include some load-balancing method. Having multiple simulators running a given resident's content could help scalability, too.
  • This multiplies the number of concurrent simulators serving the local content (especially avatars and attachments of the many visitors), managing physics, running scripts, etc. in crowded locations, distributing much of the supplementary load induced by the crowd's presence.
  • This also makes all of one's content gridwide easily accessible from one server, wherever it is inworld, just like the inventory allows to do with your non-rezzed content.
  • Object and avatar entry is controlled by the local simulator reporting or not reporting the presence of objects and avatars entering from the outside.
  • This means people and objects can move anywhere freely, without restraint, but they can be viewed as "being there" and interact with things and people present there only if the local land owner allows it. No more bumping into red lines, but in exchange you don't see any of the people and content that are on the land you're not allowed in, and no one there can see you or any of your objects, nor interact with you. That's the "ghost snub" way of access restriction, in a sense.
  • Basically, one can fly anywhere uninterrupted in the Grid, including empty ocean, and the landmass and sea look seamless, but parcels that are not accessible appear entirely void of any avatar or content, your viewer not even receiving any data for them. This is an appreciable improvement in both privacy and travel ability.
  • The land topology, parcel information, wind, sun, clouds, etc. is seperated from the "regular job" of simulating content and can be transferred to yet another cluster of servers/databases.
  • I figure the land textures would either be served by this "Land domain" ; alternatively, if serving land textures is kept as a job for the sim, viewers would display the default "grey" texture on the ground while it attempts to load it.
  • It would allow per-parcel ground texture customisation.
  • It means the landmass can vastly outgrow the "inhabited" zones of SL: land with no content or owner can be added independantly, coasts and plains and oceans can be added transparently between the islands and continents. Because such a content is extremely static, it can be neatly cached for long periods of time, maybe even downloaded as a standalone addition.
  • However, no owner for the land means no simulation, so you cannot meet people if you're there, you're lost in your own empty alternate dimension and not connected to the grid except for fetching heightmap, sun, wind and clouds information.
  • This allows for non-uniform server types for simulators, that can be just evaluated by total prim capacity, and whose "slices" of such capacity get allocated among SL content owners according to their gridwide total land possessions.
  • Employing more than one full server worth of simulation capacity would require a mechanism for balancing the load between the same-owner simulators.
  • Such a mechanism already exists: geographic proximity.
  • A resident's parcels could be joined across different regions, fusing prim capacity and total visitor capacity gridwide.
  • They would only fuse up to a full server worth.
  • Right now a single server (machine) runs four standard regions, a small but appreciable improvement potential. Also, if a sim can be occasionnally transferred from one server to another, and considering the possible non-uniformity of servers mentionned above, any sufficiently big land acquisition could get your personal sim transferred to a bigger server that would fit better. So the "full server" limit could scale, too. A dynamic mechanism for moving the "slabs of prim allotment" to properly sized servers would allow some load-balancing in advance.
  • How would a viewer know which of multiple servers running a single person's land or content (in the case of more than a full server's worth of prims) to connect to ?
  • One solution is that each of the multiple servers run all the land and content (memory overhead ?) and viewers that connect are distributed among them (amounts to sharding). Another is to have a mechanism that forwards a viewer from one of those sims to the relevant one based on geographic proximity (preferred).
  • This rids us of the region crossing lag, since the physics of your avatar stay tied to your own simulator which does not change for the whole session.
  • However this method requires some serious thought on collision handling, which still has to be done by the simulator running the parcel where the avatar or physical object is located. Either physics remain simulated entirely by the local simulator, and scalability using this method is significantly reduced ; or collisions can be handled with a small delay and be disabled automatically at the location if the number of physical objects and avatars becomes too high ; or collision info (relevant boundary boxes) can be exchanged between simulators, and possibly be cached. This last method is preferred, it would come as a replacement for the transfers of object and avatar data that currently occur upon region-crossing.
  • A lazier, simpler approach could prove sufficient: have your own simulator handle collisions for your avatar and objects in a best effort kind of way, assuming no collision unless it receives boundary box info from the local and present simulators, and detects a collision. Each sim would be responsible for respecting collisions as much as they can and not care for the behaviour of other sims in this regard, save for providing boundary boxes info when asked.
  • Sensors would be problematic.
  • In this scheme sensors are replaced by the presence list management mechanism, or more accurately, part of the detection is done by the client itself by its requesting connections to present simulators (for avatar/agent detection) and the other part is done by the simulator by its selecting of which agent to inform of content updates.
  • Better yet: the land object may simply become just another form of prim, and the topology service could concentrate on pure volume or area management. This way terraforming would become more like building (more possibilities, especially in the domain of texturing - and think about what scripting the actual land could allow).
  • Your own simulator still manages the presence of content and avatars on your parcel(s), though this function is radically simplified by only having to keep track of single ownerkeys and simulators associated to them, instead of whole individual objects and avatar sessions.
  • That's one simulator per resident instead of one per region. Let's consider overhead waste ?
  • How many of the current millions of residents registered have land and inworld content that needs a simulator ? This is a decisive question.
  • Empty land frees processing power from simulators, which helps directly with heavy-loaded places. Additionnally, the non-geographic allocation of simulator capacity helps spread the average load across the entire fleet of servers, because new users' content can be tied in priority to less-used servers. The less content and visitors in one place, the less processing power spent on it, down to the bare minimum of managing an empty presence list.
  • The performance impact of basic account residents can be better controlled by LindenLab, as their simulator usage becomes seperated from premium account residents - an usage (and cost) that can be measured and regulated directly by the number and capacity of simulators dedicated to them. This also allows fine balancing between the two populations' Grid usage. Same with the nationality of players: grid servers can be better distributed across the world.
  • I'd say one simulator for basic accounts for each simulator dedicated to premium accounts (logical servers, not physical ones) could be a good starting basis.
  • Public land is still possible, as one can allocate a share of their prim count to run content from other people, be they anonymous visitors or specific residents of their choice.
  • This implies I'd have to choose, when creating a prim or rezzing something on such land that allows building, between counting this prim/object against my own prim capacity or that of the local landowner. That means an additional mechanism to implement.
  • Empty land does not free much processing power in this scheme, actually: with one sim per resident that has land and content, we could have a hundred sims running concurrently per server. One crowded place would benefit only from unused performance on this hundred "smaller" sims, at most. On machines with only low-usage sims running, there is still waste.
  • This is why a dynamic pool of allotable sim capacity is to be preferred.
  • This requires a mechanism for scaling up or down a whole simulator process (max prims, max physicals and max running scripts, mainly). Maybe at startup of the sim process, though it means land buying would require a personal sim reboot ? Memory overhead ?
  • Let's think differently: one simulator software running per machine, but capable of opening or closing or transferring single VMs corresponding to a given resident, much like it is capable today of running many scripts, stopping, starting, throttling and transferring them. This makes sense for mass distributed processing.
  • This method makes it possible to filter content by ownerkey from your viewer, so the avatar and/or his objects don't even render on your client nor use up your bandwidth.
  • Chat, inventory transfers and IMs can be decentralized the same way (per owner), and filtered the same way too, at the sim level.
  • Login could be simplified: only one connection to your alloted simulator is required to access the Grid. Because residents are distributed across all the simulators, spreading the load, login can be decentralized, with login info stored in a backend database accessible only by the sims (and possibly cached for a few hours ?).
  • Script runtime load is more just, since your scripts only take up your own land's processing power.
  • This method breaks region coordinates for scripts, unless the llGetPos() function (and its equivalent using PrimParams) is modified to apply (modulo 256) to the global coordinates before returning them. The llGetRegionCorner() function would just have to return clamped global coordinates.
  • Sim crashes would affect residents in a different way: instead of everyone being logged out, only the person(s) whose sim crashed would be logged out, their land (or some of it, up to a single region) would become inaccessible and their objects would disappear from the whole grid during the outage. Having an alt would allow some redundancy.
  • Movement information from the viewer would have to be replicated for each simulator the viewer or the is connected to.
  • That's true, although this replicated information does not have to be kept as up to date as possible. A given simulator only needs to keep track of viewers it is connected to (for maintaining a presence list), which is simply done by checking whether the connection is still up or not. Your avatar's movement is handled by your alloted simulator, so a high frequency update is important in this case (I think the current frequency for this is 20Hz). Your viewer maintains the connections to the "non-local" simulators according to the local presence list, and receives updates for content that is in view based on your position, in this case a much lower update frequency is sufficient.
  • Alternatively, your dedicated simulator can handle this replication for you and keep the simulators of the presence list informed of your avatar's position, so they can send relevant updates to your viewer. The same problem and solutions apply to travelling objects instead of avatars.
  • In practice there are very few different owners for the content present on most parcels of land in SL, but crowded areas would have a very long presence list. Instead of cutting off people from a crowded place (or the whole region), the viewers themselves would simply max out on the number of different simulators they manage connections with: instead of having 100 avatars in the same place at most, each of them lagging the same and capable of viewing each other, there could be thousands of avatars piled up, but each of them would only see a fraction of all the persons present and it would look like each of them is lagging in variable amounts. This is an interesting effect, a sort of automatic sharding of crowded space, in a decentralized way.
  • The ultimate aim of this method of subdivision is to allow players to host and serve their own content easily. Once a simulator is tied only to a resident, there is no security issues if they run it on their own computer instead of in the Grid, decentralizing the Grid completely except for the land heightmaps, clouds, sun and parcel info: the Grid could still remain one.
  • One big issue here: scripts. If they run on a third-party machine, they're as good as public-domain. Same with restricted textures, no-mod objects, etc...
  • I personnally think copyright is an abuse of property rights with no coherent justification, that once you divulge some information to someone, it is theirs just as well as it is yours, so I'd view this kind of objection as a non-issue. However one possible compromise with the copyright crowd is that a sim process may not be "outsourced" if it contains copyrighted data. This should stamp out any potential legal issue.
  • Problem: the provision of parcel information (who owns which parcel of land somewhere) is still done centrally. Although it is a very light and static load likely to be cached and scaled up, it does not scale the same way as the rest of the proposed model. It may not be a problem with 2000 concurrent avatars but may prove so with a couple orders of magnitude more.
  • True, but this provision of parcel information can be decentralized as well, either by being region-subdivised as it is currently in SL, or by being integrated into the per-resident sims. In the latter, the parcel info is distributed across the simulators - each simulator retaining the parcel info of its N immediate neighbours - and obtained through a distributed protocol like Tapestry.
  • And in the case of conlficts between multiple simulators insisting that a given parcel of land is theirs ?
  • Existing distributed services like AnoNet solve this by allowing the local simulator (the onlooking viewer's) make an authoritative choice at its own level. This takes care of malicious injection of false parcel information, and even lets multiple persons own the same location, but in different, partially exclusive interpretations of the Grid (one could call these "parallel universes"). In simpler words, you can make it look like you evicted your neighbour in your own client but cannot impose this view on anyone else.
  • What about heightmaps ?
  • One solution for this would be to have a fixed global heightmap source, whatever form it takes (preferably a static download with each client software). On top of this local land simulators could apply heightmap offsets of their own to override it, for terraforming. On occasions existing overrides for the heightmaps could be accrued into the common global source, though it would not necessarily save bandwidth. The world map can then be extended by adding heightmaps for further regions at anytime.
  • Another solution is simply to have the simulators stream their own heightmaps, and assume "out of world" or empty ocean for where the client has no such data. But then it makes it impossible to have a landmass where no one owns the land. This corresponds to what we currently have in SL.
  • This makes it impossible to have a landmass visible to other people, actually. By allowing multiple sources for local content, this model allows very simple integration of multiple grids: the client only has to refer to more than one land register. This way, a land mass can be seen as "no owner" by one, and being owned (and have an active presence service) in another.
  • It is possible to seperate the region-related load and agent-related load, as in the LL proposal with distinct Agent Domain and Region Domain. As an aside it allows gracefully handling the difference between basic and premium account residents, as well as limiting the number of concurrent simulators running, down to only the number of residents who hold land - so that it might be much more workable. This implies seperating Grid machines in two groups, for running simulators on one hand and running agent sessions on the other. Preferably, the agent ones would run in the clients and profit from the massive computer power that residents can contribute to running the Grid. This means identity would not be certified, just as an IP address does not certify a web browser's origin, but that's fine because the content would be safe in the region servers.
  • This means that asset servers are no more autoritative, and instead the sims take precedence over them: the asset databases then only serve as backups and not sources, while the sims are no more caches, but the real thing.
  • It also solves the problem of knowing on which server to run content newly created: on the individual sim if it has spare running capacity (prim allotment, script capacity, etc.) or on the local land sim if there's no land, spare space and if the local land allows it. That means a landless resident would host his or her content on a specific person's sim, on an individual basis. A mechanism for allotting a given sim space to a specific resident might be needed.
  • Physics, especially those of avatar movement, would be extremely simplified if the client was authoritative on the avatar's position instead of the simulator. Such a change of authority would also be possible because of how dramatically the scheme improves privacy in SL.
  • Scalability of the updates between simulators is problematic: with each sim responsible for an avatar, in a crowded place of N people (+ 1 for the location owner), each sim is sending N updates so the processing and bandwidth usage scales with N squared. Updates output thus should cap and throttle with immediate visibility: for example, above half of the max update total output, the frequency is halved starting with the most distant / less visible sim(s) - as the number of concurrently present sims grows more of the visible sims receive half the updates (any new sim to the pool receives half the frequency and the most distant or least visible sim that is non-halved yet is halved as well), then onto 1/4 the frequency when output reaches 75% of max, then 1/8 at 82.5%, etc.