Virtualization of Regions
Note: The original version of this page was once the AWG document section Brainstorming#Virtualization of regions. The goal of this separate page is to focus ideas about region virtualization that periodically emerge from AW Groupies discussions.
The original Project Motivation document upon which AWG was founded created a very ambitious scalability requirement. Unfortunately, the existing statically tiled region architecture cannot deliver region scaling at all (regions currently descale as the population grows), let alone scaling to the figures implied by that motivational document.
As a result, alternative architectures that provide region scalability for SL or SL-interoperable worlds are regularly proposed and examined. This page discusses some of the relevant issues, and links to some of the relevant documents.
The decision to map virtual regions to physical servers statically in the first version of Second Life has hardwired a number of implementation choices that impact on performance, resourcing and scalability. Roughly, and very simplified:
- Passive land acreage (and cost) became limited by (and linked to) local disk storage.
- Available prim counts became limited by local CPU power, and hence are non-scalable.
- Concurrent scripting load became limited by local CPU power, and hence is non-scalable.
- Server-side scalability for events became limited by local CPU power and bandwidth, and hence is virtually blocked.
- Hardware utilization and overall scalability plummetted, as unused local resources cannot assist elsewhere.
- The need for region handover at sim boundaries created numerous complexities and opportunities for problems to arise.
- Asset serving and many other services could not scale automatically with population growth nor with grid expansion.
- Local resourcing means that improvements made to a large grid cannot benefit its residents when visiting an attached grid.
The new focus on decentralization and scalability in the AWG project offers an opportunity to overcome all of these issues, and provides the motivation to do so. However, to achieve this does require relinquishing attachment to the conceptually simple but inherently flawed approach that is static mapping of virtual resource usage to physical resource provision. Given the magnitude of the task, such changes are not expected within the context of the current SL, but nevertheless need to be examined if the desired scalability is to be achieved.
It is worth noting that region subdividion schemes that extend static tiling to smaller land extents just compound the problem further.
The excellent proposal for per-resident subdivision of the grid distributes load very effectively, although it doesn't actually provide regions with extra scalability beyond relieving them of load.
Fortunately, creating a dynamic infrastructure is not inconsistent with the current implementation. The following observations highlight this:
- A dynamic server farm still requires some static physical topology, and a 2D grid is not horribly suboptimal at current levels of scaling.
- Additional networking components are easy to attach to an exiting network if reduced latency is desired for dynamic services.
- Existing grid servers running local sim services could trivially offer their unused resources for non-local use.
- Very busy sims would degrade an additional dynamic infrastructure gracefully, simply by not contributing to it.
This suggests that an evolutionary transition to a dynamic infrastructure is possible, alongside normal operation of the static grid. It is hard to predict how issues like economics might faire alongside infrastructure that is evolving towards less-constrained dynamic resourcing, but then nothing at all can be predicted anyway about the future in a system that is being scaled to the scary numbers offered in Project_Motivation.
In some ways, a dynamic resourcing architecture is simpler than a static one, because resource boundary issues do not need to be considered as in the static design. More importantly however, a bigger simplification occurs a little down the road, because static systems simply do not scale along all dimensions and therefore evolution rapidly hits a brick wall. This means that the simpler static system does not really address the right problem at all, once all-axis scalability is added to the requirements. In contrast, dynamic systems are inherently expandable, even along unforseen dimensions because new tasks are handled as easily as old ones.
The underlying principles for a dynamic design are relatively simple, and not hard to implement: (*)
- Make all work requests stateless (a region for example becomes merely a request parameter).
- Throw all work requests into a virtual funnel (aka. multi-priority task buffer and serializer).
- Place all servers in a worker pool, ready to take tasks out of the funnel as they appear.
- Hold all persistent world state in a distributed object store which is cached on all servers.
- Workers run a task, update world state if required, and send events which may create more work.
Although it is a departure from the static design, there is nothing really radical in such a scheme, and many standard software components can be used to implement it, including the very efficient and easily managed web-type services.
(*) Note that these are only principles. In an actual system design, this kind of scheme is repeated at many levels, in breadth and depth.
A number of important improvements are gained immediately from such a dynamic infrastructure:
- Regions scale arbitrarily for events, up to the limit of resource exhaustion of the entire grid.
- The limits to desired scalability in any dimension are determined by policy, not by local limitations.
- Server death does not bring down a region, and the overall grid suffers only graceful degradation.
- Adding/upgrading grid server resources benefits the entire grid rathering than favouring one region.
- Regions have no physical edges, so complex functionality is avoided and no sim handover problems can occur.
- Regions can have any size or shape of land whatsoever, and land cost (not price) drops to that of disk storage.
- Available prim counts are no longer tied to land acreage, allowing both highly dense and highly sparse regions.
- Events can be held in previously unused areas of the world (eg. regattas in mid-ocean) without prior resourcing.
- Small remote worlds and subgrids can be visited by large-grid residents, because local caching still operates.
- AWG Scalability through per-resident subdivision of the Grid -- a very good proposal that distributes load with minimal architectural changes
- ANALYSIS: Region Subdivision as a scaling method (showing why the idea is fundamentally flawed)