User:Dzonatas Sol/AWG Virtualization

From Second Life Wiki
Jump to navigation Jump to search

Motivation

The decision to map virtual regions to physical servers statically in the first version of Second Life has hardwired a number of implementation choices that impact on performance, resourcing and scalability. Roughly, and very simplified:
  • Passive land acreage (and cost) became limited by (and linked to) local disk storage.
  • Available prim counts became limited by local CPU power, and hence are non-scalable.
  • Concurrent scripting load became limited by local CPU power, and hence is non-scalable.
  • Server-side scalability for events became limited by local CPU power and bandwidth, and hence is virtually blocked.
  • Hardware utilization and overall scalability plummetted, as unused local resources cannot assist elsewhere.
  • The need for region handover at sim boundaries created numerous complexities and opportunities for problems to arise.
  • Asset serving and many other services could not scale automatically with population growth nor with grid expansion.
  • Local resourcing means that improvements made to a large grid cannot benefit its residents when visiting an attached grid.
The new focus on decentralization and scalability offers an opportunity to overcome all of these issues, and provides the motivation to do so. However, to achieve this does require relinquishing attachment to the conceptually simple but inherently flawed approach that is static mapping of virtual resource usage to physical resource provision.

Evolution

Fortunately, creating a dynamic infrastructure is not inconsistent with the current implementation. The following observations highlight this:
  • A dynamic server farm still requires some static physical topology, and a 2D grid is not horribly suboptimal at current levels of scaling.
  • Additional networking components are easy to attach to an exiting network if reduced latency is desired for dynamic services.
  • Existing grid servers running local sim services could trivially offer their unused resources for non-local use.
  • Very busy sims would degrade an additional dynamic infrastructure gracefully, simply by not contributing to it.
This suggests that an evolutionary transition to a dynamic infrastructure is possible, alongside normal operation of the static grid. It is hard to predict how issues like economics might faire alongside infrastructure that is evolving towards less-constrained dynamic resourcing, but then nothing at all can be predicted anyway about the future in a system that is being scaled to the scary numbers offered in Project_Motivation.

Design principles

In some ways, a dynamic resourcing architecture is simpler than a static one, because resource boundary issues do not need to be considered as in the static design. More importantly however, a bigger simplification occurs a little down the road, because static systems simply do not scale along all dimensions and therefore evolution rapidly hits a brick wall. This means that the simpler static system does not really address the right problem at all, once all-axis scalability is added to the requirements. In contrast, dynamic systems are inherently expandable, even along unforseen dimensions because new tasks are handled as easily as old ones.
The underlying principles for a dynamic design are relatively simple, and not hard to implement: (*)
  1. Make all work requests stateless (a region for example becomes merely a request parameter).
  2. Throw all work requests into a virtual funnel (aka. multi-priority task buffer and serializer).
  3. Place all servers in a worker pool, ready to take tasks out of the funnel as they appear.
  4. Hold all persistent world state in a distributed object store which is cached on all servers.
  5. Workers run a task, update world state if required, and send events which may create more work.
Although it is a departure from the static design, there is nothing really radical in such a scheme, and many standard software components can be used to implement it, including the very efficient and easily managed web-type services.
(*) Note that these are only principles. In an actual system design, this kind of scheme is repeated at many levels, in breadth and depth.

Benefits

A number of important improvements are gained immediately from such a dynamic infrastructure:
  • Regions scale arbitrarily for events, up to the limit of resource exhaustion of the entire grid.
  • The limits to desired scalability in any dimension are determined by policy, not by local limitations.
  • Server death does not bring down a region, and the overall grid suffers only graceful degradation.
  • Adding/upgrading grid server resources benefits the entire grid rathering than favouring one region.
  • Regions have no physical edges, so complex functionality is avoided and no sim handover problems can occur.
  • Regions can have any size or shape of land whatsoever, and land cost (not price) drops to that of disk storage.
  • Available prim counts are no longer tied to land acreage, allowing both highly dense and highly sparse regions.
  • Events can be held in previously unused areas of the world (eg. regattas in mid-ocean) without prior resourcing.
  • Small remote worlds and subgrids can be visited by large-grid residents, because local caching still operates.