Talk:AWG Scalability through reverse proxies: the paravirtual grid

From Second Life Wiki
Revision as of 07:26, 23 October 2007 by Dzonatas Sol (Talk | contribs)

Jump to: navigation, search

Analysing scalability of the paravirtual grid

Call me confused, but I don't see much scalability in the paravirtualization of the Grid... The reverse proxies are called quasi-static caches, but the problem is that the simulation data, the way it is now in SL, is not static enough to make it practical or useful. As I understand it, this proposal is a form of sharding of the regions, it does not solve high concurrency number issues because of two reasons:

  • The subdivision method exemplified (US, Europe, Asia) is useless in current use cases: different continental populations log in SL and interact at roughly the same hours of the day across themselves already. Thousands of people connecting from the same geographical origin are still facing asset lag and simulator cutoff. This means that the proposal does not really address the concurrency problem at all, or at least it only addresses a marginal aspect of it.
  • Even if we take exemption of the above problem by redirecting same-origin excess load to other reverse proxies, the data streams are still mixed as high in the chain as the origin. i.e. the load that arises from the interaction and intervisibility of the crowd still has to be processed by the source, and we're just moving the bottleneck from the simulator to this interconnection mechanism.

Even if there is a way to seperate the data streams at the source and keep them seperate down the way (which implies seperating people into seperate grids where they cannot see each other) this proposal does not actually offer any mechanism for the accruing of those streams at the "top" of the architecture: there is no method proposed for concurrent modification of the central databases, which therefore still faces the same scalability issues known today. Besides, bottlenecks are still present inside the proposed architecture: they appear within the flow of data from the source, be it a reverse-proxy or the original simulator, and the clients. This is bound to happen with any form of static subdivision (geographic or not), and dynamic subdivisions can only mitigate that effect: the distribution of the load in SL is not uniform in any globally-determinable way, so there is no correct way to distribute that load from the top-down, it must be done from the clients in a bottom-up way.

In other words, I think trying to approach concurrency issues in SL through a macroscopic view, and thinking of the population load in global aggregates, is not the correct method. The bottom line is that the elementary design of Grid processes determines everything else, including the global result. I'm afraid LL might be making the same mistake in their own proposal here: in many common use cases, it does not, in fact, scale the way they hope it will. Simply throwing more machines at the growing Grid load does not appease it, the fundamental origins of the scalability constraints, within the design of the Grid processes, as they had been identified months ago, have to be addressed first.

I hope this can help improve the proposal. Reverse-proxies definitely have their place in a well-designed Grid architecture, finding exactly how is an important task.

This analysis feels about right. I've been chewing on some of this in these two pages

AWG: state melding exploration
AWG: Core simulator exploration

Which try to look at two levels of abstraction, at what the core work of a regoin simulator is. (Rough draft warning)