Learning the System
- Architecture Working Group

Overview Graphic of Multi-Process Client

This is a pre-draft of the draft

This is an early draft so scope and focus are still fairly open. Please add comments to the Talk:Multi-Process Client VAG -- draft if you have slightly different concerns so that we can try to converge on a common viewpoint. This discussion could also expose other similar VAGs that are needed in this area.

Summary

A new client architecture is described, in which client facilities are highly decoupled into processes that communicate over sockets. This architecture is contrasted with that of the monolithic client. It is also proposed as a reference client for testing the forthcoming AWG architecture, and also as a generic client platform for further client evolution. This work is undertaken by the Multi-Process Client VAG.

Purpose

The Multi-Process Client Viewpoint Advocacy Group (VAG) exists to provide input to the overall system archtecture design process in those areas where client-side functionality needs to evolve to track server-side changes.

More specifically, the Multi-Process Client VAG is concerned with identifying the key server-side changes that impact on the client, proposing at least one reference design for the client that can address all the expected changes, and assisting with the design and implementation of such a reference client. This work is expected to generate feedback in the opposite direction as well: a range of client-based Use_Cases will be examined in this VAG and used to generate requirements for the system architecture, particularly in the areas of scalability and interoperability.

As is evident from its name, this Multi-Process Client VAG has predefined one specific aspect of the required client, namely the change from the monolithic process structure of the original Linden client to a multi-process one. This will be discussed and justified in detail below, but it is not exclusionary: other client-oriented viewpoint groups may proceed along different lines.

This is a technical VAG.

See the Architecture Working Group and the Viewpoint Advocacy Groups for more information.

Areas addressed by this viewpoint

Server-side APIs visible from the client.
Scalability of the combined client and server systems from a client-side perspective.
Interoperability of grids and worlds from a client-side perspective.
Client diversity and its impact on client structure.
Designing clients for extensibility.
Use of clients as test tools.
Harnessing the power of community for rapid client evolution.

Areas not addressed by this viewpoint

Server-side details that are not visible from the client.

Multi-Process Client VAG Glossary

The linked glossary defines the following terms:

Backend, Busy waiting, Client Script, Facility, Facility API, Facility Optimization, Facility Socket, Frontend, GUI, LCC (Limited Capability Client) , Mediator, Monolithic Client, Multi-Process Client, Multicore, Plug-in, Process, Regression Testing, Renderer, TDD (Test-Driven Design), Test driver, Test harness, UI, Viewer (deprecated).

See Architecture Working Group Glossary for terms not defined here.

Source of Viewpoint

No existing sources for this viewpoint have (yet) been sought. This viewpoint is however highly reusable, and therefore existing sources are very likely to exist and should be investigated.

Translated into mortalspeak: other applications probably use the same structure, so we ought to check them out, even if they don't claim an IEEE_1471 viewpoint.

General concerns addressed by this viewpoint

These are expressed in the form of a Rationale for a Multi-Process Client.

Specific concerns within this viewpoint

These are presented in detail in the section Proposals and Analysis.

Rationale for a Multi-Process Client

Problems with the Monolithic Client

The original SL client performs its intended role as a fixed-function viewer for the self-contained Second Life service reasonably well, but it does not meet the requirements of a platform client, for the following reasons:

The platform view of an extended, highly scalable, and widely interoperable Second Life embraces third-party worlds, grids, assets, and contributions, which implies client diversity and flexibility which are absent in the monolithic client.
The fixed functions of the original client represent only a small subset of the functions envisaged for that extended platform.
Many Limited Capability Clients (LCCs) will have hardware constraints which will make them unable to run the original client, for example cellphones.
Many LCCs will have user-interface constraints which require a radically different client structure, for example portable audio-only clients.
The monolithic structure of the original client does not allow wholesale omission of large and inappropriate subsystems (at best, the capabilities can only be disabled).
The original client does not support user choice of local facilities unless they have been pre-defined by Linden Labs. This is inappropriate in a platform client that needs to interoperate with third-party systems and assets.
Any change of facilities outside of those envisaged by Linden Labs for the specific Second Life service would require a complete client rebuild and release. The manpower implications of this alone would would be immense, let alone the impact on client stability and interoperability.
The monolithic client is a very large and complex application. Large and complex applications run counter to good software engineering practice, for numerous well proven reasons.
Monolithic clients cannot make good use of the new generation of multicore CPUs unless they are extensively threaded internally. Such threading can be very successful, but in general this hardwires the client-side architecture to a fixed multithreading structure and very often leads to development restrictions. In addition, the lack of thread address space separation in a threaded application tends to reduce stability and harms resilience.

Benefits of a Multi-Process Client

The above concerns are addressed quite effectively in a multi-process client architecture, as follows:

Plugging together alternative modules provides the needed diversity and flexibility that turns the client into a generic platform. It makes it extensible to the larger feature set of the distributed supergrid, and also allows it to be scaled down for Limited Capability Clients.
With a multi-process architecture, 3rd parties are no longer dependent on the Linden design process to include their extensions, nor do they need to perform a complete client build and release. Extensions can be deployed incrementally.
Facility modules or extensions are inherently small and of limited complexity, because they address a specific function. This benefits stability and ease of development.
Facilities can be programmed in any language whatsoever as long as it can use network sockets. This eliminates computing language barriers between developers and allows the greatest possible number of them to contribute and collaborate.
Networking sockets are license barriers and hence, like all client-server program pairs on the Internet, Facility programs can be covered by arbitrary licenses and still interoperate freely. This eliminates balkanization of the developer community by license.
The multi-process client architecture harnesses multicore CPUs naturally and automatically, while still working on single-core CPUs through normal multitasking.

Use Cases

QA tool: Test harness / Regression test driver

Motivation

Historically, this use case provided the initial motivation for the Multi-Process Client.

The new architecture being designed under the auspices of the Architecture Working Group is expected to deliver reference implementations of key REST services fairly rapidly, whereas development of the monolithic SL client moves slowly because of conflicting requirements and insufficient manpower. Of particular concern here is that the official build repository is not open to community merging, which means that alternative community client-code repositories are continually chasing taillights and having to re-merge their changes. This is not conducive to rapid progress, and hence an alternative client platform that would be free of such constraints was sought.

A Multi-Process Client as defined here would be even more free of constraints than a monolithic client built from an open community SVN repository. Because of the mix'n'match attachment of Facility modules to meet personal requirements, as well as the license barrier created by network sockets, and also the ability to write Facility programs in any language whatsoever that supports sockets, such a client would in practice have fewer developer constraints than almost any other application in existence. This then is a very powerful approach to the problem.

Use case: QA tool

Only those Facilities required in a QA tool would need to be developed. These include:

Scripting system
State machine engine
Test logger
Regression test manager
Performance instrumentation
Test documentation generator
An ever-expanding suite of individual unit tests
Dynamic workload generators
Scaling trend visualizers

The early candidates for testing are expected to be Login and REST Services. Unit tests would therefore be created for those first.
Since a Facility can trivially create a network-accessible endpoint, the QA tool could in principle be network-accessible and driven by the community wherever it is located.
The most common uses of the "Mult-Process Client as QA tool" are expected to include:

Individual unit testing of the new unified Login service
Individual unit testing of REST services, including early experimental designs
Test-driven development framework for reference or third-party implementations
Workload generation for testing the scalability trends of REST services
Regression testing of all client-visible subsystems after rebuilding
Pressure testing of service robustness when under maximum load.

Limited Capability Client (LCC)

stub only, please fill in

Use case: LCC

stub only, please fill in

Full client

The first rule of "Full client" is not to talk about "Full client".

Joking aside, since the Multi-Process Client is completely open-ended even without developer involvement as a result of its user-level attachment of Facilities, a "Full client" is merely one with a lot of Facilities attached, and even that would vary according to need.

A "Normal client" can easily be defined for any given local grid, but in a global supergrid of interoperating grids such a concept is somewhat limiting as well.

A more appropriate model might be one in which a "Minimal normal client" is packaged with commonly used Facilities, and the user downloads additional Facilities from a suitable repository as needed ... in other words, the model employed by modern web browsers.

Use case: Minimal normal client

Taking the capabilities of the current monolithic client as the basis for required functionality, a "Minimal normal client" would include the following Facility modules:

Login session facility
Events subsystem facility
Suite of 2D UI facilities
Suite of world-rendering facilities
Suite of media handling facilities
Suite of common I/O facilities
Suite of resident communication facilities

Usage: as per current monolithic client.

Multi-Process Client Architecture

The Multi-Process Client architecture detailed below represents only one possible factoring of the client into a suite of communicating processes. This initial factoring provides a concrete basis for analysis and discussion, but is very likely to change if a better factoring is found.

It is proposed that the client be structured as follows at the level of operating system processes and communications flows:

A frontend Mediator process should handle all communications with virtual world servers.
This Mediator process should provide a dynamic number of network Facility Sockets.
Several backend Facility processes should attach to these Facility Sockets to implement Facilities.
The functions of the client are performed through concurrent operation of the Mediator and the set of attached Facility processes.

Processes and Communication Flows. This diagram is representative of the general client architecture, rather than describing a specific design. The Facilities illustrated are a subset of those possible, and the list of candidates is inherently open-ended by the extensibility of the model. Also, no Facility Optimization is shown, and this may introduce additional high-bandwidth channels between Facilities and/or merging.

Multi-Process Client Mediator

The Mediator is the communications hub of the client. As such, it is expected to benefit from being as fast as possible and as small as possible (for CPU cache locality). To this end, and in keeping with the client design architecture, all non-communication functions should be placed in attached processes, even if those functions relate to the Mediator alone. For example, Mediator configuration and subsequent control should be carried out by a Facility, not in the Mediator itself.

The Mediator is expected to comprise the following elements:

Dynamic backend socket interface able to accept an arbitrary number of connections from Facility processes. Note that any processes that might be considered logically "frontend" in the sense of lying between VW servers and the Mediator are actually implemented as Facilities as well --- at the level of process abstraction, there is no distinction.
Identity information for each Facility to allow message/event routing.
Extensive runtime instrumentation, with all captured data passed asynchronously to Facilities for processing.
Facility message handler/dispatcher optimized for Mediator functions only.

Multi-Process Client Facilities

Each of the following Facilities is implemented as a process which attaches to a Facility Socket on the Mediator process. Two or more performance-critical Facilities may be combined if the engineering tradeoff is deemed acceptable. The list includes some Facilities that are large subsystems which have not yet been refactored into components (eg. Complete 3D Engine).

As noted earlier, any processes that are considered logically "frontend" in the sense of lying between VW servers and the Mediator are actually implemented as Facilities as well, despite these being primarily "backend". At the level of process abstraction and setup, there is no distinction. The only difference is that actual frontend processes also employ direct connections to remote servers. An example of this kind of Facility could be the libsecondlife-based LibSecondlife Gateway (if used). Direct external connections are considered a Facility Optimization within this client design model, since communication with remote parties could have been routed through the Mediator instead.

Identity Manager
Login Manager
Configuration Manager
Object cache
3D Scenegraph
Standard 3D Renderer
Extended 3D Renderers
Vertex and Pixel Shaders
Complete 3D Engine
Event Subsystem
Logging Subsystem
2D UI Controller
Input Handlers
Output Handlers
Audio player
Audio DSPs
Text-To-Speech
MIDI Interface
VoIP interface
Encryption Modules
Local Scripting VMs
Generic State Machine
Unit Test Framework
Fuzzing Tester
Regression Test Manager
Private Gateways
LibSecondlife Gateway
Private Asset Database
Proxy for Monolithic Client

Note that Facilities are implemented as full processes, not as DLL/.so plugins. As a result, they can be run standalone for testing in a reduced capacity, or connected to an alternative fuzzing Mediator to check their robustness. They are not dependent on the Mediator process's internal symbols nor on those of any other Facility.

Depending on the function implemented, some Facilities may elect to open higher-bandwidth channels to other Facilities, mediated through the Mediator Facility API. This falls under Facility Optimization. Some Facilities may even merge with others, when performance constraints leave no alternative (rare).

Where appropriate, multiple Facilities of the same type may be attached concurrently: for example, both a normal workstation renderer and an HDTV renderer could operate simultaneously, or dual Wiimotes could control a pair of katanas, or multiple MIDI instruments might be employed during a musical performance.

Facility-Mediator Communications

Facility Attachment Protocol

Please note that this is a preliminary and tentative specification.

Facility and Mediator processes are normal TCP clients and server, respectively. All ports mentioned here are TCP server/listener ports. Normal TCP dynamically-assigned originator ports are present at the other end of each connection and are not mentioned here.
The Mediator listens permanently on a predefined Facility Request Port (a TCP port) for Facility attachment requests.
A Facility process requests attachment by opening a TCP connection to the Mediator on the predefined Facility Request Port and writing a standard set of details on the connection.
The Mediator responds to the attachment request (if it is granted) by returning a Facility Attachment Port number and closing the request connection.
The Mediator listens for a period on the granted Facility Attachment Port (a TCP port), awaiting an attachment from the corresponding Facility.
The Facility process that was granted attachment connects to the specified Facility Attachment Port.
The Mediator adds the new Facility to its routing table and creates an empty dispatch table for its services.

Notes:

STCP may be considered as a replacement for TCP in this application later, as it has benefits for messaging.
The main reason for keeping request ports and attachment ports separate is for clarity and ease of monitoring/diagnosis during development. It does not preclude attachment ports being configured to use the same TCP port as used for requests.
The above can easily be seen as a TCP-oriented description of services setup similar to that in standard pub/sub practice. Subsequent to the setup stage, the Multi-Process Client becomes a classic event-driven application.

Facility Definition Protocol

aka. How does a Facility tell the Mediator what services it has to offer?

TBD

IMPORTANT: What is the absolute minimum that the Mediator needs to know about the services offered by a Facility? Can we reduce it down to nothing, other than a service name/number? The less the Mediator knows, the faster it can be (at least on a simplistic level), although whether this translates into a faster client overall is debatable.

Facility Operation Protocol

aka. 1) How does one Facility use the services of another Facility?

2) How does the Mediator manage the communication between Facilities?

See External Links for some background on event-driven architecture.

TBD

Facility Data Transfer Format

The options available and the advantages/disadvantages of each in this application are currently being examined.

A number of well-known options exist:

XML is an obvious candidate since it already lies at the core of AWG's REST services.
JSON has several benefits, including clarity.
YAML is a superset of JSON, with added benefits.
Lua is a lightweight programming language often used for structured data transfer.
Use a plug-in (DLL/.so) approach so that the data structuring technology is not hardwired into the design. [Rejected: this would balkanize Facilities badly.]
Use an even more lightweight ad hoc textual encapsulation for data transfer.
Dispense with textual formats altogether and use direct binary communications instead.
Use the "best" of the above without regard to overheads, and once usage has stablized create a binary equivalent form to handle the most common cases more efficiently. Eg. XML+binary. This approach could be used at the framework level (Facility -> Mediator -> Facility), or only in direct Facility -> Facility communications as a Facility Optimization, or in both. This has good evolutionary properties.

A choice needs to be made but there is no single obvious candidate. All have advantages and disadvantages, and these are not the same ones as apply in AWG server-side. Additional input (lots of it!) is essential here. Trialing alternatives may be required as well.

Facilities in Detail

This section focuses on those Facilities which individual stakeholders consider relevant to their particular viewpoint. This is a very limited set at present, as it derives from the expected use of the client as a QA and measurement tool. The list of Facilities described here will expand as the Multi-Process Client evolves towards a full virtual-worlds client.

Login Manager

See Hegemons Login Analysis for background.

Event Subsystem

2D UI Controller

Input Handlers

Output Handlers

Local Scripting VMs

State Machine Facility (SMF)

Numerous operations in a complex client require event-stepped sequencing through defined states. Rather than spreading ad hoc implementations of state sequencing throughout the client where sequencing is needed, we propose to implement a generic state machine framework just once, in its own dedicated Facility process which manages as many state machine instances as needed. Other Facilities can then employ this as a service.

In the envisaged usage scenario, a Facility (such as the Login manager) which requires event-driven sequencing will request the creation of a state machine, populate it with states, define per-state data, specify the trigger events which cause state transitions, and add pre- and/or post-transition callbacks for each possible transition. When enabled, relevant events (from any source) would be routed by the Mediator to the state machine, which would then perform the appropriate state transitions and dispatch actions anywhere within the client.

Although state machines would normally be implemented within the SMF process and therefore callback activation would involve inter-Facility communication, Facility Optimization would be possible in the usual manner for high-speed event switching, namely by relocating key sections to their point of use.

Unit Test Framework

See Unit testing for background.

Proxy for Monolithic Client

Organization

Joining

Anyone with an interest in this Viewpoint is welcome to join. You should join the AW_Groupies group in Second Life.

In world meetings

This group is currently active on a continuous basis, communicating over group and friend/calling-card IM and via wiki. We will also endeavor to meet at least once a week in-world, or more frequently if desired.

Members are active on the wiki and in the SLDEV mailing list.

Meetings Schedule:

Meeting Agendas

Chat Logs

Architectural Descriptions/Views used to express this viewpoint

This VAG expresses a client-oriented viewpoint, and hence is concerned only with those Architectural Descriptions/Views that impact on the client. This section identifies:

the general form or representation of ADVs required to express the viewpoint of this VAG
the elements within such ADVs which will be used to express the viewpoint
how these elements within such ADVs map to the concerns of this viewpoint
the traceability to viewpoint concerns required for conformance with the viewpoint.

None decided.

Tools employed by this VAG

Normal wiki textual and graphic representations are expected to be sufficient for group communications in this VAG. Proprietary formats should be avoided.
It is the main expected task of this VAG to develop the key elements of a Multi-Process Client, which will be a general QA tool for assisting the Quality Assurance VAG and Scalability VAG with their respective concerns. The same tool can be used for self-testing and test-driven development of the client itself.
Interface testing and performance measurement for the above VAGs is likely to require visualization-oriented output. This same output is expected to provide visualization for client operations as well, which will be of particular interest in multi-grid applications.

References and Resources

SVG Overview Graphic of Multi-Process Client -- source file
IEEE_1471 -- recommended practice for architecture description
Monolithic client architecture
Hegemons Login Analysis
Client-server protocol and messaging

External Links

SLDev Mailing list and archives -- High traffic list spanning all SL developer topics
libsecondlife -- useful library and protocol reference
OpenSim simulator and wiki -- a likely seed for 3rd party grids
Event loop, event-driven programming, liboop -- some event-oriented resources.

Members (Stakeholders)

Please note that the Multi-Process Client VAG is a non-hierarchical VAG in every respect, without exception. Any tags supplied with names are purely informational. Stakeholder ordering is alphabetical.

Day Oh 09:03, 11 November 2007 (PST)

Goldie Katsu 12:13 11 November 2007 (PDT)

Morgaine Dinova 11:00, 7 November 2007 (PDT)

Saijanai Kuhn 11:00, 7 November 2007 (PDT)

Multi-Process Client VAG -- draft