Linden Lab Official:Media Rendering Plugin System Technical Overview
Overview
Goal: To facilitate easier integration of external media into the Second Life Viewer.
This document presents two separate (although related) designs:
- A general-purpose plugin architecture for the Second Life Viewer.
- Specific use of the plugin architecture for media rendering plugins within the Second Life Viewer.
Plugins of different types can use the plugin architecture. Although the initial implementation will be used only for media plugins, for example it could potentially be used for:
- Input devices
- Image decoding (JPEG2000)
- Voice services
Media rendering plug-ins will be run as separate processes to:
- Improve stability in Second Life client: if the plugin crashes, the Viewer is able to continue, albeit with invalid (or generic) media data. Currently, a small but significant portion of Viewer crashes occur when a media component crashes.
- Eliminate build-time dependencies, making it easier to author plugins.
- Ease plugin development:
- Plugin developers will not need to rebuild the entire Viewer, a complex and time-consuming task.
- Plugin developers will be able to use different compiler versions, perhaps even different languages.
The asynchronous nature of the architecture will:
- Take advantage of multi-core systems (generally speaking).
- Facilitate support for multiple media sources per parcel or per prim.
In the future, developers may wish to load plugins directly into the Viewer's address space. This design will support that mode of operation, but the initial implementation will load all plugins as external processes.
Why?
Implementing rendering for media types in separate plugins will make each one easier to create, modify and maintain.
Given the creativity and ingenuity of Residents, an increased selection of robust media implementations in Second Life will enable more interesting applications and use cases, for example, shared desktops, native document display, third-party application integration, and so on.
This redesign will also:
- Facilitate moving towards a more fine-grained architecture to ease maintenance and testing of the Second Life Viewer.
- Enable similar changes to filter through to other parts of the Second Life code base; for example, image support (J2K image modules) and user input devices (for example Space Navigator and VR Goggles).
- Allow for multiple implementations per media type and/or per platform (Mozilla/IE/WebKit for the embedded browser for example).
This design does not address a system for distributing plugins automatically, for example in the way that Firefox add-ons work. Clearly, something similar will be required eventually but is beyond the scope of this document.
Terminology
In this document "media" refers to QuickTime media and the in-Viewer web browser (currently implemented by LLMedia and its subclasses), as well as other media types which may be added in the future.
The term "client" refers to code in the Viewer that uses the services of a plugin.
Code
Latest branch: https://svn.secondlife.com/svn/linden/branches/2009/plugin-api/
After checking out the branch, run develop.py normally, then build/run the "media_plugin_test" target in the SecondLife project.
Technical overview
Plugin implementations run as a separate process. A plugin and the Viewer exchange pixel data using shared memory.
Communication between a plugin and the Viewer is:
- Bi-directional via a local TCP channel.
- Message-based and completely asynchronous.
Typical data flow when the Viewer wants to render a media URI:
- The Viewer asks the media manager to create a media implementation that can render media at the given URI.
- The plugin manager retrieves the MIME type of the resource at this URI type via an asynchronous HTTP HEAD request. If there is no MIME type associated with this URI (a VNC session for example) then one is assigned to it by specifying a type or subtype in the "x-" or "vnd-" range.
- Later, when the MIME type is known, the type and subtype fields are used to determine which media implementation will be used to render it via a lookup in an XML file.
- The plugin manager creates an "instance" of the relevant plugin.
- When the plugin is fully created and initialized, it informs the plugin manager that it is ready.
- The plugin manager tells the Viewer that the media can now be rendered.
- Sometime later, the Viewer asks the plugin manager to tell the plugin to start rendering data.
- Sometime later, the plugin receives the message and starts to render the media.
- When the media data being rendered changes, the plugin tells the media manager which in turn tells Viewer that something changed and the Viewer updates itself accordingly.
Plugin Architecture
Plugin Architecture Overview
At the lowest level, a "plugin" will be a loadable piece of object code in platform-native format (.DLL, .so, .dylib). Since these types of interfaces are inherently fragile, we will keep this part of the interface as simple as possible, so that it won't have to change over time.
Although the architecture provides for plugins to load directly into the Viewer process (for example for certain plugins developed by Linden Lab), in general, media rendering plugins will run as a separate process to enhance stability. How the plugin runs does not affect how you write it. This document will focus on plugins that run as separate processes, since in general that will be the case for third-party plugins.
The plugin loader shell loads plugins, provides for communication with the Viewer process, and acts as a host for the plugin.
Plugins communicate with the Viewer through messages. Messages consist of a message class and name (both human-readable) and a collection of key-value pairs (values with human-readable names). Messages are represented with LLSD, so values may be rich data types such as arrays or other containers.
Individual messages are unidirectional, and messages will be sent in both directions (from the host to the plugin, and from the plugin to the host). Sending a message to a plugin _may_ (depending on the contents of the message) cause it to send a message back to the host as a response. Plugins may also send unsolicited messages to the host (i.e. messages that aren't responses to a particular message the plugin received).
Messages will be passed to and from the plugin across a control channel. The control channel will be a stream -- the current plan is to use a TCP socket, but a unix domain socket or pipe would also work on platforms that support them. Since messages will be passed over the control channel, they may need to be serialized. They may also be received in a different process space, so passing pointers in a message is not generally useful.
It's important to note that messages are completely asynchronous. They may be queued for an arbitrary amount of time before they're delivered, so it's not possible to do a direct query of a plugin and get an answer back. The command sets for specific plugin types need to take this into account -- instead of having commands to query for the state of a plugin, the plugin should send unsolicited messages to the host when its state changes in meaningful ways, so that the host can maintain a local shadow of that state from which it will answer queries. This logic, as well as the logic for message encoding/decoding, will be handled by a proxy object -- an instance of a C++ class in the viewer that is tailored for a particular type of plugin.
Plugins can negotiate (via specific messages) for the host to set up segments of shared memory between the plugin and the viewer. When plugins are hosted by the plugin loader shell, these will be interprocess shared memory segments. (When directly loaded by the viewer, they will be simple pointers, but they will still negotiate setup in the same way, and the pointer will be owned by the plugin manager.) This is especially useful for media plugins, since they need to share large amounts of data with the viewer (specifically, pixel data containing rendered media).
Plugin Manager
The plugin manager will launch the plugin loader shell that hosts each plugin.
The plugin manager monitors the plugin process and can detect if it crashes. Some form of "heartbeat" from the plugin may additionally be implemented to detect if the process hangs. If the plugin dies, or hangs and needs to be killed, the plugin manager alerts the client.
The media plugin proxies may use this information to display a generic "media is not available" message in place of the rendered media.
The plugin manager will create and own all proxy object instances. Clients that need to have a plugin instantiated will make those requests through the plugin manager, and proxy objects will use the plugin manager internally to communicate with their plugins.
Plugin Loader Shell
The plugin loader shell is the executable that is responsible for loading and executing the code for a plugin as an external process.
Making this a single executable (instead of having a separate executable for each plugin) will make navigating through the Windows firewall (and other programs which ask the user about processes that make outgoing network connections) less painful - it will only need to be unblocked once.
Code in the plugin loader shell will handle certain messages (such as certain parts of shared memory setup and process priority control) instead of passing them through to the plugin.
When the plugin loader shell is initially launched, it won't know which plugin it is expected to load. This information will be supplied via internal messages once it establishes the control channel.
The plugin loader will manage relative priorities and CPU usage of plugins by setting the operating system priority of the process and throttling messages to the plugin.
In the future, the plugin loader may be responsible for querying a central authority for a certificate of authenticity for a plugin it has been asked to load. Ultimately, we will have different levels of signing so users can differentiate between plugins from Linden Lab, plugins from partner companies and 3rd party providers (Granitehead Chislehome and his penis plugins). Users will be able to set a security "zone" so that no plugin outside of their zone is ever loaded. The default setting is probably obvious. None of this will be implemented in this phase of the project and is only mentioned here in case it's appropriate to lay some groundwork earlier on.
Control Channel
The plugin manager and plugin will use the control channel to pass messages back and forth.
To establish the control channel, the plugin manager listens on a local TCP socket, and passes the socket's port to the plugin loader shell as a command-line argument on launch. The plugin loader shell connects back to the indicated TCP port on localhost.
To keep latency low, the control channel will not be used for bulk data transfer. If large amounts of data need to be passed between the plugin host and a plugin, that data should not be embedded in messages sent across the control channel. Instead use either:
- Shared memory segments.
- Separate streams negotiated via messages.
Messages
All communication with the plugin will be via messages. At the binary/function call level, the only thing plugins can do is send and receive messages. This simplifies the binary interface to plugins, and eliminates reasons to change the interface.
A message consists of a message class and message name (human-readable strings) and zero or more key-value pairs. A key-value pair is a name (also human-readable) and some data. Messages are represented using LLSD, and are serialized/deserialized using the existing mechanism for LLSD serialization.
Messages must be self-contained and cannot pass pointers, since they may need to be serialized, and may be passed across process boundaries. There may be a couple of exceptions to this rule, for messages that are sent from the plugin loader shell to the plugin itself when setting up shared memory regions.
Individual messages are unidirectional, asynchronous, and may be queued for an unspecified amount of time before they're delivered. Messages ARE guaranteed to be delivered in the same order they were sent. If a plugin crashes, any clients with an interface to that plugin will be notified, and must assume that any messages sent before the notification may have been lost
Sending a message to a plugin may or may not induce it to send back a response. Whether it does or not depends on the semantics of the particular message. Plugins can also send to the host unsolicited messages (messages that are not a direct response to another message). Such messages are used to keep the plugin's proxy object's cache of the plugin's state up to date, among other things.
Sets of messages will map to a logical hierarchy of plugin types. The initial hierarchy will look something like this:
- Internal messages: for communication between the viewer and the plugin loader shell. These should never be delivered to a plugin.
- Base messages: all plugins must implement a set of messages used for initialization, connection management, and shared memory setup
- Media messages: media plugins will implement this set, which will cover negotiating a pixel buffer and handling updates of subsections of it, and specifying a source URL for media, and passing mouse and keyboard events
- Time-based media messages: play, pause, stop, seek, etc.
- Browser-like media messages: forward, back, reload, etc
- Media messages: media plugins will implement this set, which will cover negotiating a pixel buffer and handling updates of subsections of it, and specifying a source URL for media, and passing mouse and keyboard events
Messages will need to be categorized logically. For example, it may make more sense to break out "interactive media" messages (mouse and keyboard interaction) into a separate class.
Part of a plugin's initialization will include it telling the plugin host which sets of messages it supports. This will include versioning information, so that if we have to change the semantics of messages over time, backwards compatibility can be maintained.
Incoming messages will be queued by each end, and delivered to the plugin and code in the Viewer serially. There will be a simple mechanism for both the plugin and the plugin client to put their input queues "on hold", to avoid potential concurrency issues. When the input queue is on hold, no new messages will be delivered to that end's message handler. By default, the queue will be on hold for the duration of the call to the message handler.
The plugin manager will contain code which can set up shared memory segments that are shared between the viewer and plugin processes. Shared memory segments will be identified by a name (which should be unique within each instance of the plugin). Memory segments can be created, resized (a.k.a reallocated), and destroyed.
Shared memory segment setup uses message semantics -- when you make a request to the plugin manager to modify a segment, nothing actually happens until you receive a message telling you it completed.
The plugin will interact with the plugin host through messages to set up and tear down shared memory segements, but some of those messages may be handled (or partially handled and changed) by the plugin host shell -- they will not go directly over the control channel.
We will create native implementations for the shared memory segment with a lightweight platform abstraction. This would use the CreateFileMapping API on Windows and either mmap() or shm_open() on Mac and Linux. The details of this implementation will be hidden from plugin authors, since it will reside in the viewer and the plugin loader shell.
Proxy Object
A proxy object is an instance of a C++ class in the viewer which has detailed knowledge of the behaviors and messages of a particular type of plugin. Each instance of a plugin being managed by the plugin manager will have a corresponding instance of a proxy object in the viewer.
Parts of the viewer code that need to use the services of a plugin will do so by interacting with the plugin's proxy object. The proxy object will provide any necessary interfaces for interacting with the plugin, and will mirror any necessary state information about the plugin that may need to be queried by clients of that plugin.
The proxy object instance may persist for some time after the plugin instance goes away, if necessary (i.e. so that all clients of the plugin can be cleanly notified). If a plugin crashes and is relaunched, the proxy object need not be destroyed, although any clients using the proxy will be notified that the plugin's state has been reset.
The proxy object may mirror some of the state of the plugin. This can be maintained by the plugin sending unsolicited state messages when its state changes in ways meaningful to its clients. State queries can then be answered directly by the proxy object without having to do a round-trip query with messages.
The hierarchy of proxy object classes may mirror the hierarchy of message types where it makes sense to do so. For example, there will probably be different proxy object classes for browser-like and time-based media. We started out this way, but ended up merging the browser-like and time-based proxy objects into a single class, LLPluginClassMedia.
Each proxy object will have member functions that can be called which will cause messages to be sent to the plugin for which it is a proxy. Likewise, each proxy object will have a mechanism (probably a listener-type interface) that will allow other code to sign up to receive relevant messages from the plugin.
The functions in the proxy object that clients use will have a conventional function-argument signature. The message will be built up (for outgoing messages) or decoded (for incoming messages) inside the proxy object's implementation, so that the details of message formats aren't exposed to code that doesn't need to know them. If the semantics of messages need to change over time, this is the level where compatibility with older plugins/message formats will be maintained.
Media Plugins
Media plugins can render in one of two modes: continuous and one-shot.
In continuous mode, the plugin instance is dedicated to a particular media stream. A shared memory buffer is allocated to the stream, and it renders into it continuously.
In one-shot mode, a message is sent to the plugin requesting that it render a particular media URL into a shared memory buffer. It renders the media once, then sends a message indicating the render has completed. This may be used when rendering large numbers of media streams at lower priorities (such as when there are a large number of streams in the user's view, but most of them are far away).
- The internet media type (aka MIME type) defines the media implementation used to render a given type.
- Internet media type is not the only factor that determines which plugin renders a given media type. While a plugin specifies the types of media it can render, the Viewer determines which plugin to use based on other factors (beyond the scope of this document), such as user preferences, the presence of other plugins, and so on. Thus, while you as a plugin developer indicate what media your plugin is capable of rendering, you cannot guarantee that a given media type will be rendered by the plugin.
- An XML file specifies which internet media type maps to which media implementation. TBD - Is the file stored online and cached by the Viewer, or part of Viewer installation?
- This component will contain a handler that observes the control channel and catches messages from plugins. (This is subject to a design decision about how messages are routed - via the plugin manager or directly from plugin to client). It is possible that this handler should be a standalone component that is shared with the plugin code since they both have to observe the control channel and act on messages.
Plugin implementation
- A shared library that contains the code to render media
- One shared library per media implementation.
- One invocation of the plugin per media source (per URI).
- However, consideration should be given to a system that allows multiple media sources (URIs) to share a plugin invocation. For example, for rendering one-shot Web pages, it might be perfectly acceptable for a part of the client to wait a short time before displaying a Web page whilst a different page is rendered.
- In addition, it may be optimal to cache one or more plugin invocations - for example, a significant amount of content in the Second Life client uses the embedded Web Browser so when the login page is closed and the user logs in, there is probably no point in destroying the embedded Web browser media implementation.
- Ideally, there should be a way to disable plugins remotely. If as discussed elsewhere, the mapping from MIME type to implementation is stored in a remote file, this will be straightforward to implement. If not, some other "kill switch" mechanism should be considered.
- Each plugin should support a default "Loading" 'image' that is displayed during init / load process. There should be a static one built into the plugin itself and perhaps the ability to specific one via local/remote URI.
- Plugins should be able to deal with media sizes changes over the life of the stream. For example, this happens frequently in a QuickTime RTSP stream or when a Web based UI is re sizeable.
- Needless to say, plugins must be able to deal with invalid media data or network failures and not crash. At the very least they should revert to their "Loading..." screen and ideally report back on what happened.
- Each plugin will contain a handler that observes the control channel and catches messages from the plugin manager. (This is subject to a design decision about how messages are routed - via the plugin manager or directly from plugin to client). It is possible that this handler should be a standalone component that is shared with the plugin manager code since they both have to observe the control channel and act on messages.
- An addition to the design that allows the plugin implementation shared libraries to be loaded directly by the plugin manager into the same process space is being considered. There are some circumstances when this might be preferred and whilst this functionality may not be needed immediately, it will be useful in the future. For example, in a world where many simultaneous media sources are required, having a large number of separate processes may consume a lot of system resource. If a plugin has been shown to be reliable, loading in into the same process may be reasonable. Certainly from the plugin author's point of view (and ideally for most of the system), the interface should be the same for both use cases.
Buffer size negotiation
The size and format of the pixel buffer which media plugins draw into is negotiated between the plugin and the host via the texture_params, size_change_request, and size_change messages.
The pixel size and format used for the buffer are determined entirely by the plugin, according to what it sends in the texture_params message. Indexed pixel modes and pixels which use less than 8 bits per channel are not currently supported. Recommended modes are 24 bit RGB or 32 bit ARGB, or variants thereof (BGRA, etc).
Size negotiation is a bit more complicated:
- the host initially sets up the dimensions of the pixel buffer
- if the plugin doesn't have any special sizing requirements, it can just draw to the buffer and never send a size_change_request message
- if the plugin sends a size_change_request message, the host takes that as the "native size" of the media
- media is allowed to change sizes multiple times during playback, such as with streaming QuickTime movies
- If the media is playing as parcel media and the "auto scale" option is set, the draw dimensions may be increased to the next power-of-two
- if the plugin sets the allow_downsample field in the texture_params message and the media's priority has been lowered, the host may reduce the requested draw dimensions
- the plugin should only set allow_downsample if it can draw the media at different scales without changing the overall look of the media, such as scaling down a video stream
- media which looks different when drawn at different pixel scales (such as a web browser where font sizes depend on pixel scale) should NOT set allow_downsample
- if it's more expensive to draw media downsampled than at native resolution, the plugin should NOT set allow_downsample -- it's supposed to be a way to reduce load on the system, and if it doesn't accomplish that it shouldn't be used
- if the plugin sets the 'padding' field in the texture_params message, the buffer's byte width will be padded to the requested alignment
- note that it's possible to request pixel size/alignment combinations that don't come out even when using 3-byte RGB pixels. If the host detects this, it will disable padding.
- the host sends the requested drawing dimenions, the actual size of the buffer, and the name of the shared memory segment to the plugin in a size_change_response message
- the plugin must draw within the specified dimensions, and should NOT send a size_change_request in response to the size_change_response
When changing buffer sizes, the new size may require a larger or smaller memory buffer than the old size. In this case, the shared memory segment name received with the size_change_response may be different than the one the plugin was previously drawing to. If the plugin receives a size_change_response with a shared memory segment name it hasn't seen yet, it MUST not draw to the old shared memory segment using the new parameters, or it may overrun the end of the buffer and crash.
In this case, the plugin should expect to receive a shm_added message for the new shared memory segment either before or after the size_change_response, and must wait until it has received both before drawing to the new buffer.
Likewise, when a plugin receives a shm_remove message and it's drawing to the shared memory buffer with that name, it needs to stop doing so, since the memory buffer will be deallocated once the message handler returns.