User:Infinity Linden/Distributed Object Concepts
This is a brief discussion of how we (or at least some of us) here at Linden Lab view distributed applications. We spend a lot of our day knee-deep in code and specific networking issues, but I thought it might be good to take a breather, come up for air and talk about concepts. You may disagree with us on a number of points, but we figured it was a good idea to communicate how we see the world of distributed applications.
Concept 1 : Applications Can Be Represented as Collections of Objects
This may sound a bit simple for most people reading this... OO systems are a pervasive part of software development these days. Whether you program in C++, Ruby, JavaScript or even PHP, you've probably run into objects and classes.
But we figured it was worth re-iterating. We like objects, especially when they're coded as... well... objects. It's important to note that just because you're using an object oriented programming language, it's still possible to write code eschews the objectives of object orientation. Conversely, it's possible to write code that is OO-ish in non-OO languages, it's just a bit harder.
What we're talking about with this concept is that it's possible to model an application as a collection of cooperating objects. The figure to the left is meant to represent such an application. Rather than representing the application's state in a global data structure, state is distributed across a collection of smaller objects. Objects are intended to represent a conceptual entity manipulated by the application. Rather than writing routines to manipulate global data structures, we write methods that manipulate the state of an object in response to messages from other objects. Objects maintain references to other objects; the arrows in the figure at the left represent these references. Objects send messages to each other via these references, and it's these messages that trigger code to be executed to manipulate the state of our object friends. Okay.. so now that i've completely bored you with an extremely basic discussion that's so vague it has about zero utility.. let me conclude with the "important bits."
It's easy to get lost in the details of XML parsing and the wild flurry of UDP messages the SL client emits, but at the end of the day, there's this "application" thing that is independent of TCP or UDP or HTTP or whatever. It's a collection of cooperating objects (the mathematically inclined might call it an "object graph".) The object graph represents the "application logic" (escapees from the web 2.0 era may also occasionally use the term "business logic.") It represents the core abstractions of the application, independent of network transport, graphics rendering technology or even functional system architecture. We like OO development environments as they generally support concepts such as Abstraction, Encapsulation and Decoupling that makes maintenance of an evolving system less painful.
Concept 2 : Object Graphs Abound
Still with me? Great! I was worried I was boring you.
So the next concept is pretty straight forward as well: you can represent a server application as an object graph as easily as you can represent a client application with an object graph. The figure to the right is supposed to represent this concept.
You could go on and say that this concept does not require client-server communications, and you would be right. It would be possible to replace every instance of client and server with the word peer and you would still have a very workable conceptual framework. We talk about client-server a lot at Linden cause:
- we make extensive use of HTTP in our system which uses the Request-Response pattern, thus implying a client and a server, and
- people sometimes confuse the term "network peer" with the term "peer to peer" and freak out, thinking we're supporting illegal file-sharing or opening an interface that allows "bad guys" to illicitly copy digital assets in contravention of asset permission meta-data. For the record, we go to great lengths to prevent such illicit asset copying.
Concept 3 : Distributed Applications May Be Modeled as a Collection of Object Graphs
And here's where it gets interesting. In the figure to the left we see four different machines each with it's own object graph. But if you look close you'll see there are references across machine boundaries. This is what makes this application a "distributed application" that uses a "distributed object graph". It has nothing to do with implementation details of which programming language was used to code it or what networking protocols are used to communicate between machines, but that the design treats remote and local objects the same.
C++ programmers should be aware that references in C++ objects are generally a pointer to a location in memory. They should probably also know that it's impossible for one CPU to directly access memory connected to another CPU (okay... well.. it's impossible in a non-multi-processor system that isn't using NUMA clustering technologies like SCI)
So assuming we all don't have a Convex Exemplar or a SGI Altix system on our desktops, we're going to have an application that uses external references that know how to Marshall an object request and send it to its intended target on the network.
For what it's worth, if you hear discussion of CORBA, DSOM, or DCOM in Architecture_Working_Group meetings, it's usually related to this concept. CORBA, et al. were technologies intended to make referencing remote objects "easy" for software developers. It is a matter of fierce debate as to whether or not they achieved this objective. Ultimately, most software engineers responded negatively to the perceived complexity of such systems and either rolled their own distributed transaction system or hand-coded methods implementing remote message passing over HTTP(S). Developers familiar with Linden's SecondLife Viewer will see evidence of both approaches.
But... I digress... The concept I'm trying to communicate here is that from the high level design perspective, we can view a distributed application like Second Life as a big distributed object graph where some objects live on one machine and other objects live on another. Messages that need to pass from one machine to another use some form of request marshaling. The OGP specification attempts to define remote objects in terms of REST-like resources explicitly to enable this type of abstraction.
Concept 4 : Method Invocation on Remote Objects Should be Independent of Transport or Serialization Format
Okay... open the image to the right in another window... you'll want to look back and forth often.
One of the things fans of Object Oriented Analysis yammer on about incessantly is this whole concept of clearly separating "business logic" from "presentation logic." If you're like me, you would rather spend eternity eating double-edged razor blades than have another person explain why Separation of Concerns is a good thing. So, i'll not mention the motivation behind such architectural issues, only note that a lot of people (some of whom are considerably smarter than you or I) think it's a good idea.
Remember we talked about "business logic" before? Up at the top of the diagram we have our "business objects." They represent the state of the important concepts like where your avatar is, what she's dressed in and how many times she's been shot with the Bazooka 2000 in the last 15 seconds. They do not represent the state of a TCP/IP connection to remote machine or the number of seconds since the last packet was received.
The arrows between the white circles in the "Client Object Graph" represent references to objects. Links between these objects and the colored circles below represent references to "remote" objects. "Remote objects" live on other machines, and the mythical "remote object manager" is responsible for making the business logic objects think they're local by receiving message sends, marshaling the request, sending it over the wire and optionally receiving a response.
Below the proxy objects you should note multiple message serialization and transport methods. The fact that there are several of them implies there are several ways to communicate a business object state change to remote peers. Not shown in this diagram is the remote object manager on the peer system which un-marshals the message on the remote system.
So... the important take-aways here are "it's okay to have multiple transports" and "honestly, don't hand code message parsers, please" and "your business logic shouldn't care one whit which transport you use." (In the real world, there'll probably be a little bit of hinting by the business object like.. "oh... for the next several messages it would be really nice if they went over RTP, kthxbai," but we'll get to that again later.)