Revision as of 12:24, 6 September 2007

AKA Certified HTTP.

Goals

The basic goal of certified http (chttp) is to have some kind of delivery and receipt guarantees for messages.

From a standard http client perspective, if the client reads a whole response, then it knows for certain the server handled the request. However, for all other failure modes, the client can't be sure if the server did, or did-not perform the request function. On the server side, the server can never know if the client ever got the answer or not. For some operations, we need the ability for the client to perform a data-altering operation and be insistent that it occur. In particular, if it isn't certain that it happened, then it must be able to try again safely.

The bigger picture goal is to make a simple way to conduct reliable delivery and receipt such that general adoption by the wider community is possible. This means that we have to respect HTTP where we can to take advantage of existing tools and methodology and to never contradict common conventions in a REST world.

competitive comparisons

I believe that all competitors to this fall into one of three categories, specialized message queues, reliable logic tunneled through HTTP, and delivery over HTTP with setup charges.

traditional message queue technology

This includes products like MQ, MSMQ, and ActiveMQ.

These technologies are useful, but provide a number of hurdles:

No common standards.
Integrates a new technology with unknown performance characteristics.
Requires significant operational overhead.

Because of this, we have opted to use HTTP as a foundation of the technology.

reliable application logic in body

This includes technologies like httpr and ws-reliable.

These tend to be thoroughly engineered protocol specifications which regrettably repeat the mistakes of the nearly defunct XMLRPC and the soon to join it SOAP -- namely, treating services as a function call. This is a reasonable approach, and is probably the most obvious to the engineers working on the problem. The most obvious path, which is followed in both of the examples, is to package a traditional message queue body into an HTTP body sent via POST. Treating web services as function calls severely limits the expressive nature of HTTP and should be avoided.

Consumers of web services prefer REST APIs rather than function calls over HTTP. I believe this is because REST is inherently more comprehensible to consumers since all of the data is data the consumer requested. In one telling data point, Amazon has both SOAP and REST interfaces to their web services, and 85% of their usage is of the REST interface[1]. I believe ceding power to the consumer is the only path to make wide adoption possible. In doing so, we must drop the entire concept of burying data the consumer wants inside application data.

reliable delivery after setup

There appear to be a few proposals of this nature around the web and a shining example can be found in httplr.

These are http reliability mechanisms which require a round trip to begin the reliable transfer and then follow through with some permutation of acknowledging the transfer. This setup can be cheap since the setup does not have the same reliability constraints as long as all clients correctly handle 404. Also, some of this overhead can be optimized away by batching the setup requests.

However, a protocol which requires setup for all messaging will always be more expensive than other options.

Requirements

chttp is based on http, and can use any facility provided by http 1.1 where not otherwise contradicted.
- this includes the use of https, pipelining, chunked encoding, proxies, redirects, caches, and headers such as Accept, Accept-Encoding, Expect, and Content-Type.
- any normal http verb appropriate to context should be accepted, eg POST, PUT, GET, DELETE
- unless otherwise specified, the http feature set in use is orthogonal and effectively transparent to chttp
in any complete chttp exchange the client and server will agree on success or failure of delivery
any message will be effectively received once and only once or not at all
the content of the http body must be opaque to chttp
the URI of the original request must be opaque to chttp
chttp enabled clients and servers can integrate with unreliable tools
- the chttp server can differentiate reliable requests and respond without reliability guarantees
- chttp clients can differentiate reliable responses and handle unreliable servers
the client will persist the local time of sending
if there is one the client must either have the persisted outgoing body or the exact same body can be regenerated on the fly
the server will persist the local time of message receipt
the server must persist the response body or have a mechanism to idempotently generate the same response to the same request
all chttp urls are effectively idempotent for all uniquely identified messages
- the server will keep receipt of messages for a minimum of 31 days
- the client can retry steadily over a period of 15 days
- the client and server are assumed to almost always be running
- over that window of opportunity, the same message will always get the same response
all persisted data on a single host is ACID

requirements on top of http

if the body of the request is non-zero length, the client MUST include a Content-Length header, unless prohibited by section 4.4 of RFC 2616
- the server will look for \r\n\r\n and content length body to consider the request complete
- an incomplete request will result in 4XX status code
if the body of the response is non-zero length, the server must include a Content-Length header
- the client will look for \r\n\r\n and content length body to consider the response complete
- the client will retry on an incomplete response
Messages can not be terminated by closing the connection -- only positive expressions of message termination (such as Content-Length) can guarantee that a full message is received.

assumptions

The client and server will not have any significant time discontinuity, ie, the clock will not be reset to a different day or several hours from the current setting
The client and server will not have clock drift more than 1 hour per day
Both parties exchanging messages are Reliable Hosts (defined below)
Generating a globally unique message ID is inexpensive
It is possible to store an large amount of 'small' data (such as UUIDs, urls, and date/timestamps) for a "long" time.
Any transaction or data handled by this system will become useless well before a "long" time has elapsed. The "long" time represents a time that is orders of magnitude larger than the longest downtime we expect to see in the system and the largest clock skew we expect to see. The purpose of the "long" time is for garbage collection -- chttp is expected to generate a small amount of bookkeeping data for each message, and there needs to be a way to expire this data in a way that doesn't involve additional negotiation between servers and clients. The server should keep data for a "long" time, and clients should not retry messages that are older than ("long" time)/2 or similar, so that there is at least a half of the "long" time for a Reliable Host to recover from errors, and half of a "long" time for clock skew. Therefore a half of a "long" time should represent a generous estimate of the time it takes for a not-so-alert engineering team to notice a problem and correct it. We're guessing that 7 days will be enough for our engineering team, so 15 days is the "long" time for now.

Reliable Host: A reliable host has the following properties:

The host has a durable data store of finite size, which can store data in such a way that it is guaranteed to be recoverable even in the face of a certain number of hardware failures.
The host will not be down forever. Either a clone will be brought up on different hardware, or the machine itself will reappear within a day or so.
The host can perform ACID operations on the data it contains.

Implementation

Requiring an opaque body and URI pretty much requires either negotiating an URL beforehand or adding new headers. Since the former was discarded earlier (reliable delivery after setup), we will focus on adding request and response headers.

This is a suggested implementation:

chttp enabled servers will behave idempotently with a unique message id on the request
the server can request acknowledgment from the client if the request consumes server resources

Request

A client will start a reliable message with a GET, POST, or PUT to an url on a reliable http server. The url may be known in advance or returned as part of an earlier application protocol exchange. The message itself will be uniquely identified by a new header for the message id.

X-Message-ID

A combination of client host identifier with a uuid or sequence number which specifies the reliability client ID for the request. When a server detects the X-Message-ID, it can guarantee reliable message delivery.

Since the message id must be unique, implementations should generate an id of the form $random_uuid@$client_host where the uuid is from a cryptographically secure generator and client_host is the name of the host.

Sending a message with the same message id but a a different body has an undefined result.

Response

Generally, when the client gets the 2XX back from the server, the message has been delivered. If the server has a response body, then the client will need to acknowledge receipt of the entire body. If the entire body is not read, the client can safely resubmit the exact same request. The server will include a message url in the headers if the client specified a message id on request and the body requires persistence by the server.

X-Message-URL

This will be a unique url on the server for the client message id which the client will DELETE upon receipt of the entire response to the request. This will only exist if there is a non-null response body or the server is consuming significant resources to maintain that url. When the client performs the delete, the server can flush all persisted resources other than the fact that the message was sent at some point.

If the client performs a GET to the message url prior to DELETE, the server must return the same body as the original response, including the X-Message-URL.

@@ Line 68: / Line 68: @@
 == assumptions ==
-* the client and server will not have any significant time discontinuity, ie, the clock will not be reset to a different day or several hours from the current setting
+* The client and server will not have any significant time discontinuity, ie, the clock will not be reset to a different day or several hours from the current setting
-* the client and server will not have clock drift more than 1 hour per day
+* The client and server will not have clock drift more than 1 hour per day
-* generating a unique message id is inexpensive
+* Both parties exchanging messages are Reliable Hosts (defined below)
+* Generating a globally unique message ID is inexpensive
+* It is possible to store an large amount of 'small' data (such as [[UUID]]s, urls, and date/timestamps) for a "long" time.
+* Any transaction or data handled by this system will become useless well before a "long" time has elapsed.  The "long" time represents a time that is orders of magnitude larger than the longest downtime we expect to see in the system and the largest clock skew we expect to see.  The purpose of the "long" time is for garbage collection -- chttp is expected to generate a small amount of bookkeeping data for each message, and there needs to be a way to expire this data in a way that doesn't involve additional negotiation between servers and clients.  The server should keep data for a "long" time, and clients should not retry messages that are older than ("long" time)/2 or similar, so that there is at least a half of the "long" time for a Reliable Host to recover from errors, and half of a "long" time for clock skew.   Therefore a half of a "long" time should represent a generous estimate of the time it takes for a not-so-alert engineering team to notice a problem and correct it.  We're guessing that 7 days will be enough for our engineering team, so 15 days is the "long" time for now.
+; Reliable Host : A reliable host has the following properties:
+* The host has a durable data store of finite size, which can store data in such a way that it is guaranteed to be recoverable even in the face of a certain number of hardware failures.
+* The host will not be down forever.  Either a clone will be brought up on different hardware, or the machine itself will reappear within a day or so.
+* The host can perform {{Wikipedia|ACID|w=n}} operations on the data it contains.
 = Implementation =

Difference between revisions of "Certified HTTP"