Difference between revisions of "LLSD"
Zero Linden (talk | contribs) |
Poppy Linden (talk | contribs) |
||
Line 599: | Line 599: | ||
= Notation Serialization = | = Notation Serialization = | ||
The (deprecated) notation serialization format is meant for human readability. It is similar to, but different from, JSON. See the [[LLSD#Notation_versus_JSON |comparison notes]] below. | The (<strike>deprecated</strike>?) notation serialization format is meant for human readability. It is similar to, but different from, JSON. See the [[LLSD#Notation_versus_JSON |comparison notes]] below. | ||
Parsing and formatting are currently only | Parsing and formatting are currently available <strike>only in c++</strike> in c++ (indra/llcommon/llsd.h) and in python (llbase.llsd). Notation LLSD is the notation llsd prefix followed by a single LLSD element of any type. | ||
<pre> | <pre> |
Revision as of 14:01, 14 May 2010
Summary
Linden Lab Structured Data (LLSD) provides a flexible data system similar to the data facilities of dynamic languages like Perl and Python. It supports exchange of structured data between loosely-coupled systems (those not compiled into the same module).
Data in such exchanges must be highly tolerant of:
- Recompilation
- Implementation in a different language
- Addition of extra parameters
- Execution of older versions (with fewer parameters)
To this end, LLSD strives to be easy to use, and to default to "the right thing" wherever possible. It is extremely tolerant of errors and unexpected situations.
The fundamental class is LLSD. LLSD is a value holding object. It holds one value that is either undefined, one of the scalar types, or a map or an array. LLSD objects have value semantics (copying them copies the value, though it can be considered efficient, due to sharing), and are mutable.
Undefined is the singular value given to LLSD objects that are not initialized with any data.
The scalar data types are:
- Boolean - true or false
- Integer - a 32 bit signed integer
- Real - a 64 bit IEEE 754 floating point value
- UUID - a 128 bit unique value
- String - a sequence of zero or more Unicode chracters
- Date - an absolute point in time, UTC, with resolution to the second
- URI - a String that is a URI
- Binary - a sequence of zero or more octets (unsigned bytes)
A map is a dictionary mapping String keys to LLSD values. The keys are unique within a map, and have only one value (though that value could be an LLSD array).
An array is a sequence of zero or more LLSD values.
Scalar Accessors
Function: Fetch a scalar value, converting if needed and possible.
Conversion among the basic types, Boolean, Integer, Real and String, is fully defined. Each type can be converted to another with a reasonable interpretation. These conversions can be used as a convenience even when you know the data is in one format, but you want it in another. Of course, many of these conversions lose information.
Note: These conversions are not the same as Perl's. In particular, when converting a String to a Boolean, only the empty string converts to false. Converting the String "0" to Boolean results in true.
Conversion to and from UUID, Date, and URI is only defined to and from String. Conversion is defined to be information preserving for valid values of those types. These conversions can be used when one needs to convert data to or from another system that cannot handle these types natively, but can handle strings.
Conversion to and from Binary isn't defined.
Conversion of the Undefined value to any scalar type results in a reasonable null or zero value for the type.
Automatic Cast Protection
These are not implemented on purpose. Without them, C++ can perform some conversions that are clearly not what the programmer intended.
If you get a linker error about these being missing, you have made mistake in your code. DO NOT IMPLEMENT THESE FUNCTIONS as a fix.
All of thse problems stem from trying to support char* in LLSD or in std::string. There are too many automatic casts that will lead to using an arbitrary pointer or scalar type to std::string.
Attributes and Data
Attributes are only used for encoding parser and formatting instructions. The data in the elements is always data.
Root Element
The root element is llsd. The root must have only one child element which can be any container or atomic type.
Atomic Types
Each atomic type represents one value with type information. An atomic does not have a name, but may have attributes to specify format or processing considerations for the parser. Consumers of atomics are encouraged to massage the data into the preferred native representation, but further serialization should honor the original type information if possible.
The default value for any type is represented in XML as <type />
. For example, <integer />
represents zero (0).
Datatype | Description | Default Value | Examples |
---|---|---|---|
Undefined | A placeholder for a value. | undef
|
undef
|
Boolean | A true or false value. | False | true
|
Integer | A signed 32-bit integer value; any natural number between -2147483648 and 2147483647 inclusive. | 0 | <integer>289343</integer>
|
Real | A signed 64-bit floating point number. | 0.0 | -0.28334
|
String | A sequence of zero or more Unicode characters intended to be human readable. See the IETF draft for details. | "" (the empty string) | The quick brown fox, etc.
|
UUID | A universally unique identifier: an unsigned 128-bit integer. | 00000000-0000-0000-
0000-000000000000 |
d7f4aeca-88f1-42a1-
|
Date | A specific point in time, UTC; Values range between January 1, 1970, through at least January 1, 2038. The serialization and parser only understand ISO-8601 numeric encoding in UTC. | The "epoch," Midnight, January 1, 1970 (UTC). | 2006-02-01T14:29:53.43Z
|
URI | A Uniform Resource Identifier, specifying a link to an external resource and conforming to RFC 2396. | The empty URI. | http://sim956.agni.lindenlab.com:12035/runtime/agents
|
Binary | A sequence of zero or more octets. Serialization format may include base64 (the default), base16, and base85. | A sequence of zero octets | base16: cmFuZG9t
base64: |
undefined
The undefined type is a placeholder to indicate something is there, but it has no value, and cannot be converted to any other atomic type. Though limited in this way, an undefined is still considered a first-class atomic, and is expected to behave like any other atomic structured data type at runtime.
Serialization example
<undef />
boolean
A true or false value.
Conversion
type | rules | |
boolean | unity | |
integer | true => 1, false => 0 | |
real | true => 1.0, false => 0.0 | |
uuid | n/a | |
string | 'true', 'false' | |
binary | one byte us-ascii where true => 1, false => 0 | |
date | n/a | |
uri | n/a |
Serialization examples
<!-- true --> <boolean>1</boolean> <boolean>true</boolean> <!-- false --> <boolean>0</boolean> <boolean>false</boolean> <boolean />
integer
A signed integer value with a representation of 32 bits.
Conversion
type | rules | |
boolean | 0 => false, all other values => true | |
integer | unity | |
real | closest representable number | |
uuid | n/a | |
string | human readable string | |
binary | 8 byte network byte order representation | |
date | seconds since epoch | |
uri | n/a |
Serialization examples
<integer>289343</integer> <integer>-3</integer> <integer /> <!-- zero -->
real
A 64 bit double as defined by IEEE.
Conversion
type | rules | |
boolean | exactly 0 => false, all other values => true | |
integer | rounded to closest representable number | |
real | unity | |
uuid | n/a | |
string | human readable string | |
binary | 8 byte network byte order representation | |
date | seconds since epoch | |
uri | n/a |
Serialization examples
<real>-0.28334</real> <real>2983287453.3848387</real> <real /> <!-- exactly zero -->
uuid
A 128 bit unsigned integer.
Conversion
type | rules | |
boolean | null uuid => false, all other values => true | |
integer | n/a | |
real | n/a | |
uuid | unity | |
string | standard 8-4-4-4-12 serialization format | |
binary | 16 byte raw representation | |
date | n/a | |
uri | n/a |
Serialization examples
<uuid>d7f4aeca-88f1-42a1-b385-b9db18abb255</uuid> <uuid /> <!-- null uuid '00000000-0000-0000-0000-000000000000' -->
string
A simple string of any character data which is intended to be human comprehensible.
Strings in the system that hold text a user might see or enter (chat, IM, notecards, AV names, region names,... basically almost everything!) should move to using a consistent set of acceptable characters. This set is:
- Unicode code points U+20 through U+10FFFD
- except U+D800 through U+DFFF
- except U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF ... U+10FFFE, U+10FFFF
- except U+FDD0 through U+FDEF
- U+9 (tab, '\t')
- U+A (newline or line feed, '\n')
- U+D (carriage return, '\r')
Strings may be sequences of zero or more of these characters. Strings *may* be normalized by mapping line ending sequences to U+A. Do not rely on differences in strings that normalize to the same string.
These choices of valid strings are chosen from Unicode 4.0 which defines the following valid code points:
- Unicode code points U+0 through U+10FFFD
- except U+D800 through U+DFFF (the UTF-16 surrogate pair range)
- except U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF ... U+10FFFE, U+10FFFF
- except U+FDD0 through U+FDEF (some historical screw up with Arabic)
The choice for special characters < U+20 is because XML defines acceptable text as all valid Unicode code points >= U+20, and U+9, U+A and U+D. The normalization is because XML defines that all line ending sequences are normalized to U+A.
See: Unicode In 5 Minutes for a brief introduction to Unicode.
String escaping
Strings that contain non-printable characters delimited with quotes or double quotes require escaping. If a single quote delimited string contains single quotes, those must be escaped. If a double quote delimited string contains double quotes, the double quotes must be escaped.
To escape the delimiter character, prefix a backslash. Backslashes must always be escaped with another backslash.
"And then he said, \"I have nothing more to say on the subject.\""
'Look in "C:\\linden\\"'
The most generic escaping is to specify a hex value of the byte after a literal backslash and character 'x'. This can be used for any character and is required for all non-printable characters which do not have an abbreviation. For example:
\x0C
Serialized strings should only contain UTF-8 characters, so non-printable characters other than tab, newline, and carriage return should be avoided. However, common non-printable characters have short-hand abbreviations.
character | value | serialization |
---|---|---|
alert/bell | 0x7 | \a |
backspace | 0x8 | \b |
form feed | 0xc | \f |
newline | 0xa | \n |
carriage return | 0xd | \r |
horizontal tab | 0x9 | \t |
vertical tab | 0xb | \v |
Conversion
Type | Rules |
---|---|
boolean | Empty => false, all other values => true |
integer | A simple conversion of the initial characters to an integer |
real | A simple conversion of the initial characters to a real number |
uuid | A valid 8-4-4-4-12 is converted to a uuid, all other values => null uuid |
string | Unity |
binary | Raw representation of the characters |
date | An interpretation of the string as a date |
uri | An interpretation of the string as a link |
Serialization examples
<string>The quick brown fox jumped over the lazy dog.</string> <string>540943c1-7142-4fdd-996f-fc90ed5dd3fa</string> <string /> <!-- empty string -->
binary data
A chunk of binary data. The serialization format is allowed to specify an encoding. Parsers must support base64 encoding. Parsers may support base16 and base85.
Conversion
type | rules | |
boolean | empty => false, all other values => true | |
integer | len < 4 => 0, otherwise first four bytes are interpreted as a network byte order integer | |
real | len < 8 => 0, otherwise first eight bytes are interpreted as a network byte order double | |
uuid | len < 16 => null uuid, otherwise first sixteen bytes are interpreted as the raw binary uuid | |
string | the raw binary data interpreted as utf-8 character data | |
binary | unity | |
date | n/a | |
uri | the raw binary data interpreted as a utf-8 serialized link |
Serialization examples
<binary encoding="base64">cmFuZG9t</binary> <!-- base 64 encoded binary data --> <binary>dGhlIHF1aWNrIGJyb3duIGZveA==</binary> <!-- base 64 encoded binary data is default --> <binary /> <!-- empty binary blob -->
date
A specific point in time. Intervals or relative dates are not supported. The serialization and parser only understand ISO-8601 numeric encoding in UTC. The time may be omitted which will be interpreted as midnight at the start of the day.
Conversion
type | rules | |
boolean | n/a | |
integer | seconds since epoch | |
real | seconds since epoch | |
uuid | n/a | |
string | standard serialization format | |
binary | n/a | |
date | unity | |
uri | n/a |
Serialization examples
<date>2006-02-01T14:29:53.43Z</date> <date /> <!-- epoch -->
uri
A link to an external resource. The data is expected to conform to rfc 2396 for interpretation, meaning, serialization, and deserialization.
Conversion
type | rules | |
boolean | n/a | |
integer | n/a | |
real | n/a | |
uuid | n/a | |
string | standard serialization format | |
binary | n/a | |
date | n/a | |
uri | unity |
Serialization examples
<uri>http://sim956.agni.lindenlab.com:12035/runtime/agents</uri> <uri /> <!-- an empty link -->
Containers
Containers is a special data type which can contain any other data type including other containers.
map
A map of key and value pairs where key ordering is unspecified and keys are unique. The key is always interpreted as a character string and any character string is acceptable. If there are any elements in the map, it is serialized as a key followed by an atomic or container value. For every key, there must be one value. Well formed and valid serialized maps may contain more non-unique keys. When a deserialized, the implementation should choose one of the the value objects, but that choice is not specified.
Serialization example
<map> <key>foo</key> <string>bar</string> <key>agent info</key> <map> <key>agent_id</key> <uuid>93c73b16-cd86-434d-8b4a-76e12eee950a</uuid> <key>name</key> <string>testtest tester</string> </map> </map>
array
An ordered collection of data members. Any member can be any atomic or container type.
Serialization example
<array> <real>7343.0194</real> <string>foo</string> <array> <map> <key>offset</key> <integer>9847</integer> </map> <string>da boom</string> </array> </array>
XML Serialization
MIME type: application/llsd+xml
When possible, prefer using us-ascii or or UTF-8 xml encoding.
XML is the "standard" serialization format, being future-proof and readable by a wide variety of tools. The XML serialization should be preferred unless profiling reveals that the binary serialization provides an essential performance benefit. All the serialization examples in the above sections are of the XML serialization.
DTD
<xml><!DOCTYPE llsd [ <!ELEMENT llsd (DATA)> <!ELEMENT DATA (ATOMIC|map|array)> <!ELEMENT ATOMIC (undef|boolean|integer|real|uuid|string|date|uri|binary)> <!ELEMENT KEYDATA (key,DATA)> <!ELEMENT key (#PCDATA)> <!ELEMENT map (KEYDATA*)> <!ELEMENT array (DATA*)> <!ELEMENT undef (EMPTY)> <!ELEMENT boolean (#PCDATA)> <!ELEMENT integer (#PCDATA)> <!ELEMENT real (#PCDATA)> <!ELEMENT uuid (#PCDATA)> <!ELEMENT string (#PCDATA)> <!ELEMENT date (#PCDATA)> <!ELEMENT uri (#PCDATA)> <!ELEMENT binary (#PCDATA)>
<!ATTLIST string xml:space (default|preserve) 'preserve'> <!ATTLIST binary encoding (base64|base16|base85) 'base64'> ]></xml>
Example XML Output
This is a sample from a recently running sim (indention for readability):
<xml><?xml version="1.0" encoding="UTF-8"?> <llsd> <map>
<key>region_id</key> <uuid>67153d5b-3659-afb4-8510-adda2c034649</uuid> <key>scale</key> <string>one minute</string> <key>simulator statistics</key> <map> <key>time dilation</key><real>0.9878624</real> <key>sim fps</key><real>44.38898</real> <key>pysics fps</key><real>44.38906</real> <key>agent updates per second</key><real>nan</real> <key>lsl instructions per second</key><real>0</real> <key>total task count</key><real>4</real> <key>active task count</key><real>0</real> <key>active script count</key><real>4</real> <key>main agent count</key><real>0</real> <key>child agent count</key><real>0</real> <key>inbound packets per second</key><real>1.228283</real> <key>outbound packets per second</key><real>1.277508</real> <key>pending downloads</key><real>0</real> <key>pending uploads</key><real>0.0001096525</real> <key>frame ms</key><real>0.7757886</real> <key>net ms</key><real>0.3152919</real> <key>sim other ms</key><real>0.1826937</real> <key>sim physics ms</key><real>0.04323055</real> <key>agent ms</key><real>0.01599029</real> <key>image ms</key><real>0.01865955</real> <key>script ms</key><real>0.1338836</real> </map>
</map> </llsd></xml>
Binary Serialization
MIME type: application/llsd+binary
We also have support for binary serialization and deserialization in c++ and python. The binary format is useful when dealing where optimal parse time is necessary. Binary LLSD is the binary llsd prefix followed by a single LLSD element of any type.
<?llsd/binary?>\n
type | serialization | notes |
---|---|---|
undef | '!' | |
true | '1' | |
false | '0' | |
integer | 'i' + htonl(value) | |
real | 'r' + htond(value) | |
uuid | 'u' + uuid | uuid is 16 bytes |
binary | 'b' + htonl(binary.size()) + binary | |
string | 's' + htonl(string.size()) + string | notation serialization is considered valid |
uri | 'l' + htonl(uri.size()) + uri | |
date | 'd' + htond(seconds_since_epoch) | |
array | '[' + htonl(array.length()) + (child0, child1, ...) + ']' | order is always preserved |
map | '{' + htonl(map.length()) + ((key0,value0), (key1, value1), ...)+ '}' | order is not always preserved. |
size() is a byte count.
length() is a child count.
htonl() is a function to generate a 4 byte network byte order integer.
htond() is a function to generate an 8 byte network byte order double. htond is not a standard system call, but you can find a c implementation in indra/llcommon/llsdserialize.cpp
.
Notation Serialization
The (deprecated?) notation serialization format is meant for human readability. It is similar to, but different from, JSON. See the comparison notes below.
Parsing and formatting are currently available only in c++ in c++ (indra/llcommon/llsd.h) and in python (llbase.llsd). Notation LLSD is the notation llsd prefix followed by a single LLSD element of any type.
<?llsd/notation?>\n
type | serialization | notes |
---|---|---|
undef | '!' | |
true | '1' | 't' | 'T' | 'true' | 'TRUE' | |
false | '0' | 'f' | 'F' | 'false' | 'FALSE' | |
integer | 'i' str(value) | |
real | 'r' str(value) | |
uuid | 'u' str(uuid) | |
binary | 'b(' str(size) ')"' raw_data '"' | 'b' base '"' encoded_data '"' | Base 16 and 64 encodings are supported. |
string | " escaped_string " | ' escaped_string ' | 's(' str(size) ')"' raw_string '"' | When using single quotes, double quotes do not need escaping and vice versa. |
uri | 'l"' escaped_uri '"' | See rfc 1738 for encoding rules. |
date | 'd"' YYYY-MM-DD 'T' HH:MM:SS [.FF] 'Z"' | Fractional seconds are optional |
array | '[' object0 ',' object1 ',' ... ']' | order is always preserved |
map | '{' string0:object0 ',' string1:object1 ',' ... '}' | order is not always preserved. The string is any supported string serialization format |
Example Notation Output
This is an excerpt from an agent request to enter a region serialized as notation:
[ {'destination':'http://secondlife.com'}, {'version':i1}, { 'agent_id':u3c115e51-04f4-523c-9fa6-98aff1034730, 'session_id':u2c585cec-038c-40b0-b42e-a25ebab4d132, 'circuit_code':i1075, 'first_name':'Phoenix', 'last_name':'Linden', 'position':[r70.9247,r254.378,r38.7304], 'look_at':[r-0.043753,r-0.999042,r0], 'granters':[ua2e76fcd-9360-4f6d-a924-000000000003], 'attachment_data': [ { 'attachment_point':i2, 'item_id':ud6852c11-a74e-309a-0462-50533f1ef9b3, 'asset_id':uc69b29b1-8944-58ae-a7c5-2ca7b23e22fb }, { 'attachment_point':i10, 'item_id':uff852c22-a74e-309a-0462-50533f1ef900, 'asset_id':u5868dd20-c25a-47bd-8b4c-dedc99ef9479 } ] } ]
[ { 'creation-date':d"2007-03-15T18:30:18Z", 'creator-id':u3c115e51-04f4-523c-9fa6-98aff1034730 }, s(10)"0123456789", "Where's the beef?", 'Over here.', b(160)"default { state_entry() { llSay(0, "Hello, Avatar!"); } touch_start(integer total_number) { llSay(0, "Touched."); } }", b64"AABAAAAAAAAAAAIAAAA//wAAP/8AAADgAAAA5wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AABkAAAAZAAAAAAAAAAAAAAAZAAAAAAAAAABAAAAAAAAAAAAAAAAAAAABQAAAAEAAAAQAAAAAAAA AAUAAAAFAAAAABAAAAAAAAAAPgAAAAQAAAAFAGNbXgAAAABgSGVsbG8sIEF2YXRhciEAZgAAAABc XgAAAAhwEQjRABeVAAAABQBjW14AAAAAYFRvdWNoZWQuAGYAAAAAXF4AAAAIcBEI0QAXAZUAAEAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" ]
Notation versus JSON
The plan is to eventually move to JSON, after its spec in ECMAScript 5th edition is finalized. There are three reasons this is not a current priority:
- The binary and XML serialization formats work for our use cases, so there's no driving need for JSON.
- The notation format has not been useful in our experience and we expect JSON to fill a similar use niche.
- JSON was not in wide, common use when LLSD was invented.
Guidelines
Questions & Things To Do
Would Binary be more convenient as usigned char* buffer semantics?
Should Binary be convertable to/from String, and if so how?
- as UTF8 encoded strings (making not like UUID<->String)
- as Base64 or Base96 encoded (making like UUID<->String)
Conversions to std::string and LLUUID do not result in easy assignment to std::string, LLString or LLUUID due to non-unique conversion paths.