Revision as of 14:01, 14 May 2010

Summary

Linden Lab Structured Data (LLSD) provides a flexible data system similar to the data facilities of dynamic languages like Perl and Python. It supports exchange of structured data between loosely-coupled systems (those not compiled into the same module).

Data in such exchanges must be highly tolerant of:

Recompilation
Implementation in a different language
Addition of extra parameters
Execution of older versions (with fewer parameters)

To this end, LLSD strives to be easy to use, and to default to "the right thing" wherever possible. It is extremely tolerant of errors and unexpected situations.

The fundamental class is LLSD. LLSD is a value holding object. It holds one value that is either undefined, one of the scalar types, or a map or an array. LLSD objects have value semantics (copying them copies the value, though it can be considered efficient, due to sharing), and are mutable.

Undefined is the singular value given to LLSD objects that are not initialized with any data.

The scalar data types are:

Boolean - true or false
Integer - a 32 bit signed integer
Real - a 64 bit IEEE 754 floating point value
UUID - a 128 bit unique value
String - a sequence of zero or more Unicode chracters
Date - an absolute point in time, UTC, with resolution to the second
URI - a String that is a URI
Binary - a sequence of zero or more octets (unsigned bytes)

A map is a dictionary mapping String keys to LLSD values. The keys are unique within a map, and have only one value (though that value could be an LLSD array).

An array is a sequence of zero or more LLSD values.

Scalar Accessors

Function: Fetch a scalar value, converting if needed and possible.

Conversion among the basic types, Boolean, Integer, Real and String, is fully defined. Each type can be converted to another with a reasonable interpretation. These conversions can be used as a convenience even when you know the data is in one format, but you want it in another. Of course, many of these conversions lose information.

Note: These conversions are not the same as Perl's. In particular, when converting a String to a Boolean, only the empty string converts to false. Converting the String "0" to Boolean results in true.

Conversion to and from UUID, Date, and URI is only defined to and from String. Conversion is defined to be information preserving for valid values of those types. These conversions can be used when one needs to convert data to or from another system that cannot handle these types natively, but can handle strings.

Conversion to and from Binary isn't defined.

Conversion of the Undefined value to any scalar type results in a reasonable null or zero value for the type.

Automatic Cast Protection

These are not implemented on purpose. Without them, C++ can perform some conversions that are clearly not what the programmer intended.

If you get a linker error about these being missing, you have made mistake in your code. DO NOT IMPLEMENT THESE FUNCTIONS as a fix.

All of thse problems stem from trying to support char* in LLSD or in std::string. There are too many automatic casts that will lead to using an arbitrary pointer or scalar type to std::string.

Attributes and Data

Attributes are only used for encoding parser and formatting instructions. The data in the elements is always data.

Root Element

The root element is llsd. The root must have only one child element which can be any container or atomic type.

Atomic Types

Each atomic type represents one value with type information. An atomic does not have a name, but may have attributes to specify format or processing considerations for the parser. Consumers of atomics are encouraged to massage the data into the preferred native representation, but further serialization should honor the original type information if possible.

The default value for any type is represented in XML as <type />. For example, <integer /> represents zero (0).

Datatype	Description	Default Value	Examples
Undefined	A placeholder for a value.	`undef`	`undef`
Boolean	A true or false value.	False	`true` `0`
Integer	A signed 32-bit integer value; any natural number between -2147483648 and 2147483647 inclusive.	0	`<integer>289343</integer>` `-3`
Real	A signed 64-bit floating point number.	0.0	`-0.28334` `2983287453.3848387`
String	A sequence of zero or more Unicode characters intended to be human readable. See the IETF draft for details.	"" (the empty string)	`The quick brown fox, etc.` `foo`
UUID	A universally unique identifier: an unsigned 128-bit integer.	00000000-0000-0000- 0000-000000000000	`d7f4aeca-88f1-42a1- b385-b9db18abb255`
Date	A specific point in time, UTC; Values range between January 1, 1970, through at least January 1, 2038. The serialization and parser only understand ISO-8601 numeric encoding in UTC.	The "epoch," Midnight, January 1, 1970 (UTC).	`2006-02-01T14:29:53.43Z`
URI	A Uniform Resource Identifier, specifying a link to an external resource and conforming to RFC 2396.	The empty URI.	`http://sim956.agni.lindenlab.com:12035/runtime/agents`
Binary	A sequence of zero or more octets. Serialization format may include base64 (the default), base16, and base85.	A sequence of zero octets	base16: `cmFuZG9t` base64: `dGhlIHF1aWNrIGJyb3duIGZveA==`

undefined

The undefined type is a placeholder to indicate something is there, but it has no value, and cannot be converted to any other atomic type. Though limited in this way, an undefined is still considered a first-class atomic, and is expected to behave like any other atomic structured data type at runtime.

Serialization example

<undef />

boolean

A true or false value.

Conversion

type	rules
boolean	unity
integer	true => 1, false => 0
real	true => 1.0, false => 0.0
uuid	n/a
string	'true', 'false'
binary	one byte us-ascii where true => 1, false => 0
date	n/a
uri	n/a

Serialization examples

<!-- true -->
<boolean>1</boolean>
<boolean>true</boolean>

<!-- false -->
<boolean>0</boolean>
<boolean>false</boolean>
<boolean />

integer

A signed integer value with a representation of 32 bits.

Conversion

type	rules
boolean	0 => false, all other values => true
integer	unity
real	closest representable number
uuid	n/a
string	human readable string
binary	8 byte network byte order representation
date	seconds since epoch
uri	n/a

Serialization examples

<integer>289343</integer>
<integer>-3</integer>
<integer /> <!-- zero -->

real

A 64 bit double as defined by IEEE.

Conversion

type	rules
boolean	exactly 0 => false, all other values => true
integer	rounded to closest representable number
real	unity
uuid	n/a
string	human readable string
binary	8 byte network byte order representation
date	seconds since epoch
uri	n/a

Serialization examples

<real>-0.28334</real>
<real>2983287453.3848387</real>
<real /> <!-- exactly zero -->

uuid

A 128 bit unsigned integer.

Conversion

type	rules
boolean	null uuid => false, all other values => true
integer	n/a
real	n/a
uuid	unity
string	standard 8-4-4-4-12 serialization format
binary	16 byte raw representation
date	n/a
uri	n/a

Serialization examples

<uuid>d7f4aeca-88f1-42a1-b385-b9db18abb255</uuid>
<uuid /> <!-- null uuid '00000000-0000-0000-0000-000000000000' -->

string

A simple string of any character data which is intended to be human comprehensible.

Strings in the system that hold text a user might see or enter (chat, IM, notecards, AV names, region names,... basically almost everything!) should move to using a consistent set of acceptable characters. This set is:

Unicode code points U+20 through U+10FFFD
- except U+D800 through U+DFFF
- except U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF ... U+10FFFE, U+10FFFF
- except U+FDD0 through U+FDEF
U+9 (tab, '\t')
U+A (newline or line feed, '\n')
U+D (carriage return, '\r')

Strings may be sequences of zero or more of these characters. Strings *may* be normalized by mapping line ending sequences to U+A. Do not rely on differences in strings that normalize to the same string.

These choices of valid strings are chosen from Unicode 4.0 which defines the following valid code points:

Unicode code points U+0 through U+10FFFD
- except U+D800 through U+DFFF (the UTF-16 surrogate pair range)
- except U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF ... U+10FFFE, U+10FFFF
- except U+FDD0 through U+FDEF (some historical screw up with Arabic)

The choice for special characters < U+20 is because XML defines acceptable text as all valid Unicode code points >= U+20, and U+9, U+A and U+D. The normalization is because XML defines that all line ending sequences are normalized to U+A.

See: Unicode In 5 Minutes for a brief introduction to Unicode.

String escaping

Strings that contain non-printable characters delimited with quotes or double quotes require escaping. If a single quote delimited string contains single quotes, those must be escaped. If a double quote delimited string contains double quotes, the double quotes must be escaped.

To escape the delimiter character, prefix a backslash. Backslashes must always be escaped with another backslash.

"And then he said, \"I have nothing more to say on the subject.\""

'Look in "C:\\linden\\"'

The most generic escaping is to specify a hex value of the byte after a literal backslash and character 'x'. This can be used for any character and is required for all non-printable characters which do not have an abbreviation. For example:

 \x0C

Serialized strings should only contain UTF-8 characters, so non-printable characters other than tab, newline, and carriage return should be avoided. However, common non-printable characters have short-hand abbreviations.

Notation abbreviations
character	value	serialization
alert/bell	0x7	\a
backspace	0x8	\b
form feed	0xc	\f
newline	0xa	\n
carriage return	0xd	\r
horizontal tab	0x9	\t
vertical tab	0xb	\v

Conversion

Type	Rules
boolean	Empty => false, all other values => true
integer	A simple conversion of the initial characters to an integer
real	A simple conversion of the initial characters to a real number
uuid	A valid 8-4-4-4-12 is converted to a uuid, all other values => null uuid
string	Unity
binary	Raw representation of the characters
date	An interpretation of the string as a date
uri	An interpretation of the string as a link

Serialization examples

<string>The quick brown fox jumped over the lazy dog.</string>
<string>540943c1-7142-4fdd-996f-fc90ed5dd3fa</string>
<string /> <!-- empty string -->

binary data

A chunk of binary data. The serialization format is allowed to specify an encoding. Parsers must support base64 encoding. Parsers may support base16 and base85.

Conversion

type	rules
boolean	empty => false, all other values => true
integer	len < 4 => 0, otherwise first four bytes are interpreted as a network byte order integer
real	len < 8 => 0, otherwise first eight bytes are interpreted as a network byte order double
uuid	len < 16 => null uuid, otherwise first sixteen bytes are interpreted as the raw binary uuid
string	the raw binary data interpreted as utf-8 character data
binary	unity
date	n/a
uri	the raw binary data interpreted as a utf-8 serialized link

Serialization examples

<binary encoding="base64">cmFuZG9t</binary> <!-- base 64 encoded binary data -->
<binary>dGhlIHF1aWNrIGJyb3duIGZveA==</binary> <!-- base 64 encoded binary data is default -->
<binary /> <!-- empty binary blob -->

date

A specific point in time. Intervals or relative dates are not supported. The serialization and parser only understand ISO-8601 numeric encoding in UTC. The time may be omitted which will be interpreted as midnight at the start of the day.

Conversion

type	rules
boolean	n/a
integer	seconds since epoch
real	seconds since epoch
uuid	n/a
string	standard serialization format
binary	n/a
date	unity
uri	n/a

Serialization examples

<date>2006-02-01T14:29:53.43Z</date>
<date /> <!-- epoch -->

uri

A link to an external resource. The data is expected to conform to rfc 2396 for interpretation, meaning, serialization, and deserialization.

Conversion

type	rules
boolean	n/a
integer	n/a
real	n/a
uuid	n/a
string	standard serialization format
binary	n/a
date	n/a
uri	unity

Serialization examples

<uri>http://sim956.agni.lindenlab.com:12035/runtime/agents</uri>
<uri /> <!-- an empty link -->

Containers

Containers is a special data type which can contain any other data type including other containers.

map

A map of key and value pairs where key ordering is unspecified and keys are unique. The key is always interpreted as a character string and any character string is acceptable. If there are any elements in the map, it is serialized as a key followed by an atomic or container value. For every key, there must be one value. Well formed and valid serialized maps may contain more non-unique keys. When a deserialized, the implementation should choose one of the the value objects, but that choice is not specified.

Serialization example

<map>
 <key>foo</key>
 <string>bar</string>
 <key>agent info</key>
 <map>
  <key>agent_id</key>
  <uuid>93c73b16-cd86-434d-8b4a-76e12eee950a</uuid>
  <key>name</key>
  <string>testtest tester</string>
 </map>
</map>

array

An ordered collection of data members. Any member can be any atomic or container type.

Serialization example

<array>
 <real>7343.0194</real>
 <string>foo</string>
 <array>
  <map>
   <key>offset</key>
   <integer>9847</integer>
  </map>
  <string>da boom</string>
 </array>
</array>

XML Serialization

MIME type: application/llsd+xml

When possible, prefer using us-ascii or or UTF-8 xml encoding.

XML is the "standard" serialization format, being future-proof and readable by a wide variety of tools. The XML serialization should be preferred unless profiling reveals that the binary serialization provides an essential performance benefit. All the serialization examples in the above sections are of the XML serialization.

DTD

<!ATTLIST string xml:space (default|preserve) 'preserve'> <!ATTLIST binary encoding (base64|base16|base85) 'base64'> ]></xml>

Example XML Output

This is a sample from a recently running sim (indention for readability):

 <key>region_id</key>
   <uuid>67153d5b-3659-afb4-8510-adda2c034649</uuid>
 <key>scale</key>
   <string>one minute</string>
 <key>simulator statistics</key>
 <map>
   <key>time dilation</key><real>0.9878624</real>
   <key>sim fps</key><real>44.38898</real>
   <key>pysics fps</key><real>44.38906</real>
   <key>agent updates per second</key><real>nan</real>
   <key>lsl instructions per second</key><real>0</real>
   <key>total task count</key><real>4</real>
   <key>active task count</key><real>0</real>
   <key>active script count</key><real>4</real>
   <key>main agent count</key><real>0</real>
   <key>child agent count</key><real>0</real>
   <key>inbound packets per second</key><real>1.228283</real>
   <key>outbound packets per second</key><real>1.277508</real>
   <key>pending downloads</key><real>0</real>
   <key>pending uploads</key><real>0.0001096525</real>
   <key>frame ms</key><real>0.7757886</real>
   <key>net ms</key><real>0.3152919</real>
   <key>sim other ms</key><real>0.1826937</real>
   <key>sim physics ms</key><real>0.04323055</real>
   <key>agent ms</key><real>0.01599029</real>
   <key>image ms</key><real>0.01865955</real>
   <key>script ms</key><real>0.1338836</real>
 </map>

</map> </llsd></xml>

Binary Serialization

MIME type: application/llsd+binary

We also have support for binary serialization and deserialization in c++ and python. The binary format is useful when dealing where optimal parse time is necessary. Binary LLSD is the binary llsd prefix followed by a single LLSD element of any type.

<?llsd/binary?>\n

Binary element serialization
type	serialization	notes
undef	'!'
true	'1'
false	'0'
integer	'i' + htonl(value)
real	'r' + htond(value)
uuid	'u' + uuid	uuid is 16 bytes
binary	'b' + htonl(binary.size()) + binary
string	's' + htonl(string.size()) + string	notation serialization is considered valid
uri	'l' + htonl(uri.size()) + uri
date	'd' + htond(seconds_since_epoch)
array	'[' + htonl(array.length()) + (child0, child1, ...) + ']'	order is always preserved
map	'{' + htonl(map.length()) + ((key0,value0), (key1, value1), ...)+ '}'	order is not always preserved.

size() is a byte count.

length() is a child count.

htonl() is a function to generate a 4 byte network byte order integer.

htond() is a function to generate an 8 byte network byte order double. htond is not a standard system call, but you can find a c implementation in indra/llcommon/llsdserialize.cpp.

Notation Serialization

The (~~deprecated~~?) notation serialization format is meant for human readability. It is similar to, but different from, JSON. See the comparison notes below.

Parsing and formatting are currently available ~~only in c++~~ in c++ (indra/llcommon/llsd.h) and in python (llbase.llsd). Notation LLSD is the notation llsd prefix followed by a single LLSD element of any type.

<?llsd/notation?>\n

Binary element serialization
type	serialization	notes
undef	'!'
true	'1' \| 't' \| 'T' \| 'true' \| 'TRUE'
false	'0' \| 'f' \| 'F' \| 'false' \| 'FALSE'
integer	'i' str(value)
real	'r' str(value)
uuid	'u' str(uuid)
binary	'b(' str(size) ')"' raw_data '"' \| 'b' base '"' encoded_data '"'	Base 16 and 64 encodings are supported.
string	" escaped_string " \| ' escaped_string ' \| 's(' str(size) ')"' raw_string '"'	When using single quotes, double quotes do not need escaping and vice versa.
uri	'l"' escaped_uri '"'	See rfc 1738 for encoding rules.
date	'd"' YYYY-MM-DD 'T' HH:MM:SS [.FF] 'Z"'	Fractional seconds are optional
array	'[' object0 ',' object1 ',' ... ']'	order is always preserved
map	'{' string0:object0 ',' string1:object1 ',' ... '}'	order is not always preserved. The string is any supported string serialization format

Example Notation Output

This is an excerpt from an agent request to enter a region serialized as notation:

[
  {'destination':'http://secondlife.com'}, 
  {'version':i1}, 
  {
    'agent_id':u3c115e51-04f4-523c-9fa6-98aff1034730, 
    'session_id':u2c585cec-038c-40b0-b42e-a25ebab4d132, 
    'circuit_code':i1075, 
    'first_name':'Phoenix', 
    'last_name':'Linden',
    'position':[r70.9247,r254.378,r38.7304], 
    'look_at':[r-0.043753,r-0.999042,r0], 
    'granters':[ua2e76fcd-9360-4f6d-a924-000000000003],
    'attachment_data':
    [
      {
        'attachment_point':i2,
        'item_id':ud6852c11-a74e-309a-0462-50533f1ef9b3,
        'asset_id':uc69b29b1-8944-58ae-a7c5-2ca7b23e22fb
      },
      {
        'attachment_point':i10, 
        'item_id':uff852c22-a74e-309a-0462-50533f1ef900,
        'asset_id':u5868dd20-c25a-47bd-8b4c-dedc99ef9479
      }
    ]
  }
]

[
  {
    'creation-date':d"2007-03-15T18:30:18Z", 
    'creator-id':u3c115e51-04f4-523c-9fa6-98aff1034730
  },
  s(10)"0123456789",
  "Where's the beef?",
  'Over here.',  
  b(160)"default
{
    state_entry()
    {
        llSay(0, "Hello, Avatar!");
    }

    touch_start(integer total_number)
    {
        llSay(0, "Touched.");
    }
}",
  b64"AABAAAAAAAAAAAIAAAA//wAAP/8AAADgAAAA5wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AABkAAAAZAAAAAAAAAAAAAAAZAAAAAAAAAABAAAAAAAAAAAAAAAAAAAABQAAAAEAAAAQAAAAAAAA
AAUAAAAFAAAAABAAAAAAAAAAPgAAAAQAAAAFAGNbXgAAAABgSGVsbG8sIEF2YXRhciEAZgAAAABc
XgAAAAhwEQjRABeVAAAABQBjW14AAAAAYFRvdWNoZWQuAGYAAAAAXF4AAAAIcBEI0QAXAZUAAEAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" 
]

Notation versus JSON

The plan is to eventually move to JSON, after its spec in ECMAScript 5th edition is finalized. There are three reasons this is not a current priority:

The binary and XML serialization formats work for our use cases, so there's no driving need for JSON.
The notation format has not been useful in our experience and we expect JSON to fill a similar use niche.
JSON was not in wide, common use when LLSD was invented.

Guidelines

Questions & Things To Do

Would Binary be more convenient as usigned char* buffer semantics?

Should Binary be convertable to/from String, and if so how?

as UTF8 encoded strings (making not like UUID<->String)
as Base64 or Base96 encoded (making like UUID<->String)

Conversions to std::string and LLUUID do not result in easy assignment to std::string, LLString or LLUUID due to non-unique conversion paths.

@@ Line 599: / Line 599: @@
 = Notation Serialization =
-The (deprecated) notation serialization format is meant for human readability.  It is similar to, but different from, JSON.  See the [[LLSD#Notation_versus_JSON |comparison notes]] below.
+The (<strike>deprecated</strike>?) notation serialization format is meant for human readability.  It is similar to, but different from, JSON.  See the [[LLSD#Notation_versus_JSON |comparison notes]] below.
-Parsing and formatting are currently only available in c++. Notation LLSD is the notation llsd prefix followed by a single LLSD element of any type.
+Parsing and formatting are currently available <strike>only in c++</strike> in c++ (indra/llcommon/llsd.h) and in python (llbase.llsd). Notation LLSD is the notation llsd prefix followed by a single LLSD element of any type.
 <pre>

LLSD: Difference between revisions

Revision as of 14:01, 14 May 2010

Summary

Scalar Accessors

Automatic Cast Protection

Attributes and Data

Root Element

Atomic Types

undefined

Serialization example

boolean

Conversion

Serialization examples

integer

Conversion

Serialization examples

real

Conversion

Serialization examples

uuid

Conversion

Serialization examples

string

String escaping

Conversion

Serialization examples

binary data

Conversion

Serialization examples

date

Conversion

Serialization examples

uri

Conversion

Serialization examples

Containers

map

Serialization example

array

Serialization example

XML Serialization

DTD

Example XML Output

Binary Serialization

Notation Serialization

Example Notation Output

Notation versus JSON

Guidelines

Questions & Things To Do

Navigation menu

Search