Difference between revisions of "LLSD"

From Second Life Wiki
Jump to navigation Jump to search
Line 705: Line 705:
# The binary and XML serialization formats work for our use cases, so there's no driving need for JSON.
# The binary and XML serialization formats work for our use cases, so there's no driving need for JSON.
# The notation format has not been useful in our experience and we expect JSON to fill a similar use niche.
# The notation format has not been useful in our experience and we expect JSON to fill a similar use niche.
# JSON was in wide, common use when LLSD was invented.
# JSON was not in wide, common use when LLSD was invented.


= Guidelines =
= Guidelines =

Revision as of 16:15, 25 February 2010

Summary

Linden Lab Structured Data (LLSD) provides a flexible data system similar to the data facilities of dynamic languages like Perl and Python. It supports exchange of structured data between loosely-coupled systems (those not compiled into the same module).

Data in such exchanges must be highly tolerant of:

  • Recompilation
  • Implementation in a different language
  • Addition of extra parameters
  • Execution of older versions (with fewer parameters)

To this end, LLSD strives to be easy to use, and to default to "the right thing" wherever possible. It is extremely tolerant of errors and unexpected situations.

The fundamental class is LLSD. LLSD is a value holding object. It holds one value that is either undefined, one of the scalar types, or a map or an array. LLSD objects have value semantics (copying them copies the value, though it can be considered efficient, due to sharing), and are mutable.

Undefined is the singular value given to LLSD objects that are not initialized with any data.

The scalar data types are:

  • Boolean - true or false
  • Integer - a 32 bit signed integer
  • Real - a 64 bit IEEE 754 floating point value
  • UUID - a 128 bit unique value
  • String - a sequence of zero or more Unicode chracters
  • Date - an absolute point in time, UTC, with resolution to the second
  • URI - a String that is a URI
  • Binary - a sequence of zero or more octets (unsigned bytes)

A map is a dictionary mapping String keys to LLSD values. The keys are unique within a map, and have only one value (though that value could be an LLSD array).

An array is a sequence of zero or more LLSD values.

Scalar Accessors

Function: Fetch a scalar value, converting if needed and possible.

Conversion among the basic types, Boolean, Integer, Real and String, is fully defined. Each type can be converted to another with a reasonable interpretation. These conversions can be used as a convenience even when you know the data is in one format, but you want it in another. Of course, many of these conversions lose information.

Note: These conversions are not the same as Perl's. In particular, when converting a String to a Boolean, only the empty string converts to false. Converting the String "0" to Boolean results in true.

Conversion to and from UUID, Date, and URI is only defined to and from String. Conversion is defined to be information preserving for valid values of those types. These conversions can be used when one needs to convert data to or from another system that cannot handle these types natively, but can handle strings.

Conversion to and from Binary isn't defined.

Conversion of the Undefined value to any scalar type results in a reasonable null or zero value for the type.


Automatic Cast Protection

These are not implemented on purpose. Without them, C++ can perform some conversions that are clearly not what the programmer intended.

If you get a linker error about these being missing, you have made mistake in your code. DO NOT IMPLEMENT THESE FUNCTIONS as a fix.

All of thse problems stem from trying to support char* in LLSD or in std::string. There are too many automatic casts that will lead to using an arbitrary pointer or scalar type to std::string.

Attributes and Data

Attributes are only used for encoding parser and formatting instructions. The data in the elements is always data.

Root Element

The root element is llsd. The root must have only one child element which can be any container or atomic type.

Atomic Types

Each atomic type represents one value with type information. An atomic does not have a name, but may have attributes to specify format or processing considerations for the parser. Consumers of atomics are encouraged to massage the data into the preferred native representation, but further serialization should honor the original type information if possible.

The default value for any type is represented in XML as <type />. For example, <integer /> represents zero (0).

Datatype Description Default Value Examples
Undefined A placeholder for a value. undef undef
Boolean A true or false value. False true

0

Integer A signed 32-bit integer value; any natural number between -2147483648 and 2147483647 inclusive. 0 <integer>289343</integer>

-3

Real A signed 64-bit floating point number. 0.0 -0.28334

2983287453.3848387

String A sequence of zero or more Unicode characters intended to be human readable. See the IETF draft for details. "" (the empty string) The quick brown fox, etc.

foo

UUID A universally unique identifier: an unsigned 128-bit integer. 00000000-0000-0000-

0000-000000000000

d7f4aeca-88f1-42a1-
b385-b9db18abb255
Date A specific point in time, UTC; Values range between January 1, 1970, through at least January 1, 2038. The serialization and parser only understand ISO-8601 numeric encoding in UTC. The "epoch," Midnight, January 1, 1970 (UTC). 2006-02-01T14:29:53.43Z
URI A Uniform Resource Identifier, specifying a link to an external resource and conforming to RFC 2396. The empty URI. http://sim956.agni.lindenlab.com:12035/runtime/agents
Binary A sequence of zero or more octets. Serialization format may include base64 (the default), base16, and base85. A sequence of zero octets base16: cmFuZG9t

base64: dGhlIHF1aWNrIGJyb3duIGZveA==


undefined

The undefined type is a placeholder to indicate something is there, but it has no value, and cannot be converted to any other atomic type. Though limited in this way, an undefined is still considered a first-class atomic, and is expected to behave like any other atomic structured data type at runtime.

Serialization example

<undef />

boolean

A true or false value.

Conversion

type rules
boolean unity
integer true => 1, false => 0
real true => 1.0, false => 0.0
uuid n/a
string 'true', 'false'
binary one byte us-ascii where true => 1, false => 0
date n/a
uri n/a

Serialization examples

<!-- true -->
<boolean>1</boolean>
<boolean>true</boolean>

<!-- false -->
<boolean>0</boolean>
<boolean>false</boolean>
<boolean />

integer

A signed integer value with a representation of 32 bits.

Conversion

type rules
boolean 0 => false, all other values => true
integer unity
real closest representable number
uuid n/a
string human readable string
binary 8 byte network byte order representation
date seconds since epoch
uri n/a

Serialization examples

<integer>289343</integer>
<integer>-3</integer>
<integer /> <!-- zero -->

real

A 64 bit double as defined by IEEE.

Conversion

type rules
boolean exactly 0 => false, all other values => true
integer rounded to closest representable number
real unity
uuid n/a
string human readable string
binary 8 byte network byte order representation
date seconds since epoch
uri n/a

Serialization examples

<real>-0.28334</real>
<real>2983287453.3848387</real>
<real /> <!-- exactly zero -->

uuid

A 128 bit unsigned integer.

Conversion

type rules
boolean null uuid => false, all other values => true
integer n/a
real n/a
uuid unity
string standard 8-4-4-4-12 serialization format
binary 16 byte raw representation
date n/a
uri n/a

Serialization examples

<uuid>d7f4aeca-88f1-42a1-b385-b9db18abb255</uuid>
<uuid /> <!-- null uuid '00000000-0000-0000-0000-000000000000' -->

string

A simple string of any character data which is intended to be human comprehensible.

Strings in the system that hold text a user might see or enter (chat, IM, notecards, AV names, region names,... basically almost everything!) should move to using a consistent set of acceptable characters. This set is:

  • Unicode code points U+20 through U+10FFFD
    • except U+D800 through U+DFFF
    • except U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF ... U+10FFFE, U+10FFFF
    • except U+FDD0 through U+FDEF
  • U+9 (tab, '\t')
  • U+A (newline or line feed, '\n')
  • U+D (carriage return, '\r')

Strings may be sequences of zero or more of these characters. Strings *may* be normalized by mapping line ending sequences to U+A. Do not rely on differences in strings that normalize to the same string.

These choices of valid strings are chosen from Unicode 4.0 which defines the following valid code points:

  • Unicode code points U+0 through U+10FFFD
    • except U+D800 through U+DFFF (the UTF-16 surrogate pair range)
    • except U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF ... U+10FFFE, U+10FFFF
    • except U+FDD0 through U+FDEF (some historical screw up with Arabic)

The choice for special characters < U+20 is because XML defines acceptable text as all valid Unicode code points >= U+20, and U+9, U+A and U+D. The normalization is because XML defines that all line ending sequences are normalized to U+A.

See: Unicode In 5 Minutes for a brief introduction to Unicode.

String escaping

Strings that contain non-printable characters delimited with quotes or double quotes require escaping. If a single quote delimited string contains single quotes, those must be escaped. If a double quote delimited string contains double quotes, the double quotes must be escaped.

To escape the delimiter character, prefix a backslash. Backslashes must always be escaped with another backslash.

"And then he said, \"I have nothing more to say on the subject.\""
'Look in "C:\\linden\\"'

The most generic escaping is to specify a hex value of the byte after a literal backslash and character 'x'. This can be used for any character and is required for all non-printable characters which do not have an abbreviation. For example:

 \x0C

Serialized strings should only contain UTF-8 characters, so non-printable characters other than tab, newline, and carriage return should be avoided. However, common non-printable characters have short-hand abbreviations.

Notation abbreviations
character value serialization
alert/bell 0x7 \a
backspace 0x8 \b
form feed 0xc \f
newline 0xa \n
carriage return 0xd \r
horizontal tab 0x9 \t
vertical tab 0xb \v

Conversion

Type Rules
boolean Empty => false, all other values => true
integer A simple conversion of the initial characters to an integer
real A simple conversion of the initial characters to a real number
uuid A valid 8-4-4-4-12 is converted to a uuid, all other values => null uuid
string Unity
binary Raw representation of the characters
date An interpretation of the string as a date
uri An interpretation of the string as a link

Serialization examples

<string>The quick brown fox jumped over the lazy dog.</string>
<string>540943c1-7142-4fdd-996f-fc90ed5dd3fa</string>
<string /> <!-- empty string -->

binary data

A chunk of binary data. The serialization format is allowed to specify an encoding. Parsers must support base64 encoding. Parsers may support base16 and base85.

Conversion

type rules
boolean empty => false, all other values => true
integer len < 4 => 0, otherwise first four bytes are interpreted as a network byte order integer
real len < 8 => 0, otherwise first eight bytes are interpreted as a network byte order double
uuid len < 16 => null uuid, otherwise first sixteen bytes are interpreted as the raw binary uuid
string the raw binary data interpreted as utf-8 character data
binary unity
date n/a
uri the raw binary data interpreted as a utf-8 serialized link

Serialization examples

<binary encoding="base64">cmFuZG9t</binary> <!-- base 64 encoded binary data -->
<binary>dGhlIHF1aWNrIGJyb3duIGZveA==</binary> <!-- base 64 encoded binary data is default -->
<binary /> <!-- empty binary blob -->

date

A specific point in time. Intervals or relative dates are not supported. The serialization and parser only understand ISO-8601 numeric encoding in UTC. The time may be omitted which will be interpreted as midnight at the start of the day.

Conversion

type rules
boolean n/a
integer seconds since epoch
real seconds since epoch
uuid n/a
string standard serialization format
binary n/a
date unity
uri n/a

Serialization examples

<date>2006-02-01T14:29:53.43Z</date>
<date /> <!-- epoch -->

uri

A link to an external resource. The data is expected to conform to rfc 2396 for interpretation, meaning, serialization, and deserialization.

Conversion

type rules
boolean n/a
integer n/a
real n/a
uuid n/a
string standard serialization format
binary n/a
date n/a
uri unity

Serialization examples

<uri>http://sim956.agni.lindenlab.com:12035/runtime/agents</uri>
<uri /> <!-- an empty link -->

Containers

Containers is a special data type which can contain any other data type including other containers.

map

A map of key and value pairs where key ordering is unspecified and keys are unique. The key is always interpreted as a character string and any character string is acceptable. If there are any elements in the map, it is serialized as a key followed by an atomic or container value. For every key, there must be one value. Well formed and valid serialized maps may contain more non-unique keys. When a deserialized, the implementation should choose one of the the value objects, but that choice is not specified.

Serialization example

<map>
 <key>foo</key>
 <string>bar</string>
 <key>agent info</key>
 <map>
  <key>agent_id</key>
  <uuid>93c73b16-cd86-434d-8b4a-76e12eee950a</uuid>
  <key>name</key>
  <string>testtest tester</string>
 </map>
</map>

array

An ordered collection of data members. Any member can be any atomic or container type.

Serialization example

<array>
 <real>7343.0194</real>
 <string>foo</string>
 <array>
  <map>
   <key>offset</key>
   <integer>9847</integer>
  </map>
  <string>da boom</string>
 </array>
</array>

XML Serialization

MIME type: application/llsd+xml

When possible, prefer using us-ascii or or UTF-8 xml encoding.

XML is the "standard" serialization format, being future-proof and readable by a wide variety of tools. The XML serialization should be preferred unless profiling reveals that the binary serialization provides an essential performance benefit. All the serialization examples in the above sections are of the XML serialization.

DTD

<xml><!DOCTYPE llsd [ <!ELEMENT llsd (DATA)> <!ELEMENT DATA (ATOMIC|map|array)> <!ELEMENT ATOMIC (undef|boolean|integer|real|uuid|string|date|uri|binary)> <!ELEMENT KEYDATA (key,DATA)> <!ELEMENT key (#PCDATA)> <!ELEMENT map (KEYDATA*)> <!ELEMENT array (DATA*)> <!ELEMENT undef (EMPTY)> <!ELEMENT boolean (#PCDATA)> <!ELEMENT integer (#PCDATA)> <!ELEMENT real (#PCDATA)> <!ELEMENT uuid (#PCDATA)> <!ELEMENT string (#PCDATA)> <!ELEMENT date (#PCDATA)> <!ELEMENT uri (#PCDATA)> <!ELEMENT binary (#PCDATA)>

<!ATTLIST string xml:space (default|preserve) 'preserve'> <!ATTLIST binary encoding (base64|base16|base85) 'base64'> ]></xml>

Example XML Output

This is a sample from a recently running sim (indention for readability):

<xml><?xml version="1.0" encoding="UTF-8"?> <llsd> <map>

 <key>region_id</key>
   <uuid>67153d5b-3659-afb4-8510-adda2c034649</uuid>
 <key>scale</key>
   <string>one minute</string>
 <key>simulator statistics</key>
 <map>
   <key>time dilation</key><real>0.9878624</real>
   <key>sim fps</key><real>44.38898</real>
   <key>pysics fps</key><real>44.38906</real>
   <key>agent updates per second</key><real>nan</real>
   <key>lsl instructions per second</key><real>0</real>
   <key>total task count</key><real>4</real>
   <key>active task count</key><real>0</real>
   <key>active script count</key><real>4</real>
   <key>main agent count</key><real>0</real>
   <key>child agent count</key><real>0</real>
   <key>inbound packets per second</key><real>1.228283</real>
   <key>outbound packets per second</key><real>1.277508</real>
   <key>pending downloads</key><real>0</real>
   <key>pending uploads</key><real>0.0001096525</real>
   <key>frame ms</key><real>0.7757886</real>
   <key>net ms</key><real>0.3152919</real>
   <key>sim other ms</key><real>0.1826937</real>
   <key>sim physics ms</key><real>0.04323055</real>
   <key>agent ms</key><real>0.01599029</real>
   <key>image ms</key><real>0.01865955</real>
   <key>script ms</key><real>0.1338836</real>
 </map>

</map> </llsd></xml>

Binary Serialization

MIME type: application/llsd+binary

We also have support for binary serialization and deserialization in c++ and python. The binary format is useful when dealing where optimal parse time is necessary. Binary LLSD is the binary llsd prefix followed by a single LLSD element of any type.

<?llsd/binary?>\n
Binary element serialization
type serialization notes
undef '!'
true '1'
false '0'
integer 'i' + htonl(value)
real 'r' + htond(value)
uuid 'u' + uuid uuid is 16 bytes
binary 'b' + htonl(binary.size()) + binary
string 's' + htonl(string.size()) + string notation serialization is considered valid
uri 'l' + htonl(uri.size()) + uri
date 'd' + htond(seconds_since_epoch)
array '[' + htonl(array.length()) + (child0, child1, ...) + ']' order is always preserved
map '{' + htonl(map.length()) + ((key0,value0), (key1, value1), ...)+ '}' order is not always preserved.

size() is a byte count.

length() is a child count.

htonl() is a function to generate a 4 byte network byte order integer.

htond() is a function to generate an 8 byte network byte order double. htond is not a standard system call, but you can find a c implementation in indra/llcommon/llsdserialize.cpp.

Notation Serialization

The (deprecated) notation serialization format is meant for human readability. It is similar to, but different from, JSON. See the comparison notes below.

Parsing and formatting are currently only available in c++. Notation LLSD is the notation llsd prefix followed by a single LLSD element of any type.

<?llsd/notation?>\n
Binary element serialization
type serialization notes
undef '!'
true '1' | 't' | 'T' | 'true' | 'TRUE'
false '0' | 'f' | 'F' | 'false' | 'FALSE'
integer 'i' str(value)
real 'r' str(value)
uuid 'u' str(uuid)
binary 'b(' str(size) ')"' raw_data '"' | 'b' base '"' encoded_data '"' Base 16 and 64 encodings are supported.
string " escaped_string " | ' escaped_string ' | 's(' str(size) ')"' raw_string '"' When using single quotes, double quotes do not need escaping and vice versa.
uri 'l"' escaped_uri '"' See rfc 1738 for encoding rules.
date 'd"' YYYY-MM-DD 'T' HH:MM:SS [.FF] 'Z"' Fractional seconds are optional
array '[' object0 ',' object1 ',' ... ']' order is always preserved
map '{' string0:object0 ',' string1:object1 ',' ... '}' order is not always preserved. The string is any supported string serialization format


Example Notation Output

This is an excerpt from an agent request to enter a region serialized as notation:

[
  {'destination':'http://secondlife.com'}, 
  {'version':i1}, 
  {
    'agent_id':u3c115e51-04f4-523c-9fa6-98aff1034730, 
    'session_id':u2c585cec-038c-40b0-b42e-a25ebab4d132, 
    'circuit_code':i1075, 
    'first_name':'Phoenix', 
    'last_name':'Linden',
    'position':[r70.9247,r254.378,r38.7304], 
    'look_at':[r-0.043753,r-0.999042,r0], 
    'granters':[ua2e76fcd-9360-4f6d-a924-000000000003],
    'attachment_data':
    [
      {
        'attachment_point':i2,
        'item_id':ud6852c11-a74e-309a-0462-50533f1ef9b3,
        'asset_id':uc69b29b1-8944-58ae-a7c5-2ca7b23e22fb
      },
      {
        'attachment_point':i10, 
        'item_id':uff852c22-a74e-309a-0462-50533f1ef900,
        'asset_id':u5868dd20-c25a-47bd-8b4c-dedc99ef9479
      }
    ]
  }
]
[
  {
    'creation-date':d"2007-03-15T18:30:18Z", 
    'creator-id':u3c115e51-04f4-523c-9fa6-98aff1034730
  },
  s(10)"0123456789",
  "Where's the beef?",
  'Over here.',  
  b(160)"default
{
    state_entry()
    {
        llSay(0, "Hello, Avatar!");
    }

    touch_start(integer total_number)
    {
        llSay(0, "Touched.");
    }
}",
  b64"AABAAAAAAAAAAAIAAAA//wAAP/8AAADgAAAA5wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AABkAAAAZAAAAAAAAAAAAAAAZAAAAAAAAAABAAAAAAAAAAAAAAAAAAAABQAAAAEAAAAQAAAAAAAA
AAUAAAAFAAAAABAAAAAAAAAAPgAAAAQAAAAFAGNbXgAAAABgSGVsbG8sIEF2YXRhciEAZgAAAABc
XgAAAAhwEQjRABeVAAAABQBjW14AAAAAYFRvdWNoZWQuAGYAAAAAXF4AAAAIcBEI0QAXAZUAAEAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" 
]

Notation versus JSON

The plan is to eventually move to JSON, after its spec in ECMAScript 5th edition is finalized. There are three reasons this is not a current priority:

  1. The binary and XML serialization formats work for our use cases, so there's no driving need for JSON.
  2. The notation format has not been useful in our experience and we expect JSON to fill a similar use niche.
  3. JSON was not in wide, common use when LLSD was invented.

Guidelines

Questions & Things To Do

Would Binary be more convenient as usigned char* buffer semantics?

Should Binary be convertable to/from String, and if so how?

  • as UTF8 encoded strings (making not like UUID<->String)
  • as Base64 or Base96 encoded (making like UUID<->String)

Conversions to std::string and LLUUID do not result in easy assignment to std::string, LLString or LLUUID due to non-unique conversion paths.