User:LepreKhaun Resident/Workaround4Escaped Chars within JsonText 0ld

From Second Life Wiki
< User:LepreKhaun Resident
Revision as of 14:45, 23 September 2013 by Kireji Haiku (talk | contribs) (readability++)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Workaround for Escaped Characters within Json Text ***Old***

NOTE: This is an archived page showing earlier thoughts to this problem and is kept only for reference to show how one can initially choose the wrong approach to a problem and worry it to death before the light bulb goes off. Please refer to this page for the final, much more elegant workaround.

[ETA: The reasoning here was correct, it was just the approach that was faulty, kept thinking it had to be kludged. :=)]


Because of the way LSL handles strings (we have no "raw strings", which are taken as written and not messed with), escape sequences such as \t are interpreted as 4 spaces for us as soon as they are encountered. Trying to encode \t by escaping the escape character (using \\t) results in (incorrectly) placing \\t within your Json text. Same for newlines, \n.

And it's worse when you try to encode something like \"Stop!\" he shouted. or She said \"No\". And UTF encoding such as \u7650 is perfectly valid within a Json text but is elusive to obtain using LSL strings.

Here's the only work around I've been able to work out. A kludge, granted, but at least it allows one to encode something like this:

<lsl> {

   "A": "\b\f\t\r \n aba \u0000",
   "B": "\"he\"", 
   "C": "\t"

}

string i = llEscapeURL("\\b\\f\\t\\r \\n aba \\u0000"); string j = llEscapeURL("\\\"he\\\"");

string jText;

  jText = llList2Json(JSON_OBJECT, ["A", i, "B", j]);
  jText = llJsonSetValue(jText, ["C"], llEscapeURL("\\t"));

// {"A":"\b\f\t\r \n aba \u0000","B":"\"he\"","C":"\t"}

   jText = llUnescapeURL(jText);

</lsl>


[ETA: 9/13/2013]

So, I was working the LSL string mis-handling of the escape character (\) and, looking deeper into have not only found another workaround but made an exciting discovery- LSL does have "raw strings" of a sort, and they are the JSON_STRING! First the alternate workaround:

<lsl> // NOTE: Deprecated 9/19/2013 and replaced by // uList2Json() and uJsonSetValue()

// Global constants integer QUOTE = 0; // '\"' (Double Quote) integer SLOSH = 1; // '\\' (Reverse Solidus) integer SLASH = 2; // '\/' (Solidus) integer BP = 3; // '\b' (Break Point) integer FF = 4; // '\f' (Form Feed) integer NL = 5; // '\n' (New Line) integer CR = 6; // '\r' (Carriage Return) integer TAB = 7; // '\t' (Tab) integer U_ = 8; /* '\u' (Unicode Prefix- MUST immediately precede

                                 a string of 4 Hex digits, 0-G, sans '0x') */

// Optional, included for completeness only integer CRLF = 9; // '\r\n' (Windows end-of-line)

////////////////////////////// // function string uList2JsonStringSafe (list jasonStringParts) // This function takes a list, jasonStringParts, // of the parts of the Json string one wishes and // returns a LSL string within double quotes ("") // with embedded escape characters within it that // correctly encodes as a Json string using either // llList2Json() or llJsonSetValue(). // // NOTE: Deprecated 9/19/2013 and replaced by // uList2Json() and uJsonSetValue() // Version 1.0 by LepreKhaun 9/9/2013 // May be freely used, modified and distributed with this header intact. // Compiled Size = 2,088 bytes /////////////////////////////// string uList2JsonStringSafe (list jasonStringParts) {

   list escapeCodes = ["%5C%22", "%5C%5C", "%5C/", "%5Cb", "%5Cf", "%5Cn", "%5Cr", "%5Ct", "%5Cu", "%5Cr%5Cn"];
   integer iter = llGetListLength(jasonStringParts);
   // rString must be enclosed with escaped double quotes
   // to keep the LSL String "enhanced features" out of play
   string rString = "\"";
   // build return string 'backwards'
   while (~--iter) 
   {
       if(llGetListEntryType(jasonStringParts, iter) == TYPE_INTEGER)
       {
           // substitute encoding for integer constants
           rString = llList2String(escapeCodes, llList2Integer(jasonStringParts, iter)) + rString;
       }
       else
       {
           // escape String chunks to preserve them properly
           rString = llEscapeURL(llList2String(jasonStringParts, iter)) + rString;
       }
   }
   return llUnescapeURL("\"" + rString);

}

/////////// // Example encodings showing usage ///////////

default {

   touch_end(integer i)
   {
       string jsonString;
       string jsonText;
       // To encode '{"A":"\"Go!\" he yelled.\nShe replied \"No!\"","Z":"\\escaped \\ slosh\\"}'
       jsonString = uList2JsonStringSafe([QUOTE, "Go!", QUOTE, " he yelled.", NL, "She replied ", QUOTE, "No!", QUOTE]);
       jsonText = llList2Json(JSON_OBJECT, ["A", jsonString]);
       jsonString = uList2JsonStringSafe([SLOSH, "escaped ", SLOSH, " slosh", SLOSH]);
       jsonText = llJsonSetValue(jsonText, ["Z"], jsonString);
       llOwnerSay(jsonText);
       // To encode '{"Control Chars":"\b\r\f\n\t and Windows uses \r\n for EOL","©":"\u00A9"}'
       jsonString = uList2JsonStringSafe([BP, CR, FF, NL, TAB, " and Windows uses ", CRLF, " for EOL"]);
       jsonText = llList2Json(JSON_OBJECT, ["Control Chars", jsonString]);
       jsonString = uList2JsonStringSafe([U_, "00A9"]);
       jsonText = llJsonSetValue(jsonText, ["©"], jsonString);
       llOwnerSay(jsonText);
       // To encode '["WebSite","http:\/\/my.com\/ask.php?what%20is%20it","\t"]'
       jsonString = uList2JsonStringSafe(["http:", SLASH, SLASH, "my.com", SLASH, "ask.php?what%20is%20it"]);
       jsonText = llList2Json(JSON_ARRAY, ["WebSite", jsonString]);
       jsonText = llJsonSetValue(jsonText, [JSON_APPEND], uList2JsonStringSafe([TAB]));
       llOwnerSay(jsonText);
   }

}</lsl>

The how and why this approach works is based on an earlier observation I had made that Json text (LSL strings that were enclosed within '{}' or '[]') were being handled differently than other LSL strings in that their enclosed escape codes (such as \t) were not being translated (to %09 or %20%20%20%20), a "feature" LSL strings have.

I then noticed a difference in definitions between RFC 4627 and JSON.org. The RFC defines a Json text to be either an array or an object but at json.org it's defined as any Json Value, including the JSON_STRING. And a JSON_STRING is defined, of course, as being enclosed within double quotes (""). So I began experimenting with that type of LSL string and found the same exception to "enhanced features" was afforded!

But then another problem surfaced: The LSL functions llJsonGetValue() llJson2List() extracts a JSON_STRING as a regular LSL String, resulting in these escaped character sequences being "enhanced" by translation (in other words \t becomes %09, which is further "enhanced" to %20%20%20%20 when chatted and \u23B5 becomes u23B5. Grrrrr.... This wasn't good for further processing, we needed a String to preserve these after the extraction.

And that lead to the development of uJsonGetValueSafe(), which returns the requested Value explicitly enclosed within double quotes {""}, just as it appears within the Json text...

And, of course, this was complicated by the RFC stating:

Insignificant whitespace is allowed before or after any of the six structural characters.

  ws = *(
            %x20 /            ; Space
            %x09 /            ; Horizontal tab
            %x0A /            ; Line feed or New line
            %x0D                ; Carriage return
        )

Hooboy!



See LepreKhaun Resident's page for more JSON tips, tricks and coding examples.