Difference between revisions of "User:LepreKhaun Resident/Workaround4Escaped Chars within JsonText 0ld"

From Second Life Wiki
Jump to navigation Jump to search
m (readability++)
(Undo revision 1181785 (since this is an old, archived personal page, leave it as is, please))
Line 6: Line 6:
----
----


Because of the way [[LSL Portal|LSL]] handles strings (we have no "raw strings", which are taken as written and not messed with), escape sequences such as <code>\t</code> are interpreted as 4 spaces for us as soon as they are encountered. Trying to encode <code>\t</code> by escaping the escape character (using <code>\\t</code>) results in (incorrectly) placing <code>\\t</code> within your Json text. Same for newlines, <code>\n</code>.
Because of the way LSL handles strings (we have no "raw strings", which are taken as written and not messed with), escape sequences such as "\t" are interpreted as 4 spaces for us as soon as they are encountered. Trying to encode "\t" by escaping the escape character (using "\\t") results in (incorrectly) placing '\\t' within your Json text. Same for newlines, "\n".


And it's worse when you try to encode something like <code>\"Stop!\" he shouted.</code> or <code>She said \"No\"</code>. And UTF encoding such as <code>\u7650</code> is perfectly valid within a Json text but is elusive to obtain using [[LSL Portal|LSL]] strings.
And it's worse when you try to encode something like "\"Stop!\" he shouted." or "She said \"No\"". And UTF encoding such as "\u7650" is perfectly valid within a Json text but is elusive to obtain using LSL strings.


Here's the only work around I've been able to work out. A kludge, granted, but at least it allows one to encode something like this:
Here's the only work around I've been able to work out. A kludge, granted, but at least it allows one to encode something like this:


<lsl>
<pre>{
{
"A": "\b\f\t\r \n aba \u0000",
    "A": "\b\f\t\r \n aba \u0000",
"B": "\"he\"",  
    "B": "\"he\"",  
"C": "\t"
    "C": "\t"
}
}


string i = llEscapeURL("\\b\\f\\t\\r \\n aba \\u0000");
        string jText;
string j = llEscapeURL("\\\"he\\\"");
        string i = llEscapeURL("\\b\\f\\t\\r \\n aba \\u0000");
        string j = llEscapeURL("\\\"he\\\"");
       
        jText = llList2Json(JSON_OBJECT, ["A", i, "B", j]);
        jText = llJsonSetValue(jText, ["C"], llEscapeURL("\\t"));
        jText = llUnescapeURL(jText);</pre>


string jText;
'''jText == {"A":"\b\f\t\r \n aba \u0000","B":"\"he\"","C":"\t"}'''
 
  jText = llList2Json(JSON_OBJECT, ["A", i, "B", j]);
  jText = llJsonSetValue(jText, ["C"], llEscapeURL("\\t"));
 
//  {"A":"\b\f\t\r \n aba \u0000","B":"\"he\"","C":"\t"}
    jText = llUnescapeURL(jText);
</lsl>


----
----
[ETA: 9/13/2013]
[ETA: 9/13/2013]


So, I was working the LSL string mis-handling of the escape character (<code>\</code>) and, looking deeper into have not only found another workaround but made an exciting discovery- LSL does have "raw strings" of a sort, and they are the JSON_STRING! First the alternate workaround:
So, I was working the LSL string mis-handling of the escape character ("\") and, looking deeper into have not only found another workaround but made an exciting discovery- LSL does have "raw strings" of a sort, and they are the JSON_STRING! First the alternate workaround:


<lsl>
<lsl>
Line 41: Line 38:


// Global constants
// Global constants
integer QUOTE   = 0;   // '\"' (Double Quote)
integer QUOTE = 0; // '\"' (Double Quote)
integer SLOSH   = 1;   // '\\' (Reverse Solidus)
integer SLOSH = 1; // '\\' (Reverse Solidus)
integer SLASH   = 2;   // '\/' (Solidus)
integer SLASH = 2; // '\/' (Solidus)
integer BP     = 3;   // '\b' (Break Point)
integer BP = 3; // '\b' (Break Point)
integer FF     = 4;   // '\f' (Form Feed)
integer FF = 4; // '\f' (Form Feed)
integer NL     = 5;   // '\n' (New Line)
integer NL = 5; // '\n' (New Line)
integer CR     = 6;   // '\r' (Carriage Return)
integer CR = 6; // '\r' (Carriage Return)
integer TAB     = 7;   // '\t' (Tab)
integer TAB = 7; // '\t' (Tab)
integer U_     = 8;   /* '\u' (Unicode Prefix- MUST immediately precede  
integer U_ = 8; /* '\u' (Unicode Prefix- MUST immediately precede  
                                  a string of 4 Hex digits, 0-G, sans '0x') */
  a string of 4 Hex digits, 0-G, sans '0x') */
// Optional, included for completeness only
// Optional, included for completeness only
integer CRLF   = 9;   // '\r\n' (Windows end-of-line)
integer CRLF = 9; // '\r\n' (Windows end-of-line)


//////////////////////////////
//////////////////////////////
Line 71: Line 68:
string uList2JsonStringSafe (list jasonStringParts)
string uList2JsonStringSafe (list jasonStringParts)
{
{
    list escapeCodes = ["%5C%22", "%5C%5C", "%5C/", "%5Cb", "%5Cf", "%5Cn", "%5Cr", "%5Ct", "%5Cu", "%5Cr%5Cn"];
list escapeCodes = ["%5C%22", "%5C%5C", "%5C/", "%5Cb", "%5Cf", "%5Cn", "%5Cr", "%5Ct", "%5Cu", "%5Cr%5Cn"];


    integer iter = llGetListLength(jasonStringParts);
integer iter = llGetListLength(jasonStringParts);
 
    // rString must be enclosed with escaped double quotes
// rString must be enclosed with escaped double quotes
    // to keep the LSL String "enhanced features" out of play
// to keep the LSL String "enhanced features" out of play
    string rString = "\"";
string rString = "\"";
 
    // build return string 'backwards'
// build return string 'backwards'
    while (~--iter)  
while (~--iter)  
    {
{
        if(llGetListEntryType(jasonStringParts, iter) == TYPE_INTEGER)
if(llGetListEntryType(jasonStringParts, iter) == TYPE_INTEGER)
        {
{
            // substitute encoding for integer constants
// substitute encoding for integer constants
            rString = llList2String(escapeCodes, llList2Integer(jasonStringParts, iter)) + rString;
rString = llList2String(escapeCodes, llList2Integer(jasonStringParts, iter)) + rString;
        }
}
        else
else
        {
{
            // escape String chunks to preserve them properly
// escape String chunks to preserve them properly
            rString = llEscapeURL(llList2String(jasonStringParts, iter)) + rString;
rString = llEscapeURL(llList2String(jasonStringParts, iter)) + rString;
        }
}
    }
}
    return llUnescapeURL("\"" + rString);
return llUnescapeURL("\"" + rString);
}
}


Line 102: Line 99:
default
default
{
{
    touch_end(integer i)
touch_end(integer i)
    {
{
        string jsonString;
string jsonString;
        string jsonText;
string jsonText;
 
        // To encode '{"A":"\"Go!\" he yelled.\nShe replied \"No!\"","Z":"\\escaped \\ slosh\\"}'
// To encode '{"A":"\"Go!\" he yelled.\nShe replied \"No!\"","Z":"\\escaped \\ slosh\\"}'
        jsonString = uList2JsonStringSafe([QUOTE, "Go!", QUOTE, " he yelled.", NL, "She replied ", QUOTE, "No!", QUOTE]);
jsonString = uList2JsonStringSafe([QUOTE, "Go!", QUOTE, " he yelled.", NL, "She replied ", QUOTE, "No!", QUOTE]);
        jsonText = llList2Json(JSON_OBJECT, ["A", jsonString]);
jsonText = llList2Json(JSON_OBJECT, ["A", jsonString]);
        jsonString = uList2JsonStringSafe([SLOSH, "escaped ", SLOSH, " slosh", SLOSH]);
jsonString = uList2JsonStringSafe([SLOSH, "escaped ", SLOSH, " slosh", SLOSH]);
        jsonText = llJsonSetValue(jsonText, ["Z"], jsonString);
jsonText = llJsonSetValue(jsonText, ["Z"], jsonString);
        llOwnerSay(jsonText);
llOwnerSay(jsonText);
 
        // To encode '{"Control Chars":"\b\r\f\n\t and Windows uses \r\n for EOL","©":"\u00A9"}'
// To encode '{"Control Chars":"\b\r\f\n\t and Windows uses \r\n for EOL","©":"\u00A9"}'
        jsonString = uList2JsonStringSafe([BP, CR, FF, NL, TAB, " and Windows uses ", CRLF, " for EOL"]);
jsonString = uList2JsonStringSafe([BP, CR, FF, NL, TAB, " and Windows uses ", CRLF, " for EOL"]);
        jsonText = llList2Json(JSON_OBJECT, ["Control Chars", jsonString]);
jsonText = llList2Json(JSON_OBJECT, ["Control Chars", jsonString]);
        jsonString = uList2JsonStringSafe([U_, "00A9"]);
jsonString = uList2JsonStringSafe([U_, "00A9"]);
        jsonText = llJsonSetValue(jsonText, ["©"], jsonString);
jsonText = llJsonSetValue(jsonText, ["©"], jsonString);
        llOwnerSay(jsonText);
llOwnerSay(jsonText);
 
        // To encode '["WebSite","http:\/\/my.com\/ask.php?what%20is%20it","\t"]'
// To encode '["WebSite","http:\/\/my.com\/ask.php?what%20is%20it","\t"]'
        jsonString = uList2JsonStringSafe(["http:", SLASH, SLASH, "my.com", SLASH, "ask.php?what%20is%20it"]);
jsonString = uList2JsonStringSafe(["http:", SLASH, SLASH, "my.com", SLASH, "ask.php?what%20is%20it"]);
        jsonText = llList2Json(JSON_ARRAY, ["WebSite", jsonString]);
jsonText = llList2Json(JSON_ARRAY, ["WebSite", jsonString]);
        jsonText = llJsonSetValue(jsonText, [JSON_APPEND], uList2JsonStringSafe([TAB]));
jsonText = llJsonSetValue(jsonText, [JSON_APPEND], uList2JsonStringSafe([TAB]));
        llOwnerSay(jsonText);
llOwnerSay(jsonText);
    }
}
}</lsl>
}</lsl>


The how and why this approach works is based on an earlier observation I had made that Json text (LSL strings that were enclosed within '{}' or '[]') were being handled differently than other LSL strings in that their enclosed escape codes (such as <code>\t</code>) were not being translated (to <code>%09</code> or <code>%20%20%20%20</code>), a "feature" LSL strings have.
The how and why this approach works is based on an earlier observation I had made that Json text (LSL strings that were enclosed within '{}' or '[]') were being handled differently than other LSL strings in that their enclosed escape codes (such as '\t') were not being translated (to '%09" or '%20%20%20%20'), a "feature" LSL strings have.


I then noticed a difference in definitions between [http://tools.ietf.org/html/rfc4627 RFC 4627] and [http://www.json.org/ JSON.org]. The RFC defines a Json text to be either an array or an object but at json.org it's defined as any Json Value, including the JSON_STRING. And a JSON_STRING is defined, of course, as being enclosed within double quotes (""). So I began experimenting with that type of LSL string and found the same exception to "enhanced features" was afforded!
I then noticed a difference in definitions between [http://tools.ietf.org/html/rfc4627 RFC 4627] and [http://www.json.org/ JSON.org]. The RFC defines a Json text to be either an array or an object but at json.org it's defined as any Json Value, including the JSON_STRING. And a JSON_STRING is defined, of course, as being enclosed within double quotes (""). So I began experimenting with that type of LSL string and found the same exception to "enhanced features" was afforded!


But then another problem surfaced: The LSL functions [[llJsonGetValue]]() [[llJson2List]]() extracts a JSON_STRING as a regular LSL String, resulting in these escaped character sequences being "enhanced" by translation (in other words <code>\t</code> becomes <code>%09</code>, which is further "enhanced" to <code>%20%20%20%20</code> when chatted and <code>\u23B5</code> becomes <code>u23B5</code>. Grrrrr.... This wasn't good for further processing, we needed a String to preserve these after the extraction.
But then another problem surfaced: The LSL functions llJsonGetValue() llJson2List() extracts a JSON_STRING as a regular LSL String, resulting in these escaped character sequences being "enhanced" by translation (in other words '\t' becomes '%09', which is further "enhanced" to '%20%20%20%20' when chatted and '\u23B5' becomes 'u23B5'. Grrrrr.... This wasn't good for further processing, we needed a String to preserve these after the extraction.


And that lead to the development of [[LepreKhaun_Resident/Json_Get_Value_Safe|uJsonGetValueSafe()]], which returns the requested Value explicitly enclosed within double quotes {""}, just as it appears within the Json text...
And that lead to the development of [[LepreKhaun_Resident/Json_Get_Value_Safe|uJsonGetValueSafe()]], which returns the requested Value explicitly enclosed within double quotes {""}, just as it appears within the Json text...


And, of course, this was complicated by the RFC stating:
And, of course, this was complicated by the RFC stating:
<pre>
<pre> Insignificant whitespace is allowed before or after any of the six
Insignificant whitespace is allowed before or after any of the six structural characters.
  structural characters.


  ws = *(
  ws = *(
            %x20 /           ; Space
%x20 /   ; Space
            %x09 /           ; Horizontal tab
%x09 /   ; Horizontal tab
            %x0A /           ; Line feed or New line
%x0A /   ; Line feed or New line
            %x0D               ; Carriage return
%x0D ; Carriage return
        )
</pre> )
</pre>
Hooboy!
Hooboy!


Line 153: Line 149:
----
----


<center>See [[User:LepreKhaun Resident|LepreKhaun Resident]]'s page for '''more JSON tips, tricks and coding examples'''.</center>
<center>== [[User:LepreKhaun_Resident|'''More Json Tips, Tricks and Coding Examples''']] ==</center>

Revision as of 16:22, 23 September 2013

Workaround for Escaped Characters within Json Text ***Old***

NOTE: This is an archived page showing earlier thoughts to this problem and is kept only for reference to show how one can initially choose the wrong approach to a problem and worry it to death before the light bulb goes off. Please refer to this page for the final, much more elegant workaround.

[ETA: The reasoning here was correct, it was just the approach that was faulty, kept thinking it had to be kludged. :=)]


Because of the way LSL handles strings (we have no "raw strings", which are taken as written and not messed with), escape sequences such as "\t" are interpreted as 4 spaces for us as soon as they are encountered. Trying to encode "\t" by escaping the escape character (using "\\t") results in (incorrectly) placing '\\t' within your Json text. Same for newlines, "\n".

And it's worse when you try to encode something like "\"Stop!\" he shouted." or "She said \"No\"". And UTF encoding such as "\u7650" is perfectly valid within a Json text but is elusive to obtain using LSL strings.

Here's the only work around I've been able to work out. A kludge, granted, but at least it allows one to encode something like this:

{
	"A": "\b\f\t\r \n aba \u0000",
	"B": "\"he\"", 
	"C": "\t"
}

        string jText;
        string i = llEscapeURL("\\b\\f\\t\\r \\n aba \\u0000");
        string j = llEscapeURL("\\\"he\\\"");
        
        jText = llList2Json(JSON_OBJECT, ["A", i, "B", j]);
        jText = llJsonSetValue(jText, ["C"], llEscapeURL("\\t"));
        jText = llUnescapeURL(jText);

jText == {"A":"\b\f\t\r \n aba \u0000","B":"\"he\"","C":"\t"}


[ETA: 9/13/2013]

So, I was working the LSL string mis-handling of the escape character ("\") and, looking deeper into have not only found another workaround but made an exciting discovery- LSL does have "raw strings" of a sort, and they are the JSON_STRING! First the alternate workaround:

<lsl> // NOTE: Deprecated 9/19/2013 and replaced by // uList2Json() and uJsonSetValue()

// Global constants integer QUOTE = 0; // '\"' (Double Quote) integer SLOSH = 1; // '\\' (Reverse Solidus) integer SLASH = 2; // '\/' (Solidus) integer BP = 3; // '\b' (Break Point) integer FF = 4; // '\f' (Form Feed) integer NL = 5; // '\n' (New Line) integer CR = 6; // '\r' (Carriage Return) integer TAB = 7; // '\t' (Tab) integer U_ = 8; /* '\u' (Unicode Prefix- MUST immediately precede a string of 4 Hex digits, 0-G, sans '0x') */ // Optional, included for completeness only integer CRLF = 9; // '\r\n' (Windows end-of-line)

////////////////////////////// // function string uList2JsonStringSafe (list jasonStringParts) // This function takes a list, jasonStringParts, // of the parts of the Json string one wishes and // returns a LSL string within double quotes ("") // with embedded escape characters within it that // correctly encodes as a Json string using either // llList2Json() or llJsonSetValue(). // // NOTE: Deprecated 9/19/2013 and replaced by // uList2Json() and uJsonSetValue() // Version 1.0 by LepreKhaun 9/9/2013 // May be freely used, modified and distributed with this header intact. // Compiled Size = 2,088 bytes /////////////////////////////// string uList2JsonStringSafe (list jasonStringParts) { list escapeCodes = ["%5C%22", "%5C%5C", "%5C/", "%5Cb", "%5Cf", "%5Cn", "%5Cr", "%5Ct", "%5Cu", "%5Cr%5Cn"];

integer iter = llGetListLength(jasonStringParts);

// rString must be enclosed with escaped double quotes // to keep the LSL String "enhanced features" out of play string rString = "\"";

// build return string 'backwards' while (~--iter) { if(llGetListEntryType(jasonStringParts, iter) == TYPE_INTEGER) { // substitute encoding for integer constants rString = llList2String(escapeCodes, llList2Integer(jasonStringParts, iter)) + rString; } else { // escape String chunks to preserve them properly rString = llEscapeURL(llList2String(jasonStringParts, iter)) + rString; } } return llUnescapeURL("\"" + rString); }

/////////// // Example encodings showing usage ///////////

default { touch_end(integer i) { string jsonString; string jsonText;

// To encode '{"A":"\"Go!\" he yelled.\nShe replied \"No!\"","Z":"\\escaped \\ slosh\\"}' jsonString = uList2JsonStringSafe([QUOTE, "Go!", QUOTE, " he yelled.", NL, "She replied ", QUOTE, "No!", QUOTE]); jsonText = llList2Json(JSON_OBJECT, ["A", jsonString]); jsonString = uList2JsonStringSafe([SLOSH, "escaped ", SLOSH, " slosh", SLOSH]); jsonText = llJsonSetValue(jsonText, ["Z"], jsonString); llOwnerSay(jsonText);

// To encode '{"Control Chars":"\b\r\f\n\t and Windows uses \r\n for EOL","©":"\u00A9"}' jsonString = uList2JsonStringSafe([BP, CR, FF, NL, TAB, " and Windows uses ", CRLF, " for EOL"]); jsonText = llList2Json(JSON_OBJECT, ["Control Chars", jsonString]); jsonString = uList2JsonStringSafe([U_, "00A9"]); jsonText = llJsonSetValue(jsonText, ["©"], jsonString); llOwnerSay(jsonText);

// To encode '["WebSite","http:\/\/my.com\/ask.php?what%20is%20it","\t"]' jsonString = uList2JsonStringSafe(["http:", SLASH, SLASH, "my.com", SLASH, "ask.php?what%20is%20it"]); jsonText = llList2Json(JSON_ARRAY, ["WebSite", jsonString]); jsonText = llJsonSetValue(jsonText, [JSON_APPEND], uList2JsonStringSafe([TAB])); llOwnerSay(jsonText); } }</lsl>

The how and why this approach works is based on an earlier observation I had made that Json text (LSL strings that were enclosed within '{}' or '[]') were being handled differently than other LSL strings in that their enclosed escape codes (such as '\t') were not being translated (to '%09" or '%20%20%20%20'), a "feature" LSL strings have.

I then noticed a difference in definitions between RFC 4627 and JSON.org. The RFC defines a Json text to be either an array or an object but at json.org it's defined as any Json Value, including the JSON_STRING. And a JSON_STRING is defined, of course, as being enclosed within double quotes (""). So I began experimenting with that type of LSL string and found the same exception to "enhanced features" was afforded!

But then another problem surfaced: The LSL functions llJsonGetValue() llJson2List() extracts a JSON_STRING as a regular LSL String, resulting in these escaped character sequences being "enhanced" by translation (in other words '\t' becomes '%09', which is further "enhanced" to '%20%20%20%20' when chatted and '\u23B5' becomes 'u23B5'. Grrrrr.... This wasn't good for further processing, we needed a String to preserve these after the extraction.

And that lead to the development of uJsonGetValueSafe(), which returns the requested Value explicitly enclosed within double quotes {""}, just as it appears within the Json text...

And, of course, this was complicated by the RFC stating:

 Insignificant whitespace is allowed before or after any of the six
   structural characters.

	  ws = *(
				%x20 /			  ; Space
				%x09 /			  ; Horizontal tab
				%x0A /			  ; Line feed or New line
				%x0D				; Carriage return
	

)

Hooboy!



== More Json Tips, Tricks and Coding Examples ==