Difference between revisions of "User:LepreKhaun Resident/Workaround4Escaped Chars within JsonText"

From Second Life Wiki
Jump to: navigation, search
(Added User Function uList2JsonStringSafe and observations)
Line 22: Line 22:
  
 
'''jText == {"A":"\b\f\t\r \n aba \u0000","B":"\"he\"","C":"\t"}'''
 
'''jText == {"A":"\b\f\t\r \n aba \u0000","B":"\"he\"","C":"\t"}'''
 +
 +
----
 +
[ETA: 9/13/2013]
 +
 +
So, I was working the LSL string mis-handling of the escape character ("\") and, looking deeper into have not only found another workaround but made an exciting discovery- LSL does have "raw strings" of a sort, and they are the JSON_STRING! First the alternate workaround:
 +
 +
<lsl>// Global constants
 +
integer QUOTE = 0; // '\"' (Double Quote)
 +
integer SLOSH = 1; // '\\' (Reverse Solidus)
 +
integer SLASH = 2; // '\/' (Solidus)
 +
integer BP = 3; // '\b' (Break Point)
 +
integer FF = 4; // '\f' (Form Feed)
 +
integer NL = 5; // '\n' (New Line)
 +
integer CR = 6; // '\r' (Carriage Return)
 +
integer TAB = 7; // '\t' (Tab)
 +
integer U_ = 8; /* '\u' (Unicode Prefix- MUST immediately precede
 +
  a string of 4 Hex digits, 0-G, sans '0x') */
 +
// Optional, included for completeness only
 +
integer CRLF = 9; // '\r\n' (Windows end-of-line)
 +
 +
//////////////////////////////
 +
// function string uList2JsonStringSafe (list jasonStringParts)
 +
// This function takes a list, jasonStringParts,
 +
// of the parts of the Json string one wishes and
 +
// returns a LSL string within double quotes ("")
 +
// with embedded escape characters within it that
 +
// correctly encodes as a Json string using either
 +
// llList2Json() or llJsonSetValue().
 +
//
 +
// Version 1.0 by LepreKhaun 9/9/2013
 +
// May be freely used, modified and distributed with this header intact.
 +
// Compiled Size = 2,088 bytes
 +
///////////////////////////////
 +
string uList2JsonStringSafe (list jasonStringParts)
 +
{
 +
list escapeCodes = ["%5C%22", "%5C%5C", "%5C/", "%5Cb", "%5Cf", "%5Cn", "%5Cr", "%5Ct", "%5Cu", "%5Cr%5Cn"];
 +
 +
integer iter = llGetListLength(jasonStringParts);
 +
 +
// rString must be enclosed with escaped double quotes
 +
// to keep the LSL String "enhanced features" out of play
 +
string rString = "\"";
 +
 +
// build return string 'backwards'
 +
while (~--iter)
 +
{
 +
if(llGetListEntryType(jasonStringParts, iter) == TYPE_INTEGER)
 +
{
 +
// substitute encoding for integer constants
 +
rString = llList2String(escapeCodes, llList2Integer(jasonStringParts, iter)) + rString;
 +
}
 +
else
 +
{
 +
// escape String chunks to preserve them properly
 +
rString = llEscapeURL(llList2String(jasonStringParts, iter)) + rString;
 +
}
 +
}
 +
return llUnescapeURL("\"" + rString);
 +
}
 +
 +
///////////
 +
// Example encodings showing usage
 +
///////////
 +
 +
default
 +
{
 +
touch_end(integer i)
 +
{
 +
string jsonString;
 +
string jsonText;
 +
 +
// To encode '{"A":"\"Go!\" he yelled.\nShe replied \"No!\"","Z":"\\escaped \\ slosh\\"}'
 +
jsonString = uList2JsonStringSafe([QUOTE, "Go!", QUOTE, " he yelled.", NL, "She replied ", QUOTE, "No!", QUOTE]);
 +
jsonText = llList2Json(JSON_OBJECT, ["A", jsonString]);
 +
jsonString = uList2JsonStringSafe([SLOSH, "escaped ", SLOSH, " slosh", SLOSH]);
 +
jsonText = llJsonSetValue(jsonText, ["Z"], jsonString);
 +
llOwnerSay(jsonText);
 +
 +
// To encode '{"Control Chars":"\b\r\f\n\t and Windows uses \r\n for EOL","©":"\u00A9"}'
 +
jsonString = uList2JsonStringSafe([BP, CR, FF, NL, TAB, " and Windows uses ", CRLF, " for EOL"]);
 +
jsonText = llList2Json(JSON_OBJECT, ["Control Chars", jsonString]);
 +
jsonString = uList2JsonStringSafe([U_, "00A9"]);
 +
jsonText = llJsonSetValue(jsonText, ["©"], jsonString);
 +
llOwnerSay(jsonText);
 +
 +
// To encode '["WebSite","http:\/\/my.com\/ask.php?what%20is%20it","\t"]'
 +
jsonString = uList2JsonStringSafe(["http:", SLASH, SLASH, "my.com", SLASH, "ask.php?what%20is%20it"]);
 +
jsonText = llList2Json(JSON_ARRAY, ["WebSite", jsonString]);
 +
jsonText = llJsonSetValue(jsonText, [JSON_APPEND], uList2JsonStringSafe([TAB]));
 +
llOwnerSay(jsonText);
 +
}
 +
}</lsl>
 +
 +
The how and why this approach works is based on an earlier observation I had made that Json text (LSL strings that were enclosed within '{}' or '[]') were being handled differently than other LSL strings in that their enclosed escape codes (such as '\t') were not being translated (to '%09" or '%20%20%20%20'), a "feature" LSL strings have.
 +
 +
I then noticed a difference in definitions between [http://tools.ietf.org/html/rfc4627 RFC 4627] and [http://www.json.org/ JSON.org]. The RFC defines a Json text to be either an array or an object but at json.org it's defined as any Json Value, including the JSON_STRING. And a JSON_STRING is defined, of course, as being enclosed within double quotes (""). So I began experimenting with that type of LSL string and found the same exception to "enhanced features" was afforded!
 +
 +
But then another problem surfaced: The LSL functions llJsonGetValue() llJson2List() extracts a JSON_STRING as a regular LSL String, resulting in these escaped character sequences being "enhanced" by translation (in other words '\t' becomes '%09', which is further "enhanced" to '%20%20%20%20' when chatted and '\u23B5' becomes 'u23B5'. Grrrrr.... This wasn't good for further processing, we needed a String to preserve these after the extraction.
 +
 +
And that lead to the development of [[LepreKhaun_Resident/Json_Get_Value_Safe|uJsonGetValueSafe()]], which returns the requested Value explicitly enclosed within double quotes {""}, just as it appears within the Json text...
 +
 +
And, of course, this was complicated by the RFC stating:
 +
<pre> Insignificant whitespace is allowed before or after any of the six
 +
  structural characters.
 +
 +
  ws = *(
 +
%x20 /   ; Space
 +
%x09 /   ; Horizontal tab
 +
%x0A /   ; Line feed or New line
 +
%x0D ; Carriage return
 +
</pre> )
 +
Hooboy!
 +
  
 
----
 
----
  
 
<center>== [[User:LepreKhaun_Resident|'''More Json Tips, Tricks and Coding Examples''']] ==</center>
 
<center>== [[User:LepreKhaun_Resident|'''More Json Tips, Tricks and Coding Examples''']] ==</center>

Revision as of 11:39, 13 September 2013

Workaround for Escaped Characters within Json Text

Because of the way LSL handles strings (we have no "raw strings", which are taken as written and not messed with), escape sequences such as "\t" are interpreted as 4 spaces for us as soon as they are encountered. Trying to encode "\t" by escaping the escape character (using "\\t") results in (incorrectly) placing '\\t' within your Json text. Same for newlines, "\n".

And it's worse when you try to encode something like "\"Stop!\" he shouted." or "She said \"No\"". And UTF encoding such as "\u7650" is perfectly valid within a Json text but is elusive to obtain using LSL strings.

Here's the only work around I've been able to work out. A kludge, granted, but at least it allows one to encode something like this:

{
	"A": "\b\f\t\r \n aba \u0000",
	"B": "\"he\"", 
	"C": "\t"
}

        string jText;
        string i = llEscapeURL("\\b\\f\\t\\r \\n aba \\u0000");
        string j = llEscapeURL("\\\"he\\\"");
        
        jText = llList2Json(JSON_OBJECT, ["A", i, "B", j]);
        jText = llJsonSetValue(jText, ["C"], llEscapeURL("\\t"));
        jText = llUnescapeURL(jText);

jText == {"A":"\b\f\t\r \n aba \u0000","B":"\"he\"","C":"\t"}


[ETA: 9/13/2013]

So, I was working the LSL string mis-handling of the escape character ("\") and, looking deeper into have not only found another workaround but made an exciting discovery- LSL does have "raw strings" of a sort, and they are the JSON_STRING! First the alternate workaround:

<lsl>// Global constants integer QUOTE = 0; // '\"' (Double Quote) integer SLOSH = 1; // '\\' (Reverse Solidus) integer SLASH = 2; // '\/' (Solidus) integer BP = 3; // '\b' (Break Point) integer FF = 4; // '\f' (Form Feed) integer NL = 5; // '\n' (New Line) integer CR = 6; // '\r' (Carriage Return) integer TAB = 7; // '\t' (Tab) integer U_ = 8; /* '\u' (Unicode Prefix- MUST immediately precede a string of 4 Hex digits, 0-G, sans '0x') */ // Optional, included for completeness only integer CRLF = 9; // '\r\n' (Windows end-of-line)

////////////////////////////// // function string uList2JsonStringSafe (list jasonStringParts) // This function takes a list, jasonStringParts, // of the parts of the Json string one wishes and // returns a LSL string within double quotes ("") // with embedded escape characters within it that // correctly encodes as a Json string using either // llList2Json() or llJsonSetValue(). // // Version 1.0 by LepreKhaun 9/9/2013 // May be freely used, modified and distributed with this header intact. // Compiled Size = 2,088 bytes /////////////////////////////// string uList2JsonStringSafe (list jasonStringParts) { list escapeCodes = ["%5C%22", "%5C%5C", "%5C/", "%5Cb", "%5Cf", "%5Cn", "%5Cr", "%5Ct", "%5Cu", "%5Cr%5Cn"];

integer iter = llGetListLength(jasonStringParts);

// rString must be enclosed with escaped double quotes // to keep the LSL String "enhanced features" out of play string rString = "\"";

// build return string 'backwards' while (~--iter) { if(llGetListEntryType(jasonStringParts, iter) == TYPE_INTEGER) { // substitute encoding for integer constants rString = llList2String(escapeCodes, llList2Integer(jasonStringParts, iter)) + rString; } else { // escape String chunks to preserve them properly rString = llEscapeURL(llList2String(jasonStringParts, iter)) + rString; } } return llUnescapeURL("\"" + rString); }

/////////// // Example encodings showing usage ///////////

default { touch_end(integer i) { string jsonString; string jsonText;

// To encode '{"A":"\"Go!\" he yelled.\nShe replied \"No!\"","Z":"\\escaped \\ slosh\\"}' jsonString = uList2JsonStringSafe([QUOTE, "Go!", QUOTE, " he yelled.", NL, "She replied ", QUOTE, "No!", QUOTE]); jsonText = llList2Json(JSON_OBJECT, ["A", jsonString]); jsonString = uList2JsonStringSafe([SLOSH, "escaped ", SLOSH, " slosh", SLOSH]); jsonText = llJsonSetValue(jsonText, ["Z"], jsonString); llOwnerSay(jsonText);

// To encode '{"Control Chars":"\b\r\f\n\t and Windows uses \r\n for EOL","©":"\u00A9"}' jsonString = uList2JsonStringSafe([BP, CR, FF, NL, TAB, " and Windows uses ", CRLF, " for EOL"]); jsonText = llList2Json(JSON_OBJECT, ["Control Chars", jsonString]); jsonString = uList2JsonStringSafe([U_, "00A9"]); jsonText = llJsonSetValue(jsonText, ["©"], jsonString); llOwnerSay(jsonText);

// To encode '["WebSite","http:\/\/my.com\/ask.php?what%20is%20it","\t"]' jsonString = uList2JsonStringSafe(["http:", SLASH, SLASH, "my.com", SLASH, "ask.php?what%20is%20it"]); jsonText = llList2Json(JSON_ARRAY, ["WebSite", jsonString]); jsonText = llJsonSetValue(jsonText, [JSON_APPEND], uList2JsonStringSafe([TAB])); llOwnerSay(jsonText); } }</lsl>

The how and why this approach works is based on an earlier observation I had made that Json text (LSL strings that were enclosed within '{}' or '[]') were being handled differently than other LSL strings in that their enclosed escape codes (such as '\t') were not being translated (to '%09" or '%20%20%20%20'), a "feature" LSL strings have.

I then noticed a difference in definitions between RFC 4627 and JSON.org. The RFC defines a Json text to be either an array or an object but at json.org it's defined as any Json Value, including the JSON_STRING. And a JSON_STRING is defined, of course, as being enclosed within double quotes (""). So I began experimenting with that type of LSL string and found the same exception to "enhanced features" was afforded!

But then another problem surfaced: The LSL functions llJsonGetValue() llJson2List() extracts a JSON_STRING as a regular LSL String, resulting in these escaped character sequences being "enhanced" by translation (in other words '\t' becomes '%09', which is further "enhanced" to '%20%20%20%20' when chatted and '\u23B5' becomes 'u23B5'. Grrrrr.... This wasn't good for further processing, we needed a String to preserve these after the extraction.

And that lead to the development of uJsonGetValueSafe(), which returns the requested Value explicitly enclosed within double quotes {""}, just as it appears within the Json text...

And, of course, this was complicated by the RFC stating:

 Insignificant whitespace is allowed before or after any of the six
   structural characters.

	  ws = *(
				%x20 /			  ; Space
				%x09 /			  ; Horizontal tab
				%x0A /			  ; Line feed or New line
				%x0D				; Carriage return
	
)

Hooboy!



== More Json Tips, Tricks and Coding Examples ==