Difference between revisions of "User:LepreKhaun Resident/Workaround4Escaped Chars within JsonText 0ld"
(Created page with "=== Workaround for Escaped Characters within Json Text ***Old*** === '''NOTE: This is an archived page showing earlier thoughts to this problem and is kept only for reference to…") |
Kireji Haiku (talk | contribs) m (readability++) |
||
Line 6: | Line 6: | ||
---- | ---- | ||
Because of the way LSL handles strings (we have no "raw strings", which are taken as written and not messed with), escape sequences such as | Because of the way [[LSL Portal|LSL]] handles strings (we have no "raw strings", which are taken as written and not messed with), escape sequences such as <code>\t</code> are interpreted as 4 spaces for us as soon as they are encountered. Trying to encode <code>\t</code> by escaping the escape character (using <code>\\t</code>) results in (incorrectly) placing <code>\\t</code> within your Json text. Same for newlines, <code>\n</code>. | ||
And it's worse when you try to encode something like | And it's worse when you try to encode something like <code>\"Stop!\" he shouted.</code> or <code>She said \"No\"</code>. And UTF encoding such as <code>\u7650</code> is perfectly valid within a Json text but is elusive to obtain using [[LSL Portal|LSL]] strings. | ||
Here's the only work around I've been able to work out. A kludge, granted, but at least it allows one to encode something like this: | Here's the only work around I've been able to work out. A kludge, granted, but at least it allows one to encode something like this: | ||
< | <lsl> | ||
{ | |||
"A": "\b\f\t\r \n aba \u0000", | |||
"B": "\"he\"", | |||
"C": "\t" | |||
} | } | ||
string i = llEscapeURL("\\b\\f\\t\\r \\n aba \\u0000"); | |||
string j = llEscapeURL("\\\"he\\\""); | |||
string jText; | |||
jText = llList2Json(JSON_OBJECT, ["A", i, "B", j]); | |||
jText = llJsonSetValue(jText, ["C"], llEscapeURL("\\t")); | |||
// {"A":"\b\f\t\r \n aba \u0000","B":"\"he\"","C":"\t"} | |||
jText = llUnescapeURL(jText); | |||
</lsl> | |||
---- | ---- | ||
[ETA: 9/13/2013] | [ETA: 9/13/2013] | ||
So, I was working the LSL string mis-handling of the escape character ( | So, I was working the LSL string mis-handling of the escape character (<code>\</code>) and, looking deeper into have not only found another workaround but made an exciting discovery- LSL does have "raw strings" of a sort, and they are the JSON_STRING! First the alternate workaround: | ||
<lsl> | <lsl> | ||
Line 38: | Line 41: | ||
// Global constants | // Global constants | ||
integer QUOTE = 0; // '\"' (Double Quote) | integer QUOTE = 0; // '\"' (Double Quote) | ||
integer SLOSH = 1; // '\\' (Reverse Solidus) | integer SLOSH = 1; // '\\' (Reverse Solidus) | ||
integer SLASH = 2; // '\/' (Solidus) | integer SLASH = 2; // '\/' (Solidus) | ||
integer BP | integer BP = 3; // '\b' (Break Point) | ||
integer FF | integer FF = 4; // '\f' (Form Feed) | ||
integer NL | integer NL = 5; // '\n' (New Line) | ||
integer CR | integer CR = 6; // '\r' (Carriage Return) | ||
integer TAB | integer TAB = 7; // '\t' (Tab) | ||
integer U_ | integer U_ = 8; /* '\u' (Unicode Prefix- MUST immediately precede | ||
a string of 4 Hex digits, 0-G, sans '0x') */ | |||
// Optional, included for completeness only | // Optional, included for completeness only | ||
integer CRLF = 9; // '\r\n' (Windows end-of-line) | integer CRLF = 9; // '\r\n' (Windows end-of-line) | ||
////////////////////////////// | ////////////////////////////// | ||
Line 68: | Line 71: | ||
string uList2JsonStringSafe (list jasonStringParts) | string uList2JsonStringSafe (list jasonStringParts) | ||
{ | { | ||
list escapeCodes = ["%5C%22", "%5C%5C", "%5C/", "%5Cb", "%5Cf", "%5Cn", "%5Cr", "%5Ct", "%5Cu", "%5Cr%5Cn"]; | |||
integer iter = llGetListLength(jasonStringParts); | |||
// rString must be enclosed with escaped double quotes | |||
// to keep the LSL String "enhanced features" out of play | |||
string rString = "\""; | |||
// build return string 'backwards' | |||
while (~--iter) | |||
{ | |||
if(llGetListEntryType(jasonStringParts, iter) == TYPE_INTEGER) | |||
{ | |||
// substitute encoding for integer constants | |||
rString = llList2String(escapeCodes, llList2Integer(jasonStringParts, iter)) + rString; | |||
} | |||
else | |||
{ | |||
// escape String chunks to preserve them properly | |||
rString = llEscapeURL(llList2String(jasonStringParts, iter)) + rString; | |||
} | |||
} | |||
return llUnescapeURL("\"" + rString); | |||
} | } | ||
Line 99: | Line 102: | ||
default | default | ||
{ | { | ||
touch_end(integer i) | |||
{ | |||
string jsonString; | |||
string jsonText; | |||
// To encode '{"A":"\"Go!\" he yelled.\nShe replied \"No!\"","Z":"\\escaped \\ slosh\\"}' | |||
jsonString = uList2JsonStringSafe([QUOTE, "Go!", QUOTE, " he yelled.", NL, "She replied ", QUOTE, "No!", QUOTE]); | |||
jsonText = llList2Json(JSON_OBJECT, ["A", jsonString]); | |||
jsonString = uList2JsonStringSafe([SLOSH, "escaped ", SLOSH, " slosh", SLOSH]); | |||
jsonText = llJsonSetValue(jsonText, ["Z"], jsonString); | |||
llOwnerSay(jsonText); | |||
// To encode '{"Control Chars":"\b\r\f\n\t and Windows uses \r\n for EOL","©":"\u00A9"}' | |||
jsonString = uList2JsonStringSafe([BP, CR, FF, NL, TAB, " and Windows uses ", CRLF, " for EOL"]); | |||
jsonText = llList2Json(JSON_OBJECT, ["Control Chars", jsonString]); | |||
jsonString = uList2JsonStringSafe([U_, "00A9"]); | |||
jsonText = llJsonSetValue(jsonText, ["©"], jsonString); | |||
llOwnerSay(jsonText); | |||
// To encode '["WebSite","http:\/\/my.com\/ask.php?what%20is%20it","\t"]' | |||
jsonString = uList2JsonStringSafe(["http:", SLASH, SLASH, "my.com", SLASH, "ask.php?what%20is%20it"]); | |||
jsonText = llList2Json(JSON_ARRAY, ["WebSite", jsonString]); | |||
jsonText = llJsonSetValue(jsonText, [JSON_APPEND], uList2JsonStringSafe([TAB])); | |||
llOwnerSay(jsonText); | |||
} | |||
}</lsl> | }</lsl> | ||
The how and why this approach works is based on an earlier observation I had made that Json text (LSL strings that were enclosed within '{}' or '[]') were being handled differently than other LSL strings in that their enclosed escape codes (such as | The how and why this approach works is based on an earlier observation I had made that Json text (LSL strings that were enclosed within '{}' or '[]') were being handled differently than other LSL strings in that their enclosed escape codes (such as <code>\t</code>) were not being translated (to <code>%09</code> or <code>%20%20%20%20</code>), a "feature" LSL strings have. | ||
I then noticed a difference in definitions between [http://tools.ietf.org/html/rfc4627 RFC 4627] and [http://www.json.org/ JSON.org]. The RFC defines a Json text to be either an array or an object but at json.org it's defined as any Json Value, including the JSON_STRING. And a JSON_STRING is defined, of course, as being enclosed within double quotes (""). So I began experimenting with that type of LSL string and found the same exception to "enhanced features" was afforded! | I then noticed a difference in definitions between [http://tools.ietf.org/html/rfc4627 RFC 4627] and [http://www.json.org/ JSON.org]. The RFC defines a Json text to be either an array or an object but at json.org it's defined as any Json Value, including the JSON_STRING. And a JSON_STRING is defined, of course, as being enclosed within double quotes (""). So I began experimenting with that type of LSL string and found the same exception to "enhanced features" was afforded! | ||
But then another problem surfaced: The LSL functions llJsonGetValue() llJson2List() extracts a JSON_STRING as a regular LSL String, resulting in these escaped character sequences being "enhanced" by translation (in other words | But then another problem surfaced: The LSL functions [[llJsonGetValue]]() [[llJson2List]]() extracts a JSON_STRING as a regular LSL String, resulting in these escaped character sequences being "enhanced" by translation (in other words <code>\t</code> becomes <code>%09</code>, which is further "enhanced" to <code>%20%20%20%20</code> when chatted and <code>\u23B5</code> becomes <code>u23B5</code>. Grrrrr.... This wasn't good for further processing, we needed a String to preserve these after the extraction. | ||
And that lead to the development of [[LepreKhaun_Resident/Json_Get_Value_Safe|uJsonGetValueSafe()]], which returns the requested Value explicitly enclosed within double quotes {""}, just as it appears within the Json text... | And that lead to the development of [[LepreKhaun_Resident/Json_Get_Value_Safe|uJsonGetValueSafe()]], which returns the requested Value explicitly enclosed within double quotes {""}, just as it appears within the Json text... | ||
And, of course, this was complicated by the RFC stating: | And, of course, this was complicated by the RFC stating: | ||
<pre> Insignificant whitespace is allowed before or after any of the six | <pre> | ||
Insignificant whitespace is allowed before or after any of the six structural characters. | |||
ws = *( | |||
%x20 / ; Space | |||
%x09 / ; Horizontal tab | |||
%x0A / ; Line feed or New line | |||
%x0D ; Carriage return | |||
) | |||
</pre> | |||
Hooboy! | Hooboy! | ||
Line 149: | Line 153: | ||
---- | ---- | ||
<center> | <center>See [[User:LepreKhaun Resident|LepreKhaun Resident]]'s page for '''more JSON tips, tricks and coding examples'''.</center> |
Revision as of 14:45, 23 September 2013
Workaround for Escaped Characters within Json Text ***Old***
NOTE: This is an archived page showing earlier thoughts to this problem and is kept only for reference to show how one can initially choose the wrong approach to a problem and worry it to death before the light bulb goes off. Please refer to this page for the final, much more elegant workaround.
[ETA: The reasoning here was correct, it was just the approach that was faulty, kept thinking it had to be kludged. :=)]
Because of the way LSL handles strings (we have no "raw strings", which are taken as written and not messed with), escape sequences such as \t
are interpreted as 4 spaces for us as soon as they are encountered. Trying to encode \t
by escaping the escape character (using \\t
) results in (incorrectly) placing \\t
within your Json text. Same for newlines, \n
.
And it's worse when you try to encode something like \"Stop!\" he shouted.
or She said \"No\"
. And UTF encoding such as \u7650
is perfectly valid within a Json text but is elusive to obtain using LSL strings.
Here's the only work around I've been able to work out. A kludge, granted, but at least it allows one to encode something like this:
<lsl> {
"A": "\b\f\t\r \n aba \u0000", "B": "\"he\"", "C": "\t"
}
string i = llEscapeURL("\\b\\f\\t\\r \\n aba \\u0000"); string j = llEscapeURL("\\\"he\\\"");
string jText;
jText = llList2Json(JSON_OBJECT, ["A", i, "B", j]); jText = llJsonSetValue(jText, ["C"], llEscapeURL("\\t"));
// {"A":"\b\f\t\r \n aba \u0000","B":"\"he\"","C":"\t"}
jText = llUnescapeURL(jText);
</lsl>
[ETA: 9/13/2013]
So, I was working the LSL string mis-handling of the escape character (\
) and, looking deeper into have not only found another workaround but made an exciting discovery- LSL does have "raw strings" of a sort, and they are the JSON_STRING! First the alternate workaround:
<lsl> // NOTE: Deprecated 9/19/2013 and replaced by // uList2Json() and uJsonSetValue()
// Global constants integer QUOTE = 0; // '\"' (Double Quote) integer SLOSH = 1; // '\\' (Reverse Solidus) integer SLASH = 2; // '\/' (Solidus) integer BP = 3; // '\b' (Break Point) integer FF = 4; // '\f' (Form Feed) integer NL = 5; // '\n' (New Line) integer CR = 6; // '\r' (Carriage Return) integer TAB = 7; // '\t' (Tab) integer U_ = 8; /* '\u' (Unicode Prefix- MUST immediately precede
a string of 4 Hex digits, 0-G, sans '0x') */
// Optional, included for completeness only integer CRLF = 9; // '\r\n' (Windows end-of-line)
////////////////////////////// // function string uList2JsonStringSafe (list jasonStringParts) // This function takes a list, jasonStringParts, // of the parts of the Json string one wishes and // returns a LSL string within double quotes ("") // with embedded escape characters within it that // correctly encodes as a Json string using either // llList2Json() or llJsonSetValue(). // // NOTE: Deprecated 9/19/2013 and replaced by // uList2Json() and uJsonSetValue() // Version 1.0 by LepreKhaun 9/9/2013 // May be freely used, modified and distributed with this header intact. // Compiled Size = 2,088 bytes /////////////////////////////// string uList2JsonStringSafe (list jasonStringParts) {
list escapeCodes = ["%5C%22", "%5C%5C", "%5C/", "%5Cb", "%5Cf", "%5Cn", "%5Cr", "%5Ct", "%5Cu", "%5Cr%5Cn"];
integer iter = llGetListLength(jasonStringParts);
// rString must be enclosed with escaped double quotes // to keep the LSL String "enhanced features" out of play string rString = "\"";
// build return string 'backwards' while (~--iter) { if(llGetListEntryType(jasonStringParts, iter) == TYPE_INTEGER) { // substitute encoding for integer constants rString = llList2String(escapeCodes, llList2Integer(jasonStringParts, iter)) + rString; } else { // escape String chunks to preserve them properly rString = llEscapeURL(llList2String(jasonStringParts, iter)) + rString; } } return llUnescapeURL("\"" + rString);
}
/////////// // Example encodings showing usage ///////////
default {
touch_end(integer i) { string jsonString; string jsonText;
// To encode '{"A":"\"Go!\" he yelled.\nShe replied \"No!\"","Z":"\\escaped \\ slosh\\"}' jsonString = uList2JsonStringSafe([QUOTE, "Go!", QUOTE, " he yelled.", NL, "She replied ", QUOTE, "No!", QUOTE]); jsonText = llList2Json(JSON_OBJECT, ["A", jsonString]); jsonString = uList2JsonStringSafe([SLOSH, "escaped ", SLOSH, " slosh", SLOSH]); jsonText = llJsonSetValue(jsonText, ["Z"], jsonString); llOwnerSay(jsonText);
// To encode '{"Control Chars":"\b\r\f\n\t and Windows uses \r\n for EOL","©":"\u00A9"}' jsonString = uList2JsonStringSafe([BP, CR, FF, NL, TAB, " and Windows uses ", CRLF, " for EOL"]); jsonText = llList2Json(JSON_OBJECT, ["Control Chars", jsonString]); jsonString = uList2JsonStringSafe([U_, "00A9"]); jsonText = llJsonSetValue(jsonText, ["©"], jsonString); llOwnerSay(jsonText);
// To encode '["WebSite","http:\/\/my.com\/ask.php?what%20is%20it","\t"]' jsonString = uList2JsonStringSafe(["http:", SLASH, SLASH, "my.com", SLASH, "ask.php?what%20is%20it"]); jsonText = llList2Json(JSON_ARRAY, ["WebSite", jsonString]); jsonText = llJsonSetValue(jsonText, [JSON_APPEND], uList2JsonStringSafe([TAB])); llOwnerSay(jsonText); }
}</lsl>
The how and why this approach works is based on an earlier observation I had made that Json text (LSL strings that were enclosed within '{}' or '[]') were being handled differently than other LSL strings in that their enclosed escape codes (such as \t
) were not being translated (to %09
or %20%20%20%20
), a "feature" LSL strings have.
I then noticed a difference in definitions between RFC 4627 and JSON.org. The RFC defines a Json text to be either an array or an object but at json.org it's defined as any Json Value, including the JSON_STRING. And a JSON_STRING is defined, of course, as being enclosed within double quotes (""). So I began experimenting with that type of LSL string and found the same exception to "enhanced features" was afforded!
But then another problem surfaced: The LSL functions llJsonGetValue() llJson2List() extracts a JSON_STRING as a regular LSL String, resulting in these escaped character sequences being "enhanced" by translation (in other words \t
becomes %09
, which is further "enhanced" to %20%20%20%20
when chatted and \u23B5
becomes u23B5
. Grrrrr.... This wasn't good for further processing, we needed a String to preserve these after the extraction.
And that lead to the development of uJsonGetValueSafe(), which returns the requested Value explicitly enclosed within double quotes {""}, just as it appears within the Json text...
And, of course, this was complicated by the RFC stating:
Insignificant whitespace is allowed before or after any of the six structural characters. ws = *( %x20 / ; Space %x09 / ; Horizontal tab %x0A / ; Line feed or New line %x0D ; Carriage return )
Hooboy!