Difference between revisions of "User:LepreKhaun Resident/Workaround4Escaped Chars within JsonText"

From Second Life Wiki
Jump to navigation Jump to search
(Added User Function uList2JsonStringSafe and observations)
(Complete rewrite. Thank you Maestro for the hint!)
Line 1: Line 1:
===Workaround for Escaped Characters within Json Text===
=== uList2Json() and uJsonSetValue()===


Because of the way LSL handles strings (we have no "raw strings", which are taken as written and not messed with), escape sequences such as "\t" are interpreted as 4 spaces for us as soon as they are encountered. Trying to encode "\t" by escaping the escape character (using "\\t") results in (incorrectly) placing '\\t' within your Json text. Same for newlines, "\n".
As many of you may be aware, LSL has the habit of "enhancing" Strings. This is regarded as a "feature" of the language and usually works out for the best, giving one the option of formatting chatted text by using "\t" and "\n". Unfortunately, one didn't have a way to opt out of this behavior. Put in computereze, LSL simply lacked "raw strings".


And it's worse when you try to encode something like "\"Stop!\" he shouted." or "She said \"No\"". And UTF encoding such as "\u7650" is perfectly valid within a Json text but is elusive to obtain using LSL strings.
This has bedeviled those working with Json text, either for web communications or developing other uses for it, because some strings just wouldn't encode properly. That is to say, these are all perfectly valid Json strings that simply couldn't be directly formed with llList2Json() and llJsonSetValue():
*"\"Go!\" he yelled.\n"
*She replied \"No!\""
*"Copyright symbol is \u00A9"
*"oops]"
*"Control characters are \t\n\r\f\b"


Here's the only work around I've been able to work out. A kludge, granted, but at least it allows one to encode something like this:
I've spent a few weeks studying the problem, [[User:LepreKhaun_Resident/Workaround4Escaped_Chars_within_JsonText_0ld|most of it going about it the wrong way]], but had an epiphany. A one line addition Maestro Linden added to [[Json_usage_in_LSL|Json Usage in LSL]] on the 10th ("LSL strings which both begin and end with "\"" are interpreted literally as JSON strings, while those without are parsed when converted into JSON.") confirmed what I had begun to surmise- a Json String (being a LSL String that is further enclosed within double quotes) is a "raw string"! Once I had that in hand, the following two functions practically wrote themselves.


<pre>{
"A": "\b\f\t\r \n aba \u0000",
"B": "\"he\"",
"C": "\t"
}


        string jText;
<pre>//////////////////////////////
        string i = llEscapeURL("\\b\\f\\t\\r \\n aba \\u0000");
// function string uList2Json (string type, list values)
        string j = llEscapeURL("\\\"he\\\"");
// This function takes the exact same parameters as
       
// llList2Json() but correctly encodes all possible strings
        jText = llList2Json(JSON_OBJECT, ["A", i, "B", j]);
// including those with escape characters within them.
        jText = llJsonSetValue(jText, ["C"], llEscapeURL("\\t"));
//
        jText = llUnescapeURL(jText);</pre>
// Initial strings must escape all instances of the
// desired escape character itself
// (ie "\\t" => '\t', "\\\\" => '\\', "\\/" => '\/')
// as well as any double quotes ("\\\"" => '\"')
//
// Version 1.0 by LepreKhaun 9/19/2013
// May be freely used, modified and distributed with this header intact.
///////////////////////////////
string uList2Json (string type, list values)
{


'''jText == {"A":"\b\f\t\r \n aba \u0000","B":"\"he\"","C":"\t"}'''
integer iter = -1;
 
integer listLength = llGetListLength(values);
----
[ETA: 9/13/2013]
// Step through list, hitting every other item if JSON_OBJECT
 
while ((iter = ++iter + (type == JSON_OBJECT)) < listLength)
So, I was working the LSL string mis-handling of the escape character ("\") and, looking deeper into have not only found another workaround but made an exciting discovery- LSL does have "raw strings" of a sort, and they are the JSON_STRING! First the alternate workaround:
// necessary so we don't choke on next if test
if (llGetListEntryType(values, iter) == TYPE_STRING)
// make sure it is not a JSON_* Value or a Number
if (llJsonValueType(llList2String(values, iter), []) == JSON_INVALID)
values = llListReplaceList(values, ["\"" + llList2String(values, iter) + "\""], iter, iter);


<lsl>// Global constants
return llList2Json(type, values);
integer QUOTE = 0; // '\"' (Double Quote)
}
integer SLOSH = 1; // '\\' (Reverse Solidus)
integer SLASH = 2; // '\/' (Solidus)
integer BP = 3; // '\b' (Break Point)
integer FF = 4; // '\f' (Form Feed)
integer NL = 5; // '\n' (New Line)
integer CR = 6; // '\r' (Carriage Return)
integer TAB = 7; // '\t' (Tab)
integer U_ = 8; /* '\u' (Unicode Prefix- MUST immediately precede
  a string of 4 Hex digits, 0-G, sans '0x') */
// Optional, included for completeness only
integer CRLF = 9; // '\r\n' (Windows end-of-line)


//////////////////////////////
//////////////////////////////
// function string uList2JsonStringSafe (list jasonStringParts)
// function string uJsonSetValue ( string json, list specifiers, string value )
// This function takes a list, jasonStringParts,
// This function takes the exact same parameters as
// of the parts of the Json string one wishes and
// llJsonSetValue() but correctly encodes all possible strings
// returns a LSL string within double quotes ("")
// including those with escape characters within them.
// with embedded escape characters within it that
//
// correctly encodes as a Json string using either
// Initial strings must escape all instances of the
// llList2Json() or llJsonSetValue().
// desired escape character itself
// (ie "\\t" => '\t', "\\\\" => '\\', "\\/" => '\/')
// as well as any double quotes ("\\\"" => '\"')
//
//
// Version 1.0 by LepreKhaun 9/9/2013
// NOTE: To encode a Float or Integer as a String
// within the Json text, eclosed it with escaped quotes
// (ie '"3"' => '3' BUT '"\"3\""' => '"3"')
//
// Version 1.0 by LepreKhaun 9/19/2013
// May be freely used, modified and distributed with this header intact.
// May be freely used, modified and distributed with this header intact.
// Compiled Size = 2,088 bytes
///////////////////////////////
///////////////////////////////
string uList2JsonStringSafe (list jasonStringParts)
string uJsonSetValue(string json, list specifiers, string value)
{
{
list escapeCodes = ["%5C%22", "%5C%5C", "%5C/", "%5Cb", "%5Cf", "%5Cn", "%5Cr", "%5Ct", "%5Cu", "%5Cr%5Cn"];
// We don't want to change the string representation of
// an integer, a float or any Json Value Type
if (llJsonValueType(value, []) == JSON_INVALID)
value = "\"" + value + "\"";
return llJsonSetValue(json, specifiers, value);
}


integer iter = llGetListLength(jasonStringParts);
// rString must be enclosed with escaped double quotes
// to keep the LSL String "enhanced features" out of play
string rString = "\"";
// build return string 'backwards'
while (~--iter)
{
if(llGetListEntryType(jasonStringParts, iter) == TYPE_INTEGER)
{
// substitute encoding for integer constants
rString = llList2String(escapeCodes, llList2Integer(jasonStringParts, iter)) + rString;
}
else
{
// escape String chunks to preserve them properly
rString = llEscapeURL(llList2String(jasonStringParts, iter)) + rString;
}
}
return llUnescapeURL("\"" + rString);
}


///////////
///////////
// Example encodings showing usage
// Examples showing usage
///////////
///////////
 
default
default
{
{
Line 94: Line 84:
// To encode '{"A":"\"Go!\" he yelled.\nShe replied \"No!\"","Z":"\\escaped \\ slosh\\"}'
// To encode '{"A":"\"Go!\" he yelled.\nShe replied \"No!\"","Z":"\\escaped \\ slosh\\"}'
jsonString = uList2JsonStringSafe([QUOTE, "Go!", QUOTE, " he yelled.", NL, "She replied ", QUOTE, "No!", QUOTE]);
jsonText = uList2Json (JSON_OBJECT, [
jsonText = llList2Json(JSON_OBJECT, ["A", jsonString]);
"A", "\\\"Go!\\\" he yelled.\\nShe replied \\\"No!\\\"",  
jsonString = uList2JsonStringSafe([SLOSH, "escaped ", SLOSH, " slosh", SLOSH]);
"Z", "\\\\escaped \\\\ slosh\\\\"
jsonText = llJsonSetValue(jsonText, ["Z"], jsonString);
]);
llOwnerSay(jsonText);
llOwnerSay(jsonText);
// To encode '{"Control Chars":"\b\r\f\n\t and Windows uses \r\n for EOL","©":"\u00A9"}'
// To encode '{"Control Chars":"\b\r\f\n\t and Windows uses \r\n for EOL","©":"\u00A9"}'
jsonString = uList2JsonStringSafe([BP, CR, FF, NL, TAB, " and Windows uses ", CRLF, " for EOL"]);
jsonText = uList2Json(JSON_OBJECT, [
jsonText = llList2Json(JSON_OBJECT, ["Control Chars", jsonString]);
"Control Chars", "\\b\\r\\f\\n\\t and Windows uses \\r\\n for EOL",
jsonString = uList2JsonStringSafe([U_, "00A9"]);
"©", "\\u00A9"
jsonText = llJsonSetValue(jsonText, ["©"], jsonString);
]);
llOwnerSay(jsonText);
llOwnerSay(jsonText);
// To encode '["WebSite","http:\/\/my.com\/ask.php?what%20is%20it","\t"]'
// To encode '["WebSite","http:\/\/my.com\/ask.php?what%20is%20it","\t"]'
jsonString = uList2JsonStringSafe(["http:", SLASH, SLASH, "my.com", SLASH, "ask.php?what%20is%20it"]);
jsonText = uList2Json(JSON_ARRAY, [
jsonText = llList2Json(JSON_ARRAY, ["WebSite", jsonString]);
"WebSite",
jsonText = llJsonSetValue(jsonText, [JSON_APPEND], uList2JsonStringSafe([TAB]));
"http:\\/\\/my.com\\/ask.php?what%20is%20it",
"\\t"
]);
llOwnerSay(jsonText);
 
// Make a Json object...
string temp = uList2Json(JSON_OBJECT, [
"A", 99,
"Z", "88]",
"C", JSON_TRUE
]);
// ... add it to end of the array ...
jsonText = uJsonSetValue(jsonText, [JSON_APPEND], temp);
// ... change our web address ...
jsonText = uJsonSetValue(jsonText, [1], "http:\\/\\/www.google.com");
// ... change that TAB in the third spot to PI
jsonText = uJsonSetValue(jsonText, [2], (string)PI);
// ... and add a new "Key":Value pair to our object
jsonText = uJsonSetValue(jsonText, [3, "New"], ((string)PI + "\\n"));
//  ["WebSite","http:\/\/www.google.com",3.141593,{"A":99,"C":true,"New":"3.141593\n","Z":"88]"}]
llOwnerSay(jsonText);
llOwnerSay(jsonText);
 
}
}
}</lsl>
}</pre>
 
The how and why this approach works is based on an earlier observation I had made that Json text (LSL strings that were enclosed within '{}' or '[]') were being handled differently than other LSL strings in that their enclosed escape codes (such as '\t') were not being translated (to '%09" or '%20%20%20%20'), a "feature" LSL strings have.
 
I then noticed a difference in definitions between [http://tools.ietf.org/html/rfc4627 RFC 4627] and [http://www.json.org/ JSON.org]. The RFC defines a Json text to be either an array or an object but at json.org it's defined as any Json Value, including the JSON_STRING. And a JSON_STRING is defined, of course, as being enclosed within double quotes (""). So I began experimenting with that type of LSL string and found the same exception to "enhanced features" was afforded!
 
But then another problem surfaced: The LSL functions llJsonGetValue() llJson2List() extracts a JSON_STRING as a regular LSL String, resulting in these escaped character sequences being "enhanced" by translation (in other words '\t' becomes '%09', which is further "enhanced" to '%20%20%20%20' when chatted and '\u23B5' becomes 'u23B5'. Grrrrr.... This wasn't good for further processing, we needed a String to preserve these after the extraction.
 
And that lead to the development of [[LepreKhaun_Resident/Json_Get_Value_Safe|uJsonGetValueSafe()]], which returns the requested Value explicitly enclosed within double quotes {""}, just as it appears within the Json text...
 
And, of course, this was complicated by the RFC stating:
<pre> Insignificant whitespace is allowed before or after any of the six
  structural characters.
 
  ws = *(
%x20 /   ; Space
%x09 /   ; Horizontal tab
%x0A /   ; Line feed or New line
%x0D ; Carriage return
</pre> )
Hooboy!
 


Now, if I can just get the retrieval worked out as simply... ;=)
----
----


<center>== [[User:LepreKhaun_Resident|'''More Json Tips, Tricks and Coding Examples''']] ==</center>
<center>== [[User:LepreKhaun_Resident|'''More Json Tips, Tricks and Coding Examples''']] ==</center>

Revision as of 15:28, 20 September 2013

uList2Json() and uJsonSetValue()

As many of you may be aware, LSL has the habit of "enhancing" Strings. This is regarded as a "feature" of the language and usually works out for the best, giving one the option of formatting chatted text by using "\t" and "\n". Unfortunately, one didn't have a way to opt out of this behavior. Put in computereze, LSL simply lacked "raw strings".

This has bedeviled those working with Json text, either for web communications or developing other uses for it, because some strings just wouldn't encode properly. That is to say, these are all perfectly valid Json strings that simply couldn't be directly formed with llList2Json() and llJsonSetValue():

  • "\"Go!\" he yelled.\n"
  • She replied \"No!\""
  • "Copyright symbol is \u00A9"
  • "oops]"
  • "Control characters are \t\n\r\f\b"

I've spent a few weeks studying the problem, most of it going about it the wrong way, but had an epiphany. A one line addition Maestro Linden added to Json Usage in LSL on the 10th ("LSL strings which both begin and end with "\"" are interpreted literally as JSON strings, while those without are parsed when converted into JSON.") confirmed what I had begun to surmise- a Json String (being a LSL String that is further enclosed within double quotes) is a "raw string"! Once I had that in hand, the following two functions practically wrote themselves.


//////////////////////////////
// function string uList2Json (string type, list values)
// This function takes the exact same parameters as
// llList2Json() but correctly encodes all possible strings
// including those with escape characters within them.
//
// Initial strings must escape all instances of the
// desired escape character itself 
// (ie "\\t" => '\t', "\\\\" => '\\', "\\/" => '\/')
// as well as any double quotes ("\\\"" => '\"')
//
// Version 1.0 by LepreKhaun 9/19/2013
// May be freely used, modified and distributed with this header intact.
///////////////////////////////
string uList2Json (string type, list values)
{

	integer iter = -1;
	integer listLength = llGetListLength(values);
	
	// Step through list, hitting every other item if JSON_OBJECT
	while ((iter = ++iter + (type == JSON_OBJECT)) < listLength)
		// necessary so we don't choke on next if test
		if (llGetListEntryType(values, iter) == TYPE_STRING)
			// make sure it is not a JSON_* Value or a Number
			if (llJsonValueType(llList2String(values, iter), []) == JSON_INVALID)
				values = llListReplaceList(values, ["\"" + llList2String(values, iter) + "\""], iter, iter);

	return llList2Json(type, values);
}	

//////////////////////////////
// function string uJsonSetValue ( string json, list specifiers, string value )
// This function takes the exact same parameters as
// llJsonSetValue() but correctly encodes all possible strings
// including those with escape characters within them.
//
// Initial strings must escape all instances of the
// desired escape character itself 
// (ie "\\t" => '\t', "\\\\" => '\\', "\\/" => '\/')
// as well as any double quotes ("\\\"" => '\"')
//
// NOTE: To encode a Float or Integer as a String
// within the Json text, eclosed it with escaped quotes
// (ie '"3"' => '3' BUT '"\"3\""' => '"3"')
//
// Version 1.0 by LepreKhaun 9/19/2013
// May be freely used, modified and distributed with this header intact.
///////////////////////////////
string uJsonSetValue(string json, list specifiers, string value)
{
	// We don't want to change the string representation of 
	// an integer, a float or any Json Value Type
	if (llJsonValueType(value, []) == JSON_INVALID)
		value = "\"" + value + "\"";
	return llJsonSetValue(json, specifiers, value);
}


///////////
// Examples showing usage
///////////
 
default
{
	touch_end(integer i)
	{
		string jsonString;
		string jsonText;
		
		// To encode '{"A":"\"Go!\" he yelled.\nShe replied \"No!\"","Z":"\\escaped \\ slosh\\"}'
		jsonText = uList2Json (JSON_OBJECT, [
				"A", "\\\"Go!\\\" he yelled.\\nShe replied \\\"No!\\\"", 
				"Z", "\\\\escaped \\\\ slosh\\\\"
				]);
		llOwnerSay(jsonText);
		
		// To encode '{"Control Chars":"\b\r\f\n\t and Windows uses \r\n for EOL","©":"\u00A9"}'
		jsonText = uList2Json(JSON_OBJECT, [
				"Control Chars", "\\b\\r\\f\\n\\t and Windows uses \\r\\n for EOL", 
				"©", "\\u00A9"
				]);
		llOwnerSay(jsonText);
		
		// To encode '["WebSite","http:\/\/my.com\/ask.php?what%20is%20it","\t"]'
		jsonText = uList2Json(JSON_ARRAY, [
				"WebSite",
				"http:\\/\\/my.com\\/ask.php?what%20is%20it",
				"\\t"
				]);
		llOwnerSay(jsonText);

		// Make a Json object...
		string temp = uList2Json(JSON_OBJECT, [
				"A", 99,
				"Z", "88]",
				"C", JSON_TRUE
				]);
		// ... add it to end of the array ...
		jsonText = uJsonSetValue(jsonText, [JSON_APPEND], temp);
		// ... change our web address ...
		jsonText = uJsonSetValue(jsonText, [1], "http:\\/\\/www.google.com");
		// ... change that TAB in the third spot to PI
		jsonText = uJsonSetValue(jsonText, [2], (string)PI);
		// ... and add a new "Key":Value pair to our object
		jsonText = uJsonSetValue(jsonText, [3, "New"], ((string)PI + "\\n"));
		
		//  ["WebSite","http:\/\/www.google.com",3.141593,{"A":99,"C":true,"New":"3.141593\n","Z":"88]"}]
		llOwnerSay(jsonText);
			   
	}
}

Now, if I can just get the retrieval worked out as simply... ;=)


== More Json Tips, Tricks and Coding Examples ==