Difference between revisions of "User:LindaB Helendale/UTF8StringLength"
Jump to navigation
Jump to search
Line 3: | Line 3: | ||
Channel communication, llOwnerSay, notecards, email, http calls, etc. use UTF-8 coding, with one character taking one, two or three bytes. The limits of message length are defined as bytes (e.g. llEmail 4500 bytes, llSay 1024 byes), and it may not be obvious how many bytes a string takes. | Channel communication, llOwnerSay, notecards, email, http calls, etc. use UTF-8 coding, with one character taking one, two or three bytes. The limits of message length are defined as bytes (e.g. llEmail 4500 bytes, llSay 1024 byes), and it may not be obvious how many bytes a string takes. | ||
Explanation of the formula: | Explanation of the formula: | ||
L is the string length in utf-8 we want. | L is the string length in utf-8 we want. | ||
N is the length of the string escaped by llEscapeURL. | N is the length of the string escaped by llEscapeURL. | ||
Line 25: | Line 25: | ||
length gives the number of bytes. | length gives the number of bytes. | ||
You may use this script any way you wish. | You may use this script any way you wish. | ||
(c) LindaB Helendale | |||
Revision as of 14:35, 10 March 2011
UTF8StringLength : returns the number of bytes a string takes in UTF-8 coding.
Channel communication, llOwnerSay, notecards, email, http calls, etc. use UTF-8 coding, with one character taking one, two or three bytes. The limits of message length are defined as bytes (e.g. llEmail 4500 bytes, llSay 1024 byes), and it may not be obvious how many bytes a string takes.
Explanation of the formula: L is the string length in utf-8 we want. N is the length of the string escaped by llEscapeURL. In the escaped string utf-8 characters with [1, 2, 3] bytes map to strings of [1, 6, 9] plain ascii chars, with each triplet of the form %XX. Let P be the number of '%' characters in the escaped string, and n1, n2 and n3 the number of 1,2 and 3 byte characters. Then L = n1 + 2 n2 + 3 n3 N = n1 + 6 n2 + 9 n3 P = 2 n1 + 3 n3 Substitute P to N N = n1 + 3 P => n1 = N - 3 P and substitute in L L = (N - 3 P) + P = N - 2 P Another way to derive the formula, more intuitively: In the escaped string every % represents triplet %XX, corresponding to one byte in the UTF-8 code, and it increases the escaped string length by three, thus subtracting 2*number of "%"'s from the escaped string length gives the number of bytes.
You may use this script any way you wish. (c) LindaB Helendale
<lsl>
integer UTF8StringLength(string str) {
// UTF8StringLength : returns the number of bytes a string takes in UTF-8 coding. // Useful in guarding against limits in communication to avoid clipped messages. // LindaB Helendale, permission to use this script in any way granted. string strEscaped = llEscapeURL(str); integer N = llStringLength(strEscaped); integer P = llGetListLength(llParseStringKeepNulls(strEscaped,["%"],[]))-1; return N - 2 * P ;
} </lsl>