Difference between revisions of "Talk:LlStringLength"
(4 intermediate revisions by 2 users not shown) | |||
Line 6: | Line 6: | ||
: (string)llParseString2List(input, [" ", "\n"], []) -- '''[[User:Strife_Onizuka|Strife]]''' <sup><small>([[User talk:Strife_Onizuka|talk]]|[[Special:Contributions/Strife_Onizuka|contribs]])</small></sup> 00:25, 31 March 2009 (UTC) | : (string)llParseString2List(input, [" ", "\n"], []) -- '''[[User:Strife_Onizuka|Strife]]''' <sup><small>([[User talk:Strife_Onizuka|talk]]|[[Special:Contributions/Strife_Onizuka|contribs]])</small></sup> 00:25, 31 March 2009 (UTC) | ||
I just got bitten by the fact that llDialog() has a string length limit in *bytes*, while llStringLength tells me the length of a string in *characters*. I'm trying to figure out how to limit the length of a string in bytes. Strife's suggestion above seems incomplete (the number of characters in a base 64 string does not correspond one-to-one with the number of bytes in the original string, does it?). The getStringBytes snippet on the main page does seem like it does the job, but I don't understand why, and the explanation beneath it is opaque to me. Can someone offer a clearer explanation of what this is doing and why that is the right solution? (Possibly my confusion stems from not understanding how SL represents characters in bytes, or not being quite sure what llEscapeURL actually does - I have looked but not found a clear explantion of either; a link to such an explanation might help.) Thanks! [[User:Brattle Resident|Brattle Resident]] 10:26, 4 February 2014 (PST) | |||
:Don't feel like you are alone, I had to drag my mind through it when I wrote the comments to the function. It was the only way I could prove that it worked. To expand upon those comments: -- '''[[User:Strife_Onizuka|Strife]]''' <sup><small>([[User talk:Strife_Onizuka|talk]]|[[Special:Contributions/Strife_Onizuka|contribs]])</small></sup> 21:08, 4 February 2014 (PST) | |||
:*You can think of '''msg''' as the sum of characters that will be escaped (by llEscapeURL) and those that will not ('''escapable_characters''' + '''unescapable_characters''') | |||
:*Likewise the output of this function will be '''bytes''', which is equal to '''unescaped_bytes''' plus '''escaped_bytes'''. | |||
:*Those that are not escaped always only use 1 byte. So '''unescapable_characters''' equals '''unescaped_bytes'''. | |||
:*Next we escape '''msg''', and store this string to '''s''', so the number of characters in '''s''' equals '''unescapable_characters''' plus 3 times '''escapable_bytes'''. | |||
:*:When we escape an escapable character, each byte that makes up the character is represtned by three characters in the output string, "%##" where "#" is a hex digit. | |||
:*Next we remove from '''s''' every instance of "%" from the string and store it to '''t'''. So the number of characters in '''t''' equals '''unescapable_characters''' plus '''2''' times '''escapable_bytes'''. | |||
:*:This is done by splitting the string up into a list so that "%" is used as a separator (not included in the list values). Then we turn it back into a string. | |||
:*Now the goal at this point is to get the result to be equal to '''unescaped_bytes''' plus '''escaped_bytes'''. | |||
:*#First of all '''s''' and '''t''' do not contain '''unescaped_bytes''', but fortunately that is equal to '''unescapable_characters'''. | |||
:*#So we need the output to be '''unescapable_characters''' plus '''escaped_bytes'''. | |||
:*#Here is where you really need to understand Vectors, because what we are describing here are 2D vectors. For the sake of brevity I'm going to set '''unescapable_characters''' == x and '''escapable_bytes''' == y. | |||
:*#*'''s''' = <x, 3 * y> | |||
:*#*'''t''' = <x, 2 * y> | |||
:*#*'''return''' = <x, y> | |||
:*#*The x and y values are all the same between the 3 Vectors, now we just need to write '''return''' in terms of '''s''' and '''t''' where '''s''' and '''t''' can be modified with constant scalar values. Lets use two scalar values, '''a''' and '''b''' | |||
:*#*So '''return''' = '''a''' * '''s''' + '''b''' * '''t'''. This is basically solving a system of linear equations. Fortunately we do not need to do this. It was done for us. '''a''' == -1, '''b''' == 2. | |||
:*#*'''return''' = -1 * '''s''' + 2 * '''t''' or '''return''' = 2 * '''t''' - '''s'''. | |||
:*#*'''return''' = -1 * <x, 3 * y> + 2 * <x, 2 * y> | |||
:*#*'''return''' = 2 * <2 * x, 2 * y> - <x, 3 * y> | |||
:*#*'''return''' = <2 * x, 4 * y> - <x, 3 * y> | |||
:*#*'''return''' = <2 * x - x, 4 * y - 3 * y> | |||
:*#*'''return''' = <x, y> | |||
:To sum things up, it's a very smart and clever solution to a complex problem and despite being correct it doesn't look like it should be. jinko-san is a true master scripter and it's concise solutions like this that I strive towards (other solutions aren't as concise). -- '''[[User:Strife_Onizuka|Strife]]''' <sup><small>([[User talk:Strife_Onizuka|talk]]|[[Special:Contributions/Strife_Onizuka|contribs]])</small></sup> 21:08, 4 February 2014 (PST) | |||
:P.S. To count bytes with llStringToBase64, you trim off the trailing "=". It works but it's ugly as you need to exploit integer division to ensure you get the 4/3 ratio done right. The one advantage of the Base64 version is that it less likely to crash the script on long strings (list overhead). -- '''[[User:Strife_Onizuka|Strife]]''' <sup><small>([[User talk:Strife_Onizuka|talk]]|[[Special:Contributions/Strife_Onizuka|contribs]])</small></sup> 21:15, 4 February 2014 (PST) | |||
::I've added a Base64 version to [[llStringToBase64]], the one advantage to it is that is less likely to crash the script if you are dealing with really long international strings. I suspect it will be faster but I have not tested it. -- '''[[User:Strife_Onizuka|Strife]]''' <sup><small>([[User talk:Strife_Onizuka|talk]]|[[Special:Contributions/Strife_Onizuka|contribs]])</small></sup> 12:18, 5 February 2014 (PST) | |||
::Thank you, that makes considerably more sense to me, especially when I rethought it using non-vector algebra. In case anyone else finds vector algebra as unintuitive as I do, here's my work. Capital letters are KNOWN values; lowercase are unknowns: | |||
:::* llStringLength(msg) = (unescapable_characters + escapable_characters) | |||
:::* Each unescapable character is represented by one byte, and by one character in the escaped string. | |||
:::* Each escapable character is represented by an unknown number B of bytes, and by 3*B bytes in the escaped string, one of which is "%" such that the number of "%" in the escaped string is the number of bytes occupied by escapable characters. | |||
::: | |||
::: * ub = uc | |||
::: * LU = llStringLength(msg) = uc + ec = ub + ec -> ec = LU - ub | |||
::: * LE = llStringLength(s) = uc + 3*eb = ub + 3*eb | |||
::: * L% = llStringLength(t) = eb | |||
::: * LE = ub + 3*L% | |||
::: * ub = LE - 3*L% | |||
::: * ub + eb = (LE - 3*L%) + L% = LE - 3*L% | |||
::: And that is the value we want - the total number of unescaped (ub) plus escaped (eb) bytes. | |||
::: Whew. [[User:Brattle Resident|Brattle Resident]] 14:15, 3 March 2014 (PST) |
Latest revision as of 14:15, 3 March 2014
Anyone have any idea if LL is working on a built-in function to determine the number of bytes in a string? Considering functions like llHTTPRequest have limitations based on bytes, not just characters, this would be handy.
- Convert the string to base64 (llStringToBase64) and count the characters. -- Strife (talk|contribs) 00:25, 31 March 2009 (UTC)
Anyone know a good method to get the number of characters excluding spaces and new lines? Even better, a method to exclude any particular characters?
- (string)llParseString2List(input, [" ", "\n"], []) -- Strife (talk|contribs) 00:25, 31 March 2009 (UTC)
I just got bitten by the fact that llDialog() has a string length limit in *bytes*, while llStringLength tells me the length of a string in *characters*. I'm trying to figure out how to limit the length of a string in bytes. Strife's suggestion above seems incomplete (the number of characters in a base 64 string does not correspond one-to-one with the number of bytes in the original string, does it?). The getStringBytes snippet on the main page does seem like it does the job, but I don't understand why, and the explanation beneath it is opaque to me. Can someone offer a clearer explanation of what this is doing and why that is the right solution? (Possibly my confusion stems from not understanding how SL represents characters in bytes, or not being quite sure what llEscapeURL actually does - I have looked but not found a clear explantion of either; a link to such an explanation might help.) Thanks! Brattle Resident 10:26, 4 February 2014 (PST)
- Don't feel like you are alone, I had to drag my mind through it when I wrote the comments to the function. It was the only way I could prove that it worked. To expand upon those comments: -- Strife (talk|contribs) 21:08, 4 February 2014 (PST)
- You can think of msg as the sum of characters that will be escaped (by llEscapeURL) and those that will not (escapable_characters + unescapable_characters)
- Likewise the output of this function will be bytes, which is equal to unescaped_bytes plus escaped_bytes.
- Those that are not escaped always only use 1 byte. So unescapable_characters equals unescaped_bytes.
- Next we escape msg, and store this string to s, so the number of characters in s equals unescapable_characters plus 3 times escapable_bytes.
- When we escape an escapable character, each byte that makes up the character is represtned by three characters in the output string, "%##" where "#" is a hex digit.
- Next we remove from s every instance of "%" from the string and store it to t. So the number of characters in t equals unescapable_characters plus 2 times escapable_bytes.
- This is done by splitting the string up into a list so that "%" is used as a separator (not included in the list values). Then we turn it back into a string.
- Now the goal at this point is to get the result to be equal to unescaped_bytes plus escaped_bytes.
- First of all s and t do not contain unescaped_bytes, but fortunately that is equal to unescapable_characters.
- So we need the output to be unescapable_characters plus escaped_bytes.
- Here is where you really need to understand Vectors, because what we are describing here are 2D vectors. For the sake of brevity I'm going to set unescapable_characters == x and escapable_bytes == y.
- s = <x, 3 * y>
- t = <x, 2 * y>
- return = <x, y>
- The x and y values are all the same between the 3 Vectors, now we just need to write return in terms of s and t where s and t can be modified with constant scalar values. Lets use two scalar values, a and b
- So return = a * s + b * t. This is basically solving a system of linear equations. Fortunately we do not need to do this. It was done for us. a == -1, b == 2.
- return = -1 * s + 2 * t or return = 2 * t - s.
- return = -1 * <x, 3 * y> + 2 * <x, 2 * y>
- return = 2 * <2 * x, 2 * y> - <x, 3 * y>
- return = <2 * x, 4 * y> - <x, 3 * y>
- return = <2 * x - x, 4 * y - 3 * y>
- return = <x, y>
- To sum things up, it's a very smart and clever solution to a complex problem and despite being correct it doesn't look like it should be. jinko-san is a true master scripter and it's concise solutions like this that I strive towards (other solutions aren't as concise). -- Strife (talk|contribs) 21:08, 4 February 2014 (PST)
- P.S. To count bytes with llStringToBase64, you trim off the trailing "=". It works but it's ugly as you need to exploit integer division to ensure you get the 4/3 ratio done right. The one advantage of the Base64 version is that it less likely to crash the script on long strings (list overhead). -- Strife (talk|contribs) 21:15, 4 February 2014 (PST)
- I've added a Base64 version to llStringToBase64, the one advantage to it is that is less likely to crash the script if you are dealing with really long international strings. I suspect it will be faster but I have not tested it. -- Strife (talk|contribs) 12:18, 5 February 2014 (PST)
- Thank you, that makes considerably more sense to me, especially when I rethought it using non-vector algebra. In case anyone else finds vector algebra as unintuitive as I do, here's my work. Capital letters are KNOWN values; lowercase are unknowns:
- llStringLength(msg) = (unescapable_characters + escapable_characters)
- Each unescapable character is represented by one byte, and by one character in the escaped string.
- Each escapable character is represented by an unknown number B of bytes, and by 3*B bytes in the escaped string, one of which is "%" such that the number of "%" in the escaped string is the number of bytes occupied by escapable characters.
- * ub = uc
- * LU = llStringLength(msg) = uc + ec = ub + ec -> ec = LU - ub
- * LE = llStringLength(s) = uc + 3*eb = ub + 3*eb
- * L% = llStringLength(t) = eb
- * LE = ub + 3*L%
- * ub = LE - 3*L%
- * ub + eb = (LE - 3*L%) + L% = LE - 3*L%
- And that is the value we want - the total number of unescaped (ub) plus escaped (eb) bytes.
- Whew. Brattle Resident 14:15, 3 March 2014 (PST)