Difference between revisions of "Talk:ParseString2List"

Revision as of 07:31, 15 December 2010

Hi Strife,

you did a really good job with this contribution. I just added a description of this script to 'LSL Script Library' section in Category 'LSL Library', but stumbled over this entry's name.

Why didn't you just name it 'ParseString2List' according to the standard naming conventions for functions?

Greetz --Huney Jewell 04:02, 18 September 2007 (PDT)

Not really sure, was pretty tired by the time I made that contribution. The function naming standard is a bit informal. Seemed like a good idea at the time... which was after having spent many hours on a jet. I wrote it and posted it while I should have been packing for my trip last week, thanks for putting it in the library. -- Strife Onizuka 21:22, 18 September 2007 (PDT)

Strife, maybe you misunderstood. What I'd like to say, is just concerning the name of this entry (not it's function): 'Parse String To List' instead of 'ParseString2List' as usual for LSL Functions.

Greetz --Huney Jewell 01:39, 19 September 2007 (PDT)

Ah. In that Light it seems like a good idea to sync them. I'll move the pages. -- Strife Onizuka 11:04, 19 September 2007 (PDT)

Optimized

I wrote an optimized version for kicks, it should work but I haven't tested it yet. -- Strife Onizuka

Tested and posting on main page. -- Strife Onizuka 16:29, 19 September 2007 (PDT)

How it works

Here is a break down of how this function works (the source is highly optimized and obscures the methods used)

Steps

Set the initial value of index to the first character in the string.
Generate a list that contains the first occurrence of each separator and spacer (along with a pointer back to the spacer/separator).
Sort the list by first occurrence.
Pop the first item off the list. If the occurrence is before index goto Step 7
Add the preceding non-spacer/separator text to the output list. If it is a spacer, add the spacer to the output list too.
Advance the index pointer to the first character after the spacer/separator.
Find the next occurrence of the spacer/separator and if it exists push it onto the far end of the list.
Goto step 3 if spacer/separator occurrence list is not empty.
Return the output list with the addition of any remaining text

Optimizations

Instead of using a strided list containing the spacer/separator and an integer we use just a single integer. The low bits are the index of the spacer/separator, the high bits are the occurrence position. This has the added benefit that it solves the problem of ensuring the proper behavior when collisions happen. The low bits restore the original order when collisions occur.
While it would be easy to add a special loop for the purpose of checking the following entries in the list to see if their occurrence indexes are now invalid (due to being less than index) it would require more code. The runtime speed gained needs to be balanced against the size requirements of the code.

-- Strife ^{(talk|contribs)} 01:56, 2 May 2010 (UTC)

Simpler solution?

Haven't had a chance to really compare, but is there a particular advantage for the function on the main page compared to something simpler like this? <lsl>list ParseStringKeepNullsSafe(string str, list separators, list spacers) { list ret = [str]; { integer x = separators != []; integer y = spacers != [];

integer l; while ((x > 0) && (y > 0)) { l = ret != [];

while ((--l) >= 0) ret = llListReplaceList( (ret = []) + ret, llParseStringKeepNulls(llList2String(ret, l), separators, spacers), l, l );

if ((x -= 8) > 0) separators = llDeleteSubList((separators = []) + separators, 0, 7); if ((y -= 8) > 0) spacers = llDeleteSubList((spacers = []) + spacers, 0, 7); } } return (ret = []) + ret; }</lsl>
-- Haravikk ^{(talk|contribs)} 18:38, 14 December 2010 (UTC)

Yes, the one on the main page handles the edge cases properly. Separators always have precedence over spacers, this simplification will change the ordering and it will change the ordering of precedence within both seperators and spacers. For example if you have the string "ABBA" and you try to split it by "BB" and "AB" but "AB" gets processed in the second call to llParseStringKeepNulls, "BB" will have caused the split to be ["A","BB","A"] instead of ["AB","BA"]. The edge cases are where the problem is. The implementation on the main page is without a doubt mildly insane and less than optimal but it was designed for LSO, where strings had to be copied each time they were used. To implement it in the same way it is likely done in the VM would vastly increase script execution time. -- Strife ^{(talk|contribs)} 23:09, 14 December 2010 (UTC)

Well it's just that I can't help but feel that there must be an easier way to do it. Hmm, if separators take precedence over spacers, then calling llParseStringKeepNulls() with only the separators until that list is consumed should preserve that behaviour, then you can do as you please with spacers. Now I think about it spacers do require slightly different handling anyway as once a spacer has been ripped out it shouldn't be processed again anyway. Sorry, it's something I've revisited recently since I never normally have more than 8 spacers or separators. My particular case still has a known limit, and it's separators only so easily enough fixed with a pair of calls, but naturally the more general case gets stuck in my head so I've been investigating it =)
-- Haravikk ^{(talk|contribs)} 14:31, 15 December 2010 (UTC)

Revision as of 16:09, 14 December 2010 (view source) Strife Onizuka (talk \| contribs) (→‎Simpler solution?) ← Older edit		Revision as of 07:31, 15 December 2010 (view source) Haravikk Mistral (talk \| contribs) (→‎Simpler solution?) Newer edit →
Line 72:		Line 72:

	:Yes, the one on the main page handles the edge cases properly. Separators always have precedence over spacers, this simplification will change the ordering and it will change the ordering of precedence within both seperators and spacers. For example if you have the string "ABBA" and you try to split it by "BB" and "AB" but "AB" gets processed in the second call to llParseStringKeepNulls, "BB" will have caused the split to be ["A","BB","A"] instead of ["AB","BA"]. The edge cases are where the problem is. The implementation on the main page is without a doubt mildly insane and less than optimal but it was designed for LSO, where strings had to be copied each time they were used. To implement it in the same way it is likely done in the VM would vastly increase script execution time. -- '''[[User:Strife_Onizuka\|Strife]]''' <sup><small>([[User talk:Strife_Onizuka\|talk]]\|[[Special:Contributions/Strife_Onizuka\|contribs]])</small></sup> 23:09, 14 December 2010 (UTC)		:Yes, the one on the main page handles the edge cases properly. Separators always have precedence over spacers, this simplification will change the ordering and it will change the ordering of precedence within both seperators and spacers. For example if you have the string "ABBA" and you try to split it by "BB" and "AB" but "AB" gets processed in the second call to llParseStringKeepNulls, "BB" will have caused the split to be ["A","BB","A"] instead of ["AB","BA"]. The edge cases are where the problem is. The implementation on the main page is without a doubt mildly insane and less than optimal but it was designed for LSO, where strings had to be copied each time they were used. To implement it in the same way it is likely done in the VM would vastly increase script execution time. -- '''[[User:Strife_Onizuka\|Strife]]''' <sup><small>([[User talk:Strife_Onizuka\|talk]]\|[[Special:Contributions/Strife_Onizuka\|contribs]])</small></sup> 23:09, 14 December 2010 (UTC)

			::Well it's just that I can't help but feel that there must be an easier way to do it. Hmm, if separators take precedence over spacers, then calling llParseStringKeepNulls() with only the separators until that list is consumed should preserve that behaviour, then you can do as you please with spacers. Now I think about it spacers do require slightly different handling anyway as once a spacer has been ripped out it shouldn't be processed again anyway. Sorry, it's something I've revisited recently since I never normally have more than 8 spacers or separators. My particular case still has a known limit, and it's separators only so easily enough fixed with a pair of calls, but naturally the more general case gets stuck in my head so I've been investigating it =)<br/>-- '''[[User:Haravikk_Mistral\|Haravikk]]''' <sup><small>([[User_talk:Haravikk_Mistral\|talk]]\|[[Special:Contributions/Haravikk_Mistral\|contribs]])</small></sup> 14:31, 15 December 2010 (UTC)