Talk:ParseString2List

From Second Life Wiki
Jump to navigation Jump to search

Hi Strife,

you did a really good job with this contribution. I just added a description of this script to 'LSL Script Library' section in Category 'LSL Library', but stumbled over this entry's name.

Why didn't you just name it 'ParseString2List' according to the standard naming conventions for functions?

Greetz --Huney Jewell 04:02, 18 September 2007 (PDT)

Not really sure, was pretty tired by the time I made that contribution. The function naming standard is a bit informal. Seemed like a good idea at the time... which was after having spent many hours on a jet. I wrote it and posted it while I should have been packing for my trip last week, thanks for putting it in the library. -- Strife Onizuka 21:22, 18 September 2007 (PDT)

Strife, maybe you misunderstood. What I'd like to say, is just concerning the name of this entry (not it's function): 'Parse String To List' instead of 'ParseString2List' as usual for LSL Functions.

Greetz --Huney Jewell 01:39, 19 September 2007 (PDT)

Ah. In that Light it seems like a good idea to sync them. I'll move the pages. -- Strife Onizuka 11:04, 19 September 2007 (PDT)

Optimized

I wrote an optimized version for kicks, it should work but I haven't tested it yet. -- Strife Onizuka

Tested and posting on main page. -- Strife Onizuka 16:29, 19 September 2007 (PDT)

How it works

Here is a break down of how this function works (the source is highly optimized and obscures the methods used)

Steps
  1. Set the initial value of index to the first character in the string.
  2. Generate a list that contains the first occurrence of each separator and spacer (along with a pointer back to the spacer/separator).
  3. Sort the list by first occurrence.
  4. Pop the first item off the list. If the occurrence is before index goto Step 7
  5. Add the preceding non-spacer/separator text to the output list. If it is a spacer, add the spacer to the output list too.
  6. Advance the index pointer to the first character after the spacer/separator.
  7. Find the next occurrence of the spacer/separator and if it exists push it onto the far end of the list.
  8. Goto step 3 if spacer/separator occurrence list is not empty.
  9. Return the output list with the addition of any remaining text
Optimizations
  • Instead of using a strided list containing the spacer/separator and an integer we use just a single integer. The low bits are the index of the spacer/separator, the high bits are the occurrence position. This has the added benefit that it solves the problem of ensuring the proper behavior when collisions happen. The low bits restore the original order when collisions occur.
  • While it would be easy to add a special loop for the purpose of checking the following entries in the list to see if their occurrence indexes are now invalid (due to being less than index) it would require more code. The runtime speed gained needs to be balanced against the size requirements of the code.

-- Strife (talk|contribs) 01:56, 2 May 2010 (UTC)

Simpler solution?

Haven't had a chance to really compare, but is there a particular advantage for the function on the main page compared to something simpler like this? <lsl>list ParseStringKeepNullsSafe(string str, list separators, list spacers) { list ret = [str]; { integer x = separators != []; integer y = spacers != [];

integer l; while ((x > 0) && (y > 0)) { l = ret != [];

while ((--l) >= 0) ret = llListReplaceList( (ret = []) + ret, llParseStringKeepNulls(llList2String(ret, l), separators, spacers), l, l );

if ((x -= 8) > 0) separators = llDeleteSubList((separators = []) + separators, 0, 7); if ((y -= 8) > 0) spacers = llDeleteSubList((spacers = []) + spacers, 0, 7); } } return (ret = []) + ret; }</lsl>
-- Haravikk (talk|contribs) 18:38, 14 December 2010 (UTC)

Yes, the one on the main page handles the edge cases properly. Separators always have precedence over spacers, this simplification will change the ordering and it will change the ordering of precedence within both seperators and spacers. For example if you have the string "ABBA" and you try to split it by "BB" and "AB" but "AB" gets processed in the second call to llParseStringKeepNulls, "BB" will have caused the split to be ["A","BB","A"] instead of ["AB","BA"]. The edge cases are where the problem is. The implementation on the main page is without a doubt mildly insane and less than optimal but it was designed for LSO, where strings had to be copied each time they were used. To implement it in the same way it is likely done in the VM would vastly increase script execution time. -- Strife (talk|contribs) 23:09, 14 December 2010 (UTC)
Well it's just that I can't help but feel that there must be an easier way to do it. Hmm, if separators take precedence over spacers, then calling llParseStringKeepNulls() with only the separators until that list is consumed should preserve that behaviour, then you can do as you please with spacers. Now I think about it spacers do require slightly different handling anyway as once a spacer has been ripped out it shouldn't be processed again anyway. Sorry, it's something I've revisited recently since I never normally have more than 8 spacers or separators. My particular case still has a known limit, and it's separators only so easily enough fixed with a pair of calls, but naturally the more general case gets stuck in my head so I've been investigating it =)
-- Haravikk (talk|contribs) 14:31, 15 December 2010 (UTC)
You can use your technique if you know your separators and spacers share no overlaps. Eliminate the edge, and you can simplify the code. Unfortunately I do not know of a simple way to detect overlapping rules. -- Strife (talk|contribs) 23:11, 15 December 2010 (UTC)

Indeterminate Typo

The line

if(offset <= (i = ((r = llList2Integer(separators = llListSort(separators, 1, TRUE), 0))) >> 11)))

has one too many closing braces, but I can't decide which one to remove. You can remove one of the group of 3 after the '0', or you can remove one from the very end of the line. Utterly bizarrely, both versions not only compile, but give identical run time results! Surely that shouldn't be the case. Omei Qunhua 03:07, 23 December 2013 (PST)

I've decided it was the very end ) that was surplus, otherwise there would have been a surplus ( and ) pair. I really don't understand your reticence to using lslEditor, Strife, so it's a trifle slow (not that I notice) but it's got to be better than putting up scripts that don't compile. The few anomalies relative to Mono are quickly learned. (And, we do always have the option of pointing them out to those working on newer releases of lslEditor, for everyone's benefit). You have admitted its usefulness elsewhere on the Wiki, so I was surprised to learn you weren't using any tools but were just putting up 'blind' code. :( Omei Qunhua 16:20, 23 December 2013 (PST)

I'm not surprised about the parentheses not mattering, I do not like having to think about order of operations, so I put in way too many. More over there are times when sure without parentheses it may be equivalent in mathematics but falls apart when dealing with the edge cases (though I can't think of any such situation now). I think it's more an issue of not wanting to have to hunt down order of operations induced bugs. It should be the same. It's: (i = (r >> 11)) vs (i = (r) >> 11), the first is the one I intended. The other reason for doing this is that different languages do things differently, it eliminates having to keep multiple languages OoO straight.
My dev machine was in the middle of upgrading to Windows 8.1 (not that I use 8 for dev, I still boot to 7 for everything) and was instead using the laptop. I recently reinstalled the laptop, so nothing on it. Tried to run LSLEditor but lacked the correct .net version. After installing 3 different versions I finally found the correct one but had to restart several times in the process and I really wanted to get this out first. Much fun was had. I really really do agree with you and wish I wouldn't do it either. <.< I use to give people grief for the very same practice. I'll do to do better.
I only have one gripe with LSLEditor (other than slow), it does not have the same execution order as SL, so you have to be really careful about being too clever. That and some of the functions are implemented differently (sometimes better sometimes not), my SHA bug was due to that (it really was as simple as changing the macro being used). I will not be implementing SHA-3, too complicated to write a fast enough implementation. -- Strife (talk|contribs) 19:30, 23 December 2013 (PST)
The main reason for this update was that I saw an opportunity to speed up execution. In it's current state it will be a bit faster when you have overlapping separators/spacers (skips a list operation) and a bit slower when you don't (and when I say a bit, I mean two low cost instructions 6 bytes LSO). It would be possible to make it auto-balancing, which would increase the size of the data stream you could split (because you are unlikely to ever need 1024 of either let along all 2048 when you combine both). You could also add in a dynamic stream splitter to overcome the length limitation (basically you would have it stop splitting when it got within n characters of the end of the slice where n is the length of the longest piece (you would then add those n characters onto the next slice and repeat). Mind you all of this would increase the number of variables. What I really wanted to do was have a specialized loop just for dealing with multiple overlapping deliminators, but the savings would come at the cost of a nontrivial code bloat. -- Strife (talk|contribs) 20:08, 23 December 2013 (PST)
Thank you for taking my admonition in good spirit. Nice to see some grown-up behaviour on the Wiki :) Omei Qunhua 09:47, 24 December 2013 (PST)
My hands are truly tied. The grief I gave is immortalized in several places. If I flipped out, you could dig it up and it would prove me, beyond all reasonable doubt, to be an asshole, and that goes against my goal of appearing enigmatic. And shit, I've been doing this almost 10 years, if I got all wound up over every little thing, I wouldn't be here, they would have banned me. Sure I was energized and hot headed during the beginning, now it's more just fun. I try to indulge my OCD in areas which nobody notices (eg the subtype project which is what is causing me to put in so many edits currently, the goal of which is to augment the parameter descriptions with their own articles). -- Strife (talk|contribs) 01:07, 25 December 2013 (PST)