Separate Words

From Second Life Wiki
Revision as of 19:23, 27 September 2007 by Ppaatt Lynagh (talk | contribs) (begin the demo with the quote of its input, end the demo with a print of its input)
Jump to navigation Jump to search

Function: list separateWords(string src,list separators,list spacers);

Returns the words of the src string by keeping the spacers, discarding the separators, and also getting the words in between.

Parameters:

• string src source string
• list separators separators to be discarded
• list spacers spacers to be kept

This separateWords function works like llParseString2List but accepts many more spacers and separators. If you began by using llParseString2List and then your code grew to involve more than 8 spacers or separators, you might want to call separateWords, keepSpacers, and/or discardSeparators, in place of calling llParseString2List.

This separateWords function does not return the empty words that arguably exist between adjacent separators and spacers. The separateWords and llParseString2List functions do not return such words (i.e., do not return "any null values generated"). The llParseStringKeepNulls function does return such words.

Preconditions to avoid confusion:

  1. Provide lists of strings as the separators and spacers, not mixed lists of strings and floats and such.
  2. Don't let any spacer contain or equal a separator, and don't let any separator contain or equal a spacer.
  3. Do list each spacer and separator only once.

Caveats:

Same as llParseString2List, this function returns a list of strings, not a mixed list of strings and floats and such. Cast an entry of the list to the type you need, e.g., { string words = separateWords(...); integer value = (integer) llList2String(words, index); }. Remember that LSL cast to integer works better than llList2Integer, e.g., cast to integer understands hexadecimal integer literals such as 0x2A.

Implementation

// http://wiki.secondlife.com/wiki/Separate_Words

// Keep the spacers, discard the separators, and get the words in between, within
// astonishing limits described at http://wiki.secondlife.com/wiki/llParseString2List

list keepSpacersDiscardSeparators(list sources, list separators, list spacers)
{
    list words = [];

    integer index;
    integer sourcing = llGetListLength(sources);

    for (index = 0; index < sourcing; ++index)
    {
        string source = llList2String(sources, index);
        words += llParseString2List(source, separators, spacers);

    }
    return words;
}

// Keep the spacers and get the words in between.

list keepSpacers(list sources, list spacers)
{
    list words = sources;
    integer index;
    integer spacing = llGetListLength(spacers);
    for (index = 0; index < spacing; index += 8)
    {
        list someSpacers = llList2List(spacers, index, index + 8 - 1);
        words = keepSpacersDiscardSeparators(words, [], someSpacers);
    }
    return words;
}

// Discard the separators but get the words in between.

list discardSeparators(list sources, list separators)
{
    list words = sources;
    integer index;
    integer separating = llGetListLength(separators);
    for (index = 0; index < separating; index += 8)
    {
        list someSeparators = llList2List(separators, index, index + 8 - 1);
        words = keepSpacersDiscardSeparators(words, someSeparators, []);
    }
    return words;
}

// Keep the spacers and discard the separators and get the words in between.

list separateWords(string src, list separators, list spacers)
{
    return discardSeparators(keepSpacers([src], spacers), separators);
}

Demo

Asking to keep the spacers, discard the separators, and get the words between out of this src:

42 0.99 "00000000-0000-0000-0000-000000000000" [abc, def] "xyz\\"zyx ijk" <0, 1, 2, 3> // source literals OK

says:
0: 42
1: 0.99
2: "
3: 00000000
4: -
5: 0000
6: -
7: 0000
8: -
9: 0000
10: -
11: 000000000000
12: "
13: [
14: abc
15: def
16: ]
17: "
18: xyz
19: \
20: \
21: "
22: zyx
23: ijk
24: "
25: <
26: 0
27: 1
28: 2
29: 3
30: >
31: /
32: /
33: source
34: literals
42 0.99 "00000000-0000-0000-0000-000000000000" [abc, def] "xyz\\"zyx ijk" <0, 1, 2, 3> // source literals OK


// Demo keeping the spacers, discarding the separators, and getting the words in between.

string src()
{
    return "42 0.99 \"00000000-0000-0000-0000-000000000000\"" +
        " [abc, def] \"xyz\\\\\"zyx ijk\" <0, 1, 2, 3> // source literals";
}

string LF = "\n";
string DQUOTE = "\""; // double quote
string ESCAPE = "\\";

list spacers = [DQUOTE, "(", ")", "<", ">", "[", "]", "/", "+", "-", "*", "%", ESCAPE];

list separators()
{
    string TAB = llUnescapeURL("%09"); // != "\t"
    string CR = llUnescapeURL("%0D"); // != "\r"
    return [TAB, LF, CR, " ", ",", ";"];
}

ownerSayStrings(list strings)
{
    integer stringing = llGetListLength(strings);
    integer index;
    for (index = 0; index < stringing; ++index)
    {
        llOwnerSay((string) index + ": " + llList2String(strings, index));
    }        
}

default
{
    state_entry()
    {
        list words = separateWords(src(), separators(), spacers);
        ownerSayStrings(words);
        llOwnerSay(src());
        llOwnerSay("OK");
    }
}