User:rhet0rica Resident/glob match

From Second Life Wiki
< User:Rhet0rica Resident
Revision as of 16:06, 3 July 2024 by Rhet0rica Resident (talk | contribs) (Created page with "{{LSLC|User-Defined Functions}} {{LSL_Function |func=glob |func_desc=Ported by rhet0rica from [https://github.com/torvalds/linux/blob/master/lib/gl...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Summary

Function: integer glob( string pat, string str );

Ported by rhet0rica from the Linux kernel, this function takes a pattern string and a query string as input and returns TRUE or FALSE depending on whether the query satisfies the pattern. This is the same method used by POSIX systems to match wildcards in filenames. License: Dual MIT/GPL.
integer glob_match(string pat, string str)
{
	/*
	 * Backtrack to previous * on mismatch and retry starting one
	 * character later in the string.  Because * matches all characters
	 * (no exception for /), it can be easily proved that there's
	 * never a need to backtrack multiple levels.
	 */
	
	integer back_pat = NOWHERE;
	integer back_str;
	
	/*
	 * Loop over each token (character or class) in pat, matching
	 * it against the remaining unmatched tail of str.  Return false
	 * on mismatch, or true after matching the trailing nul bytes.
	 */
	
	integer str_i;
	integer pat_i;
		
	while (TRUE) {
		integer c = llOrd(str, str_i++);
		integer d = llOrd(pat, pat_i++);
		
		/* Wildcard: anything but nul */
		if(d == 0x3f) { // '?'
			if (c == 0x00) // EOL
				// return false;
				return FALSE;
			//break;
			jump skip_default;
		/* Any-length wildcard */
		} else if(d == 0x2a) { // '*'
			/* Optimize trailing * case */
			if(llOrd(pat, pat_i) == 0x00)
				return TRUE;
			back_pat = pat_i;
			/* Allow zero-length match */
			back_str = --str_i;
			jump skip_default;
		/* Character class */
		} else if(d == 0x5b) { // '['
			integer match = FALSE;
			integer inverted = (llOrd(pat, pat_i) == 0x21); // '!'
			integer class_i = pat_i + inverted;
			integer a = llOrd(pat, class_i++);

			/*
			 * Iterate over each span in the character class.
			 * A span is either a single character a, or a
			 * range a-b.  The first span may begin with ']'.
			 */
			do {
				integer b = a;

				/* Malformed */
				if(a == 0x00)
					// goto literal;
					jump literal; // FIXME

				if(llOrd(pat, class_i + 0) == 0x2d && llOrd(pat, class_i + 1) != 0x5d) {
					b = llOrd(pat, class_i + 1);

					if(b == 0x00)
						jump literal;

					class_i += 2;
					/* Any special action if a > b? */
				}
				match = match | (a <= c && c <= b);
			} while((a = llOrd(pat, class_i++)) != 0x5d);

			if (match == inverted)
				jump backtrack;
			pat_i = class_i;
			jump skip_default;
		} else if(d == 0x5c) { // '\\'
			// d = *pat++;
			d = llOrd(pat, pat_i++);
			// fallthrough;
		}
		//default:	/* Literal character */
		@literal;
			if (c == d) {
				//if (d == '\0')
				if(d == 0x00)
					return TRUE;
				jump skip_default;
			}
		@backtrack;
			if(c == 0x00 || !~back_pat)
				/* No point continuing */
				return FALSE;
			/* Try again from last *, one character later in str. */
			pat_i = back_pat;
			str_i = ++back_str;
			jump skip_default;
		@skip_default;
		#ifdef DEBUG_GLOB
		last_d = d;
		last_c = c;
		last_str_i = str_i;
		last_pat_i = pat_i;
		#endif
	}
	return FALSE; // unreachable
}

Returns an integer TRUE or FALSE.
• string pat The string to use as a template. * matches zero or more characters, ? matches exactly one character, [abc] matches a, b, or c, and [!abc] matches any character but a, b, or c. Other characters are interpreted as literals.
• string str The query string to evaluate.

Caveats

  • An unmatched [ in the pattern string will be interpreted as a literal left square bracket.
  • This is not a full regular expression implementation. Hyphens in character classes (square brackets) are not supported.

Examples

Deep Notes

Signature

function integer glob( string pat, string str );