Difference between revisions of "User:Rhet0rica Resident/glob match"

From Second Life Wiki
Jump to navigation Jump to search
(Created page with "{{LSLC|User-Defined Functions}} {{LSL_Function |func=glob |func_desc=Ported by rhet0rica from [https://github.com/torvalds/linux/blob/master/lib/gl...")
 
Line 1: Line 1:
{{LSLC|User-Defined Functions}}
{{LSLC|User-Defined Functions}}
{{LSL_Function
{{LSL_Function
|func=glob
|func=glob_match
|func_desc=Ported by [[User:Rhet0rica Resident|rhet0rica]] from [https://github.com/torvalds/linux/blob/master/lib/glob.c the Linux kernel], this function takes a pattern string and a query string as input and returns [[TRUE]] or [[FALSE]] depending on whether the query satisfies the pattern. This is the same method used by POSIX systems to match wildcards in filenames. License: Dual MIT/GPL. <syntaxhighlight lang="lsl">integer glob_match(string pat, string str)
|func_desc=Ported by [[User:Rhet0rica Resident|rhet0rica]] from [https://github.com/torvalds/linux/blob/master/lib/glob.c the Linux kernel], this function takes a pattern string and a query string as input and returns [[TRUE]] or [[FALSE]] depending on whether the query satisfies the pattern. This is the same method used by POSIX systems to match wildcards in filenames. License: Dual MIT/GPL. <syntaxhighlight lang="lsl">
 
/**
* glob_match - Shell-style pattern matching, like !fnmatch(pat, str, 0)
* @pat: Shell-style pattern to match, e.g. "*.[ch]".
* @str: String to match.  The pattern must match the entire string.
*
* Perform shell-style glob matching, returning true (1) if the match
* succeeds, or false (0) if it fails.  Equivalent to !fnmatch(@pat, @str, 0).
*
* Pattern metacharacters are ?, *, [ and \.
* (And, inside character classes, !, - and ].)
*
* This is small and simple implementation intended for device blacklists
* where a string is matched against a number of patterns.  Thus, it
* does not preprocess the patterns.  It is non-recursive, and run-time
* is at most quadratic: strlen(@str)*strlen(@pat).
*
* An example of the worst case is glob_match("*aaaaa", "aaaaaaaaaa");
* it takes 6 passes over the pattern before matching the string.
*
* Like !fnmatch(@pat, @str, 0) and unlike the shell, this does NOT
* treat / or leading . specially; it isn't actually used for pathnames.
*
* Note that according to glob(7) (and unlike bash), character classes
* are complemented by a leading !; this does not support the regex-style
* [^a-z] syntax.
*
* An opening bracket without a matching close is matched literally.
*/
 
integer glob_match(string pat, string str)
{
{
/*
/*
Line 30: Line 61:
if(d == 0x3f) { // '?'
if(d == 0x3f) { // '?'
if (c == 0x00) // EOL
if (c == 0x00) // EOL
// return false;
return FALSE;
return FALSE;
//break;
jump skip_default;
jump skip_default;
/* Any-length wildcard */
/* Any-length wildcard */
Line 60: Line 89:
/* Malformed */
/* Malformed */
if(a == 0x00)
if(a == 0x00)
// goto literal;
jump literal; // FIXME
jump literal; // FIXME


Line 80: Line 108:
jump skip_default;
jump skip_default;
} else if(d == 0x5c) { // '\\'
} else if(d == 0x5c) { // '\\'
// d = *pat++;
d = llOrd(pat, pat_i++);
d = llOrd(pat, pat_i++);
// fallthrough;
}
}
//default: /* Literal character */
//default: /* Literal character */
@literal;
@literal;
if (c == d) {
if (c == d) {
//if (d == '\0')
if(d == 0x00)
if(d == 0x00)
return TRUE;
return TRUE;
Line 125: Line 150:
|caveats=* An unmatched <samp>[</samp> in the pattern string will be interpreted as a literal left square bracket.
|caveats=* An unmatched <samp>[</samp> in the pattern string will be interpreted as a literal left square bracket.
* This is not a full regular expression implementation. Hyphens in character classes (square brackets) are not supported.
* This is not a full regular expression implementation. Hyphens in character classes (square brackets) are not supported.
|examples
|examples=<syntaxhighlight lang="lsl">glob_match("a*c", "ac"); // TRUE
glob_match("a*c*", "alpha centauri"); // TRUE
glob_match("a?c", "ac"); // FALSE
glob_match("*a*c?", "space"); // TRUE
glob_match("gr[ae]y", "grey"); // TRUE
glob_match("gr[ae]y", "gray"); // TRUE
glob_match("gr[!ae]y", "grXy"); // TRUE
glob_match("gr[!ae]y", "gry"); // FALSE</syntaxhighlight>
|helpers
|helpers
|also_header
|also_header
Line 133: Line 165:
|also_articles
|also_articles
|also_footer
|also_footer
|notes
|notes=* For case-insensitive match, use [[llToLower]] first.
|mode
|mode
|deprecated
|deprecated

Revision as of 16:14, 3 July 2024

Summary

Function: integer glob_match( string pat, string str );

Ported by rhet0rica from the Linux kernel, this function takes a pattern string and a query string as input and returns TRUE or FALSE depending on whether the query satisfies the pattern. This is the same method used by POSIX systems to match wildcards in filenames. License: Dual MIT/GPL.
/**
 * glob_match - Shell-style pattern matching, like !fnmatch(pat, str, 0)
 * @pat: Shell-style pattern to match, e.g. "*.[ch]".
 * @str: String to match.  The pattern must match the entire string.
 *
 * Perform shell-style glob matching, returning true (1) if the match
 * succeeds, or false (0) if it fails.  Equivalent to !fnmatch(@pat, @str, 0).
 *
 * Pattern metacharacters are ?, *, [ and \.
 * (And, inside character classes, !, - and ].)
 *
 * This is small and simple implementation intended for device blacklists
 * where a string is matched against a number of patterns.  Thus, it
 * does not preprocess the patterns.  It is non-recursive, and run-time
 * is at most quadratic: strlen(@str)*strlen(@pat).
 *
 * An example of the worst case is glob_match("*aaaaa", "aaaaaaaaaa");
 * it takes 6 passes over the pattern before matching the string.
 *
 * Like !fnmatch(@pat, @str, 0) and unlike the shell, this does NOT
 * treat / or leading . specially; it isn't actually used for pathnames.
 *
 * Note that according to glob(7) (and unlike bash), character classes
 * are complemented by a leading !; this does not support the regex-style
 * [^a-z] syntax.
 *
 * An opening bracket without a matching close is matched literally.
 */

integer glob_match(string pat, string str)
{
	/*
	 * Backtrack to previous * on mismatch and retry starting one
	 * character later in the string.  Because * matches all characters
	 * (no exception for /), it can be easily proved that there's
	 * never a need to backtrack multiple levels.
	 */
	
	integer back_pat = NOWHERE;
	integer back_str;
	
	/*
	 * Loop over each token (character or class) in pat, matching
	 * it against the remaining unmatched tail of str.  Return false
	 * on mismatch, or true after matching the trailing nul bytes.
	 */
	
	integer str_i;
	integer pat_i;
		
	while (TRUE) {
		integer c = llOrd(str, str_i++);
		integer d = llOrd(pat, pat_i++);
		
		/* Wildcard: anything but nul */
		if(d == 0x3f) { // '?'
			if (c == 0x00) // EOL
				return FALSE;
			jump skip_default;
		/* Any-length wildcard */
		} else if(d == 0x2a) { // '*'
			/* Optimize trailing * case */
			if(llOrd(pat, pat_i) == 0x00)
				return TRUE;
			back_pat = pat_i;
			/* Allow zero-length match */
			back_str = --str_i;
			jump skip_default;
		/* Character class */
		} else if(d == 0x5b) { // '['
			integer match = FALSE;
			integer inverted = (llOrd(pat, pat_i) == 0x21); // '!'
			integer class_i = pat_i + inverted;
			integer a = llOrd(pat, class_i++);

			/*
			 * Iterate over each span in the character class.
			 * A span is either a single character a, or a
			 * range a-b.  The first span may begin with ']'.
			 */
			do {
				integer b = a;

				/* Malformed */
				if(a == 0x00)
					jump literal; // FIXME

				if(llOrd(pat, class_i + 0) == 0x2d && llOrd(pat, class_i + 1) != 0x5d) {
					b = llOrd(pat, class_i + 1);

					if(b == 0x00)
						jump literal;

					class_i += 2;
					/* Any special action if a > b? */
				}
				match = match | (a <= c && c <= b);
			} while((a = llOrd(pat, class_i++)) != 0x5d);

			if (match == inverted)
				jump backtrack;
			pat_i = class_i;
			jump skip_default;
		} else if(d == 0x5c) { // '\\'
			d = llOrd(pat, pat_i++);
		}
		//default:	/* Literal character */
		@literal;
			if (c == d) {
				if(d == 0x00)
					return TRUE;
				jump skip_default;
			}
		@backtrack;
			if(c == 0x00 || !~back_pat)
				/* No point continuing */
				return FALSE;
			/* Try again from last *, one character later in str. */
			pat_i = back_pat;
			str_i = ++back_str;
			jump skip_default;
		@skip_default;
		#ifdef DEBUG_GLOB
		last_d = d;
		last_c = c;
		last_str_i = str_i;
		last_pat_i = pat_i;
		#endif
	}
	return FALSE; // unreachable
}

Returns an integer TRUE or FALSE.
• string pat The string to use as a template. * matches zero or more characters, ? matches exactly one character, [abc] matches a, b, or c, and [!abc] matches any character but a, b, or c. Other characters are interpreted as literals.
• string str The query string to evaluate.

Caveats

  • An unmatched [ in the pattern string will be interpreted as a literal left square bracket.
  • This is not a full regular expression implementation. Hyphens in character classes (square brackets) are not supported.

Examples

glob_match("a*c", "ac"); // TRUE
glob_match("a*c*", "alpha centauri"); // TRUE
glob_match("a?c", "ac"); // FALSE
glob_match("*a*c?", "space"); // TRUE
glob_match("gr[ae]y", "grey"); // TRUE
glob_match("gr[ae]y", "gray"); // TRUE
glob_match("gr[!ae]y", "grXy"); // TRUE
glob_match("gr[!ae]y", "gry"); // FALSE

Notes

  • For case-insensitive match, use llToLower first.

Deep Notes

Signature

function integer glob_match( string pat, string str );