Package: Util.Strings

This piece of software is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. This software is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License with this distribution, see file "GPL.txt". If not, write to the Free Software Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

As a special exception from the GPL, if other files instantiate generics from this unit, or you link this unit with other files to produce an executable, this unit does not by itself cause the resulting executable to be covered by the GPL. This exception does not however invalidate any other reasons why the executable file might be covered by the GPL.

Version: 1.1

Author:: Thomas Wolf (TW) <twolf AT acm DOT org>

Purpose:: Various string utilities not provided in the standard library. Some of these also are repeated here, so that one can get all one needs with a single "with".

Tasking Semantics: Neither task- nor abortion-safe.

Storage Semantics: No dynamic storage allocation.

Raised by Wildcard_Match if a pattern is malformed.

Anything that can be considered white space: not just a blank, but also tabs, non-breaking spaces, carriage returns, and so on.

7-bit ASCII letters, i.e. A-Z and a-z.

This constant is used to indicate to the string parsing operations Get_String and In_String that string delimiters cannot be escaped.

Quotes typically recognized by command shells: double, single, and back quote.

Typical string quotes: double and single quotes.

Maps all character immediately following an underscore ('_') or a period ('.') or a white space as defined by Blanks below to upper case, all others to lower case.

Returns the number of occurrences of Ch in the string Src.

As Ada.Strings.Fixed.Count, but without mapping and therefore way faster.

Returns the index of the first (or last, if Dir is Backward) occurrence of Ch in the string Src, or zero if no occurrence of this character can be found.

As Index, but hard-wired to searching forward.

As Index, but hard-wired to searching backward.

As Index, but hard-wired to searching forward. Way faster than Ada.Strings.Fixed.Index, also because no mapping is applied.

As Index, but hard-wired to searching backward. Way faster than Ada.Strings.Fixed.Index, also because no mapping is applied.

As Ada.Strings.Fixed.Index, but hard-wired to not using a mapping.

Returns True if Source starts with Prefix, False otherwise.

Returns True if Source ends with Suffix, False otherwise.

Returns Ada.Strings.Maps.Is_In (Ch, Blanks).

Returns Ada.Strings.Maps.Is_In (Ch, Set). Provided mainly because I very often mix up the order of the arguments.

Removes all characters in Blanks declared above from the specified string end.

Removes the specified character sets. The point of this renaming is the default parameter.

Returns in From and To the indices of the beginning or end of the next string in S.

A string is defined as a sequence of characters enclosed by Delim; any occurrences of Delim after the first Delim that are immediately preceeded by Escape do not yet terminate the string but are part of the string's content.

Escape

/= Delim Delimiters that are part of the string must follow an Escape immediately. Two Escapes in a row are considered one literal Escape. For instance, with Delim = '"' and Escape = '\', the operation recognizes C strings.

= Delim Delimiters that are part of the string must be doubled, an in Ada strings.

= No_Escape Strings cannot contain instances of the delimiter. The second occurrence of a delimiter in S is the string end.

If no string is found, both From and To are zero.

If an unterminated string is found, From is the index of the opening occurrence of Delim, and To is zero.

Otherwise, a string was found, and From and To are the indices of the opening and closing occurrences of Delim, respectively.

Returns True if the end of S is within an unterminated "string" (as described above), and False otherwise. (If S ends with an unterminated string, returns True, otherwise False.)

Returns the index of the closing occurrence of Delim of the string in S. S (S'First) should be the opening occurrence of Delim. The semantics of Delim and Escape are as for Get_String.

Returns zero if co closing occurrence of Delim can be found in S.

Quote a string. S is supposed to contain the string's contents (without the delimiters). Any embedded delimiter is quoted as follows:

If Escape = No_Escape, S is returned.
If Escape = Delim, all occurrences of Delim in S are replaced by two Delims.
Otherwise, an Escape is inserted before any occurrence of Delim or Escape in S.

Unquotes embedded delimiters in a string. S is supposed to contain the string's contents without the bounding delimiters.

If Escape = No_Escape, S is returned.
If Escape = Delim, all non-overlapping occurrences of two consecutive Delims in S are replaced by a single Delim.
Otherwise, any non-overlapping occurrence of two Escapes in S is replaced by a single Escape, and any occurrence of an Escape immediately followed by a Delim is replaced by a single Delim.

In all cases, the following is true:

    Unquote (Quote (S, Delim, Escape), Delim, Escape) = S

Unquotes all non-overlapping occurrences of strings within S delimited by any character in Quotes. If Escape = No_Escape, the Ada convention (embedded delimiters must be doubled) is assumed, otherwise, embedded delimiters must be escaped by Escape.

If S starts with an identifier, returns the index of the identifier's last character. Otherwise, returns zero. For the purpose of this function, an identifier has the following syntax:

     Identifier = Letter {Letter | Digit | '_'}.
     Letter     = 'A' .. 'Z' | 'a' ..'z'.
     Digit      = '0' .. '9'.

Note that this is the Ada 95 syntax, except that multiple underscores in a row are allowed.

Returns the index of the first character in S such that Is_Blank (S (I)) = False, or zero if no such character exists in S.

Returns the index of the first character in S for which Is_Blank (S (I)) = True, or zero if there is no such character in S.

Replaces all non-overlapping occurrences of What in Source by By. Occurrences of What in By are not replaced recursively, as this would lead to an infinite recursion anyway.

Returns True if the wildcard string Pattern matches the text Text, and False otherwise. Raises Illegal_Pattern if the pattern is malformed.

Wildcard patterns are a simple form of regular expressions. Their syntax is as follows: (This description assumes the default values for all generic parameters.)

? Matches any one character.

* Matches any sequence of characters (zero or more).

[...] The characters between the square brackets define a character set. Matches any one character of the given set.

[!...] Defines an inverted set. Matches any one character not listed.

Character sets are given either by specifying a range ("a-z"), single characters ("xyz") or any combination of the two ("a-zA-Z0123"). If the first character in the set is '!', the set is inverted, i.e. it contains all characters not listed.

Any character that is not one of the meta characters '?', '*', '[', ']', and '\' matches literally. To do a literal match against any meta character, escape it with a backslash, or use a one-character character set.

\? or [?] matches a ?
\* or [*] matches a *
\[ or [[] matches a [
\] or []] matches a ]
\\ or [\] matches a \

In a character set, characters must not and need not be escaped. To include the character '!' in a character set, make sure it is not the character immediately following the '['. To include ']' in a character set, make sure it follows the opening '[' (or the opening "[!" in the case of an inverted set) immediately. To include '-' in a character set, make it either the first or last character of the set, or the lower or upper bound of a range, e.g. "[-a-z]", or "[abc-]", or "@[ab --9]@", or "[!-./]".

(Note that in "@[ab --9]@", the set is 'a' or 'b' or (' ' to '-') or '9', not 'a' or 'b' or ' ' or ('-' to '9'), i.e. the earliest interpretation of a range is taken. Also note that the set "[abc--9]" is illegal because in the range "c--", 'c' > '-'. Specify this set as "[--9abc]" instead.)

The '!' used for set inversion matches literally when used outside a character set. It is a meta character only when immediately following the opening '[' of a character set.

Note that by default '?' matches any one character, not zero or one!

Matches always are case sensitive. To do a case insensitive match, map upper-case letter to lower-case letters in both the text and the pattern before calling this routine.

Note: if character sets are not allowed, they match literally. E.g. the pattern "[abc]" would then match the text "[abc]", but not "a".

Generic Parameters:
Any_One The character used to match any one arbitrary text character. If Zero_Or_One (see below) is True, this character matches zero or one arbitrary characters.
Zero_Or_More The character used to match zero or more arbitrary characters.
Set_Inverter The character used for inverting a character set. If it is No_Set_Inverter, but Has_Char_Set (see below) is True, character sets cannot be inverted. If Has_Char_Set is False, Set_Inverter is ignored.
Has_Char_Set If True, character sets are supported, otherwise, they're not allowed and the set meta characters '[' and ']' always match literally. (Note that the set inverter (by default '!') always matches literally if used outside a character set.)
Has_Escape If True, backslash-escaping of meta characters is supported. If False, it is not, and one-character character sets must be used for literal matches against meta characters.
Zero_Or_One If True, the Any_One character matches zero or one text characters. If False, Any_One must match a text character.

The three characters used for Any_One, Zero_Or_More and Set_Inverter should of course be distinct, and not coincide with any of the other meta characters either!

Note that character sets always must match a character; a null match is never allowed. (If null matches were allowed, a pattern like "[!a]*" would also match texts starting with "a"!)

A default instantiation of the above Wildcard_Match.

Package: Util.Strings

Dependencies

Description

Header

Known child units

Exceptions

Constants and Named Numbers

Other Items: