Wildcards vs Regex
Wildcards and regular expressions are two very common text selection syntaxes in computing and in many software packages on the market. The basic principle is to define a word pattern that allows finding all matching words within a text.
This article presents the differences between the two syntaxes and their application for REQCHECKER™ which uses one or the other to extract requirements in Syntax mode.
Wildcards
Their origin dates back to the 1970s and comes from the command interpreters of Unix operating system and later Unix and also of MS-DOS and Windows operating systems.
The wildcards pattern mainly use the two following special metacharacters:
?
represents a single (exactly one) characters, including space*
represents a multiple (zero or more) characters, including space.
This simple and intuitive syntax is generally well known to users who can use it to search for files.
For example the pattern *day
matches something followed by day
like Monday and Tuesday:
Other examples:
P?rti?l
matchesPartial
,Partiel
but notPrtial
*tial
matchestial
,Partial
,Martial
*.docx
matchesmy document.docx
Other metacharacters are available but more or less supported depending on the software:
[ ]
Matches characters within the brackets.#
Matches any single digit (0 — 9)[! ]
Excludes characters inside the brackets.[a-z]
Matches a range of characters in ascending order.{ }
Pattern list separated by a comma,
.
For example:
Page 7[1-2]
matchesPage 71
andPage 72
but not pagePage 73
Page {71,72}
matchesPage 71
andPage 72
Page [!1][0-9]
does not matchesPage 10
, matchesPage 20
but also matchesPage z2
*Page {7[8-9],8[0-5]}*
matches any text containing a reference page 78 to page 85, e.g.See Page 78/300
orSee Page 82/300
but notSee Page 86/300
Regular expressions (regexes)
Regular expressions were invented in 1951 by the mathematician Stephen Cole Kleene. Their use developed from the end of the 1960s onwards, in particular through pattern matching in a text editor and through lexical analysis used by computer language compilers.
Regular expressions use much more metacharacters. Here are some of the most commonly used metacharacters:
a
matches the a charactermatches the space character
\u00A0
matches the non-breaking space*
quantifier, occurs zero or more times.+
quantifier, occurs one or more times.?
quantifier, occurs no or one times.{n, m}
quantifier, matches at least n and at most m occurrences of preceding expression..
wildcard matches any single character except newline.\w
expression matches word characters.\d
expression matches digits, equivalent to [0-9].\\
matches the \ character itself[...]
set, definition, matches any single character in brackets.[^...]
set, matches any single character not in brackets.XZ
matches X directly followed by Z.X|Z
or, matches X or Z.()
grouping, parentheses are used to define the scope and precedence of the operators- and more...
For example the word Monday
matches the following regex patterns:
[A-Za-z]+
M[a-x]+y
M.*
(Mon|Tues)day
Tips
With regular expressions the meaning of ?
and *
is different from that of wildcards. The equivalent of wildcard ?
is the regex .
and the equivalent of wildcard *
is the regex .*
.
Space and non-breaking space are different characters. Be careful to use [ \u00A0]
when required to catch both of them.
Some links to learn advanced regular expression:
- RegexOne is a free online training.
- regular expressions 101 allows you to learn how regexs work by identifying the role of each metacharacter through colors and an on-the-fly analysis of several texts.
Comparison
Main characteristics & features | Wildcards (globbing patterns) | Regular expression (regex) |
---|---|---|
Main purpose | Basic search in file | Advanced search in text |
Example | Linux globbing pathnames, MS Windows File Search, MS Excel Search, MS Access Queries | Java (industrial software development language) Regexp |
Strengths | Very simple and intuitive | Powerful language used by IT professionals |
Single (exactly one) characters, including space | ? | . |
Multiple (zero or more) characters, including space | * | .* |
Character escape, e.g. the character * itself |
\* (or ~* for MS Excel) | \* |
Digit | # (for MS Access) | [0-9] |
Character range, e.g. any letter from a to z | [a-z] | [a-z] |
Character exclusion, e.g. not a digit | [!0-9] | [^0-9] |
Logical operator 'or', e.g. A or B | {A,B} (or other syntax depending on software) | (A|B) |
Boundary matchers | not available | ^ $ etc. |
Greedy / Reluctant / Possessive quantifiers | not available | X+? etc. |
Positive / Negative Lookahead / Lookbehind | not available | (?<=X) etc. |
Application for REQCHECKER™
When you use the Syntax mode (read more), REQCHECKER™ is a robot that extracts text from different sources (MS Word documents, PDF or other data sources) based on patterns.
Patterns are expressions that allow REQCHECKER™ to recognise an identifier of an item (requirement, test case, user story etc) to be retrieved and its attributes in the text.
Two different syntaxes exist to achieve this. They use the same general approach based on metacharacters:
- Wildcards on the one hand are simple and intuitive.
- Regular expressions on the other hand are more complex but also much more powerful.
The option Advanced Reg. Exp. (see Options Menu) switch from wildcard to regular expression (see Tag summary).
The regular expression editor GUI helps you to test your expressions.
Statement extraction with wildcards
Blue cells accept wildcards.
Example 1
<REQ_0010> About box
The software has an about box.
#EndText
#Version 1
<REQ_0020> About Cancel
The about box has a Cancel button.
#Deleted
The syntax is :
- Advanced Reg. Exp. option : disabled
- BEGIN STAT =
<
- REQUIREMENT ID =
REQ_*
- END STAT =
>
- END TEXT =
#EndText
- DELETED =
#Deleted
- VERSION =
#Version
Example 2
N£ REQ-BASIC-0010 £N
T£ MAX OPERATING CONSUMPTION
Maximum operating consumption is 30W. £T
Note
Space are not allowed between BEGIN STAT and REQUIREMENT ID, same between REQUIREMENT ID and END STAT. T£
is not useful and is not managed ; it is included in the text of the statement.
The syntax is :
- Regular expression option : disabled
- BEGIN STAT =
- REQUIREMENT ID =
REQ-*
- END STAT =
£N
- END TEXT =
£T
- DELETED =
#Deleted
- VERSION =
#Version
Example 3
[CAT-TRA-REQ-192]
Operating System
Upper req: EXB-TRA-REQ-124
Detail
The software is installable on Windows platform.
End of requirement
The syntax is :
- Regular expression option : disabled
- BEGIN STAT=
[
- REQUIREMENT ID =
CAT-TRA-REQ-*
- END STAT =
]
- END TEXT =
- DELETED =
#Deleted
- VERSION =
#Version
- CUSTOM_TAG 'Upper req: ' =
EXB-TRA-REQ-124
- CUSTOM_TAG 'Detail' until 'End of requirement' =
The software is installable on Windows platform.
Statement extraction with regex
Orange cells accept advanced regular expression (regex).
Example
#R_MYSOFT_REP_L1_003_V00 General - Report - Design - File
Every output report must be available with [Preview] button, as illustrated in the computation function.
#R_MYSOFT_REP_L1_004_V00 General - Report - Design - Page
Each report page must display:
* "Material" [Name]
* "Type" [Type]
The end statement is not used. Then the syntax must limit the variable part to exclude the following text.
The syntax is :
- Advanced Reg. Exp. option : enabled
- BEGIN STAT =
#
- REQUIREMENT ID =
R_MYSOFT_REP_[LV_0-9]+
- END STAT =
- END TEXT =
- DELETED =
#Deleted
- VERSION =
#Version
Coverage extraction with wildcards
Blue cells accept wildcards.
Example
Open the About menu, a dialog box opens with the file name, the copyright and version. <<REQ_0123>> (partial) (v1.4)
The tags are:
- BEGIN COVER is
<<
- REQUIREMENT ID is
REQ_0123
- END COVER is
>>
- PARTIAL is
(partial)
- BEGIN VERSION is
(v
- END VERSION is
)
Coverage extraction with regex
Orange cells accept advanced regular expression (Regex).