Wildcards vs Regex

Wildcards and regular expressions are two very common text selection syntaxes in computing and in many software packages on the market. The basic principle is to define a word pattern that allows finding all matching words within a text.

This article presents the differences between the two syntaxes and their application for REQCHECKER™ which uses one or the other to extract requirements in Syntax mode.

Wildcards

Their origin dates back to the 1970s and comes from the command interpreters of Unix operating system and later Unix and also of MS-DOS and Windows operating systems.

The wildcards pattern mainly use the two following special metacharacters:

  • ? represents a single (exactly one) characters, including space
  • * represents a multiple (zero or more) characters, including space.

This simple and intuitive syntax is generally well known to users who can use it to search for files.

For example the pattern *day matches something followed by day like Monday and Tuesday:

graph LR subgraph sg1["*"] Mon Tues end subgraph sg2["day"] Mon --> day Tues --> day end

Other examples:

  • P?rti?l matches Partial, Partiel but not Prtial
graph LR subgraph sg1["P"] P end subgraph sg2["?"] P --> a1[a] P --> r end subgraph sg3["rti"] a1 --> rti r --> tia end subgraph sg4["?"] rti --> a2[a] rti --> e end subgraph sg5["l"] a2 --> l e --> l end classDef style_red fill:#F00; tia:::style_red
  • *tial matches tial, Partial, Martial
  • *.docx matches my document.docx

Other metacharacters are available but more or less supported depending on the software:

  • [ ] Matches characters within the brackets.
  • # Matches any single digit (0 — 9)
  • [! ] Excludes characters inside the brackets.
  • [a-z] Matches a range of characters in ascending order.
  • { } Pattern list separated by a comma ,.

For example:

  • Page 7[1-2] matches Page 71 and Page 72 but not page Page 73
graph LR subgraph sg1["Page 7"] e1[Page 7] end subgraph sg2["[1-2]"] e1 --> 1 e1 --> 2 e1 --> 3 end classDef style_red fill:#F00; 3:::style_red
  • Page {71,72} matches Page 71 and Page 72
  • Page [!1][0-9] does not matches Page 10, matches Page 20 but also matches Page z2
  • *Page {7[8-9],8[0-5]}* matches any text containing a reference page 78 to page 85, e.g. See Page 78/300 or See Page 82/300 but not See Page 86/300
graph LR subgraph sg1["*"] e1["See "] end subgraph sg2["Page "] e1 --> e2["Page "] end subgraph or["{7[8-9],8[0-5]}"] subgraph padding[ ] e2 --> cola["{"] cola --> 7 subgraph sg3["7"] 7 end subgraph sg4["[8-9]"] 7 --> 8b[8] end 8b --> colb cola --> 8 subgraph sg5["8"] 8 end subgraph sg6["[0-5]"] 8 --> 2 8 --> 6 end 2 --> colb end end colb["}"] --> /300 subgraph sg7["*"] /300 end classDef style_red fill:#F00; classDef style_white fill:#FFF; classDef style_grey fill:#EEE, color:#666; classDef style_padding fill:none,stroke:none; padding:::style_padding or:::style_grey cola:::style_white colb:::style_white 6:::style_red

Regular expressions (regexes)

Regular expressions were invented in 1951 by the mathematician Stephen Cole Kleene. Their use developed from the end of the 1960s onwards, in particular through pattern matching in a text editor and through lexical analysis used by computer language compilers.

Regular expressions use much more metacharacters. Here are some of the most commonly used metacharacters:

  • a matches the a character
  • matches the space character
  • \u00A0 matches the non-breaking space
  • * quantifier, occurs zero or more times.
  • + quantifier, occurs one or more times.
  • ? quantifier, occurs no or one times.
  • {n, m} quantifier, matches at least n and at most m occurrences of preceding expression.
  • . wildcard matches any single character except newline.
  • \w expression matches word characters.
  • \d expression matches digits, equivalent to [0-9].
  • \\ matches the \ character itself
  • [...] set, definition, matches any single character in brackets.
  • [^...] set, matches any single character not in brackets.
  • XZ matches X directly followed by Z.
  • X|Z or, matches X or Z.
  • () grouping, parentheses are used to define the scope and precedence of the operators
  • and more...

For example the word Monday matches the following regex patterns:

  • [A-Za-z]+
graph LR subgraph plus["[A-Za-z]+"] subgraph padding[ ] subgraph sg1["[A-Za-z]"] M end M --> o subgraph sg2["[A-Za-z]"] o end o --> n subgraph sg3["[A-Za-z]"] n end n --> d subgraph sg4["[A-Za-z]"] d end d --> a subgraph sg5["[A-Za-z]"] a end a --> y subgraph sg6["[A-Za-z]"] y end end end classDef style_grey fill:#EEE, color:#666; plus:::style_grey classDef style_padding fill:none,stroke:none; padding:::style_padding
  • M[a-x]+y
graph LR subgraph sg1["M"] M end subgraph plus["[a-x]+"] subgraph padding[ ] M --> o subgraph sg2["[a-x]"] o end o --> n subgraph sg3["[a-x]"] n end n --> d subgraph sg4["[a-x]"] d end d --> a subgraph sg5["[a-x]"] a end end end a --> y subgraph sg6["y"] y end classDef style_grey fill:#EEE, color:#666; plus:::style_grey classDef style_padding fill:none,stroke:none; padding:::style_padding
  • M.*
graph LR subgraph sg1["M"] M end subgraph plus[".*"] subgraph padding [ ] M --> o subgraph sg2["."] o end o --> n subgraph sg3["."] n end n --> d subgraph sg4["."] d end d --> a subgraph sg5["."] a end a --> y subgraph sg6["."] y end end end classDef style_grey fill:#EEE, color:#666; plus:::style_grey classDef style_padding fill:none,stroke:none; padding:::style_padding
  • (Mon|Tues)day
graph LR subgraph or["(Mon|Tues) "] subgraph padding[ ] subgraph sg1["Mon"] Mon end subgraph sg2["Tues"] Tues[ ] end end end Mon --> day Tues --> day subgraph sg3["day"] day end classDef style_grey fill:#EEE,color:#666,stroke:none; or:::style_grey classDef style_padding fill:none,stroke:none; padding:::style_padding

Tips

With regular expressions the meaning of ? and * is different from that of wildcards. The equivalent of wildcard ? is the regex . and the equivalent of wildcard * is the regex .*.

Space and non-breaking space are different characters. Be careful to use [ \u00A0] when required to catch both of them.

Some links to learn advanced regular expression:

  • RegexOne is a free online training.
  • regular expressions 101 allows you to learn how regexs work by identifying the role of each metacharacter through colors and an on-the-fly analysis of several texts.

Comparison

Main characteristics & features Wildcards (globbing patterns) Regular expression (regex)
Main purpose Basic search in file Advanced search in text
Example Linux globbing pathnames, MS Windows File Search, MS Excel Search, MS Access Queries Java (industrial software development language) Regexp
Strengths Very simple and intuitive Powerful language used by IT professionals
Single (exactly one) characters, including space ? .
Multiple (zero or more) characters, including space * .*
Character escape, e.g. the character * itself \* (or ~* for MS Excel) \*
Digit # (for MS Access) [0-9]
Character range, e.g. any letter from a to z [a-z] [a-z]
Character exclusion, e.g. not a digit [!0-9] [^0-9]
Logical operator 'or', e.g. A or B {A,B} (or other syntax depending on software) (A|B)
Boundary matchers not available ^ $ etc.
Greedy / Reluctant / Possessive quantifiers not available X+? etc.
Positive / Negative Lookahead / Lookbehind not available (?<=X) etc.

Application for REQCHECKER™

When you use the Syntax mode (read more), REQCHECKER™ is a robot that extracts text from different sources (MS Word documents, PDF or other data sources) based on patterns.

Patterns are expressions that allow REQCHECKER™ to recognise an identifier of an item (requirement, test case, user story etc) to be retrieved and its attributes in the text.

Two different syntaxes exist to achieve this. They use the same general approach based on metacharacters:

  • Wildcards on the one hand are simple and intuitive.
  • Regular expressions on the other hand are more complex but also much more powerful.

The option Advanced Reg. Exp. (see Options Menu) switch from wildcard to regular expression (see Tag summary).

The regular expression editor GUI helps you to test your expressions.

Statement extraction with wildcards

Blue cells accept wildcards.

Example 1

<REQ_0010> About box
The software has an about box.
#EndText
#Version 1

<REQ_0020> About Cancel
The about box has a Cancel button.
#Deleted

The syntax is :

  • Advanced Reg. Exp. option : disabled
  • BEGIN STAT = <
  • REQUIREMENT ID = REQ_*
  • END STAT = >
  • END TEXT = #EndText
  • DELETED = #Deleted
  • VERSION = #Version

Example 2

N£  REQ-BASIC-0010  £N
T£ MAX OPERATING CONSUMPTION
Maximum operating consumption is 30W.  £T

Note

Space are not allowed between BEGIN STAT and REQUIREMENT ID, same between REQUIREMENT ID and END STAT. is not useful and is not managed ; it is included in the text of the statement.

The syntax is :

  • Regular expression option : disabled
  • BEGIN STAT =
  • REQUIREMENT ID = REQ-*
  • END STAT = £N
  • END TEXT = £T
  • DELETED = #Deleted
  • VERSION = #Version

Example 3

[CAT-TRA-REQ-192]
Operating System
Upper req:  EXB-TRA-REQ-124
Detail
The software is installable on Windows platform.
End of requirement

The syntax is :

  • Regular expression option : disabled
  • BEGIN STAT= [
  • REQUIREMENT ID = CAT-TRA-REQ-*
  • END STAT = ]
  • END TEXT =
  • DELETED = #Deleted
  • VERSION = #Version
  • CUSTOM_TAG 'Upper req: ' = EXB-TRA-REQ-124
  • CUSTOM_TAG 'Detail' until 'End of requirement' = The software is installable on Windows platform.

Statement extraction with regex

Orange cells accept advanced regular expression (regex).

Example

#R_MYSOFT_REP_L1_003_V00 General - Report - Design - File
Every output report must be available with [Preview] button, as illustrated in the computation function.

#R_MYSOFT_REP_L1_004_V00 General - Report - Design - Page
Each report page must display:
* "Material" [Name]
* "Type" [Type]

The end statement is not used. Then the syntax must limit the variable part to exclude the following text.

The syntax is :

  • Advanced Reg. Exp. option : enabled
  • BEGIN STAT = #
  • REQUIREMENT ID = R_MYSOFT_REP_[LV_0-9]+
  • END STAT =
  • END TEXT =
  • DELETED = #Deleted
  • VERSION = #Version

Coverage extraction with wildcards

Blue cells accept wildcards.

Example

Open the About menu, a dialog box opens with the file name, the copyright and version. <<REQ_0123>> (partial) (v1.4)

The tags are:

  • BEGIN COVER is <<
  • REQUIREMENT ID is REQ_0123
  • END COVER is >>
  • PARTIAL is (partial)
  • BEGIN VERSION is (v
  • END VERSION is )

Coverage extraction with regex

Orange cells accept advanced regular expression (Regex).