{preg_match:pattern:subject:matched}

Description

Extracts text from a string using a PCRE regular expression. Runs PHP preg_match on the subject and returns one match. By default it returns the whole matched text (match 0); pass a number as the third parameter to return that parenthesised capture group instead (1 for the first group, 2 for the second, and so on). If the pattern does not match, or the requested group does not exist, the result is an empty string - no error. The subject is usually an item field getter such as the headline field. One thing to watch: the colon is the AA parameter separator, so neither the pattern nor the subject may contain a bare colon - match around it. This expression is never cached, and its parameters are not trimmed.

Parameters

pattern required default (empty)

The PCRE regular expression, including its delimiters (for example /[0-9]{4}/). Standard PHP preg_match syntax. Caution: a colon is the AA parameter separator, so the pattern must not contain a bare colon - match around it or pick a colon-free pattern.

subject required default (empty)

The text to search. Usually an item field getter such as {headline........}. Like the pattern, it must not contain a bare colon, which AA would read as the next parameter.

matched optional default 0 (whole match)

Which match to return: 0 (the default) is the whole matched text, 1 is the first parenthesised capture group, 2 the second, and so on. A non-numeric or out-of-range value yields match 0 or an empty string respectively.

Examples

test{preg_match:/[0-9]{4}/:Annual Report 2024}
Expected2024
Actual2024
The whole match (no group index) of the first run of four digits. With matched empty, preg_match returns match 0, the entire matched text.
test{preg_match:/[0-9]+/:Order 12345 shipped}
Expected12345
Actual12345
One or more digits anywhere in the subject. Returns only the first match, not all of them.
test{preg_match:/[A-Z]{2,}/:Filed under APC news}
ExpectedAPC
ActualAPC
Two or more consecutive uppercase letters. Demonstrates a non-numeric pattern on plain text.
test{preg_match:/(\d{4})-(\d{2})-(\d{2})/:Date 2024-03-15 today:1}
Expected2024
Actual2024
matched=1 returns the first parenthesised capture group. The pattern has three groups (year, month, day); group 1 is the year.
test{preg_match:/(\d{4})-(\d{2})-(\d{2})/:Date 2024-03-15 today:2}
Expected03
Actual03
Same pattern, matched=2 returns the second group (the month). matched selects which capture to return; 0 is the whole match, 1+ are the groups left to right.
test{preg_match:/\b[a-z]+\.org\b/:Email to info@apc.org today}
Expectedapc.org
Actualapc.org
Matches a lowercase word followed by .org. Note: a colon (the AA parameter separator) must NOT appear in the pattern or subject or AA splits the value at it. Match around the colon instead - here we capture the host without the scheme.
test{preg_match:/[0-9]{4}/:No digits at all}
When the pattern does not match the subject, preg_match returns an empty string (not an error). Useful with ifset/ifempty to branch on whether the pattern was found.
test{preg_match:/(\d+)/:abc 99 def:5}
matched=5 asks for capture group 5, but the pattern only has one group. A matched index that does not exist returns empty string rather than an error.
virtual{preg_match:/[0-9]{4}/:{headline........}}
Expected(the first four-digit number in the item headline, e.g. 2024 for a headline like Annual Report 2024; empty if none)
The real-world pattern: feed an item field as the subject. The result depends on the rendered item, so this is illustrative, not an asserted test.