Skip to content

PCRE regular expressions (.pcre)

An API for PCRE in q.

This library has been deprecated in favour of .pcre2. It provides access to PCRE (http://pcre.org/), with a library for PCRE2 also available.

PCRE options

Multiple options can be used when compiling a pattern either via the defined bitmaps or by passing in the appropriate symbol(s):

.pcre.compile2["pattern"; .pcre.PCRE_CASELESS | .pcre.PCRE_MULTILINE]

Below is a list of available options. For more information about these options, please visit the PCRE website or read the PCRE manual.

  • .pcre.PCRE_CASELESS
  • .pcre.PCRE_MULTILINE
  • .pcre.PCRE_DOTALL
  • .pcre.PCRE_EXTENDED
  • .pcre.PCRE_ANCHORED
  • .pcre.PCRE_DOLLAR_ENDONLY
  • .pcre.PCRE_EXTRA
  • .pcre.PCRE_NOTBOL
  • .pcre.PCRE_NOTEOL
  • .pcre.PCRE_UNGREEDY
  • .pcre.PCRE_NOTEMPTY
  • .pcre.PCRE_UTF8
  • .pcre.PCRE_NO_AUTO_CAPTURE
  • .pcre.PCRE_NO_UTF8_CHECK
  • .pcre.PCRE_AUTO_CALLOUT
  • .pcre.PCRE_PARTIAL_SOFT
  • .pcre.PCRE_PARTIAL
  • .pcre.PCRE_DFA_SHORTEST
  • .pcre.PCRE_DFA_RESTART
  • .pcre.PCRE_FIRSTLINE
  • .pcre.PCRE_DUPNAMES
  • .pcre.PCRE_NEWLINE_CR
  • .pcre.PCRE_NEWLINE_LF
  • .pcre.PCRE_NEWLINE_CRLF
  • .pcre.PCRE_NEWLINE_ANY
  • .pcre.PCRE_NEWLINE_ANYCRLF
  • .pcre.PCRE_BSR_ANYCRLF
  • .pcre.PCRE_BSR_UNICODE
  • .pcre.PCRE_JAVASCRIPT_COMPAT
  • .pcre.PCRE_NO_START_OPTIMIZE
  • .pcre.PCRE_NO_START_OPTIMISE
  • .pcre.PCRE_PARTIAL_HARD
  • .pcre.PCRE_NOTEMPTY_ATSTART
  • .pcre.PCRE_UCP

PCRE examples

Example 1: simple input strings

Here is an example of creating a pattern, using the pattern to check for a match, and freeing the pattern.

index: 0;
options: 0;

pattern: .pcre.compile2["a*b"; 0];
result: .pcre.execute[pattern; 0Ni; "abc"; index; options];
.pcre.free[pattern];
show result;

/=> 1i
/=> 0 2i

For dfa_exec, one would need to pass two additional parameters (oVector and workspace). If either is insufficiently large, dfa_exec will return an appropriate error. For most cases, vectors of 1000 elements should be sufficient.

index: 0;
options: 0;
oVector: 1000#0Ni;
workspace: 1000#0Ni;

pattern: .pcre.compile2["a*b"; 0];
result: .pcre.dfa_exec[pattern; 0Ni; "abc"; index; options; oVector; workspace];
.pcre.free[pattern];
show result;

/=> 1i
/=> 0 2i

Example 2: vector input strings

Here is an example of creating a pattern and saving it as a variable, using the pattern to check for a match, and freeing the pattern.

index: 0;
options: 0;

pattern: .pcre.compile2["a*c"; 0];
result: .pcre.execute[pattern; 0Ni; ("abc"; "def"; "ghi"); index; options];
.pcre.free[pattern];
show result;

/=> 1    -1      -1
/=> 2 3i `int$() `int$()

For dfa_exec, one would need to pass two additional parameters (oVector and workspace). If either is insufficiently large, dfa_exec will return an appropriate error. For most cases, vectors of 1000 elements should be sufficient.

index: 0;
options: 0;
oVector: 1000#0Ni;
workspace: 1000#0Ni;

pattern: .pcre.compile2["a*c"; 0];
result: .pcre.dfa_exec[pattern; 0Ni; ("abc"; "def"; "ghi"); index; options; oVector; workspace];
.pcre.free[pattern];
show result;

/=> 1    -1      -1
/=> 2 3i `int$() `int$()

Example 3: helper function example

The regex helper function simplifies the compile, execute, and free tasks into a single function call. For repeated executions, this may be much slower than pre-compiling and re-using the compiled information.

show .pcre.regex["a*b"; 0; "aab"];

/=> "aab"

Example 4: using options

Options can be passed to the appropriate PCRE function. If multiple options are required, they are simply or-ed together and passed into the function.

index: 0;

options: .pcre.PCRE_CASELESS | .pcre.PCRE_NEWLINE_ANY;
pattern: .pcre.compile2["a*c"; options];
result: .pcre.execute[pattern; 0Ni; "abc"; index; options];
.pcre.free[pattern];
show result;

/=> -3i
/=> `int$()

You can also include the options as a parameter to the function call:

index: 0;

pattern: .pcre.compile2["a*c"; .pcre.PCRE_CASELESS | .pcre.PCRE_NEWLINE_ANY];
result: .pcre.execute[pattern; 0Ni; "abc"; index; 0];

.pcre.free[pattern];

show result;

/=> 1i
/=> 2 3i

Example 5: using the JIT

Support is provided for JIT optimizations. To use the JIT, one must compile the function using pcre.compile2 and then pass the compilation to pcre.study to generate an optimized structure. This structure is then passed as an extra parameter to either .pcre.execute or .pcre.dfa\_exec.

compiled: .pcre.compile2["s*g"; 0];
extra: .pcre.study [compiled; .pcre.PCRE_STUDY_JIT_COMPILE];
result: .pcre.execute [compiled; extra; "some string to search"; 0; 0];

show result;

/=> 1i
/=> 10 11i

Example 6: compiling with an error

Compiling with an error from unmatched parentheses:

show .pcre.compile2["(a*b"; 0];

/=> 0x0000000000000000
/=> 14i
/=> "missing )"
/=> 4i

For more information on PCRE, please visit the following resources

.pcre.ERRORS

ERROR CODES. These are returned from various PCRE calls

.pcre.compile

Deprecated: .pcre is deprecated, use .pcre2 instead

pre-compile a pattern so it can be more efficiently used multiple times

Parameters:

Name Type Description
pattern String A regex pattern
options Boolean[] | Long A bitmask of options, or 0 for no options

Returns:

Type Description
String | (byte[]; int; string; int) A string containing the error message, if there is one. Otherwise, A pointer to the compiled pattern, the error code (0 if there is no error), the error message ("" if there is no error), the error position (0 if there is no error)

.pcre.compile2

Deprecated: .pcre is deprecated, use .pcre2 instead

pre-compile a pattern so it can be more efficiently used multiple times

Parameters:

Name Type Description
pattern String A regex pattern
options Boolean[] | Long A bitmask of options, or 0 for no options

Returns:

Type Description
(byte[]; int; string; int) A pointer to the compiled pattern, the error code (0 if there is no error), the error message ("" if there is no error), the error position (0 if there is no error)

.pcre.constants

Constants used by PCRE

.pcre.dfa_exec

Deprecated: .pcre is deprecated, use .pcre2 instead

Runs a compiled pattern against a string, or list of strings

Parameters:

Name Type Description
compiled *[] The result of running .pcre2.compiled2
extra Boolean | Int The result of running .pcre.study, or 0Ni otherwise
input String | String[] The string(s) to search for the pattern
index Long The offset to start searching from
options Boolean[] | Int A bitmask of options, or 0 for no options
outputVector Int[] A list where each element is 0Ni, for most cases, 1000 elements should be sufficient
workspace Int[] A list where each element is 0Ni, for most cases, 1000 elements should be sufficient

Returns:

Type Description
(int; int[]) | (int[]; int[][]) When the input is a string, the first value is the number of pairs in the second element, and the second element is a vector of start/end pairs for the whole regex, followed by the captured groups. -1 -1 indicates a capture group that did not match. When the input is a list of strings, each element becomes a vector holding the results for each string in the input

.pcre.dfa_regex

Deprecated: .pcre is deprecated, use .pcre2 instead

Find the first substring that matches the regex

Parameters:

Name Type Description
pattern String A regex pattern
compileOptions Boolean[] | Long A bitmask of options, or 0 for no options
input String | String[] The string(s) to search for the pattern

Returns:

Type Description
String The first substring that matches the regex

.pcre.dfa_regex_g

Deprecated: .pcre is deprecated, use .pcre2 instead

Return the first substring that matches the regex

Parameters:

Name Type Description
pattern String A regex pattern
compileOptions Boolean[] | Long A bitmask of options, or 0 for no options
input String | String[] The string(s) to search for the pattern

Returns:

Type Description
String[] | String[][] For each input, an enlisted string where the first is the substring matching the regex. Inputs with no matches return ().

.pcre.execute

Deprecated: .pcre is deprecated, use .pcre2 instead

Runs a compiled pattern against a string, or list of strings

Parameters:

Name Type Description
compiled *[] The result of running .pcre2.compiled2
extra Byte[] The result of running .pcre.study, or 0Ni otherwise
input String | String[] The string(s) to search for the pattern
index Long The offset to start searching from
options Boolean[] | Long A bitmask of options, or 0 for no options

Returns:

Type Description
(int; int[]) | (int[]; int[][]) When the input is a string, the first value is the number of pairs in the second element, and the second element is a vector of start/end pairs for the whole regex, followed by the capture groups. -1 -1 indicates a capture group that did not match. When the input is a list of strings, each element becomes a vector holding the results for each string in the input

.pcre.free

Deprecated: .pcre is deprecated, use .pcre2 instead

Frees the memory used by compiling a pattern

Parameter:

Name Type Description
compiled *[] The result of .pcre.compile2

Returns:

Type Description
Null

.pcre.freeMatches

Deprecated: .pcre is deprecated, use .pcre2 instead

Pulls the compiled regex out of the "matches" projection and frees it

Parameter:

Name Type Description
projection fn A projection returned by the matches function

Returns:

Type Description
Null

.pcre.free_study

Deprecated: .pcre is deprecated, use .pcre2 instead

Frees the memory used by studying a pattern

Parameter:

Name Type Description
extra *[] The result of .pcre.study

Returns:

Type Description
Null

.pcre.fullinfo

Deprecated: .pcre is deprecated, use .pcre2 instead

Get info on a compiled pattern

Parameters:

Name Type Description
compiled *[] The result of running .pcre2.compiled2
info int The property to query

Returns:

Type Description
(int; *) The error code (0 for success), and the value of the property

.pcre.jitOptions

JIT specific options

.pcre.match

Deprecated: .pcre is deprecated, use .pcre2 instead

Find substrings matching the regex

Parameters:

Name Type Description
compiled *[] A compiled regex
input char | string Text to find matches inside it
index int Start index to look for match in input
options Boolean[] | () A value defined in matchOptions constants, or () for the defaults

Returns:

Type Description
(int; int[]) | (int[]; int[][]) When the input is a string, the first value is the number of pairs in the second element, and the second element is a vector of start/end pairs for the whole regex, followed by the capture groups. -1 -1 indicates a capture group that did not match. When the input is a list of strings, each element becomes a vector holding the results for each string in the input

.pcre.matches

Deprecated: .pcre is deprecated, use .pcre2 instead

Find all matching substrings for a given pattern

Parameters:

Name Type Description
pattern String A regex pattern
compileOptions Boolean[] | Long A bitmask of options, or 0 for no options
runOptions Boolean[] | () A bitmask of options, or the empty list () for no options

Returns:

Type Description
fn A function that takes a string, and returns a list of (string; long; long) triples holding the matching substring, start position, and end position

.pcre.onLoad

Deprecated: .pcre is deprecated, use .pcre2 instead

Bindings to the PCRE C functions

Returns:

Type Description
Null

.pcre.regex

Deprecated: .pcre is deprecated, use .pcre2 instead

Find the first substring in each input matching the pattern

Parameters:

Name Type Description
pattern String A regex pattern
compileOptions Boolean[] | Long A bitmask of options, or 0 for no options
input String | String[] The string(s) to search for the pattern

Returns:

Type Description
String | String[] | () The matching substring, () if there is no match, or a vector of these if the input is a list of strings

.pcre.regex_g

Deprecated: .pcre is deprecated, use .pcre2 instead

Return the first substring that matches the regex, and the matches for the capture groups

Parameters:

Name Type Description
pattern String A regex pattern
compileOptions Boolean[] | Long A bitmask of options, or 0 for no options
input String | String[] The string(s) to search for the pattern

Returns:

Type Description
String[] | String[][] For each input, a list of strings where the first is the substring matching the regex, and the following strings match the capture groups. Inputs with no matches return ()

.pcre.study

Deprecated: .pcre is deprecated, use .pcre2 instead

This does a JIT study of the pattern, and returns a pattern to use as the “extra” parameter to .pcre.exec or .pcre.dfa_exec.

Parameters:

Name Type Description
compiled String The result of compiling a pattern with .pcre.compile2
options Boolean[] | Long A bitmask of options, or 0 for no options

Returns:

Type Description
Byte[] A pointer to the JIT compiled pattern

.pcre.types

Requested types for .pcre.fullinfo

.pcre.version

Deprecated: .pcre is deprecated, use .pcre2 instead

Returns the current PCRCE version

Returns:

Type Description
String The current PCRE version