| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8"> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <meta name="generator" content="rustdoc"> |
| <meta name="description" content="API documentation for the Rust `regex` crate."> |
| <meta name="keywords" content="rust, rustlang, rust-lang, regex"> |
| |
| <title>regex - Rust</title> |
| |
| <link rel="stylesheet" type="text/css" href="../normalize.css"> |
| <link rel="stylesheet" type="text/css" href="../rustdoc.css"> |
| <link rel="stylesheet" type="text/css" href="../main.css"> |
| |
| |
| <link rel="shortcut icon" href="https://www.rust-lang.org/favicon.ico"> |
| |
| </head> |
| <body class="rustdoc mod"> |
| <!--[if lte IE 8]> |
| <div class="warning"> |
| This old browser is unsupported and will most likely display funky |
| things. |
| </div> |
| <![endif]--> |
| |
| |
| |
| <nav class="sidebar"> |
| <a href='../regex/index.html'><img src='https://www.rust-lang.org/logos/rust-logo-128x128-blk-v2.png' alt='logo' width='100'></a> |
| <p class='location'>Crate regex</p><div class="block items"><ul><li><a href="#modules">Modules</a></li><li><a href="#structs">Structs</a></li><li><a href="#enums">Enums</a></li><li><a href="#traits">Traits</a></li><li><a href="#functions">Functions</a></li></ul></div><p class='location'></p><script>window.sidebarCurrent = {name: 'regex', ty: 'mod', relpath: '../'};</script> |
| </nav> |
| |
| <nav class="sub"> |
| <form class="search-form js-only"> |
| <div class="search-container"> |
| <input class="search-input" name="search" |
| autocomplete="off" |
| placeholder="Click or press ‘S’ to search, ‘?’ for more options…" |
| type="search"> |
| </div> |
| </form> |
| </nav> |
| |
| <section id='main' class="content"> |
| <h1 class='fqn'><span class='in-band'>Crate <a class="mod" href=''>regex</a></span><span class='out-of-band'><span id='render-detail'> |
| <a id="toggle-all-docs" href="javascript:void(0)" title="collapse all docs"> |
| [<span class='inner'>−</span>] |
| </a> |
| </span><a class='srclink' href='../src/regex/lib.rs.html#11-606' title='goto source code'>[src]</a></span></h1> |
| <div class='docblock'><p>This crate provides a native implementation of regular expressions that is |
| heavily based on RE2 both in syntax and in implementation. Notably, |
| backreferences and arbitrary lookahead/lookbehind assertions are not |
| provided. In return, regular expression searching provided by this package |
| has excellent worst-case performance. The specific syntax supported is |
| documented further down.</p> |
| |
| <p>This crate's documentation provides some simple examples, describes Unicode |
| support and exhaustively lists the supported syntax. For more specific |
| details on the API, please see the documentation for the |
| <a href="struct.Regex.html"><code>Regex</code></a> type.</p> |
| |
| <h1 id='usage' class='section-header'><a href='#usage'>Usage</a></h1> |
| <p>This crate is <a href="https://crates.io/crates/regex">on crates.io</a> and can be |
| used by adding <code>regex</code> to your dependencies in your project's <code>Cargo.toml</code>.</p> |
| |
| <pre><code class="language-toml">[dependencies] |
| regex = "0.1" |
| </code></pre> |
| |
| <p>and this to your crate root:</p> |
| |
| <pre class="rust rust-example-rendered"> |
| <span class="kw">extern</span> <span class="kw">crate</span> <span class="ident">regex</span>;</pre> |
| |
| <h1 id='example-find-a-date' class='section-header'><a href='#example-find-a-date'>Example: find a date</a></h1> |
| <p>General use of regular expressions in this package involves compiling an |
| expression and then using it to search, split or replace text. For example, |
| to confirm that some text resembles a date:</p> |
| |
| <pre class="rust rust-example-rendered"> |
| <span class="kw">use</span> <span class="ident">regex</span>::<span class="ident">Regex</span>; |
| <span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"^\d{4}-\d{2}-\d{2}$"</span>).<span class="ident">unwrap</span>(); |
| <span class="macro">assert</span><span class="macro">!</span>(<span class="ident">re</span>.<span class="ident">is_match</span>(<span class="string">"2014-01-01"</span>));</pre> |
| |
| <p>Notice the use of the <code>^</code> and <code>$</code> anchors. In this crate, every expression |
| is executed with an implicit <code>.*?</code> at the beginning and end, which allows |
| it to match anywhere in the text. Anchors can be used to ensure that the |
| full text matches an expression.</p> |
| |
| <p>This example also demonstrates the utility of |
| <a href="https://doc.rust-lang.org/stable/reference.html#raw-string-literals">raw strings</a> |
| in Rust, which |
| are just like regular strings except they are prefixed with an <code>r</code> and do |
| not process any escape sequences. For example, <code>"\\d"</code> is the same |
| expression as <code>r"\d"</code>.</p> |
| |
| <h1 id='example-avoid-compiling-the-same-regex-in-a-loop' class='section-header'><a href='#example-avoid-compiling-the-same-regex-in-a-loop'>Example: Avoid compiling the same regex in a loop</a></h1> |
| <p>It is an anti-pattern to compile the same regular expression in a loop |
| since compilation is typically expensive. (It takes anywhere from a few |
| microseconds to a few <strong>milliseconds</strong> depending on the size of the |
| regex.) Not only is compilation itself expensive, but this also prevents |
| optimizations that reuse allocations internally to the matching engines.</p> |
| |
| <p>In Rust, it can sometimes be a pain to pass regular expressions around if |
| they're used from inside a helper function. Instead, we recommend using the |
| <a href="https://crates.io/crates/lazy_static"><code>lazy_static</code></a> crate to ensure that |
| regular expressions are compiled exactly once.</p> |
| |
| <p>For example:</p> |
| |
| <pre class="rust rust-example-rendered"> |
| <span class="attribute">#[<span class="ident">macro_use</span>]</span> <span class="kw">extern</span> <span class="kw">crate</span> <span class="ident">lazy_static</span>; |
| <span class="kw">extern</span> <span class="kw">crate</span> <span class="ident">regex</span>; |
| |
| <span class="kw">use</span> <span class="ident">regex</span>::<span class="ident">Regex</span>; |
| |
| <span class="kw">fn</span> <span class="ident">some_helper_function</span>(<span class="ident">text</span>: <span class="kw-2">&</span><span class="ident">str</span>) <span class="op">-></span> <span class="ident">bool</span> { |
| <span class="macro">lazy_static</span><span class="macro">!</span> { |
| <span class="kw">static</span> <span class="kw-2">ref</span> <span class="ident">RE</span>: <span class="ident">Regex</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">"..."</span>).<span class="ident">unwrap</span>(); |
| } |
| <span class="ident">RE</span>.<span class="ident">is_match</span>(<span class="ident">text</span>) |
| } |
| |
| <span class="kw">fn</span> <span class="ident">main</span>() {}</pre> |
| |
| <p>Specifically, in this example, the regex will be compiled when it is used for |
| the first time. On subsequent uses, it will reuse the previous compilation.</p> |
| |
| <h1 id='example-iterating-over-capture-groups' class='section-header'><a href='#example-iterating-over-capture-groups'>Example: iterating over capture groups</a></h1> |
| <p>This crate provides convenient iterators for matching an expression |
| repeatedly against a search string to find successive non-overlapping |
| matches. For example, to find all dates in a string and be able to access |
| them by their component pieces:</p> |
| |
| <pre class="rust rust-example-rendered"> |
| <span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"(\d{4})-(\d{2})-(\d{2})"</span>).<span class="ident">unwrap</span>(); |
| <span class="kw">let</span> <span class="ident">text</span> <span class="op">=</span> <span class="string">"2012-03-14, 2013-01-01 and 2014-07-05"</span>; |
| <span class="kw">for</span> <span class="ident">cap</span> <span class="kw">in</span> <span class="ident">re</span>.<span class="ident">captures_iter</span>(<span class="ident">text</span>) { |
| <span class="macro">println</span><span class="macro">!</span>(<span class="string">"Month: {} Day: {} Year: {}"</span>, |
| <span class="ident">cap</span>.<span class="ident">at</span>(<span class="number">2</span>).<span class="ident">unwrap_or</span>(<span class="string">""</span>), <span class="ident">cap</span>.<span class="ident">at</span>(<span class="number">3</span>).<span class="ident">unwrap_or</span>(<span class="string">""</span>), |
| <span class="ident">cap</span>.<span class="ident">at</span>(<span class="number">1</span>).<span class="ident">unwrap_or</span>(<span class="string">""</span>)); |
| } |
| <span class="comment">// Output:</span> |
| <span class="comment">// Month: 03 Day: 14 Year: 2012</span> |
| <span class="comment">// Month: 01 Day: 01 Year: 2013</span> |
| <span class="comment">// Month: 07 Day: 05 Year: 2014</span></pre> |
| |
| <p>Notice that the year is in the capture group indexed at <code>1</code>. This is |
| because the <em>entire match</em> is stored in the capture group at index <code>0</code>.</p> |
| |
| <h1 id='example-replacement-with-named-capture-groups' class='section-header'><a href='#example-replacement-with-named-capture-groups'>Example: replacement with named capture groups</a></h1> |
| <p>Building on the previous example, perhaps we'd like to rearrange the date |
| formats. This can be done with text replacement. But to make the code |
| clearer, we can <em>name</em> our capture groups and use those names as variables |
| in our replacement text:</p> |
| |
| <pre class="rust rust-example-rendered"> |
| <span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"(?P<y>\d{4})-(?P<m>\d{2})-(?P<d>\d{2})"</span>).<span class="ident">unwrap</span>(); |
| <span class="kw">let</span> <span class="ident">before</span> <span class="op">=</span> <span class="string">"2012-03-14, 2013-01-01 and 2014-07-05"</span>; |
| <span class="kw">let</span> <span class="ident">after</span> <span class="op">=</span> <span class="ident">re</span>.<span class="ident">replace_all</span>(<span class="ident">before</span>, <span class="string">"$m/$d/$y"</span>); |
| <span class="macro">assert_eq</span><span class="macro">!</span>(<span class="ident">after</span>, <span class="string">"03/14/2012, 01/01/2013 and 07/05/2014"</span>);</pre> |
| |
| <p>The <code>replace</code> methods are actually polymorphic in the replacement, which |
| provides more flexibility than is seen here. (See the documentation for |
| <code>Regex::replace</code> for more details.)</p> |
| |
| <p>Note that if your regex gets complicated, you can use the <code>x</code> flag to |
| enable insigificant whitespace mode, which also lets you write comments:</p> |
| |
| <pre class="rust rust-example-rendered"> |
| <span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"(?x) |
| (?P<y>\d{4}) # the year |
| - |
| (?P<m>\d{2}) # the month |
| - |
| (?P<d>\d{2}) # the day |
| "</span>).<span class="ident">unwrap</span>(); |
| <span class="kw">let</span> <span class="ident">before</span> <span class="op">=</span> <span class="string">"2012-03-14, 2013-01-01 and 2014-07-05"</span>; |
| <span class="kw">let</span> <span class="ident">after</span> <span class="op">=</span> <span class="ident">re</span>.<span class="ident">replace_all</span>(<span class="ident">before</span>, <span class="string">"$m/$d/$y"</span>); |
| <span class="macro">assert_eq</span><span class="macro">!</span>(<span class="ident">after</span>, <span class="string">"03/14/2012, 01/01/2013 and 07/05/2014"</span>);</pre> |
| |
| <h1 id='example-match-multiple-regular-expressions-simultaneously' class='section-header'><a href='#example-match-multiple-regular-expressions-simultaneously'>Example: match multiple regular expressions simultaneously</a></h1> |
| <p>This demonstrates how to use a <code>RegexSet</code> to match multiple (possibly |
| overlapping) regular expressions in a single scan of the search text:</p> |
| |
| <pre class="rust rust-example-rendered"> |
| <span class="kw">use</span> <span class="ident">regex</span>::<span class="ident">RegexSet</span>; |
| |
| <span class="kw">let</span> <span class="ident">set</span> <span class="op">=</span> <span class="ident">RegexSet</span>::<span class="ident">new</span>(<span class="kw-2">&</span>[ |
| <span class="string">r"\w+"</span>, |
| <span class="string">r"\d+"</span>, |
| <span class="string">r"\pL+"</span>, |
| <span class="string">r"foo"</span>, |
| <span class="string">r"bar"</span>, |
| <span class="string">r"barfoo"</span>, |
| <span class="string">r"foobar"</span>, |
| ]).<span class="ident">unwrap</span>(); |
| |
| <span class="comment">// Iterate over and collect all of the matches.</span> |
| <span class="kw">let</span> <span class="ident">matches</span>: <span class="ident">Vec</span><span class="op"><</span>_<span class="op">></span> <span class="op">=</span> <span class="ident">set</span>.<span class="ident">matches</span>(<span class="string">"foobar"</span>).<span class="ident">into_iter</span>().<span class="ident">collect</span>(); |
| <span class="macro">assert_eq</span><span class="macro">!</span>(<span class="ident">matches</span>, <span class="macro">vec</span><span class="macro">!</span>[<span class="number">0</span>, <span class="number">2</span>, <span class="number">3</span>, <span class="number">4</span>, <span class="number">6</span>]); |
| |
| <span class="comment">// You can also test whether a particular regex matched:</span> |
| <span class="kw">let</span> <span class="ident">matches</span> <span class="op">=</span> <span class="ident">set</span>.<span class="ident">matches</span>(<span class="string">"foobar"</span>); |
| <span class="macro">assert</span><span class="macro">!</span>(<span class="op">!</span><span class="ident">matches</span>.<span class="ident">matched</span>(<span class="number">5</span>)); |
| <span class="macro">assert</span><span class="macro">!</span>(<span class="ident">matches</span>.<span class="ident">matched</span>(<span class="number">6</span>));</pre> |
| |
| <h1 id='pay-for-what-you-use' class='section-header'><a href='#pay-for-what-you-use'>Pay for what you use</a></h1> |
| <p>With respect to searching text with a regular expression, there are three |
| questions that can be asked:</p> |
| |
| <ol> |
| <li>Does the text match this expression?</li> |
| <li>If so, where does it match?</li> |
| <li>Where are the submatches?</li> |
| </ol> |
| |
| <p>Generally speaking, this crate could provide a function to answer only #3, |
| which would subsume #1 and #2 automatically. However, it can be |
| significantly more expensive to compute the location of submatches, so it's |
| best not to do it if you don't need to.</p> |
| |
| <p>Therefore, only use what you need. For example, don't use <code>find</code> if you |
| only need to test if an expression matches a string. (Use <code>is_match</code> |
| instead.)</p> |
| |
| <h1 id='unicode' class='section-header'><a href='#unicode'>Unicode</a></h1> |
| <p>This implementation executes regular expressions <strong>only</strong> on valid UTF-8 |
| while exposing match locations as byte indices into the search string.</p> |
| |
| <p>Only simple case folding is supported. Namely, when matching |
| case-insensitively, the characters are first mapped using the <a href="ftp://ftp.unicode.org/Public/UNIDATA/CaseFolding.txt">simple case |
| folding</a> mapping |
| before matching.</p> |
| |
| <p>Regular expressions themselves are <strong>only</strong> interpreted as a sequence of |
| Unicode scalar values. This means you can use Unicode characters directly |
| in your expression:</p> |
| |
| <pre class="rust rust-example-rendered"> |
| <span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"(?i)Δ+"</span>).<span class="ident">unwrap</span>(); |
| <span class="macro">assert_eq</span><span class="macro">!</span>(<span class="ident">re</span>.<span class="ident">find</span>(<span class="string">"ΔδΔ"</span>), <span class="prelude-val">Some</span>((<span class="number">0</span>, <span class="number">6</span>)));</pre> |
| |
| <p>Finally, Unicode general categories and scripts are available as character |
| classes. For example, you can match a sequence of numerals, Greek or |
| Cherokee letters:</p> |
| |
| <pre class="rust rust-example-rendered"> |
| <span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"[\pN\p{Greek}\p{Cherokee}]+"</span>).<span class="ident">unwrap</span>(); |
| <span class="macro">assert_eq</span><span class="macro">!</span>(<span class="ident">re</span>.<span class="ident">find</span>(<span class="string">"abcΔᎠβⅠᏴγδⅡxyz"</span>), <span class="prelude-val">Some</span>((<span class="number">3</span>, <span class="number">23</span>)));</pre> |
| |
| <h1 id='opt-out-of-unicode-support' class='section-header'><a href='#opt-out-of-unicode-support'>Opt out of Unicode support</a></h1> |
| <p>The <code>bytes</code> sub-module provides a <code>Regex</code> type that can be used to match |
| on <code>&[u8]</code>. By default, text is interpreted as ASCII compatible text with |
| all Unicode support disabled (e.g., <code>.</code> matches any byte instead of any |
| Unicode codepoint). Unicode support can be selectively enabled with the |
| <code>u</code> flag. See the <code>bytes</code> module documentation for more details.</p> |
| |
| <p>Unicode support can also be selectively <em>disabled</em> with the main <code>Regex</code> |
| type that matches on <code>&str</code>. For example, <code>(?-u:\b)</code> will match an ASCII |
| word boundary. Note though that invalid UTF-8 is not allowed to be matched |
| even when the <code>u</code> flag is disabled. For example, <code>(?-u:.)</code> will return an |
| error, since <code>.</code> matches <em>any byte</em> when Unicode support is disabled.</p> |
| |
| <h1 id='syntax' class='section-header'><a href='#syntax'>Syntax</a></h1> |
| <p>The syntax supported in this crate is almost in an exact correspondence |
| with the syntax supported by RE2. It is documented below.</p> |
| |
| <p>Note that the regular expression parser and abstract syntax are exposed in |
| a separate crate, <a href="../regex_syntax/index.html"><code>regex-syntax</code></a>.</p> |
| |
| <h2 id='matching-one-character' class='section-header'><a href='#matching-one-character'>Matching one character</a></h2> |
| <pre class="rust"> |
| . any character except new line (includes new line with s flag) |
| [xyz] A character class matching either x, y or z. |
| [^xyz] A character class matching any character except x, y and z. |
| [a-z] A character class matching any character in range a-z. |
| \d digit (\p{Nd}) |
| \D not digit |
| [:alpha:] ASCII character class ([A-Za-z]) |
| [:^alpha:] Negated ASCII character class ([^A-Za-z]) |
| \pN One-letter name Unicode character class |
| \p{Greek} Unicode character class (general category or script) |
| \PN Negated one-letter name Unicode character class |
| \P{Greek} negated Unicode character class (general category or script) |
| </pre> |
| |
| <p>Any named character class may appear inside a bracketed <code>[...]</code> character |
| class. For example, <code>[\p{Greek}\pN]</code> matches any Greek or numeral |
| character.</p> |
| |
| <h2 id='composites' class='section-header'><a href='#composites'>Composites</a></h2> |
| <pre class="rust"> |
| xy concatenation (x followed by y) |
| x|y alternation (x or y, prefer x) |
| </pre> |
| |
| <h2 id='repetitions' class='section-header'><a href='#repetitions'>Repetitions</a></h2> |
| <pre class="rust"> |
| x* zero or more of x (greedy) |
| x+ one or more of x (greedy) |
| x? zero or one of x (greedy) |
| x*? zero or more of x (ungreedy/lazy) |
| x+? one or more of x (ungreedy/lazy) |
| x?? zero or one of x (ungreedy/lazy) |
| x{n,m} at least n x and at most m x (greedy) |
| x{n,} at least n x (greedy) |
| x{n} exactly n x |
| x{n,m}? at least n x and at most m x (ungreedy/lazy) |
| x{n,}? at least n x (ungreedy/lazy) |
| x{n}? exactly n x |
| </pre> |
| |
| <h2 id='empty-matches' class='section-header'><a href='#empty-matches'>Empty matches</a></h2> |
| <pre class="rust"> |
| ^ the beginning of text (or start-of-line with multi-line mode) |
| $ the end of text (or end-of-line with multi-line mode) |
| \A only the beginning of text (even with multi-line mode enabled) |
| \z only the end of text (even with multi-line mode enabled) |
| \b a Unicode word boundary (\w on one side and \W, \A, or \z on other) |
| \B not a Unicode word boundary |
| </pre> |
| |
| <h2 id='grouping-and-flags' class='section-header'><a href='#grouping-and-flags'>Grouping and flags</a></h2> |
| <pre class="rust"> |
| (exp) numbered capture group (indexed by opening parenthesis) |
| (?P<name>exp) named (also numbered) capture group (allowed chars: [_0-9a-zA-Z]) |
| (?:exp) non-capturing group |
| (?flags) set flags within current group |
| (?flags:exp) set flags for exp (non-capturing) |
| </pre> |
| |
| <p>Flags are each a single character. For example, <code>(?x)</code> sets the flag <code>x</code> |
| and <code>(?-x)</code> clears the flag <code>x</code>. Multiple flags can be set or cleared at |
| the same time: <code>(?xy)</code> sets both the <code>x</code> and <code>y</code> flags and <code>(?x-y)</code> sets |
| the <code>x</code> flag and clears the <code>y</code> flag.</p> |
| |
| <p>All flags are by default disabled unless stated otherwise. They are:</p> |
| |
| <pre class="rust"> |
| i case-insensitive |
| m multi-line mode: ^ and $ match begin/end of line |
| s allow . to match \n |
| U swap the meaning of x* and x*? |
| u Unicode support (enabled by default) |
| x ignore whitespace and allow line comments (starting with `#`) |
| </pre> |
| |
| <p>Here's an example that matches case-insensitively for only part of the |
| expression:</p> |
| |
| <pre class="rust rust-example-rendered"> |
| <span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"(?i)a+(?-i)b+"</span>).<span class="ident">unwrap</span>(); |
| <span class="kw">let</span> <span class="ident">cap</span> <span class="op">=</span> <span class="ident">re</span>.<span class="ident">captures</span>(<span class="string">"AaAaAbbBBBb"</span>).<span class="ident">unwrap</span>(); |
| <span class="macro">assert_eq</span><span class="macro">!</span>(<span class="ident">cap</span>.<span class="ident">at</span>(<span class="number">0</span>), <span class="prelude-val">Some</span>(<span class="string">"AaAaAbb"</span>));</pre> |
| |
| <p>Notice that the <code>a+</code> matches either <code>a</code> or <code>A</code>, but the <code>b+</code> only matches |
| <code>b</code>.</p> |
| |
| <p>Here is an example that uses an ASCII word boundary instead of a Unicode |
| word boundary:</p> |
| |
| <pre class="rust rust-example-rendered"> |
| <span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"(?-u:\b).+(?-u:\b)"</span>).<span class="ident">unwrap</span>(); |
| <span class="kw">let</span> <span class="ident">cap</span> <span class="op">=</span> <span class="ident">re</span>.<span class="ident">captures</span>(<span class="string">"$$abc$$"</span>).<span class="ident">unwrap</span>(); |
| <span class="macro">assert_eq</span><span class="macro">!</span>(<span class="ident">cap</span>.<span class="ident">at</span>(<span class="number">0</span>), <span class="prelude-val">Some</span>(<span class="string">"abc"</span>));</pre> |
| |
| <h2 id='escape-sequences' class='section-header'><a href='#escape-sequences'>Escape sequences</a></h2> |
| <pre class="rust"> |
| \* literal *, works for any punctuation character: \.+*?()|[]{}^$ |
| \a bell (\x07) |
| \f form feed (\x0C) |
| \t horizontal tab |
| \n new line |
| \r carriage return |
| \v vertical tab (\x0B) |
| \123 octal character code (up to three digits) |
| \x7F hex character code (exactly two digits) |
| \x{10FFFF} any hex character code corresponding to a Unicode code point |
| </pre> |
| |
| <h2 id='perl-character-classes-unicode-friendly' class='section-header'><a href='#perl-character-classes-unicode-friendly'>Perl character classes (Unicode friendly)</a></h2> |
| <p>These classes are based on the definitions provided in |
| <a href="http://www.unicode.org/reports/tr18/#Compatibility_Properties">UTS#18</a>:</p> |
| |
| <pre class="rust"> |
| \d digit (\p{Nd}) |
| \D not digit |
| \s whitespace (\p{White_Space}) |
| \S not whitespace |
| \w word character (\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control}) |
| \W not word character |
| </pre> |
| |
| <h2 id='ascii-character-classes' class='section-header'><a href='#ascii-character-classes'>ASCII character classes</a></h2> |
| <pre class="rust"> |
| [:alnum:] alphanumeric ([0-9A-Za-z]) |
| [:alpha:] alphabetic ([A-Za-z]) |
| [:ascii:] ASCII ([\x00-\x7F]) |
| [:blank:] blank ([\t ]) |
| [:cntrl:] control ([\x00-\x1F\x7F]) |
| [:digit:] digits ([0-9]) |
| [:graph:] graphical ([!-~]) |
| [:lower:] lower case ([a-z]) |
| [:print:] printable ([ -~]) |
| [:punct:] punctuation ([!-/:-@[-`{-~]) |
| [:space:] whitespace ([\t\n\v\f\r ]) |
| [:upper:] upper case ([A-Z]) |
| [:word:] word characters ([0-9A-Za-z_]) |
| [:xdigit:] hex digit ([0-9A-Fa-f]) |
| </pre> |
| |
| <h1 id='untrusted-input' class='section-header'><a href='#untrusted-input'>Untrusted input</a></h1> |
| <p>This crate can handle both untrusted regular expressions and untrusted |
| search text.</p> |
| |
| <p>Untrusted regular expressions are handled by capping the size of a compiled |
| regular expression. (See <code>Regex::with_size_limit</code>.) Without this, it would |
| be trivial for an attacker to exhaust your system's memory with expressions |
| like <code>a{100}{100}{100}</code>.</p> |
| |
| <p>Untrusted search text is allowed because the matching engine(s) in this |
| crate have time complexity <code>O(mn)</code> (with <code>m ~ regex</code> and <code>n ~ search text</code>), which means there's no way to cause exponential blow-up like with |
| some other regular expression engines. (We pay for this by disallowing |
| features like arbitrary look-ahead and backreferences.)</p> |
| |
| <p>When a DFA is used, pathological cases with exponential state blow up are |
| avoided by constructing the DFA lazily or in an "online" manner. Therefore, |
| at most one new state can be created for each byte of input. This satisfies |
| our time complexity guarantees, but can lead to unbounded memory growth |
| proportional to the size of the input. As a stopgap, the DFA is only |
| allowed to store a fixed number of states. (When the limit is reached, its |
| states are wiped and continues on, possibly duplicating previous work. If |
| the limit is reached too frequently, it gives up and hands control off to |
| another matching engine with fixed memory requirements.)</p> |
| </div><h2 id='modules' class='section-header'><a href="#modules">Modules</a></h2> |
| <table> |
| <tr class=' module-item'> |
| <td><a class="mod" href="bytes/index.html" |
| title='mod regex::bytes'>bytes</a></td> |
| <td class='docblock-short'> |
| <p>Match regular expressions on arbitrary bytes.</p> |
| </td> |
| </tr></table><h2 id='structs' class='section-header'><a href="#structs">Structs</a></h2> |
| <table> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.CaptureNames.html" |
| title='struct regex::CaptureNames'>CaptureNames</a></td> |
| <td class='docblock-short'> |
| <p>An iterator over the names of all possible captures.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.Captures.html" |
| title='struct regex::Captures'>Captures</a></td> |
| <td class='docblock-short'> |
| <p>Captures represents a group of captured strings for a single match.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.FindCaptures.html" |
| title='struct regex::FindCaptures'>FindCaptures</a></td> |
| <td class='docblock-short'> |
| <p>An iterator that yields all non-overlapping capture groups matching a |
| particular regular expression.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.FindMatches.html" |
| title='struct regex::FindMatches'>FindMatches</a></td> |
| <td class='docblock-short'> |
| <p>An iterator over all non-overlapping matches for a particular string.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.NoExpand.html" |
| title='struct regex::NoExpand'>NoExpand</a></td> |
| <td class='docblock-short'> |
| <p>NoExpand indicates literal string replacement.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.Regex.html" |
| title='struct regex::Regex'>Regex</a></td> |
| <td class='docblock-short'> |
| <p>A compiled regular expression for matching Unicode strings.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.RegexBuilder.html" |
| title='struct regex::RegexBuilder'>RegexBuilder</a></td> |
| <td class='docblock-short'> |
| <p>A configurable builder for a regular expression.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.RegexSet.html" |
| title='struct regex::RegexSet'>RegexSet</a></td> |
| <td class='docblock-short'> |
| <p>Match multiple (possibly overlapping) regular expressions in a single scan.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.RegexSplits.html" |
| title='struct regex::RegexSplits'>RegexSplits</a></td> |
| <td class='docblock-short'> |
| <p>Yields all substrings delimited by a regular expression match.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.RegexSplitsN.html" |
| title='struct regex::RegexSplitsN'>RegexSplitsN</a></td> |
| <td class='docblock-short'> |
| <p>Yields at most <code>N</code> substrings delimited by a regular expression match.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.SetMatches.html" |
| title='struct regex::SetMatches'>SetMatches</a></td> |
| <td class='docblock-short'> |
| <p>A set of matches returned by a regex set.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.SetMatchesIntoIter.html" |
| title='struct regex::SetMatchesIntoIter'>SetMatchesIntoIter</a></td> |
| <td class='docblock-short'> |
| <p>An owned iterator over the set of matches from a regex set.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.SetMatchesIter.html" |
| title='struct regex::SetMatchesIter'>SetMatchesIter</a></td> |
| <td class='docblock-short'> |
| <p>A borrowed iterator over the set of matches from a regex set.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.SubCaptures.html" |
| title='struct regex::SubCaptures'>SubCaptures</a></td> |
| <td class='docblock-short'> |
| <p>An iterator over capture groups for a particular match of a regular |
| expression.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.SubCapturesNamed.html" |
| title='struct regex::SubCapturesNamed'>SubCapturesNamed</a></td> |
| <td class='docblock-short'> |
| <p>An Iterator over named capture groups as a tuple with the group |
| name and the value.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.SubCapturesPos.html" |
| title='struct regex::SubCapturesPos'>SubCapturesPos</a></td> |
| <td class='docblock-short'> |
| <p>An iterator over capture group positions for a particular match of a |
| regular expression.</p> |
| </td> |
| </tr></table><h2 id='enums' class='section-header'><a href="#enums">Enums</a></h2> |
| <table> |
| <tr class=' module-item'> |
| <td><a class="enum" href="enum.Error.html" |
| title='enum regex::Error'>Error</a></td> |
| <td class='docblock-short'> |
| <p>An error that occurred during parsing or compiling a regular expression.</p> |
| </td> |
| </tr></table><h2 id='traits' class='section-header'><a href="#traits">Traits</a></h2> |
| <table> |
| <tr class=' module-item'> |
| <td><a class="trait" href="trait.Replacer.html" |
| title='trait regex::Replacer'>Replacer</a></td> |
| <td class='docblock-short'> |
| <p>Replacer describes types that can be used to replace matches in a string.</p> |
| </td> |
| </tr></table><h2 id='functions' class='section-header'><a href="#functions">Functions</a></h2> |
| <table> |
| <tr class=' module-item'> |
| <td><a class="fn" href="fn.is_match.html" |
| title='fn regex::is_match'>is_match</a></td> |
| <td class='docblock-short'> |
| <p>Tests if the given regular expression matches somewhere in the text given.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="fn" href="fn.quote.html" |
| title='fn regex::quote'>quote</a></td> |
| <td class='docblock-short'> |
| <p>Escapes all regular expression meta characters in <code>text</code>.</p> |
| </td> |
| </tr></table></section> |
| <section id='search' class="content hidden"></section> |
| |
| <section class="footer"></section> |
| |
| <aside id="help" class="hidden"> |
| <div> |
| <h1 class="hidden">Help</h1> |
| |
| <div class="shortcuts"> |
| <h2>Keyboard Shortcuts</h2> |
| |
| <dl> |
| <dt>?</dt> |
| <dd>Show this help dialog</dd> |
| <dt>S</dt> |
| <dd>Focus the search field</dd> |
| <dt>⇤</dt> |
| <dd>Move up in search results</dd> |
| <dt>⇥</dt> |
| <dd>Move down in search results</dd> |
| <dt>⏎</dt> |
| <dd>Go to active search result</dd> |
| <dt>+</dt> |
| <dd>Collapse/expand all sections</dd> |
| </dl> |
| </div> |
| |
| <div class="infos"> |
| <h2>Search Tricks</h2> |
| |
| <p> |
| Prefix searches with a type followed by a colon (e.g. |
| <code>fn:</code>) to restrict the search to a given type. |
| </p> |
| |
| <p> |
| Accepted types are: <code>fn</code>, <code>mod</code>, |
| <code>struct</code>, <code>enum</code>, |
| <code>trait</code>, <code>type</code>, <code>macro</code>, |
| and <code>const</code>. |
| </p> |
| |
| <p> |
| Search functions by type signature (e.g. |
| <code>vec -> usize</code> or <code>* -> vec</code>) |
| </p> |
| </div> |
| </div> |
| </aside> |
| |
| |
| |
| <script> |
| window.rootPath = "../"; |
| window.currentCrate = "regex"; |
| </script> |
| <script src="../main.js"></script> |
| <script defer src="../search-index.js"></script> |
| </body> |
| </html> |