| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8"> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <meta name="generator" content="rustdoc"> |
| <meta name="description" content="API documentation for the Rust `bytes` mod in crate `regex`."> |
| <meta name="keywords" content="rust, rustlang, rust-lang, bytes"> |
| |
| <title>regex::bytes - Rust</title> |
| |
| <link rel="stylesheet" type="text/css" href="../../normalize.css"> |
| <link rel="stylesheet" type="text/css" href="../../rustdoc.css"> |
| <link rel="stylesheet" type="text/css" href="../../main.css"> |
| |
| |
| <link rel="shortcut icon" href="https://www.rust-lang.org/favicon.ico"> |
| |
| </head> |
| <body class="rustdoc mod"> |
| <!--[if lte IE 8]> |
| <div class="warning"> |
| This old browser is unsupported and will most likely display funky |
| things. |
| </div> |
| <![endif]--> |
| |
| |
| |
| <nav class="sidebar"> |
| <a href='../../regex/index.html'><img src='https://www.rust-lang.org/logos/rust-logo-128x128-blk-v2.png' alt='logo' width='100'></a> |
| <p class='location'>Module bytes</p><div class="block items"><ul><li><a href="#structs">Structs</a></li><li><a href="#traits">Traits</a></li></ul></div><p class='location'><a href='../index.html'>regex</a></p><script>window.sidebarCurrent = {name: 'bytes', ty: 'mod', relpath: '../'};</script><script defer src="../sidebar-items.js"></script> |
| </nav> |
| |
| <nav class="sub"> |
| <form class="search-form js-only"> |
| <div class="search-container"> |
| <input class="search-input" name="search" |
| autocomplete="off" |
| placeholder="Click or press ‘S’ to search, ‘?’ for more options…" |
| type="search"> |
| </div> |
| </form> |
| </nav> |
| |
| <section id='main' class="content"> |
| <h1 class='fqn'><span class='in-band'>Module <a href='../index.html'>regex</a>::<wbr><a class="mod" href=''>bytes</a></span><span class='out-of-band'><span id='render-detail'> |
| <a id="toggle-all-docs" href="javascript:void(0)" title="collapse all docs"> |
| [<span class='inner'>−</span>] |
| </a> |
| </span><a class='srclink' href='../../src/regex/lib.rs.html#561-565' title='goto source code'>[src]</a></span></h1> |
| <div class='docblock'><p>Match regular expressions on arbitrary bytes.</p> |
| |
| <p>This module provides a nearly identical API to the one found in the |
| top-level of this crate. There are two important differences:</p> |
| |
| <ol> |
| <li>Matching is done on <code>&[u8]</code> instead of <code>&str</code>. Additionally, <code>Vec<u8></code> |
| is used where <code>String</code> would have been used.</li> |
| <li>Regular expressions are compiled with Unicode support <em>disabled</em> by |
| default. This means that while Unicode regular expressions can only match valid |
| UTF-8, regular expressions in this module can match arbitrary bytes. Unicode |
| support can be selectively enabled via the <code>u</code> flag in regular expressions |
| provided by this sub-module.</li> |
| </ol> |
| |
| <h1 id='example-match-null-terminated-string' class='section-header'><a href='#example-match-null-terminated-string'>Example: match null terminated string</a></h1> |
| <p>This shows how to find all null-terminated strings in a slice of bytes:</p> |
| |
| <pre class="rust rust-example-rendered"> |
| <span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"(?P<cstr>[^\x00]+)\x00"</span>).<span class="ident">unwrap</span>(); |
| <span class="kw">let</span> <span class="ident">text</span> <span class="op">=</span> <span class="string">b"foo\x00bar\x00baz\x00"</span>; |
| |
| <span class="comment">// Extract all of the strings without the null terminator from each match.</span> |
| <span class="comment">// The unwrap is OK here since a match requires the `cstr` capture to match.</span> |
| <span class="kw">let</span> <span class="ident">cstrs</span>: <span class="ident">Vec</span><span class="op"><</span><span class="kw-2">&</span>[<span class="ident">u8</span>]<span class="op">></span> <span class="op">=</span> |
| <span class="ident">re</span>.<span class="ident">captures_iter</span>(<span class="ident">text</span>) |
| .<span class="ident">map</span>(<span class="op">|</span><span class="ident">c</span><span class="op">|</span> <span class="ident">c</span>.<span class="ident">name</span>(<span class="string">"cstr"</span>).<span class="ident">unwrap</span>()) |
| .<span class="ident">collect</span>(); |
| <span class="macro">assert_eq</span><span class="macro">!</span>(<span class="macro">vec</span><span class="macro">!</span>[<span class="kw-2">&</span><span class="string">b"foo"</span>[..], <span class="kw-2">&</span><span class="string">b"bar"</span>[..], <span class="kw-2">&</span><span class="string">b"baz"</span>[..]], <span class="ident">cstrs</span>);</pre> |
| |
| <h1 id='example-selectively-enable-unicode-support' class='section-header'><a href='#example-selectively-enable-unicode-support'>Example: selectively enable Unicode support</a></h1> |
| <p>This shows how to match an arbitrary byte pattern followed by a UTF-8 encoded |
| string (e.g., to extract a title from a Matroska file):</p> |
| |
| <pre class="rust rust-example-rendered"> |
| <span class="kw">let</span> <span class="ident">re</span> <span class="op">=</span> <span class="ident">Regex</span>::<span class="ident">new</span>(<span class="string">r"\x7b\xa9(?:[\x80-\xfe]|[\x40-\xff].)(?u:(.*))"</span>).<span class="ident">unwrap</span>(); |
| <span class="kw">let</span> <span class="ident">text</span> <span class="op">=</span> <span class="string">b"\x12\xd0\x3b\x5f\x7b\xa9\x85\xe2\x98\x83\x80\x98\x54\x76\x68\x65"</span>; |
| <span class="kw">let</span> <span class="ident">caps</span> <span class="op">=</span> <span class="ident">re</span>.<span class="ident">captures</span>(<span class="ident">text</span>).<span class="ident">unwrap</span>(); |
| |
| <span class="comment">// Notice that despite the `.*` at the end, it will only match valid UTF-8</span> |
| <span class="comment">// because Unicode mode was enabled with the `u` flag. Without the `u` flag,</span> |
| <span class="comment">// the `.*` would match the rest of the bytes.</span> |
| <span class="macro">assert_eq</span><span class="macro">!</span>((<span class="number">7</span>, <span class="number">10</span>), <span class="ident">caps</span>.<span class="ident">pos</span>(<span class="number">1</span>).<span class="ident">unwrap</span>()); |
| |
| <span class="comment">// If there was a match, Unicode mode guarantees that `title` is valid UTF-8.</span> |
| <span class="kw">let</span> <span class="ident">title</span> <span class="op">=</span> <span class="ident">str</span>::<span class="ident">from_utf8</span>(<span class="ident">caps</span>.<span class="ident">at</span>(<span class="number">1</span>).<span class="ident">unwrap</span>()).<span class="ident">unwrap</span>(); |
| <span class="macro">assert_eq</span><span class="macro">!</span>(<span class="string">"☃"</span>, <span class="ident">title</span>);</pre> |
| |
| <p>In general, if the Unicode flag is enabled in a capture group and that capture |
| is part of the overall match, then the capture is <em>guaranteed</em> to be valid |
| UTF-8.</p> |
| |
| <h1 id='syntax' class='section-header'><a href='#syntax'>Syntax</a></h1> |
| <p>The supported syntax is pretty much the same as the syntax for Unicode |
| regular expressions with a few changes that make sense for matching arbitrary |
| bytes:</p> |
| |
| <ol> |
| <li>The <code>u</code> flag is <em>disabled</em> by default, but can be selectively enabled. (The |
| opposite is true for the main <code>Regex</code> type.) Disabling the <code>u</code> flag is said to |
| invoke "ASCII compatible" mode.</li> |
| <li>In ASCII compatible mode, neither Unicode codepoints nor Unicode character |
| classes are allowed.</li> |
| <li>In ASCII compatible mode, Perl character classes (<code>\w</code>, <code>\d</code> and <code>\s</code>) |
| revert to their typical ASCII definition. <code>\w</code> maps to <code>[[:word:]]</code>, <code>\d</code> maps |
| to <code>[[:digit:]]</code> and <code>\s</code> maps to <code>[[:space:]]</code>.</li> |
| <li>In ASCII compatible mode, word boundaries use the ASCII compatible <code>\w</code> to |
| determine whether a byte is a word byte or not.</li> |
| <li>Hexadecimal notation can be used to specify arbitrary bytes instead of |
| Unicode codepoints. For example, in ASCII compatible mode, <code>\xFF</code> matches the |
| literal byte <code>\xFF</code>, while in Unicode mode, <code>\xFF</code> is a Unicode codepoint that |
| matches its UTF-8 encoding of <code>\xC3\xBF</code>. Similarly for octal notation.</li> |
| <li><code>.</code> matches any <em>byte</em> except for <code>\n</code> instead of any codepoint. When the |
| <code>s</code> flag is enabled, <code>.</code> matches any byte.</li> |
| </ol> |
| |
| <h1 id='performance' class='section-header'><a href='#performance'>Performance</a></h1> |
| <p>In general, one should expect performance on <code>&[u8]</code> to be roughly similar to |
| performance on <code>&str</code>.</p> |
| </div><h2 id='structs' class='section-header'><a href="#structs">Structs</a></h2> |
| <table> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.CaptureNames.html" |
| title='struct regex::bytes::CaptureNames'>CaptureNames</a></td> |
| <td class='docblock-short'> |
| <p>An iterator over the names of all possible captures.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.Captures.html" |
| title='struct regex::bytes::Captures'>Captures</a></td> |
| <td class='docblock-short'> |
| <p>Captures represents a group of captured byte strings for a single match.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.FindCaptures.html" |
| title='struct regex::bytes::FindCaptures'>FindCaptures</a></td> |
| <td class='docblock-short'> |
| <p>An iterator that yields all non-overlapping capture groups matching a |
| particular regular expression.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.FindMatches.html" |
| title='struct regex::bytes::FindMatches'>FindMatches</a></td> |
| <td class='docblock-short'> |
| <p>An iterator over all non-overlapping matches for a particular string.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.NoExpand.html" |
| title='struct regex::bytes::NoExpand'>NoExpand</a></td> |
| <td class='docblock-short'> |
| <p>NoExpand indicates literal byte string replacement.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.Regex.html" |
| title='struct regex::bytes::Regex'>Regex</a></td> |
| <td class='docblock-short'> |
| <p>A compiled regular expression for matching arbitrary bytes.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.RegexBuilder.html" |
| title='struct regex::bytes::RegexBuilder'>RegexBuilder</a></td> |
| <td class='docblock-short'> |
| <p>A configurable builder for a regular expression.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.RegexSet.html" |
| title='struct regex::bytes::RegexSet'>RegexSet</a></td> |
| <td class='docblock-short'> |
| <p>Match multiple (possibly overlapping) regular expressions in a single scan.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.SetMatches.html" |
| title='struct regex::bytes::SetMatches'>SetMatches</a></td> |
| <td class='docblock-short'> |
| <p>A set of matches returned by a regex set.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.SetMatchesIntoIter.html" |
| title='struct regex::bytes::SetMatchesIntoIter'>SetMatchesIntoIter</a></td> |
| <td class='docblock-short'> |
| <p>An owned iterator over the set of matches from a regex set.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.SetMatchesIter.html" |
| title='struct regex::bytes::SetMatchesIter'>SetMatchesIter</a></td> |
| <td class='docblock-short'> |
| <p>A borrowed iterator over the set of matches from a regex set.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.Splits.html" |
| title='struct regex::bytes::Splits'>Splits</a></td> |
| <td class='docblock-short'> |
| <p>Yields all substrings delimited by a regular expression match.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.SplitsN.html" |
| title='struct regex::bytes::SplitsN'>SplitsN</a></td> |
| <td class='docblock-short'> |
| <p>Yields at most <code>N</code> substrings delimited by a regular expression match.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.SubCaptures.html" |
| title='struct regex::bytes::SubCaptures'>SubCaptures</a></td> |
| <td class='docblock-short'> |
| <p>An iterator over capture groups for a particular match of a regular |
| expression.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.SubCapturesNamed.html" |
| title='struct regex::bytes::SubCapturesNamed'>SubCapturesNamed</a></td> |
| <td class='docblock-short'> |
| <p>An Iterator over named capture groups as a tuple with the group name and |
| the value.</p> |
| </td> |
| </tr> |
| <tr class=' module-item'> |
| <td><a class="struct" href="struct.SubCapturesPos.html" |
| title='struct regex::bytes::SubCapturesPos'>SubCapturesPos</a></td> |
| <td class='docblock-short'> |
| <p>An iterator over capture group positions for a particular match of a |
| regular expression.</p> |
| </td> |
| </tr></table><h2 id='traits' class='section-header'><a href="#traits">Traits</a></h2> |
| <table> |
| <tr class=' module-item'> |
| <td><a class="trait" href="trait.Replacer.html" |
| title='trait regex::bytes::Replacer'>Replacer</a></td> |
| <td class='docblock-short'> |
| <p>Replacer describes types that can be used to replace matches in a byte |
| string.</p> |
| </td> |
| </tr></table></section> |
| <section id='search' class="content hidden"></section> |
| |
| <section class="footer"></section> |
| |
| <aside id="help" class="hidden"> |
| <div> |
| <h1 class="hidden">Help</h1> |
| |
| <div class="shortcuts"> |
| <h2>Keyboard Shortcuts</h2> |
| |
| <dl> |
| <dt>?</dt> |
| <dd>Show this help dialog</dd> |
| <dt>S</dt> |
| <dd>Focus the search field</dd> |
| <dt>⇤</dt> |
| <dd>Move up in search results</dd> |
| <dt>⇥</dt> |
| <dd>Move down in search results</dd> |
| <dt>⏎</dt> |
| <dd>Go to active search result</dd> |
| <dt>+</dt> |
| <dd>Collapse/expand all sections</dd> |
| </dl> |
| </div> |
| |
| <div class="infos"> |
| <h2>Search Tricks</h2> |
| |
| <p> |
| Prefix searches with a type followed by a colon (e.g. |
| <code>fn:</code>) to restrict the search to a given type. |
| </p> |
| |
| <p> |
| Accepted types are: <code>fn</code>, <code>mod</code>, |
| <code>struct</code>, <code>enum</code>, |
| <code>trait</code>, <code>type</code>, <code>macro</code>, |
| and <code>const</code>. |
| </p> |
| |
| <p> |
| Search functions by type signature (e.g. |
| <code>vec -> usize</code> or <code>* -> vec</code>) |
| </p> |
| </div> |
| </div> |
| </aside> |
| |
| |
| |
| <script> |
| window.rootPath = "../../"; |
| window.currentCrate = "regex"; |
| </script> |
| <script src="../../main.js"></script> |
| <script defer src="../../search-index.js"></script> |
| </body> |
| </html> |