| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8"> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <meta name="generator" content="rustdoc"> |
| <meta name="description" content="Source to the Rust file `/home/fmp/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-0.1.80/src/dfa.rs`."> |
| <meta name="keywords" content="rust, rustlang, rust-lang"> |
| |
| <title>dfa.rs.html -- source</title> |
| |
| <link rel="stylesheet" type="text/css" href="../../normalize.css"> |
| <link rel="stylesheet" type="text/css" href="../../rustdoc.css"> |
| <link rel="stylesheet" type="text/css" href="../../main.css"> |
| |
| |
| <link rel="shortcut icon" href="https://www.rust-lang.org/favicon.ico"> |
| |
| </head> |
| <body class="rustdoc source"> |
| <!--[if lte IE 8]> |
| <div class="warning"> |
| This old browser is unsupported and will most likely display funky |
| things. |
| </div> |
| <![endif]--> |
| |
| |
| |
| <nav class="sidebar"> |
| <a href='../../regex/index.html'><img src='https://www.rust-lang.org/logos/rust-logo-128x128-blk-v2.png' alt='logo' width='100'></a> |
| |
| </nav> |
| |
| <nav class="sub"> |
| <form class="search-form js-only"> |
| <div class="search-container"> |
| <input class="search-input" name="search" |
| autocomplete="off" |
| placeholder="Click or press ‘S’ to search, ‘?’ for more options…" |
| type="search"> |
| </div> |
| </form> |
| </nav> |
| |
| <section id='main' class="content"><pre class="line-numbers"><span id="1"> 1</span> |
| <span id="2"> 2</span> |
| <span id="3"> 3</span> |
| <span id="4"> 4</span> |
| <span id="5"> 5</span> |
| <span id="6"> 6</span> |
| <span id="7"> 7</span> |
| <span id="8"> 8</span> |
| <span id="9"> 9</span> |
| <span id="10"> 10</span> |
| <span id="11"> 11</span> |
| <span id="12"> 12</span> |
| <span id="13"> 13</span> |
| <span id="14"> 14</span> |
| <span id="15"> 15</span> |
| <span id="16"> 16</span> |
| <span id="17"> 17</span> |
| <span id="18"> 18</span> |
| <span id="19"> 19</span> |
| <span id="20"> 20</span> |
| <span id="21"> 21</span> |
| <span id="22"> 22</span> |
| <span id="23"> 23</span> |
| <span id="24"> 24</span> |
| <span id="25"> 25</span> |
| <span id="26"> 26</span> |
| <span id="27"> 27</span> |
| <span id="28"> 28</span> |
| <span id="29"> 29</span> |
| <span id="30"> 30</span> |
| <span id="31"> 31</span> |
| <span id="32"> 32</span> |
| <span id="33"> 33</span> |
| <span id="34"> 34</span> |
| <span id="35"> 35</span> |
| <span id="36"> 36</span> |
| <span id="37"> 37</span> |
| <span id="38"> 38</span> |
| <span id="39"> 39</span> |
| <span id="40"> 40</span> |
| <span id="41"> 41</span> |
| <span id="42"> 42</span> |
| <span id="43"> 43</span> |
| <span id="44"> 44</span> |
| <span id="45"> 45</span> |
| <span id="46"> 46</span> |
| <span id="47"> 47</span> |
| <span id="48"> 48</span> |
| <span id="49"> 49</span> |
| <span id="50"> 50</span> |
| <span id="51"> 51</span> |
| <span id="52"> 52</span> |
| <span id="53"> 53</span> |
| <span id="54"> 54</span> |
| <span id="55"> 55</span> |
| <span id="56"> 56</span> |
| <span id="57"> 57</span> |
| <span id="58"> 58</span> |
| <span id="59"> 59</span> |
| <span id="60"> 60</span> |
| <span id="61"> 61</span> |
| <span id="62"> 62</span> |
| <span id="63"> 63</span> |
| <span id="64"> 64</span> |
| <span id="65"> 65</span> |
| <span id="66"> 66</span> |
| <span id="67"> 67</span> |
| <span id="68"> 68</span> |
| <span id="69"> 69</span> |
| <span id="70"> 70</span> |
| <span id="71"> 71</span> |
| <span id="72"> 72</span> |
| <span id="73"> 73</span> |
| <span id="74"> 74</span> |
| <span id="75"> 75</span> |
| <span id="76"> 76</span> |
| <span id="77"> 77</span> |
| <span id="78"> 78</span> |
| <span id="79"> 79</span> |
| <span id="80"> 80</span> |
| <span id="81"> 81</span> |
| <span id="82"> 82</span> |
| <span id="83"> 83</span> |
| <span id="84"> 84</span> |
| <span id="85"> 85</span> |
| <span id="86"> 86</span> |
| <span id="87"> 87</span> |
| <span id="88"> 88</span> |
| <span id="89"> 89</span> |
| <span id="90"> 90</span> |
| <span id="91"> 91</span> |
| <span id="92"> 92</span> |
| <span id="93"> 93</span> |
| <span id="94"> 94</span> |
| <span id="95"> 95</span> |
| <span id="96"> 96</span> |
| <span id="97"> 97</span> |
| <span id="98"> 98</span> |
| <span id="99"> 99</span> |
| <span id="100"> 100</span> |
| <span id="101"> 101</span> |
| <span id="102"> 102</span> |
| <span id="103"> 103</span> |
| <span id="104"> 104</span> |
| <span id="105"> 105</span> |
| <span id="106"> 106</span> |
| <span id="107"> 107</span> |
| <span id="108"> 108</span> |
| <span id="109"> 109</span> |
| <span id="110"> 110</span> |
| <span id="111"> 111</span> |
| <span id="112"> 112</span> |
| <span id="113"> 113</span> |
| <span id="114"> 114</span> |
| <span id="115"> 115</span> |
| <span id="116"> 116</span> |
| <span id="117"> 117</span> |
| <span id="118"> 118</span> |
| <span id="119"> 119</span> |
| <span id="120"> 120</span> |
| <span id="121"> 121</span> |
| <span id="122"> 122</span> |
| <span id="123"> 123</span> |
| <span id="124"> 124</span> |
| <span id="125"> 125</span> |
| <span id="126"> 126</span> |
| <span id="127"> 127</span> |
| <span id="128"> 128</span> |
| <span id="129"> 129</span> |
| <span id="130"> 130</span> |
| <span id="131"> 131</span> |
| <span id="132"> 132</span> |
| <span id="133"> 133</span> |
| <span id="134"> 134</span> |
| <span id="135"> 135</span> |
| <span id="136"> 136</span> |
| <span id="137"> 137</span> |
| <span id="138"> 138</span> |
| <span id="139"> 139</span> |
| <span id="140"> 140</span> |
| <span id="141"> 141</span> |
| <span id="142"> 142</span> |
| <span id="143"> 143</span> |
| <span id="144"> 144</span> |
| <span id="145"> 145</span> |
| <span id="146"> 146</span> |
| <span id="147"> 147</span> |
| <span id="148"> 148</span> |
| <span id="149"> 149</span> |
| <span id="150"> 150</span> |
| <span id="151"> 151</span> |
| <span id="152"> 152</span> |
| <span id="153"> 153</span> |
| <span id="154"> 154</span> |
| <span id="155"> 155</span> |
| <span id="156"> 156</span> |
| <span id="157"> 157</span> |
| <span id="158"> 158</span> |
| <span id="159"> 159</span> |
| <span id="160"> 160</span> |
| <span id="161"> 161</span> |
| <span id="162"> 162</span> |
| <span id="163"> 163</span> |
| <span id="164"> 164</span> |
| <span id="165"> 165</span> |
| <span id="166"> 166</span> |
| <span id="167"> 167</span> |
| <span id="168"> 168</span> |
| <span id="169"> 169</span> |
| <span id="170"> 170</span> |
| <span id="171"> 171</span> |
| <span id="172"> 172</span> |
| <span id="173"> 173</span> |
| <span id="174"> 174</span> |
| <span id="175"> 175</span> |
| <span id="176"> 176</span> |
| <span id="177"> 177</span> |
| <span id="178"> 178</span> |
| <span id="179"> 179</span> |
| <span id="180"> 180</span> |
| <span id="181"> 181</span> |
| <span id="182"> 182</span> |
| <span id="183"> 183</span> |
| <span id="184"> 184</span> |
| <span id="185"> 185</span> |
| <span id="186"> 186</span> |
| <span id="187"> 187</span> |
| <span id="188"> 188</span> |
| <span id="189"> 189</span> |
| <span id="190"> 190</span> |
| <span id="191"> 191</span> |
| <span id="192"> 192</span> |
| <span id="193"> 193</span> |
| <span id="194"> 194</span> |
| <span id="195"> 195</span> |
| <span id="196"> 196</span> |
| <span id="197"> 197</span> |
| <span id="198"> 198</span> |
| <span id="199"> 199</span> |
| <span id="200"> 200</span> |
| <span id="201"> 201</span> |
| <span id="202"> 202</span> |
| <span id="203"> 203</span> |
| <span id="204"> 204</span> |
| <span id="205"> 205</span> |
| <span id="206"> 206</span> |
| <span id="207"> 207</span> |
| <span id="208"> 208</span> |
| <span id="209"> 209</span> |
| <span id="210"> 210</span> |
| <span id="211"> 211</span> |
| <span id="212"> 212</span> |
| <span id="213"> 213</span> |
| <span id="214"> 214</span> |
| <span id="215"> 215</span> |
| <span id="216"> 216</span> |
| <span id="217"> 217</span> |
| <span id="218"> 218</span> |
| <span id="219"> 219</span> |
| <span id="220"> 220</span> |
| <span id="221"> 221</span> |
| <span id="222"> 222</span> |
| <span id="223"> 223</span> |
| <span id="224"> 224</span> |
| <span id="225"> 225</span> |
| <span id="226"> 226</span> |
| <span id="227"> 227</span> |
| <span id="228"> 228</span> |
| <span id="229"> 229</span> |
| <span id="230"> 230</span> |
| <span id="231"> 231</span> |
| <span id="232"> 232</span> |
| <span id="233"> 233</span> |
| <span id="234"> 234</span> |
| <span id="235"> 235</span> |
| <span id="236"> 236</span> |
| <span id="237"> 237</span> |
| <span id="238"> 238</span> |
| <span id="239"> 239</span> |
| <span id="240"> 240</span> |
| <span id="241"> 241</span> |
| <span id="242"> 242</span> |
| <span id="243"> 243</span> |
| <span id="244"> 244</span> |
| <span id="245"> 245</span> |
| <span id="246"> 246</span> |
| <span id="247"> 247</span> |
| <span id="248"> 248</span> |
| <span id="249"> 249</span> |
| <span id="250"> 250</span> |
| <span id="251"> 251</span> |
| <span id="252"> 252</span> |
| <span id="253"> 253</span> |
| <span id="254"> 254</span> |
| <span id="255"> 255</span> |
| <span id="256"> 256</span> |
| <span id="257"> 257</span> |
| <span id="258"> 258</span> |
| <span id="259"> 259</span> |
| <span id="260"> 260</span> |
| <span id="261"> 261</span> |
| <span id="262"> 262</span> |
| <span id="263"> 263</span> |
| <span id="264"> 264</span> |
| <span id="265"> 265</span> |
| <span id="266"> 266</span> |
| <span id="267"> 267</span> |
| <span id="268"> 268</span> |
| <span id="269"> 269</span> |
| <span id="270"> 270</span> |
| <span id="271"> 271</span> |
| <span id="272"> 272</span> |
| <span id="273"> 273</span> |
| <span id="274"> 274</span> |
| <span id="275"> 275</span> |
| <span id="276"> 276</span> |
| <span id="277"> 277</span> |
| <span id="278"> 278</span> |
| <span id="279"> 279</span> |
| <span id="280"> 280</span> |
| <span id="281"> 281</span> |
| <span id="282"> 282</span> |
| <span id="283"> 283</span> |
| <span id="284"> 284</span> |
| <span id="285"> 285</span> |
| <span id="286"> 286</span> |
| <span id="287"> 287</span> |
| <span id="288"> 288</span> |
| <span id="289"> 289</span> |
| <span id="290"> 290</span> |
| <span id="291"> 291</span> |
| <span id="292"> 292</span> |
| <span id="293"> 293</span> |
| <span id="294"> 294</span> |
| <span id="295"> 295</span> |
| <span id="296"> 296</span> |
| <span id="297"> 297</span> |
| <span id="298"> 298</span> |
| <span id="299"> 299</span> |
| <span id="300"> 300</span> |
| <span id="301"> 301</span> |
| <span id="302"> 302</span> |
| <span id="303"> 303</span> |
| <span id="304"> 304</span> |
| <span id="305"> 305</span> |
| <span id="306"> 306</span> |
| <span id="307"> 307</span> |
| <span id="308"> 308</span> |
| <span id="309"> 309</span> |
| <span id="310"> 310</span> |
| <span id="311"> 311</span> |
| <span id="312"> 312</span> |
| <span id="313"> 313</span> |
| <span id="314"> 314</span> |
| <span id="315"> 315</span> |
| <span id="316"> 316</span> |
| <span id="317"> 317</span> |
| <span id="318"> 318</span> |
| <span id="319"> 319</span> |
| <span id="320"> 320</span> |
| <span id="321"> 321</span> |
| <span id="322"> 322</span> |
| <span id="323"> 323</span> |
| <span id="324"> 324</span> |
| <span id="325"> 325</span> |
| <span id="326"> 326</span> |
| <span id="327"> 327</span> |
| <span id="328"> 328</span> |
| <span id="329"> 329</span> |
| <span id="330"> 330</span> |
| <span id="331"> 331</span> |
| <span id="332"> 332</span> |
| <span id="333"> 333</span> |
| <span id="334"> 334</span> |
| <span id="335"> 335</span> |
| <span id="336"> 336</span> |
| <span id="337"> 337</span> |
| <span id="338"> 338</span> |
| <span id="339"> 339</span> |
| <span id="340"> 340</span> |
| <span id="341"> 341</span> |
| <span id="342"> 342</span> |
| <span id="343"> 343</span> |
| <span id="344"> 344</span> |
| <span id="345"> 345</span> |
| <span id="346"> 346</span> |
| <span id="347"> 347</span> |
| <span id="348"> 348</span> |
| <span id="349"> 349</span> |
| <span id="350"> 350</span> |
| <span id="351"> 351</span> |
| <span id="352"> 352</span> |
| <span id="353"> 353</span> |
| <span id="354"> 354</span> |
| <span id="355"> 355</span> |
| <span id="356"> 356</span> |
| <span id="357"> 357</span> |
| <span id="358"> 358</span> |
| <span id="359"> 359</span> |
| <span id="360"> 360</span> |
| <span id="361"> 361</span> |
| <span id="362"> 362</span> |
| <span id="363"> 363</span> |
| <span id="364"> 364</span> |
| <span id="365"> 365</span> |
| <span id="366"> 366</span> |
| <span id="367"> 367</span> |
| <span id="368"> 368</span> |
| <span id="369"> 369</span> |
| <span id="370"> 370</span> |
| <span id="371"> 371</span> |
| <span id="372"> 372</span> |
| <span id="373"> 373</span> |
| <span id="374"> 374</span> |
| <span id="375"> 375</span> |
| <span id="376"> 376</span> |
| <span id="377"> 377</span> |
| <span id="378"> 378</span> |
| <span id="379"> 379</span> |
| <span id="380"> 380</span> |
| <span id="381"> 381</span> |
| <span id="382"> 382</span> |
| <span id="383"> 383</span> |
| <span id="384"> 384</span> |
| <span id="385"> 385</span> |
| <span id="386"> 386</span> |
| <span id="387"> 387</span> |
| <span id="388"> 388</span> |
| <span id="389"> 389</span> |
| <span id="390"> 390</span> |
| <span id="391"> 391</span> |
| <span id="392"> 392</span> |
| <span id="393"> 393</span> |
| <span id="394"> 394</span> |
| <span id="395"> 395</span> |
| <span id="396"> 396</span> |
| <span id="397"> 397</span> |
| <span id="398"> 398</span> |
| <span id="399"> 399</span> |
| <span id="400"> 400</span> |
| <span id="401"> 401</span> |
| <span id="402"> 402</span> |
| <span id="403"> 403</span> |
| <span id="404"> 404</span> |
| <span id="405"> 405</span> |
| <span id="406"> 406</span> |
| <span id="407"> 407</span> |
| <span id="408"> 408</span> |
| <span id="409"> 409</span> |
| <span id="410"> 410</span> |
| <span id="411"> 411</span> |
| <span id="412"> 412</span> |
| <span id="413"> 413</span> |
| <span id="414"> 414</span> |
| <span id="415"> 415</span> |
| <span id="416"> 416</span> |
| <span id="417"> 417</span> |
| <span id="418"> 418</span> |
| <span id="419"> 419</span> |
| <span id="420"> 420</span> |
| <span id="421"> 421</span> |
| <span id="422"> 422</span> |
| <span id="423"> 423</span> |
| <span id="424"> 424</span> |
| <span id="425"> 425</span> |
| <span id="426"> 426</span> |
| <span id="427"> 427</span> |
| <span id="428"> 428</span> |
| <span id="429"> 429</span> |
| <span id="430"> 430</span> |
| <span id="431"> 431</span> |
| <span id="432"> 432</span> |
| <span id="433"> 433</span> |
| <span id="434"> 434</span> |
| <span id="435"> 435</span> |
| <span id="436"> 436</span> |
| <span id="437"> 437</span> |
| <span id="438"> 438</span> |
| <span id="439"> 439</span> |
| <span id="440"> 440</span> |
| <span id="441"> 441</span> |
| <span id="442"> 442</span> |
| <span id="443"> 443</span> |
| <span id="444"> 444</span> |
| <span id="445"> 445</span> |
| <span id="446"> 446</span> |
| <span id="447"> 447</span> |
| <span id="448"> 448</span> |
| <span id="449"> 449</span> |
| <span id="450"> 450</span> |
| <span id="451"> 451</span> |
| <span id="452"> 452</span> |
| <span id="453"> 453</span> |
| <span id="454"> 454</span> |
| <span id="455"> 455</span> |
| <span id="456"> 456</span> |
| <span id="457"> 457</span> |
| <span id="458"> 458</span> |
| <span id="459"> 459</span> |
| <span id="460"> 460</span> |
| <span id="461"> 461</span> |
| <span id="462"> 462</span> |
| <span id="463"> 463</span> |
| <span id="464"> 464</span> |
| <span id="465"> 465</span> |
| <span id="466"> 466</span> |
| <span id="467"> 467</span> |
| <span id="468"> 468</span> |
| <span id="469"> 469</span> |
| <span id="470"> 470</span> |
| <span id="471"> 471</span> |
| <span id="472"> 472</span> |
| <span id="473"> 473</span> |
| <span id="474"> 474</span> |
| <span id="475"> 475</span> |
| <span id="476"> 476</span> |
| <span id="477"> 477</span> |
| <span id="478"> 478</span> |
| <span id="479"> 479</span> |
| <span id="480"> 480</span> |
| <span id="481"> 481</span> |
| <span id="482"> 482</span> |
| <span id="483"> 483</span> |
| <span id="484"> 484</span> |
| <span id="485"> 485</span> |
| <span id="486"> 486</span> |
| <span id="487"> 487</span> |
| <span id="488"> 488</span> |
| <span id="489"> 489</span> |
| <span id="490"> 490</span> |
| <span id="491"> 491</span> |
| <span id="492"> 492</span> |
| <span id="493"> 493</span> |
| <span id="494"> 494</span> |
| <span id="495"> 495</span> |
| <span id="496"> 496</span> |
| <span id="497"> 497</span> |
| <span id="498"> 498</span> |
| <span id="499"> 499</span> |
| <span id="500"> 500</span> |
| <span id="501"> 501</span> |
| <span id="502"> 502</span> |
| <span id="503"> 503</span> |
| <span id="504"> 504</span> |
| <span id="505"> 505</span> |
| <span id="506"> 506</span> |
| <span id="507"> 507</span> |
| <span id="508"> 508</span> |
| <span id="509"> 509</span> |
| <span id="510"> 510</span> |
| <span id="511"> 511</span> |
| <span id="512"> 512</span> |
| <span id="513"> 513</span> |
| <span id="514"> 514</span> |
| <span id="515"> 515</span> |
| <span id="516"> 516</span> |
| <span id="517"> 517</span> |
| <span id="518"> 518</span> |
| <span id="519"> 519</span> |
| <span id="520"> 520</span> |
| <span id="521"> 521</span> |
| <span id="522"> 522</span> |
| <span id="523"> 523</span> |
| <span id="524"> 524</span> |
| <span id="525"> 525</span> |
| <span id="526"> 526</span> |
| <span id="527"> 527</span> |
| <span id="528"> 528</span> |
| <span id="529"> 529</span> |
| <span id="530"> 530</span> |
| <span id="531"> 531</span> |
| <span id="532"> 532</span> |
| <span id="533"> 533</span> |
| <span id="534"> 534</span> |
| <span id="535"> 535</span> |
| <span id="536"> 536</span> |
| <span id="537"> 537</span> |
| <span id="538"> 538</span> |
| <span id="539"> 539</span> |
| <span id="540"> 540</span> |
| <span id="541"> 541</span> |
| <span id="542"> 542</span> |
| <span id="543"> 543</span> |
| <span id="544"> 544</span> |
| <span id="545"> 545</span> |
| <span id="546"> 546</span> |
| <span id="547"> 547</span> |
| <span id="548"> 548</span> |
| <span id="549"> 549</span> |
| <span id="550"> 550</span> |
| <span id="551"> 551</span> |
| <span id="552"> 552</span> |
| <span id="553"> 553</span> |
| <span id="554"> 554</span> |
| <span id="555"> 555</span> |
| <span id="556"> 556</span> |
| <span id="557"> 557</span> |
| <span id="558"> 558</span> |
| <span id="559"> 559</span> |
| <span id="560"> 560</span> |
| <span id="561"> 561</span> |
| <span id="562"> 562</span> |
| <span id="563"> 563</span> |
| <span id="564"> 564</span> |
| <span id="565"> 565</span> |
| <span id="566"> 566</span> |
| <span id="567"> 567</span> |
| <span id="568"> 568</span> |
| <span id="569"> 569</span> |
| <span id="570"> 570</span> |
| <span id="571"> 571</span> |
| <span id="572"> 572</span> |
| <span id="573"> 573</span> |
| <span id="574"> 574</span> |
| <span id="575"> 575</span> |
| <span id="576"> 576</span> |
| <span id="577"> 577</span> |
| <span id="578"> 578</span> |
| <span id="579"> 579</span> |
| <span id="580"> 580</span> |
| <span id="581"> 581</span> |
| <span id="582"> 582</span> |
| <span id="583"> 583</span> |
| <span id="584"> 584</span> |
| <span id="585"> 585</span> |
| <span id="586"> 586</span> |
| <span id="587"> 587</span> |
| <span id="588"> 588</span> |
| <span id="589"> 589</span> |
| <span id="590"> 590</span> |
| <span id="591"> 591</span> |
| <span id="592"> 592</span> |
| <span id="593"> 593</span> |
| <span id="594"> 594</span> |
| <span id="595"> 595</span> |
| <span id="596"> 596</span> |
| <span id="597"> 597</span> |
| <span id="598"> 598</span> |
| <span id="599"> 599</span> |
| <span id="600"> 600</span> |
| <span id="601"> 601</span> |
| <span id="602"> 602</span> |
| <span id="603"> 603</span> |
| <span id="604"> 604</span> |
| <span id="605"> 605</span> |
| <span id="606"> 606</span> |
| <span id="607"> 607</span> |
| <span id="608"> 608</span> |
| <span id="609"> 609</span> |
| <span id="610"> 610</span> |
| <span id="611"> 611</span> |
| <span id="612"> 612</span> |
| <span id="613"> 613</span> |
| <span id="614"> 614</span> |
| <span id="615"> 615</span> |
| <span id="616"> 616</span> |
| <span id="617"> 617</span> |
| <span id="618"> 618</span> |
| <span id="619"> 619</span> |
| <span id="620"> 620</span> |
| <span id="621"> 621</span> |
| <span id="622"> 622</span> |
| <span id="623"> 623</span> |
| <span id="624"> 624</span> |
| <span id="625"> 625</span> |
| <span id="626"> 626</span> |
| <span id="627"> 627</span> |
| <span id="628"> 628</span> |
| <span id="629"> 629</span> |
| <span id="630"> 630</span> |
| <span id="631"> 631</span> |
| <span id="632"> 632</span> |
| <span id="633"> 633</span> |
| <span id="634"> 634</span> |
| <span id="635"> 635</span> |
| <span id="636"> 636</span> |
| <span id="637"> 637</span> |
| <span id="638"> 638</span> |
| <span id="639"> 639</span> |
| <span id="640"> 640</span> |
| <span id="641"> 641</span> |
| <span id="642"> 642</span> |
| <span id="643"> 643</span> |
| <span id="644"> 644</span> |
| <span id="645"> 645</span> |
| <span id="646"> 646</span> |
| <span id="647"> 647</span> |
| <span id="648"> 648</span> |
| <span id="649"> 649</span> |
| <span id="650"> 650</span> |
| <span id="651"> 651</span> |
| <span id="652"> 652</span> |
| <span id="653"> 653</span> |
| <span id="654"> 654</span> |
| <span id="655"> 655</span> |
| <span id="656"> 656</span> |
| <span id="657"> 657</span> |
| <span id="658"> 658</span> |
| <span id="659"> 659</span> |
| <span id="660"> 660</span> |
| <span id="661"> 661</span> |
| <span id="662"> 662</span> |
| <span id="663"> 663</span> |
| <span id="664"> 664</span> |
| <span id="665"> 665</span> |
| <span id="666"> 666</span> |
| <span id="667"> 667</span> |
| <span id="668"> 668</span> |
| <span id="669"> 669</span> |
| <span id="670"> 670</span> |
| <span id="671"> 671</span> |
| <span id="672"> 672</span> |
| <span id="673"> 673</span> |
| <span id="674"> 674</span> |
| <span id="675"> 675</span> |
| <span id="676"> 676</span> |
| <span id="677"> 677</span> |
| <span id="678"> 678</span> |
| <span id="679"> 679</span> |
| <span id="680"> 680</span> |
| <span id="681"> 681</span> |
| <span id="682"> 682</span> |
| <span id="683"> 683</span> |
| <span id="684"> 684</span> |
| <span id="685"> 685</span> |
| <span id="686"> 686</span> |
| <span id="687"> 687</span> |
| <span id="688"> 688</span> |
| <span id="689"> 689</span> |
| <span id="690"> 690</span> |
| <span id="691"> 691</span> |
| <span id="692"> 692</span> |
| <span id="693"> 693</span> |
| <span id="694"> 694</span> |
| <span id="695"> 695</span> |
| <span id="696"> 696</span> |
| <span id="697"> 697</span> |
| <span id="698"> 698</span> |
| <span id="699"> 699</span> |
| <span id="700"> 700</span> |
| <span id="701"> 701</span> |
| <span id="702"> 702</span> |
| <span id="703"> 703</span> |
| <span id="704"> 704</span> |
| <span id="705"> 705</span> |
| <span id="706"> 706</span> |
| <span id="707"> 707</span> |
| <span id="708"> 708</span> |
| <span id="709"> 709</span> |
| <span id="710"> 710</span> |
| <span id="711"> 711</span> |
| <span id="712"> 712</span> |
| <span id="713"> 713</span> |
| <span id="714"> 714</span> |
| <span id="715"> 715</span> |
| <span id="716"> 716</span> |
| <span id="717"> 717</span> |
| <span id="718"> 718</span> |
| <span id="719"> 719</span> |
| <span id="720"> 720</span> |
| <span id="721"> 721</span> |
| <span id="722"> 722</span> |
| <span id="723"> 723</span> |
| <span id="724"> 724</span> |
| <span id="725"> 725</span> |
| <span id="726"> 726</span> |
| <span id="727"> 727</span> |
| <span id="728"> 728</span> |
| <span id="729"> 729</span> |
| <span id="730"> 730</span> |
| <span id="731"> 731</span> |
| <span id="732"> 732</span> |
| <span id="733"> 733</span> |
| <span id="734"> 734</span> |
| <span id="735"> 735</span> |
| <span id="736"> 736</span> |
| <span id="737"> 737</span> |
| <span id="738"> 738</span> |
| <span id="739"> 739</span> |
| <span id="740"> 740</span> |
| <span id="741"> 741</span> |
| <span id="742"> 742</span> |
| <span id="743"> 743</span> |
| <span id="744"> 744</span> |
| <span id="745"> 745</span> |
| <span id="746"> 746</span> |
| <span id="747"> 747</span> |
| <span id="748"> 748</span> |
| <span id="749"> 749</span> |
| <span id="750"> 750</span> |
| <span id="751"> 751</span> |
| <span id="752"> 752</span> |
| <span id="753"> 753</span> |
| <span id="754"> 754</span> |
| <span id="755"> 755</span> |
| <span id="756"> 756</span> |
| <span id="757"> 757</span> |
| <span id="758"> 758</span> |
| <span id="759"> 759</span> |
| <span id="760"> 760</span> |
| <span id="761"> 761</span> |
| <span id="762"> 762</span> |
| <span id="763"> 763</span> |
| <span id="764"> 764</span> |
| <span id="765"> 765</span> |
| <span id="766"> 766</span> |
| <span id="767"> 767</span> |
| <span id="768"> 768</span> |
| <span id="769"> 769</span> |
| <span id="770"> 770</span> |
| <span id="771"> 771</span> |
| <span id="772"> 772</span> |
| <span id="773"> 773</span> |
| <span id="774"> 774</span> |
| <span id="775"> 775</span> |
| <span id="776"> 776</span> |
| <span id="777"> 777</span> |
| <span id="778"> 778</span> |
| <span id="779"> 779</span> |
| <span id="780"> 780</span> |
| <span id="781"> 781</span> |
| <span id="782"> 782</span> |
| <span id="783"> 783</span> |
| <span id="784"> 784</span> |
| <span id="785"> 785</span> |
| <span id="786"> 786</span> |
| <span id="787"> 787</span> |
| <span id="788"> 788</span> |
| <span id="789"> 789</span> |
| <span id="790"> 790</span> |
| <span id="791"> 791</span> |
| <span id="792"> 792</span> |
| <span id="793"> 793</span> |
| <span id="794"> 794</span> |
| <span id="795"> 795</span> |
| <span id="796"> 796</span> |
| <span id="797"> 797</span> |
| <span id="798"> 798</span> |
| <span id="799"> 799</span> |
| <span id="800"> 800</span> |
| <span id="801"> 801</span> |
| <span id="802"> 802</span> |
| <span id="803"> 803</span> |
| <span id="804"> 804</span> |
| <span id="805"> 805</span> |
| <span id="806"> 806</span> |
| <span id="807"> 807</span> |
| <span id="808"> 808</span> |
| <span id="809"> 809</span> |
| <span id="810"> 810</span> |
| <span id="811"> 811</span> |
| <span id="812"> 812</span> |
| <span id="813"> 813</span> |
| <span id="814"> 814</span> |
| <span id="815"> 815</span> |
| <span id="816"> 816</span> |
| <span id="817"> 817</span> |
| <span id="818"> 818</span> |
| <span id="819"> 819</span> |
| <span id="820"> 820</span> |
| <span id="821"> 821</span> |
| <span id="822"> 822</span> |
| <span id="823"> 823</span> |
| <span id="824"> 824</span> |
| <span id="825"> 825</span> |
| <span id="826"> 826</span> |
| <span id="827"> 827</span> |
| <span id="828"> 828</span> |
| <span id="829"> 829</span> |
| <span id="830"> 830</span> |
| <span id="831"> 831</span> |
| <span id="832"> 832</span> |
| <span id="833"> 833</span> |
| <span id="834"> 834</span> |
| <span id="835"> 835</span> |
| <span id="836"> 836</span> |
| <span id="837"> 837</span> |
| <span id="838"> 838</span> |
| <span id="839"> 839</span> |
| <span id="840"> 840</span> |
| <span id="841"> 841</span> |
| <span id="842"> 842</span> |
| <span id="843"> 843</span> |
| <span id="844"> 844</span> |
| <span id="845"> 845</span> |
| <span id="846"> 846</span> |
| <span id="847"> 847</span> |
| <span id="848"> 848</span> |
| <span id="849"> 849</span> |
| <span id="850"> 850</span> |
| <span id="851"> 851</span> |
| <span id="852"> 852</span> |
| <span id="853"> 853</span> |
| <span id="854"> 854</span> |
| <span id="855"> 855</span> |
| <span id="856"> 856</span> |
| <span id="857"> 857</span> |
| <span id="858"> 858</span> |
| <span id="859"> 859</span> |
| <span id="860"> 860</span> |
| <span id="861"> 861</span> |
| <span id="862"> 862</span> |
| <span id="863"> 863</span> |
| <span id="864"> 864</span> |
| <span id="865"> 865</span> |
| <span id="866"> 866</span> |
| <span id="867"> 867</span> |
| <span id="868"> 868</span> |
| <span id="869"> 869</span> |
| <span id="870"> 870</span> |
| <span id="871"> 871</span> |
| <span id="872"> 872</span> |
| <span id="873"> 873</span> |
| <span id="874"> 874</span> |
| <span id="875"> 875</span> |
| <span id="876"> 876</span> |
| <span id="877"> 877</span> |
| <span id="878"> 878</span> |
| <span id="879"> 879</span> |
| <span id="880"> 880</span> |
| <span id="881"> 881</span> |
| <span id="882"> 882</span> |
| <span id="883"> 883</span> |
| <span id="884"> 884</span> |
| <span id="885"> 885</span> |
| <span id="886"> 886</span> |
| <span id="887"> 887</span> |
| <span id="888"> 888</span> |
| <span id="889"> 889</span> |
| <span id="890"> 890</span> |
| <span id="891"> 891</span> |
| <span id="892"> 892</span> |
| <span id="893"> 893</span> |
| <span id="894"> 894</span> |
| <span id="895"> 895</span> |
| <span id="896"> 896</span> |
| <span id="897"> 897</span> |
| <span id="898"> 898</span> |
| <span id="899"> 899</span> |
| <span id="900"> 900</span> |
| <span id="901"> 901</span> |
| <span id="902"> 902</span> |
| <span id="903"> 903</span> |
| <span id="904"> 904</span> |
| <span id="905"> 905</span> |
| <span id="906"> 906</span> |
| <span id="907"> 907</span> |
| <span id="908"> 908</span> |
| <span id="909"> 909</span> |
| <span id="910"> 910</span> |
| <span id="911"> 911</span> |
| <span id="912"> 912</span> |
| <span id="913"> 913</span> |
| <span id="914"> 914</span> |
| <span id="915"> 915</span> |
| <span id="916"> 916</span> |
| <span id="917"> 917</span> |
| <span id="918"> 918</span> |
| <span id="919"> 919</span> |
| <span id="920"> 920</span> |
| <span id="921"> 921</span> |
| <span id="922"> 922</span> |
| <span id="923"> 923</span> |
| <span id="924"> 924</span> |
| <span id="925"> 925</span> |
| <span id="926"> 926</span> |
| <span id="927"> 927</span> |
| <span id="928"> 928</span> |
| <span id="929"> 929</span> |
| <span id="930"> 930</span> |
| <span id="931"> 931</span> |
| <span id="932"> 932</span> |
| <span id="933"> 933</span> |
| <span id="934"> 934</span> |
| <span id="935"> 935</span> |
| <span id="936"> 936</span> |
| <span id="937"> 937</span> |
| <span id="938"> 938</span> |
| <span id="939"> 939</span> |
| <span id="940"> 940</span> |
| <span id="941"> 941</span> |
| <span id="942"> 942</span> |
| <span id="943"> 943</span> |
| <span id="944"> 944</span> |
| <span id="945"> 945</span> |
| <span id="946"> 946</span> |
| <span id="947"> 947</span> |
| <span id="948"> 948</span> |
| <span id="949"> 949</span> |
| <span id="950"> 950</span> |
| <span id="951"> 951</span> |
| <span id="952"> 952</span> |
| <span id="953"> 953</span> |
| <span id="954"> 954</span> |
| <span id="955"> 955</span> |
| <span id="956"> 956</span> |
| <span id="957"> 957</span> |
| <span id="958"> 958</span> |
| <span id="959"> 959</span> |
| <span id="960"> 960</span> |
| <span id="961"> 961</span> |
| <span id="962"> 962</span> |
| <span id="963"> 963</span> |
| <span id="964"> 964</span> |
| <span id="965"> 965</span> |
| <span id="966"> 966</span> |
| <span id="967"> 967</span> |
| <span id="968"> 968</span> |
| <span id="969"> 969</span> |
| <span id="970"> 970</span> |
| <span id="971"> 971</span> |
| <span id="972"> 972</span> |
| <span id="973"> 973</span> |
| <span id="974"> 974</span> |
| <span id="975"> 975</span> |
| <span id="976"> 976</span> |
| <span id="977"> 977</span> |
| <span id="978"> 978</span> |
| <span id="979"> 979</span> |
| <span id="980"> 980</span> |
| <span id="981"> 981</span> |
| <span id="982"> 982</span> |
| <span id="983"> 983</span> |
| <span id="984"> 984</span> |
| <span id="985"> 985</span> |
| <span id="986"> 986</span> |
| <span id="987"> 987</span> |
| <span id="988"> 988</span> |
| <span id="989"> 989</span> |
| <span id="990"> 990</span> |
| <span id="991"> 991</span> |
| <span id="992"> 992</span> |
| <span id="993"> 993</span> |
| <span id="994"> 994</span> |
| <span id="995"> 995</span> |
| <span id="996"> 996</span> |
| <span id="997"> 997</span> |
| <span id="998"> 998</span> |
| <span id="999"> 999</span> |
| <span id="1000">1000</span> |
| <span id="1001">1001</span> |
| <span id="1002">1002</span> |
| <span id="1003">1003</span> |
| <span id="1004">1004</span> |
| <span id="1005">1005</span> |
| <span id="1006">1006</span> |
| <span id="1007">1007</span> |
| <span id="1008">1008</span> |
| <span id="1009">1009</span> |
| <span id="1010">1010</span> |
| <span id="1011">1011</span> |
| <span id="1012">1012</span> |
| <span id="1013">1013</span> |
| <span id="1014">1014</span> |
| <span id="1015">1015</span> |
| <span id="1016">1016</span> |
| <span id="1017">1017</span> |
| <span id="1018">1018</span> |
| <span id="1019">1019</span> |
| <span id="1020">1020</span> |
| <span id="1021">1021</span> |
| <span id="1022">1022</span> |
| <span id="1023">1023</span> |
| <span id="1024">1024</span> |
| <span id="1025">1025</span> |
| <span id="1026">1026</span> |
| <span id="1027">1027</span> |
| <span id="1028">1028</span> |
| <span id="1029">1029</span> |
| <span id="1030">1030</span> |
| <span id="1031">1031</span> |
| <span id="1032">1032</span> |
| <span id="1033">1033</span> |
| <span id="1034">1034</span> |
| <span id="1035">1035</span> |
| <span id="1036">1036</span> |
| <span id="1037">1037</span> |
| <span id="1038">1038</span> |
| <span id="1039">1039</span> |
| <span id="1040">1040</span> |
| <span id="1041">1041</span> |
| <span id="1042">1042</span> |
| <span id="1043">1043</span> |
| <span id="1044">1044</span> |
| <span id="1045">1045</span> |
| <span id="1046">1046</span> |
| <span id="1047">1047</span> |
| <span id="1048">1048</span> |
| <span id="1049">1049</span> |
| <span id="1050">1050</span> |
| <span id="1051">1051</span> |
| <span id="1052">1052</span> |
| <span id="1053">1053</span> |
| <span id="1054">1054</span> |
| <span id="1055">1055</span> |
| <span id="1056">1056</span> |
| <span id="1057">1057</span> |
| <span id="1058">1058</span> |
| <span id="1059">1059</span> |
| <span id="1060">1060</span> |
| <span id="1061">1061</span> |
| <span id="1062">1062</span> |
| <span id="1063">1063</span> |
| <span id="1064">1064</span> |
| <span id="1065">1065</span> |
| <span id="1066">1066</span> |
| <span id="1067">1067</span> |
| <span id="1068">1068</span> |
| <span id="1069">1069</span> |
| <span id="1070">1070</span> |
| <span id="1071">1071</span> |
| <span id="1072">1072</span> |
| <span id="1073">1073</span> |
| <span id="1074">1074</span> |
| <span id="1075">1075</span> |
| <span id="1076">1076</span> |
| <span id="1077">1077</span> |
| <span id="1078">1078</span> |
| <span id="1079">1079</span> |
| <span id="1080">1080</span> |
| <span id="1081">1081</span> |
| <span id="1082">1082</span> |
| <span id="1083">1083</span> |
| <span id="1084">1084</span> |
| <span id="1085">1085</span> |
| <span id="1086">1086</span> |
| <span id="1087">1087</span> |
| <span id="1088">1088</span> |
| <span id="1089">1089</span> |
| <span id="1090">1090</span> |
| <span id="1091">1091</span> |
| <span id="1092">1092</span> |
| <span id="1093">1093</span> |
| <span id="1094">1094</span> |
| <span id="1095">1095</span> |
| <span id="1096">1096</span> |
| <span id="1097">1097</span> |
| <span id="1098">1098</span> |
| <span id="1099">1099</span> |
| <span id="1100">1100</span> |
| <span id="1101">1101</span> |
| <span id="1102">1102</span> |
| <span id="1103">1103</span> |
| <span id="1104">1104</span> |
| <span id="1105">1105</span> |
| <span id="1106">1106</span> |
| <span id="1107">1107</span> |
| <span id="1108">1108</span> |
| <span id="1109">1109</span> |
| <span id="1110">1110</span> |
| <span id="1111">1111</span> |
| <span id="1112">1112</span> |
| <span id="1113">1113</span> |
| <span id="1114">1114</span> |
| <span id="1115">1115</span> |
| <span id="1116">1116</span> |
| <span id="1117">1117</span> |
| <span id="1118">1118</span> |
| <span id="1119">1119</span> |
| <span id="1120">1120</span> |
| <span id="1121">1121</span> |
| <span id="1122">1122</span> |
| <span id="1123">1123</span> |
| <span id="1124">1124</span> |
| <span id="1125">1125</span> |
| <span id="1126">1126</span> |
| <span id="1127">1127</span> |
| <span id="1128">1128</span> |
| <span id="1129">1129</span> |
| <span id="1130">1130</span> |
| <span id="1131">1131</span> |
| <span id="1132">1132</span> |
| <span id="1133">1133</span> |
| <span id="1134">1134</span> |
| <span id="1135">1135</span> |
| <span id="1136">1136</span> |
| <span id="1137">1137</span> |
| <span id="1138">1138</span> |
| <span id="1139">1139</span> |
| <span id="1140">1140</span> |
| <span id="1141">1141</span> |
| <span id="1142">1142</span> |
| <span id="1143">1143</span> |
| <span id="1144">1144</span> |
| <span id="1145">1145</span> |
| <span id="1146">1146</span> |
| <span id="1147">1147</span> |
| <span id="1148">1148</span> |
| <span id="1149">1149</span> |
| <span id="1150">1150</span> |
| <span id="1151">1151</span> |
| <span id="1152">1152</span> |
| <span id="1153">1153</span> |
| <span id="1154">1154</span> |
| <span id="1155">1155</span> |
| <span id="1156">1156</span> |
| <span id="1157">1157</span> |
| <span id="1158">1158</span> |
| <span id="1159">1159</span> |
| <span id="1160">1160</span> |
| <span id="1161">1161</span> |
| <span id="1162">1162</span> |
| <span id="1163">1163</span> |
| <span id="1164">1164</span> |
| <span id="1165">1165</span> |
| <span id="1166">1166</span> |
| <span id="1167">1167</span> |
| <span id="1168">1168</span> |
| <span id="1169">1169</span> |
| <span id="1170">1170</span> |
| <span id="1171">1171</span> |
| <span id="1172">1172</span> |
| <span id="1173">1173</span> |
| <span id="1174">1174</span> |
| <span id="1175">1175</span> |
| <span id="1176">1176</span> |
| <span id="1177">1177</span> |
| <span id="1178">1178</span> |
| <span id="1179">1179</span> |
| <span id="1180">1180</span> |
| <span id="1181">1181</span> |
| <span id="1182">1182</span> |
| <span id="1183">1183</span> |
| <span id="1184">1184</span> |
| <span id="1185">1185</span> |
| <span id="1186">1186</span> |
| <span id="1187">1187</span> |
| <span id="1188">1188</span> |
| <span id="1189">1189</span> |
| <span id="1190">1190</span> |
| <span id="1191">1191</span> |
| <span id="1192">1192</span> |
| <span id="1193">1193</span> |
| <span id="1194">1194</span> |
| <span id="1195">1195</span> |
| <span id="1196">1196</span> |
| <span id="1197">1197</span> |
| <span id="1198">1198</span> |
| <span id="1199">1199</span> |
| <span id="1200">1200</span> |
| <span id="1201">1201</span> |
| <span id="1202">1202</span> |
| <span id="1203">1203</span> |
| <span id="1204">1204</span> |
| <span id="1205">1205</span> |
| <span id="1206">1206</span> |
| <span id="1207">1207</span> |
| <span id="1208">1208</span> |
| <span id="1209">1209</span> |
| <span id="1210">1210</span> |
| <span id="1211">1211</span> |
| <span id="1212">1212</span> |
| <span id="1213">1213</span> |
| <span id="1214">1214</span> |
| <span id="1215">1215</span> |
| <span id="1216">1216</span> |
| <span id="1217">1217</span> |
| <span id="1218">1218</span> |
| <span id="1219">1219</span> |
| <span id="1220">1220</span> |
| <span id="1221">1221</span> |
| <span id="1222">1222</span> |
| <span id="1223">1223</span> |
| <span id="1224">1224</span> |
| <span id="1225">1225</span> |
| <span id="1226">1226</span> |
| <span id="1227">1227</span> |
| <span id="1228">1228</span> |
| <span id="1229">1229</span> |
| <span id="1230">1230</span> |
| <span id="1231">1231</span> |
| <span id="1232">1232</span> |
| <span id="1233">1233</span> |
| <span id="1234">1234</span> |
| <span id="1235">1235</span> |
| <span id="1236">1236</span> |
| <span id="1237">1237</span> |
| <span id="1238">1238</span> |
| <span id="1239">1239</span> |
| <span id="1240">1240</span> |
| <span id="1241">1241</span> |
| <span id="1242">1242</span> |
| <span id="1243">1243</span> |
| <span id="1244">1244</span> |
| <span id="1245">1245</span> |
| <span id="1246">1246</span> |
| <span id="1247">1247</span> |
| <span id="1248">1248</span> |
| <span id="1249">1249</span> |
| <span id="1250">1250</span> |
| <span id="1251">1251</span> |
| <span id="1252">1252</span> |
| <span id="1253">1253</span> |
| <span id="1254">1254</span> |
| <span id="1255">1255</span> |
| <span id="1256">1256</span> |
| <span id="1257">1257</span> |
| <span id="1258">1258</span> |
| <span id="1259">1259</span> |
| <span id="1260">1260</span> |
| <span id="1261">1261</span> |
| <span id="1262">1262</span> |
| <span id="1263">1263</span> |
| <span id="1264">1264</span> |
| <span id="1265">1265</span> |
| <span id="1266">1266</span> |
| <span id="1267">1267</span> |
| <span id="1268">1268</span> |
| <span id="1269">1269</span> |
| <span id="1270">1270</span> |
| <span id="1271">1271</span> |
| <span id="1272">1272</span> |
| <span id="1273">1273</span> |
| <span id="1274">1274</span> |
| <span id="1275">1275</span> |
| <span id="1276">1276</span> |
| <span id="1277">1277</span> |
| <span id="1278">1278</span> |
| <span id="1279">1279</span> |
| <span id="1280">1280</span> |
| <span id="1281">1281</span> |
| <span id="1282">1282</span> |
| <span id="1283">1283</span> |
| <span id="1284">1284</span> |
| <span id="1285">1285</span> |
| <span id="1286">1286</span> |
| <span id="1287">1287</span> |
| <span id="1288">1288</span> |
| <span id="1289">1289</span> |
| <span id="1290">1290</span> |
| <span id="1291">1291</span> |
| <span id="1292">1292</span> |
| <span id="1293">1293</span> |
| <span id="1294">1294</span> |
| <span id="1295">1295</span> |
| <span id="1296">1296</span> |
| <span id="1297">1297</span> |
| <span id="1298">1298</span> |
| <span id="1299">1299</span> |
| <span id="1300">1300</span> |
| <span id="1301">1301</span> |
| <span id="1302">1302</span> |
| <span id="1303">1303</span> |
| <span id="1304">1304</span> |
| <span id="1305">1305</span> |
| <span id="1306">1306</span> |
| <span id="1307">1307</span> |
| <span id="1308">1308</span> |
| <span id="1309">1309</span> |
| <span id="1310">1310</span> |
| <span id="1311">1311</span> |
| <span id="1312">1312</span> |
| <span id="1313">1313</span> |
| <span id="1314">1314</span> |
| <span id="1315">1315</span> |
| <span id="1316">1316</span> |
| <span id="1317">1317</span> |
| <span id="1318">1318</span> |
| <span id="1319">1319</span> |
| <span id="1320">1320</span> |
| <span id="1321">1321</span> |
| <span id="1322">1322</span> |
| <span id="1323">1323</span> |
| <span id="1324">1324</span> |
| <span id="1325">1325</span> |
| <span id="1326">1326</span> |
| <span id="1327">1327</span> |
| <span id="1328">1328</span> |
| <span id="1329">1329</span> |
| <span id="1330">1330</span> |
| <span id="1331">1331</span> |
| <span id="1332">1332</span> |
| <span id="1333">1333</span> |
| <span id="1334">1334</span> |
| <span id="1335">1335</span> |
| <span id="1336">1336</span> |
| <span id="1337">1337</span> |
| <span id="1338">1338</span> |
| <span id="1339">1339</span> |
| <span id="1340">1340</span> |
| <span id="1341">1341</span> |
| <span id="1342">1342</span> |
| <span id="1343">1343</span> |
| <span id="1344">1344</span> |
| <span id="1345">1345</span> |
| <span id="1346">1346</span> |
| <span id="1347">1347</span> |
| <span id="1348">1348</span> |
| <span id="1349">1349</span> |
| <span id="1350">1350</span> |
| <span id="1351">1351</span> |
| <span id="1352">1352</span> |
| <span id="1353">1353</span> |
| <span id="1354">1354</span> |
| <span id="1355">1355</span> |
| <span id="1356">1356</span> |
| <span id="1357">1357</span> |
| <span id="1358">1358</span> |
| <span id="1359">1359</span> |
| <span id="1360">1360</span> |
| <span id="1361">1361</span> |
| <span id="1362">1362</span> |
| <span id="1363">1363</span> |
| <span id="1364">1364</span> |
| <span id="1365">1365</span> |
| <span id="1366">1366</span> |
| <span id="1367">1367</span> |
| <span id="1368">1368</span> |
| <span id="1369">1369</span> |
| <span id="1370">1370</span> |
| <span id="1371">1371</span> |
| <span id="1372">1372</span> |
| <span id="1373">1373</span> |
| <span id="1374">1374</span> |
| <span id="1375">1375</span> |
| <span id="1376">1376</span> |
| <span id="1377">1377</span> |
| <span id="1378">1378</span> |
| <span id="1379">1379</span> |
| <span id="1380">1380</span> |
| <span id="1381">1381</span> |
| <span id="1382">1382</span> |
| <span id="1383">1383</span> |
| <span id="1384">1384</span> |
| <span id="1385">1385</span> |
| <span id="1386">1386</span> |
| <span id="1387">1387</span> |
| <span id="1388">1388</span> |
| <span id="1389">1389</span> |
| <span id="1390">1390</span> |
| <span id="1391">1391</span> |
| <span id="1392">1392</span> |
| <span id="1393">1393</span> |
| <span id="1394">1394</span> |
| <span id="1395">1395</span> |
| <span id="1396">1396</span> |
| <span id="1397">1397</span> |
| <span id="1398">1398</span> |
| <span id="1399">1399</span> |
| <span id="1400">1400</span> |
| <span id="1401">1401</span> |
| <span id="1402">1402</span> |
| <span id="1403">1403</span> |
| <span id="1404">1404</span> |
| <span id="1405">1405</span> |
| <span id="1406">1406</span> |
| <span id="1407">1407</span> |
| <span id="1408">1408</span> |
| <span id="1409">1409</span> |
| <span id="1410">1410</span> |
| <span id="1411">1411</span> |
| <span id="1412">1412</span> |
| <span id="1413">1413</span> |
| <span id="1414">1414</span> |
| <span id="1415">1415</span> |
| <span id="1416">1416</span> |
| <span id="1417">1417</span> |
| <span id="1418">1418</span> |
| <span id="1419">1419</span> |
| <span id="1420">1420</span> |
| <span id="1421">1421</span> |
| <span id="1422">1422</span> |
| <span id="1423">1423</span> |
| <span id="1424">1424</span> |
| <span id="1425">1425</span> |
| <span id="1426">1426</span> |
| <span id="1427">1427</span> |
| <span id="1428">1428</span> |
| <span id="1429">1429</span> |
| <span id="1430">1430</span> |
| <span id="1431">1431</span> |
| <span id="1432">1432</span> |
| <span id="1433">1433</span> |
| <span id="1434">1434</span> |
| <span id="1435">1435</span> |
| <span id="1436">1436</span> |
| <span id="1437">1437</span> |
| <span id="1438">1438</span> |
| <span id="1439">1439</span> |
| <span id="1440">1440</span> |
| <span id="1441">1441</span> |
| <span id="1442">1442</span> |
| <span id="1443">1443</span> |
| <span id="1444">1444</span> |
| <span id="1445">1445</span> |
| <span id="1446">1446</span> |
| <span id="1447">1447</span> |
| <span id="1448">1448</span> |
| <span id="1449">1449</span> |
| <span id="1450">1450</span> |
| <span id="1451">1451</span> |
| <span id="1452">1452</span> |
| <span id="1453">1453</span> |
| <span id="1454">1454</span> |
| <span id="1455">1455</span> |
| <span id="1456">1456</span> |
| <span id="1457">1457</span> |
| <span id="1458">1458</span> |
| <span id="1459">1459</span> |
| <span id="1460">1460</span> |
| <span id="1461">1461</span> |
| <span id="1462">1462</span> |
| <span id="1463">1463</span> |
| <span id="1464">1464</span> |
| <span id="1465">1465</span> |
| <span id="1466">1466</span> |
| <span id="1467">1467</span> |
| <span id="1468">1468</span> |
| <span id="1469">1469</span> |
| <span id="1470">1470</span> |
| <span id="1471">1471</span> |
| <span id="1472">1472</span> |
| <span id="1473">1473</span> |
| <span id="1474">1474</span> |
| <span id="1475">1475</span> |
| <span id="1476">1476</span> |
| <span id="1477">1477</span> |
| <span id="1478">1478</span> |
| <span id="1479">1479</span> |
| <span id="1480">1480</span> |
| <span id="1481">1481</span> |
| <span id="1482">1482</span> |
| <span id="1483">1483</span> |
| <span id="1484">1484</span> |
| <span id="1485">1485</span> |
| <span id="1486">1486</span> |
| <span id="1487">1487</span> |
| <span id="1488">1488</span> |
| <span id="1489">1489</span> |
| <span id="1490">1490</span> |
| <span id="1491">1491</span> |
| <span id="1492">1492</span> |
| <span id="1493">1493</span> |
| <span id="1494">1494</span> |
| <span id="1495">1495</span> |
| <span id="1496">1496</span> |
| <span id="1497">1497</span> |
| <span id="1498">1498</span> |
| <span id="1499">1499</span> |
| <span id="1500">1500</span> |
| <span id="1501">1501</span> |
| <span id="1502">1502</span> |
| <span id="1503">1503</span> |
| <span id="1504">1504</span> |
| <span id="1505">1505</span> |
| <span id="1506">1506</span> |
| <span id="1507">1507</span> |
| <span id="1508">1508</span> |
| <span id="1509">1509</span> |
| <span id="1510">1510</span> |
| <span id="1511">1511</span> |
| <span id="1512">1512</span> |
| <span id="1513">1513</span> |
| <span id="1514">1514</span> |
| <span id="1515">1515</span> |
| <span id="1516">1516</span> |
| <span id="1517">1517</span> |
| <span id="1518">1518</span> |
| <span id="1519">1519</span> |
| <span id="1520">1520</span> |
| <span id="1521">1521</span> |
| <span id="1522">1522</span> |
| <span id="1523">1523</span> |
| <span id="1524">1524</span> |
| <span id="1525">1525</span> |
| <span id="1526">1526</span> |
| <span id="1527">1527</span> |
| <span id="1528">1528</span> |
| <span id="1529">1529</span> |
| <span id="1530">1530</span> |
| <span id="1531">1531</span> |
| <span id="1532">1532</span> |
| <span id="1533">1533</span> |
| <span id="1534">1534</span> |
| <span id="1535">1535</span> |
| <span id="1536">1536</span> |
| <span id="1537">1537</span> |
| <span id="1538">1538</span> |
| <span id="1539">1539</span> |
| <span id="1540">1540</span> |
| <span id="1541">1541</span> |
| <span id="1542">1542</span> |
| <span id="1543">1543</span> |
| <span id="1544">1544</span> |
| <span id="1545">1545</span> |
| <span id="1546">1546</span> |
| <span id="1547">1547</span> |
| <span id="1548">1548</span> |
| <span id="1549">1549</span> |
| <span id="1550">1550</span> |
| <span id="1551">1551</span> |
| <span id="1552">1552</span> |
| <span id="1553">1553</span> |
| <span id="1554">1554</span> |
| <span id="1555">1555</span> |
| <span id="1556">1556</span> |
| <span id="1557">1557</span> |
| <span id="1558">1558</span> |
| <span id="1559">1559</span> |
| <span id="1560">1560</span> |
| <span id="1561">1561</span> |
| <span id="1562">1562</span> |
| <span id="1563">1563</span> |
| <span id="1564">1564</span> |
| <span id="1565">1565</span> |
| <span id="1566">1566</span> |
| <span id="1567">1567</span> |
| <span id="1568">1568</span> |
| <span id="1569">1569</span> |
| <span id="1570">1570</span> |
| <span id="1571">1571</span> |
| <span id="1572">1572</span> |
| <span id="1573">1573</span> |
| <span id="1574">1574</span> |
| <span id="1575">1575</span> |
| <span id="1576">1576</span> |
| <span id="1577">1577</span> |
| <span id="1578">1578</span> |
| <span id="1579">1579</span> |
| <span id="1580">1580</span> |
| <span id="1581">1581</span> |
| <span id="1582">1582</span> |
| <span id="1583">1583</span> |
| <span id="1584">1584</span> |
| <span id="1585">1585</span> |
| <span id="1586">1586</span> |
| <span id="1587">1587</span> |
| <span id="1588">1588</span> |
| <span id="1589">1589</span> |
| <span id="1590">1590</span> |
| <span id="1591">1591</span> |
| <span id="1592">1592</span> |
| <span id="1593">1593</span> |
| <span id="1594">1594</span> |
| <span id="1595">1595</span> |
| <span id="1596">1596</span> |
| <span id="1597">1597</span> |
| <span id="1598">1598</span> |
| <span id="1599">1599</span> |
| <span id="1600">1600</span> |
| <span id="1601">1601</span> |
| <span id="1602">1602</span> |
| <span id="1603">1603</span> |
| <span id="1604">1604</span> |
| <span id="1605">1605</span> |
| <span id="1606">1606</span> |
| <span id="1607">1607</span> |
| <span id="1608">1608</span> |
| <span id="1609">1609</span> |
| <span id="1610">1610</span> |
| <span id="1611">1611</span> |
| <span id="1612">1612</span> |
| <span id="1613">1613</span> |
| <span id="1614">1614</span> |
| <span id="1615">1615</span> |
| <span id="1616">1616</span> |
| <span id="1617">1617</span> |
| <span id="1618">1618</span> |
| <span id="1619">1619</span> |
| <span id="1620">1620</span> |
| <span id="1621">1621</span> |
| <span id="1622">1622</span> |
| <span id="1623">1623</span> |
| <span id="1624">1624</span> |
| <span id="1625">1625</span> |
| <span id="1626">1626</span> |
| <span id="1627">1627</span> |
| <span id="1628">1628</span> |
| <span id="1629">1629</span> |
| <span id="1630">1630</span> |
| <span id="1631">1631</span> |
| <span id="1632">1632</span> |
| <span id="1633">1633</span> |
| <span id="1634">1634</span> |
| <span id="1635">1635</span> |
| <span id="1636">1636</span> |
| <span id="1637">1637</span> |
| <span id="1638">1638</span> |
| <span id="1639">1639</span> |
| <span id="1640">1640</span> |
| <span id="1641">1641</span> |
| <span id="1642">1642</span> |
| <span id="1643">1643</span> |
| <span id="1644">1644</span> |
| <span id="1645">1645</span> |
| <span id="1646">1646</span> |
| <span id="1647">1647</span> |
| <span id="1648">1648</span> |
| <span id="1649">1649</span> |
| <span id="1650">1650</span> |
| <span id="1651">1651</span> |
| <span id="1652">1652</span> |
| <span id="1653">1653</span> |
| <span id="1654">1654</span> |
| <span id="1655">1655</span> |
| <span id="1656">1656</span> |
| <span id="1657">1657</span> |
| <span id="1658">1658</span> |
| <span id="1659">1659</span> |
| <span id="1660">1660</span> |
| <span id="1661">1661</span> |
| <span id="1662">1662</span> |
| <span id="1663">1663</span> |
| <span id="1664">1664</span> |
| <span id="1665">1665</span> |
| <span id="1666">1666</span> |
| <span id="1667">1667</span> |
| <span id="1668">1668</span> |
| <span id="1669">1669</span> |
| <span id="1670">1670</span> |
| <span id="1671">1671</span> |
| <span id="1672">1672</span> |
| <span id="1673">1673</span> |
| <span id="1674">1674</span> |
| <span id="1675">1675</span> |
| <span id="1676">1676</span> |
| <span id="1677">1677</span> |
| <span id="1678">1678</span> |
| <span id="1679">1679</span> |
| <span id="1680">1680</span> |
| <span id="1681">1681</span> |
| <span id="1682">1682</span> |
| <span id="1683">1683</span> |
| <span id="1684">1684</span> |
| <span id="1685">1685</span> |
| <span id="1686">1686</span> |
| <span id="1687">1687</span> |
| <span id="1688">1688</span> |
| <span id="1689">1689</span> |
| <span id="1690">1690</span> |
| <span id="1691">1691</span> |
| <span id="1692">1692</span> |
| <span id="1693">1693</span> |
| <span id="1694">1694</span> |
| <span id="1695">1695</span> |
| <span id="1696">1696</span> |
| <span id="1697">1697</span> |
| <span id="1698">1698</span> |
| <span id="1699">1699</span> |
| <span id="1700">1700</span> |
| <span id="1701">1701</span> |
| <span id="1702">1702</span> |
| <span id="1703">1703</span> |
| <span id="1704">1704</span> |
| <span id="1705">1705</span> |
| <span id="1706">1706</span> |
| <span id="1707">1707</span> |
| <span id="1708">1708</span> |
| <span id="1709">1709</span> |
| <span id="1710">1710</span> |
| <span id="1711">1711</span> |
| <span id="1712">1712</span> |
| <span id="1713">1713</span> |
| <span id="1714">1714</span> |
| <span id="1715">1715</span> |
| <span id="1716">1716</span> |
| <span id="1717">1717</span> |
| <span id="1718">1718</span> |
| <span id="1719">1719</span> |
| <span id="1720">1720</span> |
| <span id="1721">1721</span> |
| <span id="1722">1722</span> |
| <span id="1723">1723</span> |
| <span id="1724">1724</span> |
| <span id="1725">1725</span> |
| <span id="1726">1726</span> |
| <span id="1727">1727</span> |
| <span id="1728">1728</span> |
| <span id="1729">1729</span> |
| <span id="1730">1730</span> |
| <span id="1731">1731</span> |
| <span id="1732">1732</span> |
| <span id="1733">1733</span> |
| <span id="1734">1734</span> |
| <span id="1735">1735</span> |
| <span id="1736">1736</span> |
| <span id="1737">1737</span> |
| <span id="1738">1738</span> |
| <span id="1739">1739</span> |
| <span id="1740">1740</span> |
| <span id="1741">1741</span> |
| <span id="1742">1742</span> |
| <span id="1743">1743</span> |
| <span id="1744">1744</span> |
| <span id="1745">1745</span> |
| <span id="1746">1746</span> |
| <span id="1747">1747</span> |
| <span id="1748">1748</span> |
| <span id="1749">1749</span> |
| <span id="1750">1750</span> |
| <span id="1751">1751</span> |
| <span id="1752">1752</span> |
| <span id="1753">1753</span> |
| <span id="1754">1754</span> |
| <span id="1755">1755</span> |
| <span id="1756">1756</span> |
| <span id="1757">1757</span> |
| <span id="1758">1758</span> |
| <span id="1759">1759</span> |
| <span id="1760">1760</span> |
| <span id="1761">1761</span> |
| <span id="1762">1762</span> |
| <span id="1763">1763</span> |
| <span id="1764">1764</span> |
| <span id="1765">1765</span> |
| <span id="1766">1766</span> |
| <span id="1767">1767</span> |
| <span id="1768">1768</span> |
| <span id="1769">1769</span> |
| <span id="1770">1770</span> |
| <span id="1771">1771</span> |
| <span id="1772">1772</span> |
| <span id="1773">1773</span> |
| <span id="1774">1774</span> |
| <span id="1775">1775</span> |
| <span id="1776">1776</span> |
| <span id="1777">1777</span> |
| <span id="1778">1778</span> |
| <span id="1779">1779</span> |
| <span id="1780">1780</span> |
| <span id="1781">1781</span> |
| <span id="1782">1782</span> |
| <span id="1783">1783</span> |
| <span id="1784">1784</span> |
| <span id="1785">1785</span> |
| <span id="1786">1786</span> |
| <span id="1787">1787</span> |
| <span id="1788">1788</span> |
| <span id="1789">1789</span> |
| <span id="1790">1790</span> |
| <span id="1791">1791</span> |
| <span id="1792">1792</span> |
| <span id="1793">1793</span> |
| <span id="1794">1794</span> |
| <span id="1795">1795</span> |
| <span id="1796">1796</span> |
| <span id="1797">1797</span> |
| <span id="1798">1798</span> |
| <span id="1799">1799</span> |
| <span id="1800">1800</span> |
| <span id="1801">1801</span> |
| <span id="1802">1802</span> |
| <span id="1803">1803</span> |
| <span id="1804">1804</span> |
| <span id="1805">1805</span> |
| <span id="1806">1806</span> |
| <span id="1807">1807</span> |
| <span id="1808">1808</span> |
| <span id="1809">1809</span> |
| <span id="1810">1810</span> |
| <span id="1811">1811</span> |
| <span id="1812">1812</span> |
| <span id="1813">1813</span> |
| <span id="1814">1814</span> |
| <span id="1815">1815</span> |
| <span id="1816">1816</span> |
| <span id="1817">1817</span> |
| <span id="1818">1818</span> |
| <span id="1819">1819</span> |
| <span id="1820">1820</span> |
| <span id="1821">1821</span> |
| <span id="1822">1822</span> |
| <span id="1823">1823</span> |
| <span id="1824">1824</span> |
| <span id="1825">1825</span> |
| <span id="1826">1826</span> |
| <span id="1827">1827</span> |
| <span id="1828">1828</span> |
| <span id="1829">1829</span> |
| <span id="1830">1830</span> |
| <span id="1831">1831</span> |
| <span id="1832">1832</span> |
| <span id="1833">1833</span> |
| <span id="1834">1834</span> |
| <span id="1835">1835</span> |
| <span id="1836">1836</span> |
| <span id="1837">1837</span> |
| <span id="1838">1838</span> |
| <span id="1839">1839</span> |
| <span id="1840">1840</span> |
| <span id="1841">1841</span> |
| <span id="1842">1842</span> |
| <span id="1843">1843</span> |
| <span id="1844">1844</span> |
| <span id="1845">1845</span> |
| <span id="1846">1846</span> |
| <span id="1847">1847</span> |
| <span id="1848">1848</span> |
| <span id="1849">1849</span> |
| <span id="1850">1850</span> |
| <span id="1851">1851</span> |
| <span id="1852">1852</span> |
| <span id="1853">1853</span> |
| <span id="1854">1854</span> |
| <span id="1855">1855</span> |
| <span id="1856">1856</span> |
| <span id="1857">1857</span> |
| <span id="1858">1858</span> |
| <span id="1859">1859</span> |
| <span id="1860">1860</span> |
| </pre><pre class="rust "> |
| <span class="comment">// Copyright 2014-2016 The Rust Project Developers. See the COPYRIGHT</span> |
| <span class="comment">// file at the top-level directory of this distribution and at</span> |
| <span class="comment">// http://rust-lang.org/COPYRIGHT.</span> |
| <span class="comment">//</span> |
| <span class="comment">// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or</span> |
| <span class="comment">// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license</span> |
| <span class="comment">// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your</span> |
| <span class="comment">// option. This file may not be copied, modified, or distributed</span> |
| <span class="comment">// except according to those terms.</span> |
| |
| <span class="doccomment">/*! |
| The DFA matching engine. |
| |
| A DFA provides faster matching because the engine is in exactly one state at |
| any point in time. In the NFA, there may be multiple active states, and |
| considerable CPU cycles are spent shuffling them around. In finite automata |
| speak, the DFA follows epsilon transitions in the regex far less than the NFA. |
| |
| A DFA is a classic trade off between time and space. The NFA is slower, but |
| its memory requirements are typically small and predictable. The DFA is faster, |
| but given the right regex and the right input, the number of states in the |
| DFA can grow exponentially. To mitigate this space problem, we do two things: |
| |
| 1. We implement an *online* DFA. That is, the DFA is constructed from the NFA |
| during a search. When a new state is computed, it is stored in a cache so |
| that it may be reused. An important consequence of this implementation |
| is that states that are never reached for a particular input are never |
| computed. (This is impossible in an "offline" DFA which needs to compute |
| all possible states up front.) |
| 2. If the cache gets too big, we wipe it and continue matching. |
| |
| In pathological cases, a new state can be created for every byte of input. |
| (e.g., The regex `(a|b)*a(a|b){20}` on a long sequence of a's and b's.) |
| In this case, performance regresses to slightly slower than the full NFA |
| simulation, in large part because the cache becomes useless. If the cache |
| is wiped too frequently, the DFA quits and control falls back to one of the |
| NFA simulations. |
| |
| Because of the "lazy" nature of this DFA, the inner matching loop is |
| considerably more complex than one might expect out of a DFA. A number of |
| tricks are employed to make it fast. Tread carefully. |
| |
| N.B. While this implementation is heavily commented, Russ Cox's series of |
| articles on regexes is strongly recommended: https://swtch.com/~rsc/regexp/ |
| (As is the DFA implementation in RE2, which heavily influenced this |
| implementation.) |
| */</span> |
| |
| <span class="kw">use</span> <span class="ident">std</span>::<span class="ident">collections</span>::<span class="ident">HashMap</span>; |
| <span class="kw">use</span> <span class="ident">std</span>::<span class="ident">fmt</span>; |
| <span class="kw">use</span> <span class="ident">std</span>::<span class="ident">iter</span>::<span class="ident">repeat</span>; |
| <span class="kw">use</span> <span class="ident">std</span>::<span class="ident">mem</span>; |
| |
| <span class="kw">use</span> <span class="ident">exec</span>::<span class="ident">ProgramCache</span>; |
| <span class="kw">use</span> <span class="ident">prog</span>::{<span class="ident">Inst</span>, <span class="ident">Program</span>}; |
| <span class="kw">use</span> <span class="ident">sparse</span>::<span class="ident">SparseSet</span>; |
| |
| <span class="doccomment">/// Return true if and only if the given program can be executed by a DFA.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// Generally, a DFA is always possible. A pathological case where it is not</span> |
| <span class="doccomment">/// possible is if the number of NFA states exceeds u32::MAX, in which case,</span> |
| <span class="doccomment">/// this function will return false.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// This function will also return false if the given program has any Unicode</span> |
| <span class="doccomment">/// instructions (Char or Ranges) since the DFA operates on bytes only.</span> |
| <span class="kw">pub</span> <span class="kw">fn</span> <span class="ident">can_exec</span>(<span class="ident">insts</span>: <span class="kw-2">&</span><span class="ident">Program</span>) <span class="op">-></span> <span class="ident">bool</span> { |
| <span class="kw">use</span> <span class="ident">prog</span>::<span class="ident">Inst</span>::<span class="kw-2">*</span>; |
| <span class="comment">// If for some reason we manage to allocate a regex program with more</span> |
| <span class="comment">// than i32::MAX instructions, then we can't execute the DFA because we</span> |
| <span class="comment">// use 32 bit instruction pointer deltas for memory savings.</span> |
| <span class="comment">// If i32::MAX is the largest positive delta,</span> |
| <span class="comment">// then -i32::MAX == i32::MIN + 1 is the largest negative delta,</span> |
| <span class="comment">// and we are OK to use 32 bits.</span> |
| <span class="kw">if</span> <span class="ident">insts</span>.<span class="ident">len</span>() <span class="op">></span> ::<span class="ident">std</span>::<span class="ident">i32</span>::<span class="ident">MAX</span> <span class="kw">as</span> <span class="ident">usize</span> { |
| <span class="kw">return</span> <span class="bool-val">false</span>; |
| } |
| <span class="kw">for</span> <span class="ident">inst</span> <span class="kw">in</span> <span class="ident">insts</span> { |
| <span class="kw">match</span> <span class="kw-2">*</span><span class="ident">inst</span> { |
| <span class="ident">Char</span>(_) <span class="op">|</span> <span class="ident">Ranges</span>(_) <span class="op">=></span> <span class="kw">return</span> <span class="bool-val">false</span>, |
| <span class="ident">EmptyLook</span>(_) <span class="op">|</span> <span class="ident">Match</span>(_) <span class="op">|</span> <span class="ident">Save</span>(_) <span class="op">|</span> <span class="ident">Split</span>(_) <span class="op">|</span> <span class="ident">Bytes</span>(_) <span class="op">=></span> {} |
| } |
| } |
| <span class="bool-val">true</span> |
| } |
| |
| <span class="doccomment">/// A reusable cache of DFA states.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// This cache is reused between multiple invocations of the same regex</span> |
| <span class="doccomment">/// program. (It is not shared simultaneously between threads. If there is</span> |
| <span class="doccomment">/// contention, then new caches are created.)</span> |
| <span class="attribute">#[<span class="ident">derive</span>(<span class="ident">Clone</span>, <span class="ident">Debug</span>)]</span> |
| <span class="kw">pub</span> <span class="kw">struct</span> <span class="ident">Cache</span> { |
| <span class="doccomment">/// Group persistent DFA related cache state together. The sparse sets</span> |
| <span class="doccomment">/// listed below are used as scratch space while computing uncached states.</span> |
| <span class="ident">inner</span>: <span class="ident">CacheInner</span>, |
| <span class="doccomment">/// qcur and qnext are ordered sets with constant time</span> |
| <span class="doccomment">/// addition/membership/clearing-whole-set and linear time iteration. They</span> |
| <span class="doccomment">/// are used to manage the sets of NFA states in DFA states when computing</span> |
| <span class="doccomment">/// cached DFA states. In particular, the order of the NFA states matters</span> |
| <span class="doccomment">/// for leftmost-first style matching. Namely, when computing a cached</span> |
| <span class="doccomment">/// state, the set of NFA states stops growing as soon as the first Match</span> |
| <span class="doccomment">/// instruction is observed.</span> |
| <span class="ident">qcur</span>: <span class="ident">SparseSet</span>, |
| <span class="ident">qnext</span>: <span class="ident">SparseSet</span>, |
| } |
| |
| <span class="doccomment">/// CacheInner is logically just a part of Cache, but groups together fields</span> |
| <span class="doccomment">/// that aren't passed as function parameters throughout search. (This split</span> |
| <span class="doccomment">/// is mostly an artifact of the borrow checker. It is happily paid.)</span> |
| <span class="attribute">#[<span class="ident">derive</span>(<span class="ident">Clone</span>, <span class="ident">Debug</span>)]</span> |
| <span class="kw">struct</span> <span class="ident">CacheInner</span> { |
| <span class="doccomment">/// A cache of pre-compiled DFA states, keyed by the set of NFA states</span> |
| <span class="doccomment">/// and the set of empty-width flags set at the byte in the input when the</span> |
| <span class="doccomment">/// state was observed.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// A StatePtr is effectively a `*State`, but to avoid various inconvenient</span> |
| <span class="doccomment">/// things, we just pass indexes around manually. The performance impact of</span> |
| <span class="doccomment">/// this is probably an instruction or two in the inner loop. However, on</span> |
| <span class="doccomment">/// 64 bit, each StatePtr is half the size of a *State.</span> |
| <span class="ident">compiled</span>: <span class="ident">HashMap</span><span class="op"><</span><span class="ident">State</span>, <span class="ident">StatePtr</span><span class="op">></span>, |
| <span class="doccomment">/// The transition table.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// The transition table is laid out in row-major order, where states are</span> |
| <span class="doccomment">/// rows and the transitions for each state are columns. At a high level,</span> |
| <span class="doccomment">/// given state `s` and byte `b`, the next state can be found at index</span> |
| <span class="doccomment">/// `s * 256 + b`.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// This is, of course, a lie. A StatePtr is actually a pointer to the</span> |
| <span class="doccomment">/// *start* of a row in this table. When indexing in the DFA's inner loop,</span> |
| <span class="doccomment">/// this removes the need to multiply the StatePtr by the stride. Yes, it</span> |
| <span class="doccomment">/// matters. This reduces the number of states we can store, but: the</span> |
| <span class="doccomment">/// stride is rarely 256 since we define transitions in terms of</span> |
| <span class="doccomment">/// *equivalence classes* of bytes. Each class corresponds to a set of</span> |
| <span class="doccomment">/// bytes that never discriminate a distinct path through the DFA from each</span> |
| <span class="doccomment">/// other.</span> |
| <span class="ident">trans</span>: <span class="ident">Transitions</span>, |
| <span class="doccomment">/// Our set of states. Note that `StatePtr / num_byte_classes` indexes</span> |
| <span class="doccomment">/// this Vec rather than just a `StatePtr`.</span> |
| <span class="ident">states</span>: <span class="ident">Vec</span><span class="op"><</span><span class="ident">State</span><span class="op">></span>, |
| <span class="doccomment">/// A set of cached start states, which are limited to the number of</span> |
| <span class="doccomment">/// permutations of flags set just before the initial byte of input. (The</span> |
| <span class="doccomment">/// index into this vec is a `EmptyFlags`.)</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// N.B. A start state can be "dead" (i.e., no possible match), so we</span> |
| <span class="doccomment">/// represent it with a StatePtr.</span> |
| <span class="ident">start_states</span>: <span class="ident">Vec</span><span class="op"><</span><span class="ident">StatePtr</span><span class="op">></span>, |
| <span class="doccomment">/// Stack scratch space used to follow epsilon transitions in the NFA.</span> |
| <span class="doccomment">/// (This permits us to avoid recursion.)</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// The maximum stack size is the number of NFA states.</span> |
| <span class="ident">stack</span>: <span class="ident">Vec</span><span class="op"><</span><span class="ident">InstPtr</span><span class="op">></span>, |
| <span class="doccomment">/// The total number of times this cache has been flushed by the DFA</span> |
| <span class="doccomment">/// because of space constraints.</span> |
| <span class="ident">flush_count</span>: <span class="ident">u64</span>, |
| <span class="doccomment">/// The total heap size of the DFA's cache. We use this to determine when</span> |
| <span class="doccomment">/// we should flush the cache.</span> |
| <span class="ident">size</span>: <span class="ident">usize</span>, |
| } |
| |
| <span class="doccomment">/// The transition table.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// It is laid out in row-major order, with states as rows and byte class</span> |
| <span class="doccomment">/// transitions as columns.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// The transition table is responsible for producing valid StatePtrs. A</span> |
| <span class="doccomment">/// StatePtr points to the start of a particular row in this table. When</span> |
| <span class="doccomment">/// indexing to find the next state this allows us to avoid a multiplication</span> |
| <span class="doccomment">/// when computing an index into the table.</span> |
| <span class="attribute">#[<span class="ident">derive</span>(<span class="ident">Clone</span>)]</span> |
| <span class="kw">struct</span> <span class="ident">Transitions</span> { |
| <span class="doccomment">/// The table.</span> |
| <span class="ident">table</span>: <span class="ident">Vec</span><span class="op"><</span><span class="ident">StatePtr</span><span class="op">></span>, |
| <span class="doccomment">/// The stride.</span> |
| <span class="ident">num_byte_classes</span>: <span class="ident">usize</span>, |
| } |
| |
| <span class="doccomment">/// Fsm encapsulates the actual execution of the DFA.</span> |
| <span class="attribute">#[<span class="ident">derive</span>(<span class="ident">Debug</span>)]</span> |
| <span class="kw">pub</span> <span class="kw">struct</span> <span class="ident">Fsm</span><span class="op"><</span><span class="lifetime">'a</span><span class="op">></span> { |
| <span class="doccomment">/// prog contains the NFA instruction opcodes. DFA execution uses either</span> |
| <span class="doccomment">/// the `dfa` instructions or the `dfa_reverse` instructions from</span> |
| <span class="doccomment">/// `exec::ExecReadOnly`. (It never uses `ExecReadOnly.nfa`, which may have</span> |
| <span class="doccomment">/// Unicode opcodes that cannot be executed by the DFA.)</span> |
| <span class="ident">prog</span>: <span class="kw-2">&</span><span class="lifetime">'a</span> <span class="ident">Program</span>, |
| <span class="doccomment">/// The start state. We record it here because the pointer may change</span> |
| <span class="doccomment">/// when the cache is wiped.</span> |
| <span class="ident">start</span>: <span class="ident">StatePtr</span>, |
| <span class="doccomment">/// The current position in the input.</span> |
| <span class="ident">at</span>: <span class="ident">usize</span>, |
| <span class="doccomment">/// Should we quit after seeing the first match? e.g., When the caller</span> |
| <span class="doccomment">/// uses `is_match` or `shortest_match`.</span> |
| <span class="ident">quit_after_match</span>: <span class="ident">bool</span>, |
| <span class="doccomment">/// The last state that matched.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// When no match has occurred, this is set to STATE_UNKNOWN.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// This is only useful when matching regex sets. The last match state</span> |
| <span class="doccomment">/// is useful because it contains all of the match instructions seen,</span> |
| <span class="doccomment">/// thereby allowing us to enumerate which regexes in the set matched.</span> |
| <span class="ident">last_match_si</span>: <span class="ident">StatePtr</span>, |
| <span class="doccomment">/// The input position of the last cache flush. We use this to determine</span> |
| <span class="doccomment">/// if we're thrashing in the cache too often. If so, the DFA quits so</span> |
| <span class="doccomment">/// that we can fall back to the NFA algorithm.</span> |
| <span class="ident">last_cache_flush</span>: <span class="ident">usize</span>, |
| <span class="doccomment">/// All cached DFA information that is persisted between searches.</span> |
| <span class="ident">cache</span>: <span class="kw-2">&</span><span class="lifetime">'a</span> <span class="kw-2">mut</span> <span class="ident">CacheInner</span>, |
| } |
| |
| <span class="doccomment">/// The result of running the DFA.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// Generally, the result is either a match or not a match, but sometimes the</span> |
| <span class="doccomment">/// DFA runs too slowly because the cache size is too small. In that case, it</span> |
| <span class="doccomment">/// gives up with the intent of falling back to the NFA algorithm.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// The DFA can also give up if it runs out of room to create new states, or if</span> |
| <span class="doccomment">/// it sees non-ASCII bytes in the presence of a Unicode word boundary.</span> |
| <span class="attribute">#[<span class="ident">derive</span>(<span class="ident">Clone</span>, <span class="ident">Debug</span>)]</span> |
| <span class="kw">pub</span> <span class="kw">enum</span> <span class="prelude-ty">Result</span><span class="op"><</span><span class="ident">T</span><span class="op">></span> { |
| <span class="ident">Match</span>(<span class="ident">T</span>), |
| <span class="ident">NoMatch</span>(<span class="ident">usize</span>), |
| <span class="ident">Quit</span>, |
| } |
| |
| <span class="kw">impl</span><span class="op"><</span><span class="ident">T</span><span class="op">></span> <span class="prelude-ty">Result</span><span class="op"><</span><span class="ident">T</span><span class="op">></span> { |
| <span class="doccomment">/// Returns true if this result corresponds to a match.</span> |
| <span class="kw">pub</span> <span class="kw">fn</span> <span class="ident">is_match</span>(<span class="kw-2">&</span><span class="self">self</span>) <span class="op">-></span> <span class="ident">bool</span> { |
| <span class="kw">match</span> <span class="kw-2">*</span><span class="self">self</span> { |
| <span class="prelude-ty">Result</span>::<span class="ident">Match</span>(_) <span class="op">=></span> <span class="bool-val">true</span>, |
| <span class="prelude-ty">Result</span>::<span class="ident">NoMatch</span>(_) <span class="op">|</span> <span class="prelude-ty">Result</span>::<span class="ident">Quit</span> <span class="op">=></span> <span class="bool-val">false</span>, |
| } |
| } |
| |
| <span class="doccomment">/// Maps the given function onto T and returns the result.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// If this isn't a match, then this is a no-op.</span> |
| <span class="kw">pub</span> <span class="kw">fn</span> <span class="ident">map</span><span class="op"><</span><span class="ident">U</span>, <span class="ident">F</span>: <span class="ident">FnMut</span>(<span class="ident">T</span>) <span class="op">-></span> <span class="ident">U</span><span class="op">></span>(<span class="self">self</span>, <span class="kw-2">mut</span> <span class="ident">f</span>: <span class="ident">F</span>) <span class="op">-></span> <span class="prelude-ty">Result</span><span class="op"><</span><span class="ident">U</span><span class="op">></span> { |
| <span class="kw">match</span> <span class="self">self</span> { |
| <span class="prelude-ty">Result</span>::<span class="ident">Match</span>(<span class="ident">t</span>) <span class="op">=></span> <span class="prelude-ty">Result</span>::<span class="ident">Match</span>(<span class="ident">f</span>(<span class="ident">t</span>)), |
| <span class="prelude-ty">Result</span>::<span class="ident">NoMatch</span>(<span class="ident">x</span>) <span class="op">=></span> <span class="prelude-ty">Result</span>::<span class="ident">NoMatch</span>(<span class="ident">x</span>), |
| <span class="prelude-ty">Result</span>::<span class="ident">Quit</span> <span class="op">=></span> <span class="prelude-ty">Result</span>::<span class="ident">Quit</span>, |
| } |
| } |
| |
| <span class="doccomment">/// Sets the non-match position.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// If this isn't a non-match, then this is a no-op.</span> |
| <span class="kw">fn</span> <span class="ident">set_non_match</span>(<span class="self">self</span>, <span class="ident">at</span>: <span class="ident">usize</span>) <span class="op">-></span> <span class="prelude-ty">Result</span><span class="op"><</span><span class="ident">T</span><span class="op">></span> { |
| <span class="kw">match</span> <span class="self">self</span> { |
| <span class="prelude-ty">Result</span>::<span class="ident">NoMatch</span>(_) <span class="op">=></span> <span class="prelude-ty">Result</span>::<span class="ident">NoMatch</span>(<span class="ident">at</span>), |
| <span class="ident">r</span> <span class="op">=></span> <span class="ident">r</span>, |
| } |
| } |
| } |
| |
| <span class="doccomment">/// State is a DFA state. It contains an ordered set of NFA states (not</span> |
| <span class="doccomment">/// necessarily complete) and a smattering of flags.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// The flags are packed into the first byte of data.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// States don't carry their transitions. Instead, transitions are stored in</span> |
| <span class="doccomment">/// a single row-major table.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// Delta encoding is used to store the instruction pointers.</span> |
| <span class="doccomment">/// The first instruction pointer is stored directly starting</span> |
| <span class="doccomment">/// at data[1], and each following pointer is stored as an offset</span> |
| <span class="doccomment">/// to the previous one. If a delta is in the range -127..127,</span> |
| <span class="doccomment">/// it is packed into a single byte; Otherwise the byte 128 (-128 as an i8)</span> |
| <span class="doccomment">/// is coded as a flag, followed by 4 bytes encoding the delta.</span> |
| <span class="attribute">#[<span class="ident">derive</span>(<span class="ident">Clone</span>, <span class="ident">Eq</span>, <span class="ident">Hash</span>, <span class="ident">PartialEq</span>)]</span> |
| <span class="kw">struct</span> <span class="ident">State</span>{ |
| <span class="ident">data</span>: <span class="ident">Box</span><span class="op"><</span>[<span class="ident">u8</span>]<span class="op">></span>, |
| } |
| |
| <span class="doccomment">/// InstPtr is a 32 bit pointer into a sequence of opcodes (i.e., it indexes</span> |
| <span class="doccomment">/// an NFA state).</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// Throughout this library, this is usually set to `usize`, but we force a</span> |
| <span class="doccomment">/// `u32` here for the DFA to save on space.</span> |
| <span class="kw">type</span> <span class="ident">InstPtr</span> <span class="op">=</span> <span class="ident">u32</span>; |
| |
| <span class="doccomment">/// Adds ip to data using delta encoding with respect to prev.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// After completion, `data` will contain `ip` and `prev` will be set to `ip`.</span> |
| <span class="kw">fn</span> <span class="ident">push_inst_ptr</span>(<span class="ident">data</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">Vec</span><span class="op"><</span><span class="ident">u8</span><span class="op">></span>, <span class="ident">prev</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">InstPtr</span>, <span class="ident">ip</span>: <span class="ident">InstPtr</span>) { |
| <span class="kw">let</span> <span class="ident">delta</span> <span class="op">=</span> (<span class="ident">ip</span> <span class="kw">as</span> <span class="ident">i32</span>) <span class="op">-</span> (<span class="kw-2">*</span><span class="ident">prev</span> <span class="kw">as</span> <span class="ident">i32</span>); |
| <span class="ident">write_vari32</span>(<span class="ident">data</span>, <span class="ident">delta</span>); |
| <span class="kw-2">*</span><span class="ident">prev</span> <span class="op">=</span> <span class="ident">ip</span>; |
| } |
| |
| <span class="kw">struct</span> <span class="ident">InstPtrs</span><span class="op"><</span><span class="lifetime">'a</span><span class="op">></span> { |
| <span class="ident">base</span>: <span class="ident">usize</span>, |
| <span class="ident">data</span>: <span class="kw-2">&</span><span class="lifetime">'a</span> [<span class="ident">u8</span>], |
| } |
| |
| <span class="kw">impl</span> <span class="op"><</span><span class="lifetime">'a</span><span class="op">></span><span class="ident">Iterator</span> <span class="kw">for</span> <span class="ident">InstPtrs</span><span class="op"><</span><span class="lifetime">'a</span><span class="op">></span> { |
| <span class="kw">type</span> <span class="ident">Item</span> <span class="op">=</span> <span class="ident">usize</span>; |
| |
| <span class="kw">fn</span> <span class="ident">next</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>) <span class="op">-></span> <span class="prelude-ty">Option</span><span class="op"><</span><span class="ident">usize</span><span class="op">></span> { |
| <span class="kw">if</span> <span class="self">self</span>.<span class="ident">data</span>.<span class="ident">is_empty</span>() { |
| <span class="kw">return</span> <span class="prelude-val">None</span>; |
| } |
| <span class="kw">let</span> (<span class="ident">delta</span>, <span class="ident">nread</span>) <span class="op">=</span> <span class="ident">read_vari32</span>(<span class="self">self</span>.<span class="ident">data</span>); |
| <span class="kw">let</span> <span class="ident">base</span> <span class="op">=</span> <span class="self">self</span>.<span class="ident">base</span> <span class="kw">as</span> <span class="ident">i32</span> <span class="op">+</span> <span class="ident">delta</span>; |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="ident">base</span> <span class="op">>=</span> <span class="number">0</span>); |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="ident">nread</span> <span class="op">></span> <span class="number">0</span>); |
| <span class="self">self</span>.<span class="ident">data</span> <span class="op">=</span> <span class="kw-2">&</span><span class="self">self</span>.<span class="ident">data</span>[<span class="ident">nread</span>..]; |
| <span class="self">self</span>.<span class="ident">base</span> <span class="op">=</span> <span class="ident">base</span> <span class="kw">as</span> <span class="ident">usize</span>; |
| <span class="prelude-val">Some</span>(<span class="self">self</span>.<span class="ident">base</span>) |
| } |
| } |
| |
| <span class="kw">impl</span> <span class="ident">State</span> { |
| <span class="kw">fn</span> <span class="ident">flags</span>(<span class="kw-2">&</span><span class="self">self</span>) <span class="op">-></span> <span class="ident">StateFlags</span> { |
| <span class="ident">StateFlags</span>(<span class="self">self</span>.<span class="ident">data</span>[<span class="number">0</span>]) |
| } |
| |
| <span class="kw">fn</span> <span class="ident">inst_ptrs</span>(<span class="kw-2">&</span><span class="self">self</span>) <span class="op">-></span> <span class="ident">InstPtrs</span> { |
| <span class="ident">InstPtrs</span> { |
| <span class="ident">base</span>: <span class="number">0</span>, |
| <span class="ident">data</span>: <span class="kw-2">&</span><span class="self">self</span>.<span class="ident">data</span>[<span class="number">1</span>..], |
| } |
| } |
| } |
| |
| <span class="doccomment">/// StatePtr is a 32 bit pointer to the start of a row in the transition table.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// It has many special values. There are two types of special values:</span> |
| <span class="doccomment">/// sentinels and flags.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// Sentinels corresponds to special states that carry some kind of</span> |
| <span class="doccomment">/// significance. There are three such states: unknown, dead and quit states.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// Unknown states are states that haven't been computed yet. They indicate</span> |
| <span class="doccomment">/// that a transition should be filled in that points to either an existing</span> |
| <span class="doccomment">/// cached state or a new state altogether. In general, an unknown state means</span> |
| <span class="doccomment">/// "follow the NFA's epsilon transitions."</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// Dead states are states that can never lead to a match, no matter what</span> |
| <span class="doccomment">/// subsequent input is observed. This means that the DFA should quit</span> |
| <span class="doccomment">/// immediately and return the longest match it has found thus far.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// Quit states are states that imply the DFA is not capable of matching the</span> |
| <span class="doccomment">/// regex correctly. Currently, this is only used when a Unicode word boundary</span> |
| <span class="doccomment">/// exists in the regex *and* a non-ASCII byte is observed.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// The other type of state pointer is a state pointer with special flag bits.</span> |
| <span class="doccomment">/// There are two flags: a start flag and a match flag. The lower bits of both</span> |
| <span class="doccomment">/// kinds always contain a "valid" StatePtr (indicated by the STATE_MAX mask).</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// The start flag means that the state is a start state, and therefore may be</span> |
| <span class="doccomment">/// subject to special prefix scanning optimizations.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// The match flag means that the state is a match state, and therefore the</span> |
| <span class="doccomment">/// current position in the input (while searching) should be recorded.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// The above exists mostly in the service of making the inner loop fast.</span> |
| <span class="doccomment">/// In particular, the inner *inner* loop looks something like this:</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// ```ignore</span> |
| <span class="doccomment">/// while state <= STATE_MAX and i < len(text):</span> |
| <span class="doccomment">/// state = state.next[i]</span> |
| <span class="doccomment">/// ```</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// This is nice because it lets us execute a lazy DFA as if it were an</span> |
| <span class="doccomment">/// entirely offline DFA (i.e., with very few instructions). The loop will</span> |
| <span class="doccomment">/// quit only when we need to examine a case that needs special attention.</span> |
| <span class="kw">type</span> <span class="ident">StatePtr</span> <span class="op">=</span> <span class="ident">u32</span>; |
| |
| <span class="doccomment">/// An unknown state means that the state has not been computed yet, and that</span> |
| <span class="doccomment">/// the only way to progress is to compute it.</span> |
| <span class="kw">const</span> <span class="ident">STATE_UNKNOWN</span>: <span class="ident">StatePtr</span> <span class="op">=</span> <span class="number">1</span><span class="op"><<</span><span class="number">31</span>; |
| |
| <span class="doccomment">/// A dead state means that the state has been computed and it is known that</span> |
| <span class="doccomment">/// once it is entered, no future match can ever occur.</span> |
| <span class="kw">const</span> <span class="ident">STATE_DEAD</span>: <span class="ident">StatePtr</span> <span class="op">=</span> <span class="ident">STATE_UNKNOWN</span> <span class="op">+</span> <span class="number">1</span>; |
| |
| <span class="doccomment">/// A quit state means that the DFA came across some input that it doesn't</span> |
| <span class="doccomment">/// know how to process correctly. The DFA should quit and another matching</span> |
| <span class="doccomment">/// engine should be run in its place.</span> |
| <span class="kw">const</span> <span class="ident">STATE_QUIT</span>: <span class="ident">StatePtr</span> <span class="op">=</span> <span class="ident">STATE_DEAD</span> <span class="op">+</span> <span class="number">1</span>; |
| |
| <span class="doccomment">/// A start state is a state that the DFA can start in.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// Note that start states have their lower bits set to a state pointer.</span> |
| <span class="kw">const</span> <span class="ident">STATE_START</span>: <span class="ident">StatePtr</span> <span class="op">=</span> <span class="number">1</span><span class="op"><<</span><span class="number">30</span>; |
| |
| <span class="doccomment">/// A match state means that the regex has successfully matched.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// Note that match states have their lower bits set to a state pointer.</span> |
| <span class="kw">const</span> <span class="ident">STATE_MATCH</span>: <span class="ident">StatePtr</span> <span class="op">=</span> <span class="number">1</span><span class="op"><<</span><span class="number">29</span>; |
| |
| <span class="doccomment">/// The maximum state pointer. This is useful to mask out the "valid" state</span> |
| <span class="doccomment">/// pointer from a state with the "start" or "match" bits set.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// It doesn't make sense to use this with unknown, dead or quit state</span> |
| <span class="doccomment">/// pointers, since those pointers are sentinels and never have their lower</span> |
| <span class="doccomment">/// bits set to anything meaningful.</span> |
| <span class="kw">const</span> <span class="ident">STATE_MAX</span>: <span class="ident">StatePtr</span> <span class="op">=</span> <span class="ident">STATE_MATCH</span> <span class="op">-</span> <span class="number">1</span>; |
| |
| <span class="doccomment">/// Byte is a u8 in spirit, but a u16 in practice so that we can represent the</span> |
| <span class="doccomment">/// special EOF sentinel value.</span> |
| <span class="attribute">#[<span class="ident">derive</span>(<span class="ident">Copy</span>, <span class="ident">Clone</span>, <span class="ident">Debug</span>)]</span> |
| <span class="kw">struct</span> <span class="ident">Byte</span>(<span class="ident">u16</span>); |
| |
| <span class="doccomment">/// A set of flags for zero-width assertions.</span> |
| <span class="attribute">#[<span class="ident">derive</span>(<span class="ident">Clone</span>, <span class="ident">Copy</span>, <span class="ident">Eq</span>, <span class="ident">Debug</span>, <span class="ident">Default</span>, <span class="ident">Hash</span>, <span class="ident">PartialEq</span>)]</span> |
| <span class="kw">struct</span> <span class="ident">EmptyFlags</span> { |
| <span class="ident">start</span>: <span class="ident">bool</span>, |
| <span class="ident">end</span>: <span class="ident">bool</span>, |
| <span class="ident">start_line</span>: <span class="ident">bool</span>, |
| <span class="ident">end_line</span>: <span class="ident">bool</span>, |
| <span class="ident">word_boundary</span>: <span class="ident">bool</span>, |
| <span class="ident">not_word_boundary</span>: <span class="ident">bool</span>, |
| } |
| |
| <span class="doccomment">/// A set of flags describing various configurations of a DFA state. This is</span> |
| <span class="doccomment">/// represented by a `u8` so that it is compact.</span> |
| <span class="attribute">#[<span class="ident">derive</span>(<span class="ident">Clone</span>, <span class="ident">Copy</span>, <span class="ident">Eq</span>, <span class="ident">Default</span>, <span class="ident">Hash</span>, <span class="ident">PartialEq</span>)]</span> |
| <span class="kw">struct</span> <span class="ident">StateFlags</span>(<span class="ident">u8</span>); |
| |
| <span class="kw">impl</span> <span class="ident">Cache</span> { |
| <span class="doccomment">/// Create new empty cache for the DFA engine.</span> |
| <span class="kw">pub</span> <span class="kw">fn</span> <span class="ident">new</span>(<span class="ident">prog</span>: <span class="kw-2">&</span><span class="ident">Program</span>) <span class="op">-></span> <span class="self">Self</span> { |
| <span class="comment">// We add 1 to account for the special EOF byte.</span> |
| <span class="kw">let</span> <span class="ident">num_byte_classes</span> <span class="op">=</span> (<span class="ident">prog</span>.<span class="ident">byte_classes</span>[<span class="number">255</span>] <span class="kw">as</span> <span class="ident">usize</span> <span class="op">+</span> <span class="number">1</span>) <span class="op">+</span> <span class="number">1</span>; |
| <span class="kw">let</span> <span class="ident">starts</span> <span class="op">=</span> <span class="macro">vec</span><span class="macro">!</span>[<span class="ident">STATE_UNKNOWN</span>; <span class="number">256</span>]; |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">cache</span> <span class="op">=</span> <span class="ident">Cache</span> { |
| <span class="ident">inner</span>: <span class="ident">CacheInner</span> { |
| <span class="ident">compiled</span>: <span class="ident">HashMap</span>::<span class="ident">new</span>(), |
| <span class="ident">trans</span>: <span class="ident">Transitions</span>::<span class="ident">new</span>(<span class="ident">num_byte_classes</span>), |
| <span class="ident">states</span>: <span class="macro">vec</span><span class="macro">!</span>[], |
| <span class="ident">start_states</span>: <span class="ident">starts</span>, |
| <span class="ident">stack</span>: <span class="macro">vec</span><span class="macro">!</span>[], |
| <span class="ident">flush_count</span>: <span class="number">0</span>, |
| <span class="ident">size</span>: <span class="number">0</span>, |
| }, |
| <span class="ident">qcur</span>: <span class="ident">SparseSet</span>::<span class="ident">new</span>(<span class="ident">prog</span>.<span class="ident">insts</span>.<span class="ident">len</span>()), |
| <span class="ident">qnext</span>: <span class="ident">SparseSet</span>::<span class="ident">new</span>(<span class="ident">prog</span>.<span class="ident">insts</span>.<span class="ident">len</span>()), |
| }; |
| <span class="ident">cache</span>.<span class="ident">inner</span>.<span class="ident">reset_size</span>(); |
| <span class="ident">cache</span> |
| } |
| } |
| |
| <span class="kw">impl</span> <span class="ident">CacheInner</span> { |
| <span class="doccomment">/// Resets the cache size to account for fixed costs, such as the program</span> |
| <span class="doccomment">/// and stack sizes.</span> |
| <span class="kw">fn</span> <span class="ident">reset_size</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>) { |
| <span class="self">self</span>.<span class="ident">size</span> <span class="op">=</span> |
| (<span class="self">self</span>.<span class="ident">start_states</span>.<span class="ident">len</span>() <span class="op">*</span> <span class="ident">mem</span>::<span class="ident">size_of</span>::<span class="op"><</span><span class="ident">StatePtr</span><span class="op">></span>()) |
| <span class="op">+</span> (<span class="self">self</span>.<span class="ident">stack</span>.<span class="ident">len</span>() <span class="op">*</span> <span class="ident">mem</span>::<span class="ident">size_of</span>::<span class="op"><</span><span class="ident">InstPtr</span><span class="op">></span>()); |
| } |
| } |
| |
| <span class="kw">impl</span><span class="op"><</span><span class="lifetime">'a</span><span class="op">></span> <span class="ident">Fsm</span><span class="op"><</span><span class="lifetime">'a</span><span class="op">></span> { |
| <span class="attribute">#[<span class="ident">inline</span>(<span class="ident">always</span>)]</span> <span class="comment">// reduces constant overhead</span> |
| <span class="kw">pub</span> <span class="kw">fn</span> <span class="ident">forward</span>( |
| <span class="ident">prog</span>: <span class="kw-2">&</span><span class="lifetime">'a</span> <span class="ident">Program</span>, |
| <span class="ident">cache</span>: <span class="kw-2">&</span><span class="ident">ProgramCache</span>, |
| <span class="ident">quit_after_match</span>: <span class="ident">bool</span>, |
| <span class="ident">text</span>: <span class="kw-2">&</span>[<span class="ident">u8</span>], |
| <span class="ident">at</span>: <span class="ident">usize</span>, |
| ) <span class="op">-></span> <span class="prelude-ty">Result</span><span class="op"><</span><span class="ident">usize</span><span class="op">></span> { |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">cache</span> <span class="op">=</span> <span class="ident">cache</span>.<span class="ident">borrow_mut</span>(); |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">cache</span> <span class="op">=</span> <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">cache</span>.<span class="ident">dfa</span>; |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">dfa</span> <span class="op">=</span> <span class="ident">Fsm</span> { |
| <span class="ident">prog</span>: <span class="ident">prog</span>, |
| <span class="ident">start</span>: <span class="number">0</span>, <span class="comment">// filled in below</span> |
| <span class="ident">at</span>: <span class="ident">at</span>, |
| <span class="ident">quit_after_match</span>: <span class="ident">quit_after_match</span>, |
| <span class="ident">last_match_si</span>: <span class="ident">STATE_UNKNOWN</span>, |
| <span class="ident">last_cache_flush</span>: <span class="ident">at</span>, |
| <span class="ident">cache</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">cache</span>.<span class="ident">inner</span>, |
| }; |
| <span class="kw">let</span> (<span class="ident">empty_flags</span>, <span class="ident">state_flags</span>) <span class="op">=</span> <span class="ident">dfa</span>.<span class="ident">start_flags</span>(<span class="ident">text</span>, <span class="ident">at</span>); |
| <span class="ident">dfa</span>.<span class="ident">start</span> <span class="op">=</span> <span class="kw">match</span> <span class="ident">dfa</span>.<span class="ident">start_state</span>( |
| <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">cache</span>.<span class="ident">qcur</span>, |
| <span class="ident">empty_flags</span>, |
| <span class="ident">state_flags</span>, |
| ) { |
| <span class="prelude-val">None</span> <span class="op">=></span> <span class="kw">return</span> <span class="prelude-ty">Result</span>::<span class="ident">Quit</span>, |
| <span class="prelude-val">Some</span>(<span class="ident">STATE_DEAD</span>) <span class="op">=></span> <span class="kw">return</span> <span class="prelude-ty">Result</span>::<span class="ident">NoMatch</span>(<span class="ident">at</span>), |
| <span class="prelude-val">Some</span>(<span class="ident">si</span>) <span class="op">=></span> <span class="ident">si</span>, |
| }; |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="ident">dfa</span>.<span class="ident">start</span> <span class="op">!=</span> <span class="ident">STATE_UNKNOWN</span>); |
| <span class="ident">dfa</span>.<span class="ident">exec_at</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">cache</span>.<span class="ident">qcur</span>, <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">cache</span>.<span class="ident">qnext</span>, <span class="ident">text</span>) |
| } |
| |
| <span class="attribute">#[<span class="ident">inline</span>(<span class="ident">always</span>)]</span> <span class="comment">// reduces constant overhead</span> |
| <span class="kw">pub</span> <span class="kw">fn</span> <span class="ident">reverse</span>( |
| <span class="ident">prog</span>: <span class="kw-2">&</span><span class="lifetime">'a</span> <span class="ident">Program</span>, |
| <span class="ident">cache</span>: <span class="kw-2">&</span><span class="ident">ProgramCache</span>, |
| <span class="ident">quit_after_match</span>: <span class="ident">bool</span>, |
| <span class="ident">text</span>: <span class="kw-2">&</span>[<span class="ident">u8</span>], |
| <span class="ident">at</span>: <span class="ident">usize</span>, |
| ) <span class="op">-></span> <span class="prelude-ty">Result</span><span class="op"><</span><span class="ident">usize</span><span class="op">></span> { |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">cache</span> <span class="op">=</span> <span class="ident">cache</span>.<span class="ident">borrow_mut</span>(); |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">cache</span> <span class="op">=</span> <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">cache</span>.<span class="ident">dfa_reverse</span>; |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">dfa</span> <span class="op">=</span> <span class="ident">Fsm</span> { |
| <span class="ident">prog</span>: <span class="ident">prog</span>, |
| <span class="ident">start</span>: <span class="number">0</span>, <span class="comment">// filled in below</span> |
| <span class="ident">at</span>: <span class="ident">at</span>, |
| <span class="ident">quit_after_match</span>: <span class="ident">quit_after_match</span>, |
| <span class="ident">last_match_si</span>: <span class="ident">STATE_UNKNOWN</span>, |
| <span class="ident">last_cache_flush</span>: <span class="ident">at</span>, |
| <span class="ident">cache</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">cache</span>.<span class="ident">inner</span>, |
| }; |
| <span class="kw">let</span> (<span class="ident">empty_flags</span>, <span class="ident">state_flags</span>) <span class="op">=</span> <span class="ident">dfa</span>.<span class="ident">start_flags_reverse</span>(<span class="ident">text</span>, <span class="ident">at</span>); |
| <span class="ident">dfa</span>.<span class="ident">start</span> <span class="op">=</span> <span class="kw">match</span> <span class="ident">dfa</span>.<span class="ident">start_state</span>( |
| <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">cache</span>.<span class="ident">qcur</span>, |
| <span class="ident">empty_flags</span>, |
| <span class="ident">state_flags</span>, |
| ) { |
| <span class="prelude-val">None</span> <span class="op">=></span> <span class="kw">return</span> <span class="prelude-ty">Result</span>::<span class="ident">Quit</span>, |
| <span class="prelude-val">Some</span>(<span class="ident">STATE_DEAD</span>) <span class="op">=></span> <span class="kw">return</span> <span class="prelude-ty">Result</span>::<span class="ident">NoMatch</span>(<span class="ident">at</span>), |
| <span class="prelude-val">Some</span>(<span class="ident">si</span>) <span class="op">=></span> <span class="ident">si</span>, |
| }; |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="ident">dfa</span>.<span class="ident">start</span> <span class="op">!=</span> <span class="ident">STATE_UNKNOWN</span>); |
| <span class="ident">dfa</span>.<span class="ident">exec_at_reverse</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">cache</span>.<span class="ident">qcur</span>, <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">cache</span>.<span class="ident">qnext</span>, <span class="ident">text</span>) |
| } |
| |
| <span class="attribute">#[<span class="ident">inline</span>(<span class="ident">always</span>)]</span> <span class="comment">// reduces constant overhead</span> |
| <span class="kw">pub</span> <span class="kw">fn</span> <span class="ident">forward_many</span>( |
| <span class="ident">prog</span>: <span class="kw-2">&</span><span class="lifetime">'a</span> <span class="ident">Program</span>, |
| <span class="ident">cache</span>: <span class="kw-2">&</span><span class="ident">ProgramCache</span>, |
| <span class="ident">matches</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> [<span class="ident">bool</span>], |
| <span class="ident">text</span>: <span class="kw-2">&</span>[<span class="ident">u8</span>], |
| <span class="ident">at</span>: <span class="ident">usize</span>, |
| ) <span class="op">-></span> <span class="prelude-ty">Result</span><span class="op"><</span><span class="ident">usize</span><span class="op">></span> { |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="ident">matches</span>.<span class="ident">len</span>() <span class="op">==</span> <span class="ident">prog</span>.<span class="ident">matches</span>.<span class="ident">len</span>()); |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">cache</span> <span class="op">=</span> <span class="ident">cache</span>.<span class="ident">borrow_mut</span>(); |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">cache</span> <span class="op">=</span> <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">cache</span>.<span class="ident">dfa</span>; |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">dfa</span> <span class="op">=</span> <span class="ident">Fsm</span> { |
| <span class="ident">prog</span>: <span class="ident">prog</span>, |
| <span class="ident">start</span>: <span class="number">0</span>, <span class="comment">// filled in below</span> |
| <span class="ident">at</span>: <span class="ident">at</span>, |
| <span class="ident">quit_after_match</span>: <span class="bool-val">false</span>, |
| <span class="ident">last_match_si</span>: <span class="ident">STATE_UNKNOWN</span>, |
| <span class="ident">last_cache_flush</span>: <span class="ident">at</span>, |
| <span class="ident">cache</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">cache</span>.<span class="ident">inner</span>, |
| }; |
| <span class="kw">let</span> (<span class="ident">empty_flags</span>, <span class="ident">state_flags</span>) <span class="op">=</span> <span class="ident">dfa</span>.<span class="ident">start_flags</span>(<span class="ident">text</span>, <span class="ident">at</span>); |
| <span class="ident">dfa</span>.<span class="ident">start</span> <span class="op">=</span> <span class="kw">match</span> <span class="ident">dfa</span>.<span class="ident">start_state</span>( |
| <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">cache</span>.<span class="ident">qcur</span>, |
| <span class="ident">empty_flags</span>, |
| <span class="ident">state_flags</span>, |
| ) { |
| <span class="prelude-val">None</span> <span class="op">=></span> <span class="kw">return</span> <span class="prelude-ty">Result</span>::<span class="ident">Quit</span>, |
| <span class="prelude-val">Some</span>(<span class="ident">STATE_DEAD</span>) <span class="op">=></span> <span class="kw">return</span> <span class="prelude-ty">Result</span>::<span class="ident">NoMatch</span>(<span class="ident">at</span>), |
| <span class="prelude-val">Some</span>(<span class="ident">si</span>) <span class="op">=></span> <span class="ident">si</span>, |
| }; |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="ident">dfa</span>.<span class="ident">start</span> <span class="op">!=</span> <span class="ident">STATE_UNKNOWN</span>); |
| <span class="kw">let</span> <span class="ident">result</span> <span class="op">=</span> <span class="ident">dfa</span>.<span class="ident">exec_at</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">cache</span>.<span class="ident">qcur</span>, <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">cache</span>.<span class="ident">qnext</span>, <span class="ident">text</span>); |
| <span class="kw">if</span> <span class="ident">result</span>.<span class="ident">is_match</span>() { |
| <span class="kw">if</span> <span class="ident">matches</span>.<span class="ident">len</span>() <span class="op">==</span> <span class="number">1</span> { |
| <span class="ident">matches</span>[<span class="number">0</span>] <span class="op">=</span> <span class="bool-val">true</span>; |
| } <span class="kw">else</span> { |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="ident">dfa</span>.<span class="ident">last_match_si</span> <span class="op">!=</span> <span class="ident">STATE_UNKNOWN</span>); |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="ident">dfa</span>.<span class="ident">last_match_si</span> <span class="op">!=</span> <span class="ident">STATE_DEAD</span>); |
| <span class="kw">for</span> <span class="ident">ip</span> <span class="kw">in</span> <span class="ident">dfa</span>.<span class="ident">state</span>(<span class="ident">dfa</span>.<span class="ident">last_match_si</span>).<span class="ident">inst_ptrs</span>() { |
| <span class="kw">if</span> <span class="kw">let</span> <span class="ident">Inst</span>::<span class="ident">Match</span>(<span class="ident">slot</span>) <span class="op">=</span> <span class="ident">dfa</span>.<span class="ident">prog</span>[<span class="ident">ip</span>] { |
| <span class="ident">matches</span>[<span class="ident">slot</span>] <span class="op">=</span> <span class="bool-val">true</span>; |
| } |
| } |
| } |
| } |
| <span class="ident">result</span> |
| } |
| |
| <span class="doccomment">/// Executes the DFA on a forward NFA.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// {qcur,qnext} are scratch ordered sets which may be non-empty.</span> |
| <span class="attribute">#[<span class="ident">inline</span>(<span class="ident">always</span>)]</span> <span class="comment">// reduces constant overhead</span> |
| <span class="kw">fn</span> <span class="ident">exec_at</span>( |
| <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>, |
| <span class="ident">qcur</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">SparseSet</span>, |
| <span class="ident">qnext</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">SparseSet</span>, |
| <span class="ident">text</span>: <span class="kw-2">&</span>[<span class="ident">u8</span>], |
| ) <span class="op">-></span> <span class="prelude-ty">Result</span><span class="op"><</span><span class="ident">usize</span><span class="op">></span> { |
| <span class="comment">// For the most part, the DFA is basically:</span> |
| <span class="comment">//</span> |
| <span class="comment">// last_match = null</span> |
| <span class="comment">// while current_byte != EOF:</span> |
| <span class="comment">// si = current_state.next[current_byte]</span> |
| <span class="comment">// if si is match</span> |
| <span class="comment">// last_match = si</span> |
| <span class="comment">// return last_match</span> |
| <span class="comment">//</span> |
| <span class="comment">// However, we need to deal with a few things:</span> |
| <span class="comment">//</span> |
| <span class="comment">// 1. This is an *online* DFA, so the current state's next list</span> |
| <span class="comment">// may not point to anywhere yet, so we must go out and compute</span> |
| <span class="comment">// them. (They are then cached into the current state's next list</span> |
| <span class="comment">// to avoid re-computation.)</span> |
| <span class="comment">// 2. If we come across a state that is known to be dead (i.e., never</span> |
| <span class="comment">// leads to a match), then we can quit early.</span> |
| <span class="comment">// 3. If the caller just wants to know if a match occurs, then we</span> |
| <span class="comment">// can quit as soon as we know we have a match. (Full leftmost</span> |
| <span class="comment">// first semantics require continuing on.)</span> |
| <span class="comment">// 4. If we're in the start state, then we can use a pre-computed set</span> |
| <span class="comment">// of prefix literals to skip quickly along the input.</span> |
| <span class="comment">// 5. After the input is exhausted, we run the DFA on one symbol</span> |
| <span class="comment">// that stands for EOF. This is useful for handling empty width</span> |
| <span class="comment">// assertions.</span> |
| <span class="comment">// 6. We can't actually do state.next[byte]. Instead, we have to do</span> |
| <span class="comment">// state.next[byte_classes[byte]], which permits us to keep the</span> |
| <span class="comment">// 'next' list very small.</span> |
| <span class="comment">//</span> |
| <span class="comment">// Since there's a bunch of extra stuff we need to consider, we do some</span> |
| <span class="comment">// pretty hairy tricks to get the inner loop to run as fast as</span> |
| <span class="comment">// possible.</span> |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="op">!</span><span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">is_reverse</span>); |
| |
| <span class="comment">// The last match is the currently known ending match position. It is</span> |
| <span class="comment">// reported as an index to the most recent byte that resulted in a</span> |
| <span class="comment">// transition to a match state and is always stored in capture slot `1`</span> |
| <span class="comment">// when searching forwards. Its maximum value is `text.len()`.</span> |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">result</span> <span class="op">=</span> <span class="prelude-ty">Result</span>::<span class="ident">NoMatch</span>(<span class="self">self</span>.<span class="ident">at</span>); |
| <span class="kw">let</span> (<span class="kw-2">mut</span> <span class="ident">prev_si</span>, <span class="kw-2">mut</span> <span class="ident">next_si</span>) <span class="op">=</span> (<span class="self">self</span>.<span class="ident">start</span>, <span class="self">self</span>.<span class="ident">start</span>); |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">at</span> <span class="op">=</span> <span class="self">self</span>.<span class="ident">at</span>; |
| <span class="kw">while</span> <span class="ident">at</span> <span class="op"><</span> <span class="ident">text</span>.<span class="ident">len</span>() { |
| <span class="comment">// This is the real inner loop. We take advantage of special bits</span> |
| <span class="comment">// set in the state pointer to determine whether a state is in the</span> |
| <span class="comment">// "common" case or not. Specifically, the common case is a</span> |
| <span class="comment">// non-match non-start non-dead state that has already been</span> |
| <span class="comment">// computed. So long as we remain in the common case, this inner</span> |
| <span class="comment">// loop will chew through the input.</span> |
| <span class="comment">//</span> |
| <span class="comment">// We also unroll the loop 4 times to amortize the cost of checking</span> |
| <span class="comment">// whether we've consumed the entire input. We are also careful</span> |
| <span class="comment">// to make sure that `prev_si` always represents the previous state</span> |
| <span class="comment">// and `next_si` always represents the next state after the loop</span> |
| <span class="comment">// exits, even if it isn't always true inside the loop.</span> |
| <span class="kw">while</span> <span class="ident">next_si</span> <span class="op"><=</span> <span class="ident">STATE_MAX</span> <span class="op">&&</span> <span class="ident">at</span> <span class="op"><</span> <span class="ident">text</span>.<span class="ident">len</span>() { |
| <span class="comment">// Argument for safety is in the definition of next_si.</span> |
| <span class="ident">prev_si</span> <span class="op">=</span> <span class="kw">unsafe</span> { <span class="self">self</span>.<span class="ident">next_si</span>(<span class="ident">next_si</span>, <span class="ident">text</span>, <span class="ident">at</span>) }; |
| <span class="ident">at</span> <span class="op">+=</span> <span class="number">1</span>; |
| <span class="kw">if</span> <span class="ident">prev_si</span> <span class="op">></span> <span class="ident">STATE_MAX</span> <span class="op">||</span> <span class="ident">at</span> <span class="op">+</span> <span class="number">2</span> <span class="op">>=</span> <span class="ident">text</span>.<span class="ident">len</span>() { |
| <span class="ident">mem</span>::<span class="ident">swap</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">prev_si</span>, <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">next_si</span>); |
| <span class="kw">break</span>; |
| } |
| <span class="ident">next_si</span> <span class="op">=</span> <span class="kw">unsafe</span> { <span class="self">self</span>.<span class="ident">next_si</span>(<span class="ident">prev_si</span>, <span class="ident">text</span>, <span class="ident">at</span>) }; |
| <span class="ident">at</span> <span class="op">+=</span> <span class="number">1</span>; |
| <span class="kw">if</span> <span class="ident">next_si</span> <span class="op">></span> <span class="ident">STATE_MAX</span> { |
| <span class="kw">break</span>; |
| } |
| <span class="ident">prev_si</span> <span class="op">=</span> <span class="kw">unsafe</span> { <span class="self">self</span>.<span class="ident">next_si</span>(<span class="ident">next_si</span>, <span class="ident">text</span>, <span class="ident">at</span>) }; |
| <span class="ident">at</span> <span class="op">+=</span> <span class="number">1</span>; |
| <span class="kw">if</span> <span class="ident">prev_si</span> <span class="op">></span> <span class="ident">STATE_MAX</span> { |
| <span class="ident">mem</span>::<span class="ident">swap</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">prev_si</span>, <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">next_si</span>); |
| <span class="kw">break</span>; |
| } |
| <span class="ident">next_si</span> <span class="op">=</span> <span class="kw">unsafe</span> { <span class="self">self</span>.<span class="ident">next_si</span>(<span class="ident">prev_si</span>, <span class="ident">text</span>, <span class="ident">at</span>) }; |
| <span class="ident">at</span> <span class="op">+=</span> <span class="number">1</span>; |
| } |
| <span class="kw">if</span> <span class="ident">next_si</span> <span class="op">&</span> <span class="ident">STATE_MATCH</span> <span class="op">></span> <span class="number">0</span> { |
| <span class="comment">// A match state is outside of the common case because it needs</span> |
| <span class="comment">// special case analysis. In particular, we need to record the</span> |
| <span class="comment">// last position as having matched and possibly quit the DFA if</span> |
| <span class="comment">// we don't need to keep matching.</span> |
| <span class="ident">next_si</span> <span class="op">&=</span> <span class="op">!</span><span class="ident">STATE_MATCH</span>; |
| <span class="ident">result</span> <span class="op">=</span> <span class="prelude-ty">Result</span>::<span class="ident">Match</span>(<span class="ident">at</span> <span class="op">-</span> <span class="number">1</span>); |
| <span class="kw">if</span> <span class="self">self</span>.<span class="ident">quit_after_match</span> { |
| <span class="kw">return</span> <span class="ident">result</span>; |
| } |
| <span class="self">self</span>.<span class="ident">last_match_si</span> <span class="op">=</span> <span class="ident">next_si</span>; |
| <span class="ident">prev_si</span> <span class="op">=</span> <span class="ident">next_si</span>; |
| |
| <span class="comment">// This permits short-circuiting when matching a regex set.</span> |
| <span class="comment">// In particular, if this DFA state contains only match states,</span> |
| <span class="comment">// then it's impossible to extend the set of matches since</span> |
| <span class="comment">// match states are final. Therefore, we can quit.</span> |
| <span class="kw">if</span> <span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">matches</span>.<span class="ident">len</span>() <span class="op">></span> <span class="number">1</span> { |
| <span class="kw">let</span> <span class="ident">state</span> <span class="op">=</span> <span class="self">self</span>.<span class="ident">state</span>(<span class="ident">next_si</span>); |
| <span class="kw">let</span> <span class="ident">just_matches</span> <span class="op">=</span> <span class="ident">state</span>.<span class="ident">inst_ptrs</span>() |
| .<span class="ident">all</span>(<span class="op">|</span><span class="ident">ip</span><span class="op">|</span> <span class="self">self</span>.<span class="ident">prog</span>[<span class="ident">ip</span>].<span class="ident">is_match</span>()); |
| <span class="kw">if</span> <span class="ident">just_matches</span> { |
| <span class="kw">return</span> <span class="ident">result</span>; |
| } |
| } |
| |
| <span class="comment">// Another inner loop! If the DFA stays in this particular</span> |
| <span class="comment">// match state, then we can rip through all of the input</span> |
| <span class="comment">// very quickly, and only recording the match location once</span> |
| <span class="comment">// we've left this particular state.</span> |
| <span class="kw">let</span> <span class="ident">cur</span> <span class="op">=</span> <span class="ident">at</span>; |
| <span class="kw">while</span> (<span class="ident">next_si</span> <span class="op">&</span> <span class="op">!</span><span class="ident">STATE_MATCH</span>) <span class="op">==</span> <span class="ident">prev_si</span> |
| <span class="op">&&</span> <span class="ident">at</span> <span class="op">+</span> <span class="number">2</span> <span class="op"><</span> <span class="ident">text</span>.<span class="ident">len</span>() { |
| <span class="comment">// Argument for safety is in the definition of next_si.</span> |
| <span class="ident">next_si</span> <span class="op">=</span> <span class="kw">unsafe</span> { |
| <span class="self">self</span>.<span class="ident">next_si</span>(<span class="ident">next_si</span> <span class="op">&</span> <span class="op">!</span><span class="ident">STATE_MATCH</span>, <span class="ident">text</span>, <span class="ident">at</span>) |
| }; |
| <span class="ident">at</span> <span class="op">+=</span> <span class="number">1</span>; |
| } |
| <span class="kw">if</span> <span class="ident">at</span> <span class="op">></span> <span class="ident">cur</span> { |
| <span class="ident">result</span> <span class="op">=</span> <span class="prelude-ty">Result</span>::<span class="ident">Match</span>(<span class="ident">at</span> <span class="op">-</span> <span class="number">2</span>); |
| } |
| } <span class="kw">else</span> <span class="kw">if</span> <span class="ident">next_si</span> <span class="op">&</span> <span class="ident">STATE_START</span> <span class="op">></span> <span class="number">0</span> { |
| <span class="comment">// A start state isn't in the common case because we may</span> |
| <span class="comment">// what to do quick prefix scanning. If the program doesn't</span> |
| <span class="comment">// have a detected prefix, then start states are actually</span> |
| <span class="comment">// considered common and this case is never reached.</span> |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="self">self</span>.<span class="ident">has_prefix</span>()); |
| <span class="ident">next_si</span> <span class="op">&=</span> <span class="op">!</span><span class="ident">STATE_START</span>; |
| <span class="ident">prev_si</span> <span class="op">=</span> <span class="ident">next_si</span>; |
| <span class="ident">at</span> <span class="op">=</span> <span class="kw">match</span> <span class="self">self</span>.<span class="ident">prefix_at</span>(<span class="ident">text</span>, <span class="ident">at</span>) { |
| <span class="prelude-val">None</span> <span class="op">=></span> <span class="kw">return</span> <span class="prelude-ty">Result</span>::<span class="ident">NoMatch</span>(<span class="ident">text</span>.<span class="ident">len</span>()), |
| <span class="prelude-val">Some</span>(<span class="ident">i</span>) <span class="op">=></span> <span class="ident">i</span>, |
| }; |
| } <span class="kw">else</span> <span class="kw">if</span> <span class="ident">next_si</span> <span class="op">>=</span> <span class="ident">STATE_UNKNOWN</span> { |
| <span class="kw">if</span> <span class="ident">next_si</span> <span class="op">==</span> <span class="ident">STATE_QUIT</span> { |
| <span class="kw">return</span> <span class="prelude-ty">Result</span>::<span class="ident">Quit</span>; |
| } |
| <span class="comment">// Finally, this corresponds to the case where the transition</span> |
| <span class="comment">// entered a state that can never lead to a match or a state</span> |
| <span class="comment">// that hasn't been computed yet. The latter being the "slow"</span> |
| <span class="comment">// path.</span> |
| <span class="kw">let</span> <span class="ident">byte</span> <span class="op">=</span> <span class="ident">Byte</span>::<span class="ident">byte</span>(<span class="ident">text</span>[<span class="ident">at</span> <span class="op">-</span> <span class="number">1</span>]); |
| <span class="comment">// We no longer care about the special bits in the state</span> |
| <span class="comment">// pointer.</span> |
| <span class="ident">prev_si</span> <span class="op">&=</span> <span class="ident">STATE_MAX</span>; |
| <span class="comment">// Record where we are. This is used to track progress for</span> |
| <span class="comment">// determining whether we should quit if we've flushed the</span> |
| <span class="comment">// cache too much.</span> |
| <span class="self">self</span>.<span class="ident">at</span> <span class="op">=</span> <span class="ident">at</span>; |
| <span class="ident">next_si</span> <span class="op">=</span> <span class="kw">match</span> <span class="self">self</span>.<span class="ident">next_state</span>(<span class="ident">qcur</span>, <span class="ident">qnext</span>, <span class="ident">prev_si</span>, <span class="ident">byte</span>) { |
| <span class="prelude-val">None</span> <span class="op">=></span> <span class="kw">return</span> <span class="prelude-ty">Result</span>::<span class="ident">Quit</span>, |
| <span class="prelude-val">Some</span>(<span class="ident">STATE_DEAD</span>) <span class="op">=></span> <span class="kw">return</span> <span class="ident">result</span>.<span class="ident">set_non_match</span>(<span class="ident">at</span>), |
| <span class="prelude-val">Some</span>(<span class="ident">si</span>) <span class="op">=></span> <span class="ident">si</span>, |
| }; |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="ident">next_si</span> <span class="op">!=</span> <span class="ident">STATE_UNKNOWN</span>); |
| <span class="kw">if</span> <span class="ident">next_si</span> <span class="op">&</span> <span class="ident">STATE_MATCH</span> <span class="op">></span> <span class="number">0</span> { |
| <span class="ident">next_si</span> <span class="op">&=</span> <span class="op">!</span><span class="ident">STATE_MATCH</span>; |
| <span class="ident">result</span> <span class="op">=</span> <span class="prelude-ty">Result</span>::<span class="ident">Match</span>(<span class="ident">at</span> <span class="op">-</span> <span class="number">1</span>); |
| <span class="kw">if</span> <span class="self">self</span>.<span class="ident">quit_after_match</span> { |
| <span class="kw">return</span> <span class="ident">result</span>; |
| } |
| <span class="self">self</span>.<span class="ident">last_match_si</span> <span class="op">=</span> <span class="ident">next_si</span>; |
| } |
| <span class="ident">prev_si</span> <span class="op">=</span> <span class="ident">next_si</span>; |
| } <span class="kw">else</span> { |
| <span class="ident">prev_si</span> <span class="op">=</span> <span class="ident">next_si</span>; |
| } |
| } |
| |
| <span class="comment">// Run the DFA once more on the special EOF senitnel value.</span> |
| <span class="comment">// We don't care about the special bits in the state pointer any more,</span> |
| <span class="comment">// so get rid of them.</span> |
| <span class="ident">prev_si</span> <span class="op">&=</span> <span class="ident">STATE_MAX</span>; |
| <span class="ident">prev_si</span> <span class="op">=</span> <span class="kw">match</span> <span class="self">self</span>.<span class="ident">next_state</span>(<span class="ident">qcur</span>, <span class="ident">qnext</span>, <span class="ident">prev_si</span>, <span class="ident">Byte</span>::<span class="ident">eof</span>()) { |
| <span class="prelude-val">None</span> <span class="op">=></span> <span class="kw">return</span> <span class="prelude-ty">Result</span>::<span class="ident">Quit</span>, |
| <span class="prelude-val">Some</span>(<span class="ident">STATE_DEAD</span>) <span class="op">=></span> <span class="kw">return</span> <span class="ident">result</span>.<span class="ident">set_non_match</span>(<span class="ident">text</span>.<span class="ident">len</span>()), |
| <span class="prelude-val">Some</span>(<span class="ident">si</span>) <span class="op">=></span> <span class="ident">si</span> <span class="op">&</span> <span class="op">!</span><span class="ident">STATE_START</span>, |
| }; |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="ident">prev_si</span> <span class="op">!=</span> <span class="ident">STATE_UNKNOWN</span>); |
| <span class="kw">if</span> <span class="ident">prev_si</span> <span class="op">&</span> <span class="ident">STATE_MATCH</span> <span class="op">></span> <span class="number">0</span> { |
| <span class="ident">prev_si</span> <span class="op">&=</span> <span class="op">!</span><span class="ident">STATE_MATCH</span>; |
| <span class="self">self</span>.<span class="ident">last_match_si</span> <span class="op">=</span> <span class="ident">prev_si</span>; |
| <span class="ident">result</span> <span class="op">=</span> <span class="prelude-ty">Result</span>::<span class="ident">Match</span>(<span class="ident">text</span>.<span class="ident">len</span>()); |
| } |
| <span class="ident">result</span> |
| } |
| |
| <span class="doccomment">/// Executes the DFA on a reverse NFA.</span> |
| <span class="attribute">#[<span class="ident">inline</span>(<span class="ident">always</span>)]</span> <span class="comment">// reduces constant overhead</span> |
| <span class="kw">fn</span> <span class="ident">exec_at_reverse</span>( |
| <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>, |
| <span class="ident">qcur</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">SparseSet</span>, |
| <span class="ident">qnext</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">SparseSet</span>, |
| <span class="ident">text</span>: <span class="kw-2">&</span>[<span class="ident">u8</span>], |
| ) <span class="op">-></span> <span class="prelude-ty">Result</span><span class="op"><</span><span class="ident">usize</span><span class="op">></span> { |
| <span class="comment">// The comments in `exec_at` above mostly apply here too. The main</span> |
| <span class="comment">// difference is that we move backwards over the input and we look for</span> |
| <span class="comment">// the longest possible match instead of the leftmost-first match.</span> |
| <span class="comment">//</span> |
| <span class="comment">// N.B. The code duplication here is regrettable. Efforts to improve</span> |
| <span class="comment">// it without sacrificing performance are welcome. ---AG</span> |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">is_reverse</span>); |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">result</span> <span class="op">=</span> <span class="prelude-ty">Result</span>::<span class="ident">NoMatch</span>(<span class="self">self</span>.<span class="ident">at</span>); |
| <span class="kw">let</span> (<span class="kw-2">mut</span> <span class="ident">prev_si</span>, <span class="kw-2">mut</span> <span class="ident">next_si</span>) <span class="op">=</span> (<span class="self">self</span>.<span class="ident">start</span>, <span class="self">self</span>.<span class="ident">start</span>); |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">at</span> <span class="op">=</span> <span class="self">self</span>.<span class="ident">at</span>; |
| <span class="kw">while</span> <span class="ident">at</span> <span class="op">></span> <span class="number">0</span> { |
| <span class="kw">while</span> <span class="ident">next_si</span> <span class="op"><=</span> <span class="ident">STATE_MAX</span> <span class="op">&&</span> <span class="ident">at</span> <span class="op">></span> <span class="number">0</span> { |
| <span class="comment">// Argument for safety is in the definition of next_si.</span> |
| <span class="ident">at</span> <span class="op">-=</span> <span class="number">1</span>; |
| <span class="ident">prev_si</span> <span class="op">=</span> <span class="kw">unsafe</span> { <span class="self">self</span>.<span class="ident">next_si</span>(<span class="ident">next_si</span>, <span class="ident">text</span>, <span class="ident">at</span>) }; |
| <span class="kw">if</span> <span class="ident">prev_si</span> <span class="op">></span> <span class="ident">STATE_MAX</span> <span class="op">||</span> <span class="ident">at</span> <span class="op"><=</span> <span class="number">4</span> { |
| <span class="ident">mem</span>::<span class="ident">swap</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">prev_si</span>, <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">next_si</span>); |
| <span class="kw">break</span>; |
| } |
| <span class="ident">at</span> <span class="op">-=</span> <span class="number">1</span>; |
| <span class="ident">next_si</span> <span class="op">=</span> <span class="kw">unsafe</span> { <span class="self">self</span>.<span class="ident">next_si</span>(<span class="ident">prev_si</span>, <span class="ident">text</span>, <span class="ident">at</span>) }; |
| <span class="kw">if</span> <span class="ident">next_si</span> <span class="op">></span> <span class="ident">STATE_MAX</span> { |
| <span class="kw">break</span>; |
| } |
| <span class="ident">at</span> <span class="op">-=</span> <span class="number">1</span>; |
| <span class="ident">prev_si</span> <span class="op">=</span> <span class="kw">unsafe</span> { <span class="self">self</span>.<span class="ident">next_si</span>(<span class="ident">next_si</span>, <span class="ident">text</span>, <span class="ident">at</span>) }; |
| <span class="kw">if</span> <span class="ident">prev_si</span> <span class="op">></span> <span class="ident">STATE_MAX</span> { |
| <span class="ident">mem</span>::<span class="ident">swap</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">prev_si</span>, <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">next_si</span>); |
| <span class="kw">break</span>; |
| } |
| <span class="ident">at</span> <span class="op">-=</span> <span class="number">1</span>; |
| <span class="ident">next_si</span> <span class="op">=</span> <span class="kw">unsafe</span> { <span class="self">self</span>.<span class="ident">next_si</span>(<span class="ident">prev_si</span>, <span class="ident">text</span>, <span class="ident">at</span>) }; |
| } |
| <span class="kw">if</span> <span class="ident">next_si</span> <span class="op">&</span> <span class="ident">STATE_MATCH</span> <span class="op">></span> <span class="number">0</span> { |
| <span class="ident">next_si</span> <span class="op">&=</span> <span class="op">!</span><span class="ident">STATE_MATCH</span>; |
| <span class="ident">result</span> <span class="op">=</span> <span class="prelude-ty">Result</span>::<span class="ident">Match</span>(<span class="ident">at</span> <span class="op">+</span> <span class="number">1</span>); |
| <span class="kw">if</span> <span class="self">self</span>.<span class="ident">quit_after_match</span> { |
| <span class="kw">return</span> <span class="ident">result</span> |
| } |
| <span class="self">self</span>.<span class="ident">last_match_si</span> <span class="op">=</span> <span class="ident">next_si</span>; |
| <span class="ident">prev_si</span> <span class="op">=</span> <span class="ident">next_si</span>; |
| <span class="kw">let</span> <span class="ident">cur</span> <span class="op">=</span> <span class="ident">at</span>; |
| <span class="kw">while</span> (<span class="ident">next_si</span> <span class="op">&</span> <span class="op">!</span><span class="ident">STATE_MATCH</span>) <span class="op">==</span> <span class="ident">prev_si</span> <span class="op">&&</span> <span class="ident">at</span> <span class="op">>=</span> <span class="number">2</span> { |
| <span class="comment">// Argument for safety is in the definition of next_si.</span> |
| <span class="ident">at</span> <span class="op">-=</span> <span class="number">1</span>; |
| <span class="ident">next_si</span> <span class="op">=</span> <span class="kw">unsafe</span> { |
| <span class="self">self</span>.<span class="ident">next_si</span>(<span class="ident">next_si</span> <span class="op">&</span> <span class="op">!</span><span class="ident">STATE_MATCH</span>, <span class="ident">text</span>, <span class="ident">at</span>) |
| }; |
| } |
| <span class="kw">if</span> <span class="ident">at</span> <span class="op"><</span> <span class="ident">cur</span> { |
| <span class="ident">result</span> <span class="op">=</span> <span class="prelude-ty">Result</span>::<span class="ident">Match</span>(<span class="ident">at</span> <span class="op">+</span> <span class="number">2</span>); |
| } |
| } <span class="kw">else</span> <span class="kw">if</span> <span class="ident">next_si</span> <span class="op">>=</span> <span class="ident">STATE_UNKNOWN</span> { |
| <span class="kw">if</span> <span class="ident">next_si</span> <span class="op">==</span> <span class="ident">STATE_QUIT</span> { |
| <span class="kw">return</span> <span class="prelude-ty">Result</span>::<span class="ident">Quit</span>; |
| } |
| <span class="kw">let</span> <span class="ident">byte</span> <span class="op">=</span> <span class="ident">Byte</span>::<span class="ident">byte</span>(<span class="ident">text</span>[<span class="ident">at</span>]); |
| <span class="ident">prev_si</span> <span class="op">&=</span> <span class="ident">STATE_MAX</span>; |
| <span class="self">self</span>.<span class="ident">at</span> <span class="op">=</span> <span class="ident">at</span>; |
| <span class="ident">next_si</span> <span class="op">=</span> <span class="kw">match</span> <span class="self">self</span>.<span class="ident">next_state</span>(<span class="ident">qcur</span>, <span class="ident">qnext</span>, <span class="ident">prev_si</span>, <span class="ident">byte</span>) { |
| <span class="prelude-val">None</span> <span class="op">=></span> <span class="kw">return</span> <span class="prelude-ty">Result</span>::<span class="ident">Quit</span>, |
| <span class="prelude-val">Some</span>(<span class="ident">STATE_DEAD</span>) <span class="op">=></span> <span class="kw">return</span> <span class="ident">result</span>.<span class="ident">set_non_match</span>(<span class="ident">at</span>), |
| <span class="prelude-val">Some</span>(<span class="ident">si</span>) <span class="op">=></span> <span class="ident">si</span>, |
| }; |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="ident">next_si</span> <span class="op">!=</span> <span class="ident">STATE_UNKNOWN</span>); |
| <span class="kw">if</span> <span class="ident">next_si</span> <span class="op">&</span> <span class="ident">STATE_MATCH</span> <span class="op">></span> <span class="number">0</span> { |
| <span class="ident">next_si</span> <span class="op">&=</span> <span class="op">!</span><span class="ident">STATE_MATCH</span>; |
| <span class="ident">result</span> <span class="op">=</span> <span class="prelude-ty">Result</span>::<span class="ident">Match</span>(<span class="ident">at</span> <span class="op">+</span> <span class="number">1</span>); |
| <span class="kw">if</span> <span class="self">self</span>.<span class="ident">quit_after_match</span> { |
| <span class="kw">return</span> <span class="ident">result</span>; |
| } |
| <span class="self">self</span>.<span class="ident">last_match_si</span> <span class="op">=</span> <span class="ident">next_si</span>; |
| } |
| <span class="ident">prev_si</span> <span class="op">=</span> <span class="ident">next_si</span>; |
| } <span class="kw">else</span> { |
| <span class="ident">prev_si</span> <span class="op">=</span> <span class="ident">next_si</span>; |
| } |
| } |
| |
| <span class="comment">// Run the DFA once more on the special EOF senitnel value.</span> |
| <span class="ident">prev_si</span> <span class="op">=</span> <span class="kw">match</span> <span class="self">self</span>.<span class="ident">next_state</span>(<span class="ident">qcur</span>, <span class="ident">qnext</span>, <span class="ident">prev_si</span>, <span class="ident">Byte</span>::<span class="ident">eof</span>()) { |
| <span class="prelude-val">None</span> <span class="op">=></span> <span class="kw">return</span> <span class="prelude-ty">Result</span>::<span class="ident">Quit</span>, |
| <span class="prelude-val">Some</span>(<span class="ident">STATE_DEAD</span>) <span class="op">=></span> <span class="kw">return</span> <span class="ident">result</span>.<span class="ident">set_non_match</span>(<span class="number">0</span>), |
| <span class="prelude-val">Some</span>(<span class="ident">si</span>) <span class="op">=></span> <span class="ident">si</span>, |
| }; |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="ident">prev_si</span> <span class="op">!=</span> <span class="ident">STATE_UNKNOWN</span>); |
| <span class="kw">if</span> <span class="ident">prev_si</span> <span class="op">&</span> <span class="ident">STATE_MATCH</span> <span class="op">></span> <span class="number">0</span> { |
| <span class="ident">prev_si</span> <span class="op">&=</span> <span class="op">!</span><span class="ident">STATE_MATCH</span>; |
| <span class="self">self</span>.<span class="ident">last_match_si</span> <span class="op">=</span> <span class="ident">prev_si</span>; |
| <span class="ident">result</span> <span class="op">=</span> <span class="prelude-ty">Result</span>::<span class="ident">Match</span>(<span class="number">0</span>); |
| } |
| <span class="ident">result</span> |
| } |
| |
| <span class="doccomment">/// next_si transitions to the next state, where the transition input</span> |
| <span class="doccomment">/// corresponds to text[i].</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// This elides bounds checks, and is therefore unsafe.</span> |
| <span class="attribute">#[<span class="ident">inline</span>(<span class="ident">always</span>)]</span> |
| <span class="kw">unsafe</span> <span class="kw">fn</span> <span class="ident">next_si</span>(<span class="kw-2">&</span><span class="self">self</span>, <span class="ident">si</span>: <span class="ident">StatePtr</span>, <span class="ident">text</span>: <span class="kw-2">&</span>[<span class="ident">u8</span>], <span class="ident">i</span>: <span class="ident">usize</span>) <span class="op">-></span> <span class="ident">StatePtr</span> { |
| <span class="comment">// What is the argument for safety here?</span> |
| <span class="comment">// We have three unchecked accesses that could possibly violate safety:</span> |
| <span class="comment">//</span> |
| <span class="comment">// 1. The given byte of input (`text[i]`).</span> |
| <span class="comment">// 2. The class of the byte of input (`classes[text[i]]`).</span> |
| <span class="comment">// 3. The transition for the class (`trans[si + cls]`).</span> |
| <span class="comment">//</span> |
| <span class="comment">// (1) is only safe when calling next_si is guarded by</span> |
| <span class="comment">// `i < text.len()`.</span> |
| <span class="comment">//</span> |
| <span class="comment">// (2) is the easiest case to guarantee since `text[i]` is always a</span> |
| <span class="comment">// `u8` and `self.prog.byte_classes` always has length `u8::MAX`.</span> |
| <span class="comment">// (See `ByteClassSet.byte_classes` in `compile.rs`.)</span> |
| <span class="comment">//</span> |
| <span class="comment">// (3) is only safe if (1)+(2) are safe. Namely, the transitions</span> |
| <span class="comment">// of every state are defined to have length equal to the number of</span> |
| <span class="comment">// byte classes in the program. Therefore, a valid class leads to a</span> |
| <span class="comment">// valid transition. (All possible transitions are valid lookups, even</span> |
| <span class="comment">// if it points to a state that hasn't been computed yet.) (3) also</span> |
| <span class="comment">// relies on `si` being correct, but StatePtrs should only ever be</span> |
| <span class="comment">// retrieved from the transition table, which ensures they are correct.</span> |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="ident">i</span> <span class="op"><</span> <span class="ident">text</span>.<span class="ident">len</span>()); |
| <span class="kw">let</span> <span class="ident">b</span> <span class="op">=</span> <span class="kw-2">*</span><span class="ident">text</span>.<span class="ident">get_unchecked</span>(<span class="ident">i</span>); |
| <span class="macro">debug_assert</span><span class="macro">!</span>((<span class="ident">b</span> <span class="kw">as</span> <span class="ident">usize</span>) <span class="op"><</span> <span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">byte_classes</span>.<span class="ident">len</span>()); |
| <span class="kw">let</span> <span class="ident">cls</span> <span class="op">=</span> <span class="kw-2">*</span><span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">byte_classes</span>.<span class="ident">get_unchecked</span>(<span class="ident">b</span> <span class="kw">as</span> <span class="ident">usize</span>); |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">trans</span>.<span class="ident">next_unchecked</span>(<span class="ident">si</span>, <span class="ident">cls</span> <span class="kw">as</span> <span class="ident">usize</span>) |
| } |
| |
| <span class="doccomment">/// Computes the next state given the current state and the current input</span> |
| <span class="doccomment">/// byte (which may be EOF).</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// If STATE_DEAD is returned, then there is no valid state transition.</span> |
| <span class="doccomment">/// This implies that no permutation of future input can lead to a match</span> |
| <span class="doccomment">/// state.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// STATE_UNKNOWN can never be returned.</span> |
| <span class="kw">fn</span> <span class="ident">exec_byte</span>( |
| <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>, |
| <span class="ident">qcur</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">SparseSet</span>, |
| <span class="ident">qnext</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">SparseSet</span>, |
| <span class="kw-2">mut</span> <span class="ident">si</span>: <span class="ident">StatePtr</span>, |
| <span class="ident">b</span>: <span class="ident">Byte</span>, |
| ) <span class="op">-></span> <span class="prelude-ty">Option</span><span class="op"><</span><span class="ident">StatePtr</span><span class="op">></span> { |
| <span class="kw">use</span> <span class="ident">prog</span>::<span class="ident">Inst</span>::<span class="kw-2">*</span>; |
| |
| <span class="comment">// Initialize a queue with the current DFA state's NFA states.</span> |
| <span class="ident">qcur</span>.<span class="ident">clear</span>(); |
| <span class="kw">for</span> <span class="ident">ip</span> <span class="kw">in</span> <span class="self">self</span>.<span class="ident">state</span>(<span class="ident">si</span>).<span class="ident">inst_ptrs</span>() { |
| <span class="ident">qcur</span>.<span class="ident">insert</span>(<span class="ident">ip</span>); |
| } |
| |
| <span class="comment">// Before inspecting the current byte, we may need to also inspect</span> |
| <span class="comment">// whether the position immediately preceding the current byte</span> |
| <span class="comment">// satisfies the empty assertions found in the current state.</span> |
| <span class="comment">//</span> |
| <span class="comment">// We only need to do this step if there are any empty assertions in</span> |
| <span class="comment">// the current state.</span> |
| <span class="kw">let</span> <span class="ident">is_word_last</span> <span class="op">=</span> <span class="self">self</span>.<span class="ident">state</span>(<span class="ident">si</span>).<span class="ident">flags</span>().<span class="ident">is_word</span>(); |
| <span class="kw">let</span> <span class="ident">is_word</span> <span class="op">=</span> <span class="ident">b</span>.<span class="ident">is_ascii_word</span>(); |
| <span class="kw">if</span> <span class="self">self</span>.<span class="ident">state</span>(<span class="ident">si</span>).<span class="ident">flags</span>().<span class="ident">has_empty</span>() { |
| <span class="comment">// Compute the flags immediately preceding the current byte.</span> |
| <span class="comment">// This means we only care about the "end" or "end line" flags.</span> |
| <span class="comment">// (The "start" flags are computed immediately proceding the</span> |
| <span class="comment">// current byte and is handled below.)</span> |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">flags</span> <span class="op">=</span> <span class="ident">EmptyFlags</span>::<span class="ident">default</span>(); |
| <span class="kw">if</span> <span class="ident">b</span>.<span class="ident">is_eof</span>() { |
| <span class="ident">flags</span>.<span class="ident">end</span> <span class="op">=</span> <span class="bool-val">true</span>; |
| <span class="ident">flags</span>.<span class="ident">end_line</span> <span class="op">=</span> <span class="bool-val">true</span>; |
| } <span class="kw">else</span> <span class="kw">if</span> <span class="ident">b</span>.<span class="ident">as_byte</span>().<span class="ident">map_or</span>(<span class="bool-val">false</span>, <span class="op">|</span><span class="ident">b</span><span class="op">|</span> <span class="ident">b</span> <span class="op">==</span> <span class="string">b'\n'</span>) { |
| <span class="ident">flags</span>.<span class="ident">end_line</span> <span class="op">=</span> <span class="bool-val">true</span>; |
| } |
| <span class="kw">if</span> <span class="ident">is_word_last</span> <span class="op">==</span> <span class="ident">is_word</span> { |
| <span class="ident">flags</span>.<span class="ident">not_word_boundary</span> <span class="op">=</span> <span class="bool-val">true</span>; |
| } <span class="kw">else</span> { |
| <span class="ident">flags</span>.<span class="ident">word_boundary</span> <span class="op">=</span> <span class="bool-val">true</span>; |
| } |
| <span class="comment">// Now follow epsilon transitions from every NFA state, but make</span> |
| <span class="comment">// sure we only follow transitions that satisfy our flags.</span> |
| <span class="ident">qnext</span>.<span class="ident">clear</span>(); |
| <span class="kw">for</span> <span class="kw-2">&</span><span class="ident">ip</span> <span class="kw">in</span> <span class="kw-2">&</span><span class="kw-2">*</span><span class="ident">qcur</span> { |
| <span class="self">self</span>.<span class="ident">follow_epsilons</span>(<span class="ident">usize_to_u32</span>(<span class="ident">ip</span>), <span class="ident">qnext</span>, <span class="ident">flags</span>); |
| } |
| <span class="ident">mem</span>::<span class="ident">swap</span>(<span class="ident">qcur</span>, <span class="ident">qnext</span>); |
| } |
| |
| <span class="comment">// Now we set flags for immediately after the current byte. Since start</span> |
| <span class="comment">// states are processed separately, and are the only states that can</span> |
| <span class="comment">// have the StartText flag set, we therefore only need to worry about</span> |
| <span class="comment">// the StartLine flag here.</span> |
| <span class="comment">//</span> |
| <span class="comment">// We do also keep track of whether this DFA state contains a NFA state</span> |
| <span class="comment">// that is a matching state. This is precisely how we delay the DFA</span> |
| <span class="comment">// matching by one byte in order to process the special EOF sentinel</span> |
| <span class="comment">// byte. Namely, if this DFA state containing a matching NFA state,</span> |
| <span class="comment">// then it is the *next* DFA state that is marked as a match.</span> |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">empty_flags</span> <span class="op">=</span> <span class="ident">EmptyFlags</span>::<span class="ident">default</span>(); |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">state_flags</span> <span class="op">=</span> <span class="ident">StateFlags</span>::<span class="ident">default</span>(); |
| <span class="ident">empty_flags</span>.<span class="ident">start_line</span> <span class="op">=</span> <span class="ident">b</span>.<span class="ident">as_byte</span>().<span class="ident">map_or</span>(<span class="bool-val">false</span>, <span class="op">|</span><span class="ident">b</span><span class="op">|</span> <span class="ident">b</span> <span class="op">==</span> <span class="string">b'\n'</span>); |
| <span class="kw">if</span> <span class="ident">b</span>.<span class="ident">is_ascii_word</span>() { |
| <span class="ident">state_flags</span>.<span class="ident">set_word</span>(); |
| } |
| <span class="comment">// Now follow all epsilon transitions again, but only after consuming</span> |
| <span class="comment">// the current byte.</span> |
| <span class="ident">qnext</span>.<span class="ident">clear</span>(); |
| <span class="kw">for</span> <span class="kw-2">&</span><span class="ident">ip</span> <span class="kw">in</span> <span class="kw-2">&</span><span class="kw-2">*</span><span class="ident">qcur</span> { |
| <span class="kw">match</span> <span class="self">self</span>.<span class="ident">prog</span>[<span class="ident">ip</span> <span class="kw">as</span> <span class="ident">usize</span>] { |
| <span class="comment">// These states never happen in a byte-based program.</span> |
| <span class="ident">Char</span>(_) <span class="op">|</span> <span class="ident">Ranges</span>(_) <span class="op">=></span> <span class="macro">unreachable</span><span class="macro">!</span>(), |
| <span class="comment">// These states are handled when following epsilon transitions.</span> |
| <span class="ident">Save</span>(_) <span class="op">|</span> <span class="ident">Split</span>(_) <span class="op">|</span> <span class="ident">EmptyLook</span>(_) <span class="op">=></span> {} |
| <span class="ident">Match</span>(_) <span class="op">=></span> { |
| <span class="ident">state_flags</span>.<span class="ident">set_match</span>(); |
| <span class="kw">if</span> <span class="op">!</span><span class="self">self</span>.<span class="ident">continue_past_first_match</span>() { |
| <span class="kw">break</span>; |
| } <span class="kw">else</span> <span class="kw">if</span> <span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">matches</span>.<span class="ident">len</span>() <span class="op">></span> <span class="number">1</span> |
| <span class="op">&&</span> <span class="op">!</span><span class="ident">qnext</span>.<span class="ident">contains</span>(<span class="ident">ip</span> <span class="kw">as</span> <span class="ident">usize</span>) { |
| <span class="comment">// If we are continuing on to find other matches,</span> |
| <span class="comment">// then keep a record of the match states we've seen.</span> |
| <span class="ident">qnext</span>.<span class="ident">insert</span>(<span class="ident">ip</span>); |
| } |
| } |
| <span class="ident">Bytes</span>(<span class="kw-2">ref</span> <span class="ident">inst</span>) <span class="op">=></span> { |
| <span class="kw">if</span> <span class="ident">b</span>.<span class="ident">as_byte</span>().<span class="ident">map_or</span>(<span class="bool-val">false</span>, <span class="op">|</span><span class="ident">b</span><span class="op">|</span> <span class="ident">inst</span>.<span class="ident">matches</span>(<span class="ident">b</span>)) { |
| <span class="self">self</span>.<span class="ident">follow_epsilons</span>( |
| <span class="ident">inst</span>.<span class="ident">goto</span> <span class="kw">as</span> <span class="ident">InstPtr</span>, <span class="ident">qnext</span>, <span class="ident">empty_flags</span>); |
| } |
| } |
| } |
| } |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">cache</span> <span class="op">=</span> <span class="bool-val">true</span>; |
| <span class="kw">if</span> <span class="ident">b</span>.<span class="ident">is_eof</span>() <span class="op">&&</span> <span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">matches</span>.<span class="ident">len</span>() <span class="op">></span> <span class="number">1</span> { |
| <span class="comment">// If we're processing the last byte of the input and we're</span> |
| <span class="comment">// matching a regex set, then make the next state contain the</span> |
| <span class="comment">// previous states transitions. We do this so that the main</span> |
| <span class="comment">// matching loop can extract all of the match instructions.</span> |
| <span class="ident">mem</span>::<span class="ident">swap</span>(<span class="ident">qcur</span>, <span class="ident">qnext</span>); |
| <span class="comment">// And don't cache this state because it's totally bunk.</span> |
| <span class="ident">cache</span> <span class="op">=</span> <span class="bool-val">false</span>; |
| } |
| <span class="comment">// We've now built up the set of NFA states that ought to comprise the</span> |
| <span class="comment">// next DFA state, so try to find it in the cache, and if it doesn't</span> |
| <span class="comment">// exist, cache it.</span> |
| <span class="comment">//</span> |
| <span class="comment">// N.B. We pass `&mut si` here because the cache may clear itself if</span> |
| <span class="comment">// it has gotten too full. When that happens, the location of the</span> |
| <span class="comment">// current state may change.</span> |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">next</span> <span class="op">=</span> <span class="kw">match</span> <span class="self">self</span>.<span class="ident">cached_state</span>( |
| <span class="ident">qnext</span>, |
| <span class="ident">state_flags</span>, |
| <span class="prelude-val">Some</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">si</span>), |
| ) { |
| <span class="prelude-val">None</span> <span class="op">=></span> <span class="kw">return</span> <span class="prelude-val">None</span>, |
| <span class="prelude-val">Some</span>(<span class="ident">next</span>) <span class="op">=></span> <span class="ident">next</span>, |
| }; |
| <span class="kw">if</span> (<span class="self">self</span>.<span class="ident">start</span> <span class="op">&</span> <span class="op">!</span><span class="ident">STATE_START</span>) <span class="op">==</span> <span class="ident">next</span> { |
| <span class="comment">// Start states can never be match states since all matches are</span> |
| <span class="comment">// delayed by one byte.</span> |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="op">!</span><span class="self">self</span>.<span class="ident">state</span>(<span class="ident">next</span>).<span class="ident">flags</span>().<span class="ident">is_match</span>()); |
| <span class="ident">next</span> <span class="op">=</span> <span class="self">self</span>.<span class="ident">start_ptr</span>(<span class="ident">next</span>); |
| } |
| <span class="kw">if</span> <span class="ident">next</span> <span class="op"><=</span> <span class="ident">STATE_MAX</span> <span class="op">&&</span> <span class="self">self</span>.<span class="ident">state</span>(<span class="ident">next</span>).<span class="ident">flags</span>().<span class="ident">is_match</span>() { |
| <span class="ident">next</span> <span class="op">=</span> <span class="ident">STATE_MATCH</span> <span class="op">|</span> <span class="ident">next</span>; |
| } |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="ident">next</span> <span class="op">!=</span> <span class="ident">STATE_UNKNOWN</span>); |
| <span class="comment">// And now store our state in the current state's next list.</span> |
| <span class="kw">if</span> <span class="ident">cache</span> { |
| <span class="kw">let</span> <span class="ident">cls</span> <span class="op">=</span> <span class="self">self</span>.<span class="ident">byte_class</span>(<span class="ident">b</span>); |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">trans</span>.<span class="ident">set_next</span>(<span class="ident">si</span>, <span class="ident">cls</span>, <span class="ident">next</span>); |
| } |
| <span class="prelude-val">Some</span>(<span class="ident">next</span>) |
| } |
| |
| <span class="doccomment">/// Follows the epsilon transitions starting at (and including) `ip`. The</span> |
| <span class="doccomment">/// resulting states are inserted into the ordered set `q`.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// Conditional epsilon transitions (i.e., empty width assertions) are only</span> |
| <span class="doccomment">/// followed if they are satisfied by the given flags, which should</span> |
| <span class="doccomment">/// represent the flags set at the current location in the input.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// If the current location corresponds to the empty string, then only the</span> |
| <span class="doccomment">/// end line and/or end text flags may be set. If the current location</span> |
| <span class="doccomment">/// corresponds to a real byte in the input, then only the start line</span> |
| <span class="doccomment">/// and/or start text flags may be set.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// As an exception to the above, when finding the initial state, any of</span> |
| <span class="doccomment">/// the above flags may be set:</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// If matching starts at the beginning of the input, then start text and</span> |
| <span class="doccomment">/// start line should be set. If the input is empty, then end text and end</span> |
| <span class="doccomment">/// line should also be set.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// If matching starts after the beginning of the input, then only start</span> |
| <span class="doccomment">/// line should be set if the preceding byte is `\n`. End line should never</span> |
| <span class="doccomment">/// be set in this case. (Even if the proceding byte is a `\n`, it will</span> |
| <span class="doccomment">/// be handled in a subsequent DFA state.)</span> |
| <span class="kw">fn</span> <span class="ident">follow_epsilons</span>( |
| <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>, |
| <span class="ident">ip</span>: <span class="ident">InstPtr</span>, |
| <span class="ident">q</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">SparseSet</span>, |
| <span class="ident">flags</span>: <span class="ident">EmptyFlags</span>, |
| ) { |
| <span class="kw">use</span> <span class="ident">prog</span>::<span class="ident">Inst</span>::<span class="kw-2">*</span>; |
| <span class="kw">use</span> <span class="ident">prog</span>::<span class="ident">EmptyLook</span>::<span class="kw-2">*</span>; |
| |
| <span class="comment">// We need to traverse the NFA to follow epsilon transitions, so avoid</span> |
| <span class="comment">// recursion with an explicit stack.</span> |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">stack</span>.<span class="ident">push</span>(<span class="ident">ip</span>); |
| <span class="kw">while</span> <span class="kw">let</span> <span class="prelude-val">Some</span>(<span class="ident">ip</span>) <span class="op">=</span> <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">stack</span>.<span class="ident">pop</span>() { |
| <span class="comment">// Don't visit states we've already added.</span> |
| <span class="kw">if</span> <span class="ident">q</span>.<span class="ident">contains</span>(<span class="ident">ip</span> <span class="kw">as</span> <span class="ident">usize</span>) { |
| <span class="kw">continue</span>; |
| } |
| <span class="ident">q</span>.<span class="ident">insert</span>(<span class="ident">ip</span> <span class="kw">as</span> <span class="ident">usize</span>); |
| <span class="kw">match</span> <span class="self">self</span>.<span class="ident">prog</span>[<span class="ident">ip</span> <span class="kw">as</span> <span class="ident">usize</span>] { |
| <span class="ident">Char</span>(_) <span class="op">|</span> <span class="ident">Ranges</span>(_) <span class="op">=></span> <span class="macro">unreachable</span><span class="macro">!</span>(), |
| <span class="ident">Match</span>(_) <span class="op">|</span> <span class="ident">Bytes</span>(_) <span class="op">=></span> {} |
| <span class="ident">EmptyLook</span>(<span class="kw-2">ref</span> <span class="ident">inst</span>) <span class="op">=></span> { |
| <span class="comment">// Only follow empty assertion states if our flags satisfy</span> |
| <span class="comment">// the assertion.</span> |
| <span class="kw">match</span> <span class="ident">inst</span>.<span class="ident">look</span> { |
| <span class="ident">StartLine</span> <span class="kw">if</span> <span class="ident">flags</span>.<span class="ident">start_line</span> <span class="op">=></span> { |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">stack</span>.<span class="ident">push</span>(<span class="ident">inst</span>.<span class="ident">goto</span> <span class="kw">as</span> <span class="ident">InstPtr</span>); |
| } |
| <span class="ident">EndLine</span> <span class="kw">if</span> <span class="ident">flags</span>.<span class="ident">end_line</span> <span class="op">=></span> { |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">stack</span>.<span class="ident">push</span>(<span class="ident">inst</span>.<span class="ident">goto</span> <span class="kw">as</span> <span class="ident">InstPtr</span>); |
| } |
| <span class="ident">StartText</span> <span class="kw">if</span> <span class="ident">flags</span>.<span class="ident">start</span> <span class="op">=></span> { |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">stack</span>.<span class="ident">push</span>(<span class="ident">inst</span>.<span class="ident">goto</span> <span class="kw">as</span> <span class="ident">InstPtr</span>); |
| } |
| <span class="ident">EndText</span> <span class="kw">if</span> <span class="ident">flags</span>.<span class="ident">end</span> <span class="op">=></span> { |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">stack</span>.<span class="ident">push</span>(<span class="ident">inst</span>.<span class="ident">goto</span> <span class="kw">as</span> <span class="ident">InstPtr</span>); |
| } |
| <span class="ident">WordBoundaryAscii</span> <span class="kw">if</span> <span class="ident">flags</span>.<span class="ident">word_boundary</span> <span class="op">=></span> { |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">stack</span>.<span class="ident">push</span>(<span class="ident">inst</span>.<span class="ident">goto</span> <span class="kw">as</span> <span class="ident">InstPtr</span>); |
| } |
| <span class="ident">NotWordBoundaryAscii</span> <span class="kw">if</span> <span class="ident">flags</span>.<span class="ident">not_word_boundary</span> <span class="op">=></span> { |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">stack</span>.<span class="ident">push</span>(<span class="ident">inst</span>.<span class="ident">goto</span> <span class="kw">as</span> <span class="ident">InstPtr</span>); |
| } |
| <span class="ident">WordBoundary</span> <span class="kw">if</span> <span class="ident">flags</span>.<span class="ident">word_boundary</span> <span class="op">=></span> { |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">stack</span>.<span class="ident">push</span>(<span class="ident">inst</span>.<span class="ident">goto</span> <span class="kw">as</span> <span class="ident">InstPtr</span>); |
| } |
| <span class="ident">NotWordBoundary</span> <span class="kw">if</span> <span class="ident">flags</span>.<span class="ident">not_word_boundary</span> <span class="op">=></span> { |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">stack</span>.<span class="ident">push</span>(<span class="ident">inst</span>.<span class="ident">goto</span> <span class="kw">as</span> <span class="ident">InstPtr</span>); |
| } |
| <span class="ident">StartLine</span> <span class="op">|</span> <span class="ident">EndLine</span> <span class="op">|</span> <span class="ident">StartText</span> <span class="op">|</span> <span class="ident">EndText</span> <span class="op">=></span> {} |
| <span class="ident">WordBoundaryAscii</span> <span class="op">|</span> <span class="ident">NotWordBoundaryAscii</span> <span class="op">=></span> {} |
| <span class="ident">WordBoundary</span> <span class="op">|</span> <span class="ident">NotWordBoundary</span> <span class="op">=></span> {} |
| } |
| } |
| <span class="ident">Save</span>(<span class="kw-2">ref</span> <span class="ident">inst</span>) <span class="op">=></span> <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">stack</span>.<span class="ident">push</span>(<span class="ident">inst</span>.<span class="ident">goto</span> <span class="kw">as</span> <span class="ident">InstPtr</span>), |
| <span class="ident">Split</span>(<span class="kw-2">ref</span> <span class="ident">inst</span>) <span class="op">=></span> { |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">stack</span>.<span class="ident">push</span>(<span class="ident">inst</span>.<span class="ident">goto2</span> <span class="kw">as</span> <span class="ident">InstPtr</span>); |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">stack</span>.<span class="ident">push</span>(<span class="ident">inst</span>.<span class="ident">goto1</span> <span class="kw">as</span> <span class="ident">InstPtr</span>); |
| } |
| } |
| } |
| } |
| |
| <span class="doccomment">/// Find a previously computed state matching the given set of instructions</span> |
| <span class="doccomment">/// and is_match bool.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// The given set of instructions should represent a single state in the</span> |
| <span class="doccomment">/// NFA along with all states reachable without consuming any input.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// The is_match bool should be true if and only if the preceding DFA state</span> |
| <span class="doccomment">/// contains an NFA matching state. The cached state produced here will</span> |
| <span class="doccomment">/// then signify a match. (This enables us to delay a match by one byte,</span> |
| <span class="doccomment">/// in order to account for the EOF sentinel byte.)</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// If the cache is full, then it is wiped before caching a new state.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// The current state should be specified if it exists, since it will need</span> |
| <span class="doccomment">/// to be preserved if the cache clears itself. (Start states are</span> |
| <span class="doccomment">/// always saved, so they should not be passed here.) It takes a mutable</span> |
| <span class="doccomment">/// pointer to the index because if the cache is cleared, the state's</span> |
| <span class="doccomment">/// location may change.</span> |
| <span class="kw">fn</span> <span class="ident">cached_state</span>( |
| <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>, |
| <span class="ident">q</span>: <span class="kw-2">&</span><span class="ident">SparseSet</span>, |
| <span class="kw-2">mut</span> <span class="ident">state_flags</span>: <span class="ident">StateFlags</span>, |
| <span class="ident">current_state</span>: <span class="prelude-ty">Option</span><span class="op"><</span><span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">StatePtr</span><span class="op">></span>, |
| ) <span class="op">-></span> <span class="prelude-ty">Option</span><span class="op"><</span><span class="ident">StatePtr</span><span class="op">></span> { |
| <span class="comment">// If we couldn't come up with a non-empty key to represent this state,</span> |
| <span class="comment">// then it is dead and can never lead to a match.</span> |
| <span class="comment">//</span> |
| <span class="comment">// Note that inst_flags represent the set of empty width assertions</span> |
| <span class="comment">// in q. We use this as an optimization in exec_byte to determine when</span> |
| <span class="comment">// we should follow epsilon transitions at the empty string preceding</span> |
| <span class="comment">// the current byte.</span> |
| <span class="kw">let</span> <span class="ident">key</span> <span class="op">=</span> <span class="kw">match</span> <span class="self">self</span>.<span class="ident">cached_state_key</span>(<span class="ident">q</span>, <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">state_flags</span>) { |
| <span class="prelude-val">None</span> <span class="op">=></span> <span class="kw">return</span> <span class="prelude-val">Some</span>(<span class="ident">STATE_DEAD</span>), |
| <span class="prelude-val">Some</span>(<span class="ident">v</span>) <span class="op">=></span> <span class="ident">v</span>, |
| }; |
| <span class="comment">// In the cache? Cool. Done.</span> |
| <span class="kw">if</span> <span class="kw">let</span> <span class="prelude-val">Some</span>(<span class="kw-2">&</span><span class="ident">si</span>) <span class="op">=</span> <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">compiled</span>.<span class="ident">get</span>(<span class="kw-2">&</span><span class="ident">key</span>) { |
| <span class="kw">return</span> <span class="prelude-val">Some</span>(<span class="ident">si</span>); |
| } |
| <span class="comment">// If the cache has gotten too big, wipe it.</span> |
| <span class="kw">if</span> <span class="self">self</span>.<span class="ident">approximate_size</span>() <span class="op">></span> <span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">dfa_size_limit</span> { |
| <span class="kw">if</span> <span class="op">!</span><span class="self">self</span>.<span class="ident">clear_cache_and_save</span>(<span class="ident">current_state</span>) { |
| <span class="comment">// Ooops. DFA is giving up.</span> |
| <span class="kw">return</span> <span class="prelude-val">None</span>; |
| } |
| } |
| <span class="comment">// Allocate room for our state and add it.</span> |
| <span class="self">self</span>.<span class="ident">add_state</span>(<span class="ident">key</span>) |
| } |
| |
| <span class="doccomment">/// Produces a key suitable for describing a state in the DFA cache.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// The key invariant here is that equivalent keys are produced for any two</span> |
| <span class="doccomment">/// sets of ordered NFA states (and toggling of whether the previous NFA</span> |
| <span class="doccomment">/// states contain a match state) that do not discriminate a match for any</span> |
| <span class="doccomment">/// input.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// Specifically, q should be an ordered set of NFA states and is_match</span> |
| <span class="doccomment">/// should be true if and only if the previous NFA states contained a match</span> |
| <span class="doccomment">/// state.</span> |
| <span class="kw">fn</span> <span class="ident">cached_state_key</span>( |
| <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>, |
| <span class="ident">q</span>: <span class="kw-2">&</span><span class="ident">SparseSet</span>, |
| <span class="ident">state_flags</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">StateFlags</span>, |
| ) <span class="op">-></span> <span class="prelude-ty">Option</span><span class="op"><</span><span class="ident">State</span><span class="op">></span> { |
| <span class="kw">use</span> <span class="ident">prog</span>::<span class="ident">Inst</span>::<span class="kw-2">*</span>; |
| |
| <span class="comment">// We need to build up enough information to recognize pre-built states</span> |
| <span class="comment">// in the DFA. Generally speaking, this includes every instruction</span> |
| <span class="comment">// except for those which are purely epsilon transitions, e.g., the</span> |
| <span class="comment">// Save and Split instructions.</span> |
| <span class="comment">//</span> |
| <span class="comment">// Empty width assertions are also epsilon transitions, but since they</span> |
| <span class="comment">// are conditional, we need to make them part of a state's key in the</span> |
| <span class="comment">// cache.</span> |
| |
| <span class="comment">// Reserve 1 byte for flags.</span> |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">insts</span> <span class="op">=</span> <span class="macro">vec</span><span class="macro">!</span>[<span class="number">0</span>]; |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">prev</span> <span class="op">=</span> <span class="number">0</span>; |
| <span class="kw">for</span> <span class="kw-2">&</span><span class="ident">ip</span> <span class="kw">in</span> <span class="ident">q</span> { |
| <span class="kw">let</span> <span class="ident">ip</span> <span class="op">=</span> <span class="ident">usize_to_u32</span>(<span class="ident">ip</span>); |
| <span class="kw">match</span> <span class="self">self</span>.<span class="ident">prog</span>[<span class="ident">ip</span> <span class="kw">as</span> <span class="ident">usize</span>] { |
| <span class="ident">Char</span>(_) <span class="op">|</span> <span class="ident">Ranges</span>(_) <span class="op">=></span> <span class="macro">unreachable</span><span class="macro">!</span>(), |
| <span class="ident">Save</span>(_) <span class="op">=></span> {} |
| <span class="ident">Split</span>(_) <span class="op">=></span> {} |
| <span class="ident">Bytes</span>(_) <span class="op">=></span> <span class="ident">push_inst_ptr</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">insts</span>, <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">prev</span>, <span class="ident">ip</span>), |
| <span class="ident">EmptyLook</span>(_) <span class="op">=></span> { |
| <span class="ident">state_flags</span>.<span class="ident">set_empty</span>(); |
| <span class="ident">push_inst_ptr</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">insts</span>, <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">prev</span>, <span class="ident">ip</span>) |
| } |
| <span class="ident">Match</span>(_) <span class="op">=></span> { |
| <span class="ident">push_inst_ptr</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">insts</span>, <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">prev</span>, <span class="ident">ip</span>); |
| <span class="kw">if</span> <span class="op">!</span><span class="self">self</span>.<span class="ident">continue_past_first_match</span>() { |
| <span class="kw">break</span>; |
| } |
| } |
| } |
| } |
| <span class="comment">// If we couldn't transition to any other instructions and we didn't</span> |
| <span class="comment">// see a match when expanding NFA states previously, then this is a</span> |
| <span class="comment">// dead state and no amount of additional input can transition out</span> |
| <span class="comment">// of this state.</span> |
| <span class="kw">if</span> <span class="ident">insts</span>.<span class="ident">len</span>() <span class="op">==</span> <span class="number">1</span> <span class="op">&&</span> <span class="op">!</span><span class="ident">state_flags</span>.<span class="ident">is_match</span>() { |
| <span class="prelude-val">None</span> |
| } <span class="kw">else</span> { |
| <span class="kw">let</span> <span class="ident">StateFlags</span>(<span class="ident">f</span>) <span class="op">=</span> <span class="kw-2">*</span><span class="ident">state_flags</span>; |
| <span class="ident">insts</span>[<span class="number">0</span>] <span class="op">=</span> <span class="ident">f</span>; |
| <span class="prelude-val">Some</span>(<span class="ident">State</span> { <span class="ident">data</span>: <span class="ident">insts</span>.<span class="ident">into_boxed_slice</span>() }) |
| } |
| } |
| |
| <span class="doccomment">/// Clears the cache, but saves and restores current_state if it is not</span> |
| <span class="doccomment">/// none.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// The current state must be provided here in case its location in the</span> |
| <span class="doccomment">/// cache changes.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// This returns false if the cache is not cleared and the DFA should</span> |
| <span class="doccomment">/// give up.</span> |
| <span class="kw">fn</span> <span class="ident">clear_cache_and_save</span>( |
| <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>, |
| <span class="ident">current_state</span>: <span class="prelude-ty">Option</span><span class="op"><</span><span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">StatePtr</span><span class="op">></span>, |
| ) <span class="op">-></span> <span class="ident">bool</span> { |
| <span class="kw">if</span> <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">states</span>.<span class="ident">is_empty</span>() { |
| <span class="comment">// Nothing to clear...</span> |
| <span class="kw">return</span> <span class="bool-val">true</span>; |
| } |
| <span class="kw">match</span> <span class="ident">current_state</span> { |
| <span class="prelude-val">None</span> <span class="op">=></span> <span class="self">self</span>.<span class="ident">clear_cache</span>(), |
| <span class="prelude-val">Some</span>(<span class="ident">si</span>) <span class="op">=></span> { |
| <span class="kw">let</span> <span class="ident">cur</span> <span class="op">=</span> <span class="self">self</span>.<span class="ident">state</span>(<span class="kw-2">*</span><span class="ident">si</span>).<span class="ident">clone</span>(); |
| <span class="kw">if</span> <span class="op">!</span><span class="self">self</span>.<span class="ident">clear_cache</span>() { |
| <span class="kw">return</span> <span class="bool-val">false</span>; |
| } |
| <span class="comment">// The unwrap is OK because we just cleared the cache and</span> |
| <span class="comment">// therefore know that the next state pointer won't exceed</span> |
| <span class="comment">// STATE_MAX.</span> |
| <span class="kw-2">*</span><span class="ident">si</span> <span class="op">=</span> <span class="self">self</span>.<span class="ident">restore_state</span>(<span class="ident">cur</span>).<span class="ident">unwrap</span>(); |
| <span class="bool-val">true</span> |
| } |
| } |
| } |
| |
| <span class="doccomment">/// Wipes the state cache, but saves and restores the current start state.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// This returns false if the cache is not cleared and the DFA should</span> |
| <span class="doccomment">/// give up.</span> |
| <span class="kw">fn</span> <span class="ident">clear_cache</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>) <span class="op">-></span> <span class="ident">bool</span> { |
| <span class="comment">// Bail out of the DFA if we're moving too "slowly."</span> |
| <span class="comment">// A heuristic from RE2: assume the DFA is too slow if it is processing</span> |
| <span class="comment">// 10 or fewer bytes per state.</span> |
| <span class="comment">// Additionally, we permit the cache to be flushed a few times before</span> |
| <span class="comment">// caling it quits.</span> |
| <span class="kw">let</span> <span class="ident">nstates</span> <span class="op">=</span> <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">states</span>.<span class="ident">len</span>(); |
| <span class="kw">if</span> <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">flush_count</span> <span class="op">>=</span> <span class="number">3</span> |
| <span class="op">&&</span> <span class="self">self</span>.<span class="ident">at</span> <span class="op">>=</span> <span class="self">self</span>.<span class="ident">last_cache_flush</span> |
| <span class="op">&&</span> (<span class="self">self</span>.<span class="ident">at</span> <span class="op">-</span> <span class="self">self</span>.<span class="ident">last_cache_flush</span>) <span class="op"><=</span> <span class="number">10</span> <span class="op">*</span> <span class="ident">nstates</span> { |
| <span class="kw">return</span> <span class="bool-val">false</span>; |
| } |
| <span class="comment">// Update statistics tracking cache flushes.</span> |
| <span class="self">self</span>.<span class="ident">last_cache_flush</span> <span class="op">=</span> <span class="self">self</span>.<span class="ident">at</span>; |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">flush_count</span> <span class="op">+=</span> <span class="number">1</span>; |
| |
| <span class="comment">// OK, actually flush the cache.</span> |
| <span class="kw">let</span> <span class="ident">start</span> <span class="op">=</span> <span class="self">self</span>.<span class="ident">state</span>(<span class="self">self</span>.<span class="ident">start</span> <span class="op">&</span> <span class="op">!</span><span class="ident">STATE_START</span>).<span class="ident">clone</span>(); |
| <span class="kw">let</span> <span class="ident">last_match</span> <span class="op">=</span> <span class="kw">if</span> <span class="self">self</span>.<span class="ident">last_match_si</span> <span class="op"><=</span> <span class="ident">STATE_MAX</span> { |
| <span class="prelude-val">Some</span>(<span class="self">self</span>.<span class="ident">state</span>(<span class="self">self</span>.<span class="ident">last_match_si</span>).<span class="ident">clone</span>()) |
| } <span class="kw">else</span> { |
| <span class="prelude-val">None</span> |
| }; |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">reset_size</span>(); |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">trans</span>.<span class="ident">clear</span>(); |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">states</span>.<span class="ident">clear</span>(); |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">compiled</span>.<span class="ident">clear</span>(); |
| <span class="kw">for</span> <span class="ident">s</span> <span class="kw">in</span> <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">start_states</span>.<span class="ident">iter_mut</span>() { |
| <span class="kw-2">*</span><span class="ident">s</span> <span class="op">=</span> <span class="ident">STATE_UNKNOWN</span>; |
| } |
| <span class="comment">// The unwraps are OK because we just cleared the cache and therefore</span> |
| <span class="comment">// know that the next state pointer won't exceed STATE_MAX.</span> |
| <span class="kw">let</span> <span class="ident">start_ptr</span> <span class="op">=</span> <span class="self">self</span>.<span class="ident">restore_state</span>(<span class="ident">start</span>).<span class="ident">unwrap</span>(); |
| <span class="self">self</span>.<span class="ident">start</span> <span class="op">=</span> <span class="self">self</span>.<span class="ident">start_ptr</span>(<span class="ident">start_ptr</span>); |
| <span class="kw">if</span> <span class="kw">let</span> <span class="prelude-val">Some</span>(<span class="ident">last_match</span>) <span class="op">=</span> <span class="ident">last_match</span> { |
| <span class="self">self</span>.<span class="ident">last_match_si</span> <span class="op">=</span> <span class="self">self</span>.<span class="ident">restore_state</span>(<span class="ident">last_match</span>).<span class="ident">unwrap</span>(); |
| } |
| <span class="bool-val">true</span> |
| } |
| |
| <span class="doccomment">/// Restores the given state back into the cache, and returns a pointer</span> |
| <span class="doccomment">/// to it.</span> |
| <span class="kw">fn</span> <span class="ident">restore_state</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>, <span class="ident">state</span>: <span class="ident">State</span>) <span class="op">-></span> <span class="prelude-ty">Option</span><span class="op"><</span><span class="ident">StatePtr</span><span class="op">></span> { |
| <span class="comment">// If we've already stored this state, just return a pointer to it.</span> |
| <span class="comment">// None will be the wiser.</span> |
| <span class="kw">if</span> <span class="kw">let</span> <span class="prelude-val">Some</span>(<span class="kw-2">&</span><span class="ident">si</span>) <span class="op">=</span> <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">compiled</span>.<span class="ident">get</span>(<span class="kw-2">&</span><span class="ident">state</span>) { |
| <span class="kw">return</span> <span class="prelude-val">Some</span>(<span class="ident">si</span>); |
| } |
| <span class="self">self</span>.<span class="ident">add_state</span>(<span class="ident">state</span>) |
| } |
| |
| <span class="doccomment">/// Returns the next state given the current state si and current byte</span> |
| <span class="doccomment">/// b. {qcur,qnext} are used as scratch space for storing ordered NFA</span> |
| <span class="doccomment">/// states.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// This tries to fetch the next state from the cache, but if that fails,</span> |
| <span class="doccomment">/// it computes the next state, caches it and returns a pointer to it.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// The pointer can be to a real state, or it can be STATE_DEAD.</span> |
| <span class="doccomment">/// STATE_UNKNOWN cannot be returned.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// None is returned if a new state could not be allocated (i.e., the DFA</span> |
| <span class="doccomment">/// ran out of space and thinks it's running too slowly).</span> |
| <span class="kw">fn</span> <span class="ident">next_state</span>( |
| <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>, |
| <span class="ident">qcur</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">SparseSet</span>, |
| <span class="ident">qnext</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">SparseSet</span>, |
| <span class="ident">si</span>: <span class="ident">StatePtr</span>, |
| <span class="ident">b</span>: <span class="ident">Byte</span>, |
| ) <span class="op">-></span> <span class="prelude-ty">Option</span><span class="op"><</span><span class="ident">StatePtr</span><span class="op">></span> { |
| <span class="kw">if</span> <span class="ident">si</span> <span class="op">==</span> <span class="ident">STATE_DEAD</span> { |
| <span class="kw">return</span> <span class="prelude-val">Some</span>(<span class="ident">STATE_DEAD</span>); |
| } |
| <span class="kw">match</span> <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">trans</span>.<span class="ident">next</span>(<span class="ident">si</span>, <span class="self">self</span>.<span class="ident">byte_class</span>(<span class="ident">b</span>)) { |
| <span class="ident">STATE_UNKNOWN</span> <span class="op">=></span> <span class="self">self</span>.<span class="ident">exec_byte</span>(<span class="ident">qcur</span>, <span class="ident">qnext</span>, <span class="ident">si</span>, <span class="ident">b</span>), |
| <span class="ident">STATE_QUIT</span> <span class="op">=></span> <span class="prelude-val">None</span>, |
| <span class="ident">STATE_DEAD</span> <span class="op">=></span> <span class="prelude-val">Some</span>(<span class="ident">STATE_DEAD</span>), |
| <span class="ident">nsi</span> <span class="op">=></span> <span class="prelude-val">Some</span>(<span class="ident">nsi</span>), |
| } |
| } |
| |
| <span class="doccomment">/// Computes and returns the start state, where searching begins at</span> |
| <span class="doccomment">/// position `at` in `text`. If the state has already been computed,</span> |
| <span class="doccomment">/// then it is pulled from the cache. If the state hasn't been cached,</span> |
| <span class="doccomment">/// then it is computed, cached and a pointer to it is returned.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// This may return STATE_DEAD but never STATE_UNKNOWN.</span> |
| <span class="attribute">#[<span class="ident">inline</span>(<span class="ident">always</span>)]</span> <span class="comment">// reduces constant overhead</span> |
| <span class="kw">fn</span> <span class="ident">start_state</span>( |
| <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>, |
| <span class="ident">q</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">SparseSet</span>, |
| <span class="ident">empty_flags</span>: <span class="ident">EmptyFlags</span>, |
| <span class="ident">state_flags</span>: <span class="ident">StateFlags</span>, |
| ) <span class="op">-></span> <span class="prelude-ty">Option</span><span class="op"><</span><span class="ident">StatePtr</span><span class="op">></span> { |
| <span class="comment">// Compute an index into our cache of start states based on the set</span> |
| <span class="comment">// of empty/state flags set at the current position in the input. We</span> |
| <span class="comment">// don't use every flag since not all flags matter. For example, since</span> |
| <span class="comment">// matches are delayed by one byte, start states can never be match</span> |
| <span class="comment">// states.</span> |
| <span class="kw">let</span> <span class="ident">flagi</span> <span class="op">=</span> { |
| (((<span class="ident">empty_flags</span>.<span class="ident">start</span> <span class="kw">as</span> <span class="ident">u8</span>) <span class="op"><<</span> <span class="number">0</span>) <span class="op">|</span> |
| ((<span class="ident">empty_flags</span>.<span class="ident">end</span> <span class="kw">as</span> <span class="ident">u8</span>) <span class="op"><<</span> <span class="number">1</span>) <span class="op">|</span> |
| ((<span class="ident">empty_flags</span>.<span class="ident">start_line</span> <span class="kw">as</span> <span class="ident">u8</span>) <span class="op"><<</span> <span class="number">2</span>) <span class="op">|</span> |
| ((<span class="ident">empty_flags</span>.<span class="ident">end_line</span> <span class="kw">as</span> <span class="ident">u8</span>) <span class="op"><<</span> <span class="number">3</span>) <span class="op">|</span> |
| ((<span class="ident">state_flags</span>.<span class="ident">is_word</span>() <span class="kw">as</span> <span class="ident">u8</span>) <span class="op"><<</span> <span class="number">4</span>)) |
| <span class="kw">as</span> <span class="ident">usize</span> |
| }; |
| <span class="kw">match</span> <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">start_states</span>[<span class="ident">flagi</span>] { |
| <span class="ident">STATE_UNKNOWN</span> <span class="op">=></span> {} |
| <span class="ident">STATE_DEAD</span> <span class="op">=></span> <span class="kw">return</span> <span class="prelude-val">Some</span>(<span class="ident">STATE_DEAD</span>), |
| <span class="ident">si</span> <span class="op">=></span> <span class="kw">return</span> <span class="prelude-val">Some</span>(<span class="ident">si</span>), |
| } |
| <span class="ident">q</span>.<span class="ident">clear</span>(); |
| <span class="kw">let</span> <span class="ident">start</span> <span class="op">=</span> <span class="ident">usize_to_u32</span>(<span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">start</span>); |
| <span class="self">self</span>.<span class="ident">follow_epsilons</span>(<span class="ident">start</span>, <span class="ident">q</span>, <span class="ident">empty_flags</span>); |
| <span class="comment">// Start states can never be match states because we delay every match</span> |
| <span class="comment">// by one byte. Given an empty string and an empty match, the match</span> |
| <span class="comment">// won't actually occur until the DFA processes the special EOF</span> |
| <span class="comment">// sentinel byte.</span> |
| <span class="kw">let</span> <span class="ident">sp</span> <span class="op">=</span> <span class="kw">match</span> <span class="self">self</span>.<span class="ident">cached_state</span>(<span class="ident">q</span>, <span class="ident">state_flags</span>, <span class="prelude-val">None</span>) { |
| <span class="prelude-val">None</span> <span class="op">=></span> <span class="kw">return</span> <span class="prelude-val">None</span>, |
| <span class="prelude-val">Some</span>(<span class="ident">sp</span>) <span class="op">=></span> <span class="self">self</span>.<span class="ident">start_ptr</span>(<span class="ident">sp</span>), |
| }; |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">start_states</span>[<span class="ident">flagi</span>] <span class="op">=</span> <span class="ident">sp</span>; |
| <span class="prelude-val">Some</span>(<span class="ident">sp</span>) |
| } |
| |
| <span class="doccomment">/// Computes the set of starting flags for the given position in text.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// This should only be used when executing the DFA forwards over the</span> |
| <span class="doccomment">/// input.</span> |
| <span class="kw">fn</span> <span class="ident">start_flags</span>(<span class="kw-2">&</span><span class="self">self</span>, <span class="ident">text</span>: <span class="kw-2">&</span>[<span class="ident">u8</span>], <span class="ident">at</span>: <span class="ident">usize</span>) <span class="op">-></span> (<span class="ident">EmptyFlags</span>, <span class="ident">StateFlags</span>) { |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">empty_flags</span> <span class="op">=</span> <span class="ident">EmptyFlags</span>::<span class="ident">default</span>(); |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">state_flags</span> <span class="op">=</span> <span class="ident">StateFlags</span>::<span class="ident">default</span>(); |
| <span class="ident">empty_flags</span>.<span class="ident">start</span> <span class="op">=</span> <span class="ident">at</span> <span class="op">==</span> <span class="number">0</span>; |
| <span class="ident">empty_flags</span>.<span class="ident">end</span> <span class="op">=</span> <span class="ident">text</span>.<span class="ident">len</span>() <span class="op">==</span> <span class="number">0</span>; |
| <span class="ident">empty_flags</span>.<span class="ident">start_line</span> <span class="op">=</span> <span class="ident">at</span> <span class="op">==</span> <span class="number">0</span> <span class="op">||</span> <span class="ident">text</span>[<span class="ident">at</span> <span class="op">-</span> <span class="number">1</span>] <span class="op">==</span> <span class="string">b'\n'</span>; |
| <span class="ident">empty_flags</span>.<span class="ident">end_line</span> <span class="op">=</span> <span class="ident">text</span>.<span class="ident">len</span>() <span class="op">==</span> <span class="number">0</span>; |
| <span class="kw">if</span> <span class="ident">at</span> <span class="op">></span> <span class="number">0</span> <span class="op">&&</span> <span class="ident">Byte</span>::<span class="ident">byte</span>(<span class="ident">text</span>[<span class="ident">at</span> <span class="op">-</span> <span class="number">1</span>]).<span class="ident">is_ascii_word</span>() { |
| <span class="ident">state_flags</span>.<span class="ident">set_word</span>(); |
| } |
| (<span class="ident">empty_flags</span>, <span class="ident">state_flags</span>) |
| } |
| |
| <span class="doccomment">/// Computes the set of starting flags for the given position in text.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// This should only be used when executing the DFA in reverse over the</span> |
| <span class="doccomment">/// input.</span> |
| <span class="kw">fn</span> <span class="ident">start_flags_reverse</span>( |
| <span class="kw-2">&</span><span class="self">self</span>, |
| <span class="ident">text</span>: <span class="kw-2">&</span>[<span class="ident">u8</span>], |
| <span class="ident">at</span>: <span class="ident">usize</span>, |
| ) <span class="op">-></span> (<span class="ident">EmptyFlags</span>, <span class="ident">StateFlags</span>) { |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">empty_flags</span> <span class="op">=</span> <span class="ident">EmptyFlags</span>::<span class="ident">default</span>(); |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">state_flags</span> <span class="op">=</span> <span class="ident">StateFlags</span>::<span class="ident">default</span>(); |
| <span class="ident">empty_flags</span>.<span class="ident">start</span> <span class="op">=</span> <span class="ident">at</span> <span class="op">==</span> <span class="ident">text</span>.<span class="ident">len</span>(); |
| <span class="ident">empty_flags</span>.<span class="ident">end</span> <span class="op">=</span> <span class="ident">text</span>.<span class="ident">len</span>() <span class="op">==</span> <span class="number">0</span>; |
| <span class="ident">empty_flags</span>.<span class="ident">start_line</span> <span class="op">=</span> <span class="ident">at</span> <span class="op">==</span> <span class="ident">text</span>.<span class="ident">len</span>() <span class="op">||</span> <span class="ident">text</span>[<span class="ident">at</span>] <span class="op">==</span> <span class="string">b'\n'</span>; |
| <span class="ident">empty_flags</span>.<span class="ident">end_line</span> <span class="op">=</span> <span class="ident">text</span>.<span class="ident">len</span>() <span class="op">==</span> <span class="number">0</span>; |
| <span class="kw">if</span> <span class="ident">at</span> <span class="op"><</span> <span class="ident">text</span>.<span class="ident">len</span>() <span class="op">&&</span> <span class="ident">Byte</span>::<span class="ident">byte</span>(<span class="ident">text</span>[<span class="ident">at</span>]).<span class="ident">is_ascii_word</span>() { |
| <span class="ident">state_flags</span>.<span class="ident">set_word</span>(); |
| } |
| (<span class="ident">empty_flags</span>, <span class="ident">state_flags</span>) |
| } |
| |
| <span class="doccomment">/// Returns a reference to a State given a pointer to it.</span> |
| <span class="kw">fn</span> <span class="ident">state</span>(<span class="kw-2">&</span><span class="self">self</span>, <span class="ident">si</span>: <span class="ident">StatePtr</span>) <span class="op">-></span> <span class="kw-2">&</span><span class="ident">State</span> { |
| <span class="kw-2">&</span><span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">states</span>[<span class="ident">si</span> <span class="kw">as</span> <span class="ident">usize</span> <span class="op">/</span> <span class="self">self</span>.<span class="ident">num_byte_classes</span>()] |
| } |
| |
| <span class="doccomment">/// Adds the given state to the DFA.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// This allocates room for transitions out of this state in</span> |
| <span class="doccomment">/// self.cache.trans. The transitions can be set with the returned</span> |
| <span class="doccomment">/// StatePtr.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// If None is returned, then the state limit was reached and the DFA</span> |
| <span class="doccomment">/// should quit.</span> |
| <span class="kw">fn</span> <span class="ident">add_state</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>, <span class="ident">state</span>: <span class="ident">State</span>) <span class="op">-></span> <span class="prelude-ty">Option</span><span class="op"><</span><span class="ident">StatePtr</span><span class="op">></span> { |
| <span class="comment">// This will fail if the next state pointer exceeds STATE_PTR. In</span> |
| <span class="comment">// practice, the cache limit will prevent us from ever getting here,</span> |
| <span class="comment">// but maybe callers will set the cache size to something ridiculous...</span> |
| <span class="kw">let</span> <span class="ident">si</span> <span class="op">=</span> <span class="kw">match</span> <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">trans</span>.<span class="ident">add</span>() { |
| <span class="prelude-val">None</span> <span class="op">=></span> <span class="kw">return</span> <span class="prelude-val">None</span>, |
| <span class="prelude-val">Some</span>(<span class="ident">si</span>) <span class="op">=></span> <span class="ident">si</span>, |
| }; |
| <span class="comment">// If the program has a Unicode word boundary, then set any transitions</span> |
| <span class="comment">// for non-ASCII bytes to STATE_QUIT. If the DFA stumbles over such a</span> |
| <span class="comment">// transition, then it will quit and an alternative matching engine</span> |
| <span class="comment">// will take over.</span> |
| <span class="kw">if</span> <span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">has_unicode_word_boundary</span> { |
| <span class="kw">for</span> <span class="ident">b</span> <span class="kw">in</span> <span class="number">128</span>..<span class="number">256</span> { |
| <span class="kw">let</span> <span class="ident">cls</span> <span class="op">=</span> <span class="self">self</span>.<span class="ident">byte_class</span>(<span class="ident">Byte</span>::<span class="ident">byte</span>(<span class="ident">b</span> <span class="kw">as</span> <span class="ident">u8</span>)); |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">trans</span>.<span class="ident">set_next</span>(<span class="ident">si</span>, <span class="ident">cls</span>, <span class="ident">STATE_QUIT</span>); |
| } |
| } |
| <span class="comment">// Finally, put our actual state on to our heap of states and index it</span> |
| <span class="comment">// so we can find it later.</span> |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">size</span> <span class="op">+=</span> |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">trans</span>.<span class="ident">state_heap_size</span>() |
| <span class="op">+</span> (<span class="number">2</span> <span class="op">*</span> <span class="ident">state</span>.<span class="ident">data</span>.<span class="ident">len</span>()) |
| <span class="op">+</span> (<span class="number">2</span> <span class="op">*</span> <span class="ident">mem</span>::<span class="ident">size_of</span>::<span class="op"><</span><span class="ident">State</span><span class="op">></span>()) |
| <span class="op">+</span> <span class="ident">mem</span>::<span class="ident">size_of</span>::<span class="op"><</span><span class="ident">StatePtr</span><span class="op">></span>(); |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">states</span>.<span class="ident">push</span>(<span class="ident">state</span>.<span class="ident">clone</span>()); |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">compiled</span>.<span class="ident">insert</span>(<span class="ident">state</span>, <span class="ident">si</span>); |
| <span class="comment">// Transition table and set of states and map should all be in sync.</span> |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">states</span>.<span class="ident">len</span>() |
| <span class="op">==</span> <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">trans</span>.<span class="ident">num_states</span>()); |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">states</span>.<span class="ident">len</span>() |
| <span class="op">==</span> <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">compiled</span>.<span class="ident">len</span>()); |
| <span class="prelude-val">Some</span>(<span class="ident">si</span>) |
| } |
| |
| <span class="doccomment">/// Quickly finds the next occurrence of any literal prefixes in the regex.</span> |
| <span class="doccomment">/// If there are no literal prefixes, then the current position is</span> |
| <span class="doccomment">/// returned. If there are literal prefixes and one could not be found,</span> |
| <span class="doccomment">/// then None is returned.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// This should only be called when the DFA is in a start state.</span> |
| <span class="kw">fn</span> <span class="ident">prefix_at</span>(<span class="kw-2">&</span><span class="self">self</span>, <span class="ident">text</span>: <span class="kw-2">&</span>[<span class="ident">u8</span>], <span class="ident">at</span>: <span class="ident">usize</span>) <span class="op">-></span> <span class="prelude-ty">Option</span><span class="op"><</span><span class="ident">usize</span><span class="op">></span> { |
| <span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">prefixes</span>.<span class="ident">find</span>(<span class="kw-2">&</span><span class="ident">text</span>[<span class="ident">at</span>..]).<span class="ident">map</span>(<span class="op">|</span>(<span class="ident">s</span>, _)<span class="op">|</span> <span class="ident">at</span> <span class="op">+</span> <span class="ident">s</span>) |
| } |
| |
| <span class="doccomment">/// Returns the number of byte classes required to discriminate transitions</span> |
| <span class="doccomment">/// in each state.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// invariant: num_byte_classes() == len(State.next)</span> |
| <span class="kw">fn</span> <span class="ident">num_byte_classes</span>(<span class="kw-2">&</span><span class="self">self</span>) <span class="op">-></span> <span class="ident">usize</span> { |
| <span class="comment">// We add 1 to account for the special EOF byte.</span> |
| (<span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">byte_classes</span>[<span class="number">255</span>] <span class="kw">as</span> <span class="ident">usize</span> <span class="op">+</span> <span class="number">1</span>) <span class="op">+</span> <span class="number">1</span> |
| } |
| |
| <span class="doccomment">/// Given an input byte or the special EOF sentinel, return its</span> |
| <span class="doccomment">/// corresponding byte class.</span> |
| <span class="attribute">#[<span class="ident">inline</span>(<span class="ident">always</span>)]</span> |
| <span class="kw">fn</span> <span class="ident">byte_class</span>(<span class="kw-2">&</span><span class="self">self</span>, <span class="ident">b</span>: <span class="ident">Byte</span>) <span class="op">-></span> <span class="ident">usize</span> { |
| <span class="kw">match</span> <span class="ident">b</span>.<span class="ident">as_byte</span>() { |
| <span class="prelude-val">None</span> <span class="op">=></span> <span class="self">self</span>.<span class="ident">num_byte_classes</span>() <span class="op">-</span> <span class="number">1</span>, |
| <span class="prelude-val">Some</span>(<span class="ident">b</span>) <span class="op">=></span> <span class="self">self</span>.<span class="ident">u8_class</span>(<span class="ident">b</span>), |
| } |
| } |
| |
| <span class="doccomment">/// Like byte_class, but explicitly for u8s.</span> |
| <span class="attribute">#[<span class="ident">inline</span>(<span class="ident">always</span>)]</span> |
| <span class="kw">fn</span> <span class="ident">u8_class</span>(<span class="kw-2">&</span><span class="self">self</span>, <span class="ident">b</span>: <span class="ident">u8</span>) <span class="op">-></span> <span class="ident">usize</span> { |
| <span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">byte_classes</span>[<span class="ident">b</span> <span class="kw">as</span> <span class="ident">usize</span>] <span class="kw">as</span> <span class="ident">usize</span> |
| } |
| |
| <span class="doccomment">/// Returns true if the DFA should continue searching past the first match.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// Leftmost first semantics in the DFA are preserved by not following NFA</span> |
| <span class="doccomment">/// transitions after the first match is seen.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// On occasion, we want to avoid leftmost first semantics to find either</span> |
| <span class="doccomment">/// the longest match (for reverse search) or all possible matches (for</span> |
| <span class="doccomment">/// regex sets).</span> |
| <span class="kw">fn</span> <span class="ident">continue_past_first_match</span>(<span class="kw-2">&</span><span class="self">self</span>) <span class="op">-></span> <span class="ident">bool</span> { |
| <span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">is_reverse</span> <span class="op">||</span> <span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">matches</span>.<span class="ident">len</span>() <span class="op">></span> <span class="number">1</span> |
| } |
| |
| <span class="doccomment">/// Returns true if there is a prefix we can quickly search for.</span> |
| <span class="kw">fn</span> <span class="ident">has_prefix</span>(<span class="kw-2">&</span><span class="self">self</span>) <span class="op">-></span> <span class="ident">bool</span> { |
| <span class="op">!</span><span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">is_reverse</span> |
| <span class="op">&&</span> <span class="op">!</span><span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">prefixes</span>.<span class="ident">is_empty</span>() |
| <span class="op">&&</span> <span class="op">!</span><span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">is_anchored_start</span> |
| } |
| |
| <span class="doccomment">/// Sets the STATE_START bit in the given state pointer if and only if</span> |
| <span class="doccomment">/// we have a prefix to scan for.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// If there's no prefix, then it's a waste to treat the start state</span> |
| <span class="doccomment">/// specially.</span> |
| <span class="kw">fn</span> <span class="ident">start_ptr</span>(<span class="kw-2">&</span><span class="self">self</span>, <span class="ident">si</span>: <span class="ident">StatePtr</span>) <span class="op">-></span> <span class="ident">StatePtr</span> { |
| <span class="kw">if</span> <span class="self">self</span>.<span class="ident">has_prefix</span>() { |
| <span class="ident">si</span> <span class="op">|</span> <span class="ident">STATE_START</span> |
| } <span class="kw">else</span> { |
| <span class="ident">si</span> |
| } |
| } |
| |
| <span class="doccomment">/// Approximate size returns the approximate heap space currently used by</span> |
| <span class="doccomment">/// the DFA. It is used to determine whether the DFA's state cache needs to</span> |
| <span class="doccomment">/// be wiped. Namely, it is possible that for certain regexes on certain</span> |
| <span class="doccomment">/// inputs, a new state could be created for every byte of input. (This is</span> |
| <span class="doccomment">/// bad for memory use, so we bound it with a cache.)</span> |
| <span class="kw">fn</span> <span class="ident">approximate_size</span>(<span class="kw-2">&</span><span class="self">self</span>) <span class="op">-></span> <span class="ident">usize</span> { |
| <span class="self">self</span>.<span class="ident">cache</span>.<span class="ident">size</span> <span class="op">+</span> <span class="self">self</span>.<span class="ident">prog</span>.<span class="ident">approximate_size</span>() |
| } |
| } |
| |
| <span class="kw">impl</span> <span class="ident">Transitions</span> { |
| <span class="doccomment">/// Create a new transition table.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// The number of byte classes corresponds to the stride. Every state will</span> |
| <span class="doccomment">/// have `num_byte_classes` slots for transitions.</span> |
| <span class="kw">fn</span> <span class="ident">new</span>(<span class="ident">num_byte_classes</span>: <span class="ident">usize</span>) <span class="op">-></span> <span class="ident">Transitions</span> { |
| <span class="ident">Transitions</span> { |
| <span class="ident">table</span>: <span class="macro">vec</span><span class="macro">!</span>[], |
| <span class="ident">num_byte_classes</span>: <span class="ident">num_byte_classes</span>, |
| } |
| } |
| |
| <span class="doccomment">/// Returns the total number of states currently in this table.</span> |
| <span class="kw">fn</span> <span class="ident">num_states</span>(<span class="kw-2">&</span><span class="self">self</span>) <span class="op">-></span> <span class="ident">usize</span> { |
| <span class="self">self</span>.<span class="ident">table</span>.<span class="ident">len</span>() <span class="op">/</span> <span class="self">self</span>.<span class="ident">num_byte_classes</span> |
| } |
| |
| <span class="doccomment">/// Allocates room for one additional state and returns a pointer to it.</span> |
| <span class="doccomment">///</span> |
| <span class="doccomment">/// If there's no more room, None is returned.</span> |
| <span class="kw">fn</span> <span class="ident">add</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>) <span class="op">-></span> <span class="prelude-ty">Option</span><span class="op"><</span><span class="ident">StatePtr</span><span class="op">></span> { |
| <span class="kw">let</span> <span class="ident">si</span> <span class="op">=</span> <span class="self">self</span>.<span class="ident">table</span>.<span class="ident">len</span>(); |
| <span class="kw">if</span> <span class="ident">si</span> <span class="op">></span> <span class="ident">STATE_MAX</span> <span class="kw">as</span> <span class="ident">usize</span> { |
| <span class="kw">return</span> <span class="prelude-val">None</span>; |
| } |
| <span class="self">self</span>.<span class="ident">table</span>.<span class="ident">extend</span>(<span class="ident">repeat</span>(<span class="ident">STATE_UNKNOWN</span>).<span class="ident">take</span>(<span class="self">self</span>.<span class="ident">num_byte_classes</span>)); |
| <span class="prelude-val">Some</span>(<span class="ident">usize_to_u32</span>(<span class="ident">si</span>)) |
| } |
| |
| <span class="doccomment">/// Clears the table of all states.</span> |
| <span class="kw">fn</span> <span class="ident">clear</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>) { |
| <span class="self">self</span>.<span class="ident">table</span>.<span class="ident">clear</span>(); |
| } |
| |
| <span class="doccomment">/// Sets the transition from (si, cls) to next.</span> |
| <span class="kw">fn</span> <span class="ident">set_next</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>, <span class="ident">si</span>: <span class="ident">StatePtr</span>, <span class="ident">cls</span>: <span class="ident">usize</span>, <span class="ident">next</span>: <span class="ident">StatePtr</span>) { |
| <span class="self">self</span>.<span class="ident">table</span>[<span class="ident">si</span> <span class="kw">as</span> <span class="ident">usize</span> <span class="op">+</span> <span class="ident">cls</span>] <span class="op">=</span> <span class="ident">next</span>; |
| } |
| |
| <span class="doccomment">/// Returns the transition corresponding to (si, cls).</span> |
| <span class="kw">fn</span> <span class="ident">next</span>(<span class="kw-2">&</span><span class="self">self</span>, <span class="ident">si</span>: <span class="ident">StatePtr</span>, <span class="ident">cls</span>: <span class="ident">usize</span>) <span class="op">-></span> <span class="ident">StatePtr</span> { |
| <span class="self">self</span>.<span class="ident">table</span>[<span class="ident">si</span> <span class="kw">as</span> <span class="ident">usize</span> <span class="op">+</span> <span class="ident">cls</span>] |
| } |
| |
| <span class="doccomment">/// The heap size, in bytes, of a single state in the transition table.</span> |
| <span class="kw">fn</span> <span class="ident">state_heap_size</span>(<span class="kw-2">&</span><span class="self">self</span>) <span class="op">-></span> <span class="ident">usize</span> { |
| <span class="self">self</span>.<span class="ident">num_byte_classes</span> <span class="op">*</span> <span class="ident">mem</span>::<span class="ident">size_of</span>::<span class="op"><</span><span class="ident">StatePtr</span><span class="op">></span>() |
| } |
| |
| <span class="doccomment">/// Like `next`, but uses unchecked access and is therefore unsafe.</span> |
| <span class="kw">unsafe</span> <span class="kw">fn</span> <span class="ident">next_unchecked</span>(<span class="kw-2">&</span><span class="self">self</span>, <span class="ident">si</span>: <span class="ident">StatePtr</span>, <span class="ident">cls</span>: <span class="ident">usize</span>) <span class="op">-></span> <span class="ident">StatePtr</span> { |
| <span class="macro">debug_assert</span><span class="macro">!</span>((<span class="ident">si</span> <span class="kw">as</span> <span class="ident">usize</span>) <span class="op"><</span> <span class="self">self</span>.<span class="ident">table</span>.<span class="ident">len</span>()); |
| <span class="macro">debug_assert</span><span class="macro">!</span>(<span class="ident">cls</span> <span class="op"><</span> <span class="self">self</span>.<span class="ident">num_byte_classes</span>); |
| <span class="kw-2">*</span><span class="self">self</span>.<span class="ident">table</span>.<span class="ident">get_unchecked</span>(<span class="ident">si</span> <span class="kw">as</span> <span class="ident">usize</span> <span class="op">+</span> <span class="ident">cls</span>) |
| } |
| } |
| |
| <span class="kw">impl</span> <span class="ident">StateFlags</span> { |
| <span class="kw">fn</span> <span class="ident">is_match</span>(<span class="kw-2">&</span><span class="self">self</span>) <span class="op">-></span> <span class="ident">bool</span> { |
| <span class="self">self</span>.<span class="number">0</span> <span class="op">&</span> <span class="number">0b0000000_1</span> <span class="op">></span> <span class="number">0</span> |
| } |
| |
| <span class="kw">fn</span> <span class="ident">set_match</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>) { |
| <span class="self">self</span>.<span class="number">0</span> <span class="op">|=</span> <span class="number">0b0000000_1</span>; |
| } |
| |
| <span class="kw">fn</span> <span class="ident">is_word</span>(<span class="kw-2">&</span><span class="self">self</span>) <span class="op">-></span> <span class="ident">bool</span> { |
| <span class="self">self</span>.<span class="number">0</span> <span class="op">&</span> <span class="number">0b000000_1_0</span> <span class="op">></span> <span class="number">0</span> |
| } |
| |
| <span class="kw">fn</span> <span class="ident">set_word</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>) { |
| <span class="self">self</span>.<span class="number">0</span> <span class="op">|=</span> <span class="number">0b000000_1_0</span>; |
| } |
| |
| <span class="kw">fn</span> <span class="ident">has_empty</span>(<span class="kw-2">&</span><span class="self">self</span>) <span class="op">-></span> <span class="ident">bool</span> { |
| <span class="self">self</span>.<span class="number">0</span> <span class="op">&</span> <span class="number">0b00000_1_00</span> <span class="op">></span> <span class="number">0</span> |
| } |
| |
| <span class="kw">fn</span> <span class="ident">set_empty</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="self">self</span>) { |
| <span class="self">self</span>.<span class="number">0</span> <span class="op">|=</span> <span class="number">0b00000_1_00</span>; |
| } |
| } |
| |
| <span class="kw">impl</span> <span class="ident">Byte</span> { |
| <span class="kw">fn</span> <span class="ident">byte</span>(<span class="ident">b</span>: <span class="ident">u8</span>) <span class="op">-></span> <span class="self">Self</span> { <span class="ident">Byte</span>(<span class="ident">b</span> <span class="kw">as</span> <span class="ident">u16</span>) } |
| <span class="kw">fn</span> <span class="ident">eof</span>() <span class="op">-></span> <span class="self">Self</span> { <span class="ident">Byte</span>(<span class="number">256</span>) } |
| <span class="kw">fn</span> <span class="ident">is_eof</span>(<span class="kw-2">&</span><span class="self">self</span>) <span class="op">-></span> <span class="ident">bool</span> { <span class="self">self</span>.<span class="number">0</span> <span class="op">==</span> <span class="number">256</span> } |
| |
| <span class="kw">fn</span> <span class="ident">is_ascii_word</span>(<span class="kw-2">&</span><span class="self">self</span>) <span class="op">-></span> <span class="ident">bool</span> { |
| <span class="kw">let</span> <span class="ident">b</span> <span class="op">=</span> <span class="kw">match</span> <span class="self">self</span>.<span class="ident">as_byte</span>() { |
| <span class="prelude-val">None</span> <span class="op">=></span> <span class="kw">return</span> <span class="bool-val">false</span>, |
| <span class="prelude-val">Some</span>(<span class="ident">b</span>) <span class="op">=></span> <span class="ident">b</span>, |
| }; |
| <span class="kw">match</span> <span class="ident">b</span> { |
| <span class="string">b'A'</span>...<span class="string">b'Z'</span> <span class="op">|</span> <span class="string">b'a'</span>...<span class="string">b'z'</span> <span class="op">|</span> <span class="string">b'0'</span>...<span class="string">b'9'</span> <span class="op">|</span> <span class="string">b'_'</span> <span class="op">=></span> <span class="bool-val">true</span>, |
| _ <span class="op">=></span> <span class="bool-val">false</span>, |
| } |
| } |
| |
| <span class="kw">fn</span> <span class="ident">as_byte</span>(<span class="kw-2">&</span><span class="self">self</span>) <span class="op">-></span> <span class="prelude-ty">Option</span><span class="op"><</span><span class="ident">u8</span><span class="op">></span> { |
| <span class="kw">if</span> <span class="self">self</span>.<span class="ident">is_eof</span>() { |
| <span class="prelude-val">None</span> |
| } <span class="kw">else</span> { |
| <span class="prelude-val">Some</span>(<span class="self">self</span>.<span class="number">0</span> <span class="kw">as</span> <span class="ident">u8</span>) |
| } |
| } |
| } |
| |
| <span class="kw">impl</span> <span class="ident">fmt</span>::<span class="ident">Debug</span> <span class="kw">for</span> <span class="ident">State</span> { |
| <span class="kw">fn</span> <span class="ident">fmt</span>(<span class="kw-2">&</span><span class="self">self</span>, <span class="ident">f</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">fmt</span>::<span class="ident">Formatter</span>) <span class="op">-></span> <span class="ident">fmt</span>::<span class="prelude-ty">Result</span> { |
| <span class="kw">let</span> <span class="ident">ips</span>: <span class="ident">Vec</span><span class="op"><</span><span class="ident">usize</span><span class="op">></span> <span class="op">=</span> <span class="self">self</span>.<span class="ident">inst_ptrs</span>().<span class="ident">collect</span>(); |
| <span class="ident">f</span>.<span class="ident">debug_struct</span>(<span class="string">"State"</span>) |
| .<span class="ident">field</span>(<span class="string">"flags"</span>, <span class="kw-2">&</span><span class="self">self</span>.<span class="ident">flags</span>()) |
| .<span class="ident">field</span>(<span class="string">"insts"</span>, <span class="kw-2">&</span><span class="ident">ips</span>) |
| .<span class="ident">finish</span>() |
| } |
| } |
| |
| <span class="kw">impl</span> <span class="ident">fmt</span>::<span class="ident">Debug</span> <span class="kw">for</span> <span class="ident">Transitions</span> { |
| <span class="kw">fn</span> <span class="ident">fmt</span>(<span class="kw-2">&</span><span class="self">self</span>, <span class="ident">f</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">fmt</span>::<span class="ident">Formatter</span>) <span class="op">-></span> <span class="ident">fmt</span>::<span class="prelude-ty">Result</span> { |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">fmtd</span> <span class="op">=</span> <span class="ident">f</span>.<span class="ident">debug_map</span>(); |
| <span class="kw">for</span> <span class="ident">si</span> <span class="kw">in</span> <span class="number">0</span>..<span class="self">self</span>.<span class="ident">num_states</span>() { |
| <span class="kw">let</span> <span class="ident">s</span> <span class="op">=</span> <span class="ident">si</span> <span class="op">*</span> <span class="self">self</span>.<span class="ident">num_byte_classes</span>; |
| <span class="kw">let</span> <span class="ident">e</span> <span class="op">=</span> <span class="ident">s</span> <span class="op">+</span> <span class="self">self</span>.<span class="ident">num_byte_classes</span>; |
| <span class="ident">fmtd</span>.<span class="ident">entry</span>(<span class="kw-2">&</span><span class="ident">si</span>.<span class="ident">to_string</span>(), <span class="kw-2">&</span><span class="ident">TransitionsRow</span>(<span class="kw-2">&</span><span class="self">self</span>.<span class="ident">table</span>[<span class="ident">s</span>..<span class="ident">e</span>])); |
| } |
| <span class="ident">fmtd</span>.<span class="ident">finish</span>() |
| } |
| } |
| |
| <span class="kw">struct</span> <span class="ident">TransitionsRow</span><span class="op"><</span><span class="lifetime">'a</span><span class="op">></span>(<span class="kw-2">&</span><span class="lifetime">'a</span> [<span class="ident">StatePtr</span>]); |
| |
| <span class="kw">impl</span><span class="op"><</span><span class="lifetime">'a</span><span class="op">></span> <span class="ident">fmt</span>::<span class="ident">Debug</span> <span class="kw">for</span> <span class="ident">TransitionsRow</span><span class="op"><</span><span class="lifetime">'a</span><span class="op">></span> { |
| <span class="kw">fn</span> <span class="ident">fmt</span>(<span class="kw-2">&</span><span class="self">self</span>, <span class="ident">f</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">fmt</span>::<span class="ident">Formatter</span>) <span class="op">-></span> <span class="ident">fmt</span>::<span class="prelude-ty">Result</span> { |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">fmtd</span> <span class="op">=</span> <span class="ident">f</span>.<span class="ident">debug_map</span>(); |
| <span class="kw">for</span> (<span class="ident">b</span>, <span class="ident">si</span>) <span class="kw">in</span> <span class="self">self</span>.<span class="number">0</span>.<span class="ident">iter</span>().<span class="ident">enumerate</span>() { |
| <span class="kw">match</span> <span class="kw-2">*</span><span class="ident">si</span> { |
| <span class="ident">STATE_UNKNOWN</span> <span class="op">=></span> {} |
| <span class="ident">STATE_DEAD</span> <span class="op">=></span> { |
| <span class="ident">fmtd</span>.<span class="ident">entry</span>(<span class="kw-2">&</span><span class="ident">vb</span>(<span class="ident">b</span> <span class="kw">as</span> <span class="ident">usize</span>), <span class="kw-2">&</span><span class="string">"DEAD"</span>); |
| } |
| <span class="ident">si</span> <span class="op">=></span> { |
| <span class="ident">fmtd</span>.<span class="ident">entry</span>(<span class="kw-2">&</span><span class="ident">vb</span>(<span class="ident">b</span> <span class="kw">as</span> <span class="ident">usize</span>), <span class="kw-2">&</span><span class="ident">si</span>.<span class="ident">to_string</span>()); |
| } |
| } |
| } |
| <span class="ident">fmtd</span>.<span class="ident">finish</span>() |
| } |
| } |
| |
| <span class="kw">impl</span> <span class="ident">fmt</span>::<span class="ident">Debug</span> <span class="kw">for</span> <span class="ident">StateFlags</span> { |
| <span class="kw">fn</span> <span class="ident">fmt</span>(<span class="kw-2">&</span><span class="self">self</span>, <span class="ident">f</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">fmt</span>::<span class="ident">Formatter</span>) <span class="op">-></span> <span class="ident">fmt</span>::<span class="prelude-ty">Result</span> { |
| <span class="ident">f</span>.<span class="ident">debug_struct</span>(<span class="string">"StateFlags"</span>) |
| .<span class="ident">field</span>(<span class="string">"is_match"</span>, <span class="kw-2">&</span><span class="self">self</span>.<span class="ident">is_match</span>()) |
| .<span class="ident">field</span>(<span class="string">"is_word"</span>, <span class="kw-2">&</span><span class="self">self</span>.<span class="ident">is_word</span>()) |
| .<span class="ident">field</span>(<span class="string">"has_empty"</span>, <span class="kw-2">&</span><span class="self">self</span>.<span class="ident">has_empty</span>()) |
| .<span class="ident">finish</span>() |
| } |
| } |
| |
| <span class="doccomment">/// Helper function for formatting a byte as a nice-to-read escaped string.</span> |
| <span class="kw">fn</span> <span class="ident">vb</span>(<span class="ident">b</span>: <span class="ident">usize</span>) <span class="op">-></span> <span class="ident">String</span> { |
| <span class="kw">use</span> <span class="ident">std</span>::<span class="ident">ascii</span>::<span class="ident">escape_default</span>; |
| |
| <span class="kw">if</span> <span class="ident">b</span> <span class="op">></span> ::<span class="ident">std</span>::<span class="ident">u8</span>::<span class="ident">MAX</span> <span class="kw">as</span> <span class="ident">usize</span> { |
| <span class="string">"EOF"</span>.<span class="ident">to_owned</span>() |
| } <span class="kw">else</span> { |
| <span class="kw">let</span> <span class="ident">escaped</span> <span class="op">=</span> <span class="ident">escape_default</span>(<span class="ident">b</span> <span class="kw">as</span> <span class="ident">u8</span>).<span class="ident">collect</span>::<span class="op"><</span><span class="ident">Vec</span><span class="op"><</span><span class="ident">u8</span><span class="op">>></span>(); |
| <span class="ident">String</span>::<span class="ident">from_utf8_lossy</span>(<span class="kw-2">&</span><span class="ident">escaped</span>).<span class="ident">into_owned</span>() |
| } |
| } |
| |
| <span class="kw">fn</span> <span class="ident">usize_to_u32</span>(<span class="ident">n</span>: <span class="ident">usize</span>) <span class="op">-></span> <span class="ident">u32</span> { |
| <span class="kw">if</span> (<span class="ident">n</span> <span class="kw">as</span> <span class="ident">u64</span>) <span class="op">></span> (::<span class="ident">std</span>::<span class="ident">u32</span>::<span class="ident">MAX</span> <span class="kw">as</span> <span class="ident">u64</span>) { |
| <span class="macro">panic</span><span class="macro">!</span>(<span class="string">"BUG: {} is too big to fit into u32"</span>, <span class="ident">n</span>) |
| } |
| <span class="ident">n</span> <span class="kw">as</span> <span class="ident">u32</span> |
| } |
| |
| <span class="attribute">#[<span class="ident">allow</span>(<span class="ident">dead_code</span>)]</span> <span class="comment">// useful for debugging</span> |
| <span class="kw">fn</span> <span class="ident">show_state_ptr</span>(<span class="ident">si</span>: <span class="ident">StatePtr</span>) <span class="op">-></span> <span class="ident">String</span> { |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">s</span> <span class="op">=</span> <span class="macro">format</span><span class="macro">!</span>(<span class="string">"{:?}"</span>, <span class="ident">si</span> <span class="op">&</span> <span class="ident">STATE_MAX</span>); |
| <span class="kw">if</span> <span class="ident">si</span> <span class="op">==</span> <span class="ident">STATE_UNKNOWN</span> { |
| <span class="ident">s</span> <span class="op">=</span> <span class="macro">format</span><span class="macro">!</span>(<span class="string">"{} (unknown)"</span>, <span class="ident">s</span>); |
| } |
| <span class="kw">if</span> <span class="ident">si</span> <span class="op">==</span> <span class="ident">STATE_DEAD</span> { |
| <span class="ident">s</span> <span class="op">=</span> <span class="macro">format</span><span class="macro">!</span>(<span class="string">"{} (dead)"</span>, <span class="ident">s</span>); |
| } |
| <span class="kw">if</span> <span class="ident">si</span> <span class="op">==</span> <span class="ident">STATE_QUIT</span> { |
| <span class="ident">s</span> <span class="op">=</span> <span class="macro">format</span><span class="macro">!</span>(<span class="string">"{} (quit)"</span>, <span class="ident">s</span>); |
| } |
| <span class="kw">if</span> <span class="ident">si</span> <span class="op">&</span> <span class="ident">STATE_START</span> <span class="op">></span> <span class="number">0</span> { |
| <span class="ident">s</span> <span class="op">=</span> <span class="macro">format</span><span class="macro">!</span>(<span class="string">"{} (start)"</span>, <span class="ident">s</span>); |
| } |
| <span class="kw">if</span> <span class="ident">si</span> <span class="op">&</span> <span class="ident">STATE_MATCH</span> <span class="op">></span> <span class="number">0</span> { |
| <span class="ident">s</span> <span class="op">=</span> <span class="macro">format</span><span class="macro">!</span>(<span class="string">"{} (match)"</span>, <span class="ident">s</span>); |
| } |
| <span class="ident">s</span> |
| } |
| |
| <span class="doccomment">/// https://developers.google.com/protocol-buffers/docs/encoding#varints</span> |
| <span class="kw">fn</span> <span class="ident">write_vari32</span>(<span class="ident">data</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">Vec</span><span class="op"><</span><span class="ident">u8</span><span class="op">></span>, <span class="ident">n</span>: <span class="ident">i32</span>) { |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">un</span> <span class="op">=</span> (<span class="ident">n</span> <span class="kw">as</span> <span class="ident">u32</span>) <span class="op"><<</span> <span class="number">1</span>; |
| <span class="kw">if</span> <span class="ident">n</span> <span class="op"><</span> <span class="number">0</span> { |
| <span class="ident">un</span> <span class="op">=</span> <span class="op">!</span><span class="ident">un</span>; |
| } |
| <span class="ident">write_varu32</span>(<span class="ident">data</span>, <span class="ident">un</span>) |
| } |
| |
| <span class="doccomment">/// https://developers.google.com/protocol-buffers/docs/encoding#varints</span> |
| <span class="kw">fn</span> <span class="ident">read_vari32</span>(<span class="ident">data</span>: <span class="kw-2">&</span>[<span class="ident">u8</span>]) <span class="op">-></span> (<span class="ident">i32</span>, <span class="ident">usize</span>) { |
| <span class="kw">let</span> (<span class="ident">un</span>, <span class="ident">i</span>) <span class="op">=</span> <span class="ident">read_varu32</span>(<span class="ident">data</span>); |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">n</span> <span class="op">=</span> (<span class="ident">un</span> <span class="op">>></span> <span class="number">1</span>) <span class="kw">as</span> <span class="ident">i32</span>; |
| <span class="kw">if</span> <span class="ident">un</span> <span class="op">&</span> <span class="number">1</span> <span class="op">!=</span> <span class="number">0</span> { |
| <span class="ident">n</span> <span class="op">=</span> <span class="op">!</span><span class="ident">n</span>; |
| } |
| (<span class="ident">n</span>, <span class="ident">i</span>) |
| } |
| |
| <span class="doccomment">/// https://developers.google.com/protocol-buffers/docs/encoding#varints</span> |
| <span class="kw">fn</span> <span class="ident">write_varu32</span>(<span class="ident">data</span>: <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">Vec</span><span class="op"><</span><span class="ident">u8</span><span class="op">></span>, <span class="kw-2">mut</span> <span class="ident">n</span>: <span class="ident">u32</span>) { |
| <span class="kw">while</span> <span class="ident">n</span> <span class="op">>=</span> <span class="number">0b1000_0000</span> { |
| <span class="ident">data</span>.<span class="ident">push</span>((<span class="ident">n</span> <span class="kw">as</span> <span class="ident">u8</span>) <span class="op">|</span> <span class="number">0b1000_0000</span>); |
| <span class="ident">n</span> <span class="op">>>=</span> <span class="number">7</span>; |
| } |
| <span class="ident">data</span>.<span class="ident">push</span>(<span class="ident">n</span> <span class="kw">as</span> <span class="ident">u8</span>); |
| } |
| |
| <span class="doccomment">/// https://developers.google.com/protocol-buffers/docs/encoding#varints</span> |
| <span class="kw">fn</span> <span class="ident">read_varu32</span>(<span class="ident">data</span>: <span class="kw-2">&</span>[<span class="ident">u8</span>]) <span class="op">-></span> (<span class="ident">u32</span>, <span class="ident">usize</span>) { |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">n</span>: <span class="ident">u32</span> <span class="op">=</span> <span class="number">0</span>; |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">shift</span>: <span class="ident">u32</span> <span class="op">=</span> <span class="number">0</span>; |
| <span class="kw">for</span> (<span class="ident">i</span>, <span class="kw-2">&</span><span class="ident">b</span>) <span class="kw">in</span> <span class="ident">data</span>.<span class="ident">iter</span>().<span class="ident">enumerate</span>() { |
| <span class="kw">if</span> <span class="ident">b</span> <span class="op"><</span> <span class="number">0b1000_0000</span> { |
| <span class="kw">return</span> (<span class="ident">n</span> <span class="op">|</span> ((<span class="ident">b</span> <span class="kw">as</span> <span class="ident">u32</span>) <span class="op"><<</span> <span class="ident">shift</span>), <span class="ident">i</span> <span class="op">+</span> <span class="number">1</span>); |
| } |
| <span class="ident">n</span> <span class="op">|=</span> ((<span class="ident">b</span> <span class="kw">as</span> <span class="ident">u32</span>) <span class="op">&</span> <span class="number">0b0111_1111</span>) <span class="op"><<</span> <span class="ident">shift</span>; |
| <span class="ident">shift</span> <span class="op">+=</span> <span class="number">7</span>; |
| } |
| (<span class="number">0</span>, <span class="number">0</span>) |
| } |
| |
| <span class="attribute">#[<span class="ident">cfg</span>(<span class="ident">test</span>)]</span> |
| <span class="kw">mod</span> <span class="ident">tests</span> { |
| <span class="kw">extern</span> <span class="kw">crate</span> <span class="ident">rand</span>; |
| |
| <span class="kw">use</span> <span class="ident">quickcheck</span>::{<span class="ident">QuickCheck</span>, <span class="ident">StdGen</span>, <span class="ident">quickcheck</span>}; |
| <span class="kw">use</span> <span class="kw">super</span>::{ |
| <span class="ident">StateFlags</span>, <span class="ident">State</span>, <span class="ident">push_inst_ptr</span>, |
| <span class="ident">write_varu32</span>, <span class="ident">read_varu32</span>, <span class="ident">write_vari32</span>, <span class="ident">read_vari32</span>, |
| }; |
| |
| <span class="attribute">#[<span class="ident">test</span>]</span> |
| <span class="kw">fn</span> <span class="ident">prop_state_encode_decode</span>() { |
| <span class="kw">fn</span> <span class="ident">p</span>(<span class="ident">ips</span>: <span class="ident">Vec</span><span class="op"><</span><span class="ident">u32</span><span class="op">></span>, <span class="ident">flags</span>: <span class="ident">u8</span>) <span class="op">-></span> <span class="ident">bool</span> { |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">data</span> <span class="op">=</span> <span class="macro">vec</span><span class="macro">!</span>[<span class="ident">flags</span>]; |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">prev</span> <span class="op">=</span> <span class="number">0</span>; |
| <span class="kw">for</span> <span class="kw-2">&</span><span class="ident">ip</span> <span class="kw">in</span> <span class="ident">ips</span>.<span class="ident">iter</span>() { |
| <span class="ident">push_inst_ptr</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">data</span>, <span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">prev</span>, <span class="ident">ip</span>); |
| } |
| <span class="kw">let</span> <span class="ident">state</span> <span class="op">=</span> <span class="ident">State</span> { <span class="ident">data</span>: <span class="ident">data</span>.<span class="ident">into_boxed_slice</span>() }; |
| |
| <span class="kw">let</span> <span class="ident">expected</span>: <span class="ident">Vec</span><span class="op"><</span><span class="ident">usize</span><span class="op">></span> <span class="op">=</span> |
| <span class="ident">ips</span>.<span class="ident">into_iter</span>().<span class="ident">map</span>(<span class="op">|</span><span class="ident">ip</span><span class="op">|</span> <span class="ident">ip</span> <span class="kw">as</span> <span class="ident">usize</span>).<span class="ident">collect</span>(); |
| <span class="kw">let</span> <span class="ident">got</span>: <span class="ident">Vec</span><span class="op"><</span><span class="ident">usize</span><span class="op">></span> <span class="op">=</span> <span class="ident">state</span>.<span class="ident">inst_ptrs</span>().<span class="ident">collect</span>(); |
| <span class="ident">expected</span> <span class="op">==</span> <span class="ident">got</span> <span class="op">&&</span> <span class="ident">state</span>.<span class="ident">flags</span>() <span class="op">==</span> <span class="ident">StateFlags</span>(<span class="ident">flags</span>) |
| } |
| <span class="ident">QuickCheck</span>::<span class="ident">new</span>() |
| .<span class="ident">gen</span>(<span class="ident">StdGen</span>::<span class="ident">new</span>(<span class="self">self</span>::<span class="ident">rand</span>::<span class="ident">thread_rng</span>(), <span class="number">10_000</span>)) |
| .<span class="ident">quickcheck</span>(<span class="ident">p</span> <span class="kw">as</span> <span class="kw">fn</span>(<span class="ident">Vec</span><span class="op"><</span><span class="ident">u32</span><span class="op">></span>, <span class="ident">u8</span>) <span class="op">-></span> <span class="ident">bool</span>); |
| } |
| |
| <span class="attribute">#[<span class="ident">test</span>]</span> |
| <span class="kw">fn</span> <span class="ident">prop_read_write_u32</span>() { |
| <span class="kw">fn</span> <span class="ident">p</span>(<span class="ident">n</span>: <span class="ident">u32</span>) <span class="op">-></span> <span class="ident">bool</span> { |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">buf</span> <span class="op">=</span> <span class="macro">vec</span><span class="macro">!</span>[]; |
| <span class="ident">write_varu32</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">buf</span>, <span class="ident">n</span>); |
| <span class="kw">let</span> (<span class="ident">got</span>, <span class="ident">nread</span>) <span class="op">=</span> <span class="ident">read_varu32</span>(<span class="kw-2">&</span><span class="ident">buf</span>); |
| <span class="ident">nread</span> <span class="op">==</span> <span class="ident">buf</span>.<span class="ident">len</span>() <span class="op">&&</span> <span class="ident">got</span> <span class="op">==</span> <span class="ident">n</span> |
| } |
| <span class="ident">quickcheck</span>(<span class="ident">p</span> <span class="kw">as</span> <span class="kw">fn</span>(<span class="ident">u32</span>) <span class="op">-></span> <span class="ident">bool</span>); |
| } |
| |
| <span class="attribute">#[<span class="ident">test</span>]</span> |
| <span class="kw">fn</span> <span class="ident">prop_read_write_i32</span>() { |
| <span class="kw">fn</span> <span class="ident">p</span>(<span class="ident">n</span>: <span class="ident">i32</span>) <span class="op">-></span> <span class="ident">bool</span> { |
| <span class="kw">let</span> <span class="kw-2">mut</span> <span class="ident">buf</span> <span class="op">=</span> <span class="macro">vec</span><span class="macro">!</span>[]; |
| <span class="ident">write_vari32</span>(<span class="kw-2">&</span><span class="kw-2">mut</span> <span class="ident">buf</span>, <span class="ident">n</span>); |
| <span class="kw">let</span> (<span class="ident">got</span>, <span class="ident">nread</span>) <span class="op">=</span> <span class="ident">read_vari32</span>(<span class="kw-2">&</span><span class="ident">buf</span>); |
| <span class="ident">nread</span> <span class="op">==</span> <span class="ident">buf</span>.<span class="ident">len</span>() <span class="op">&&</span> <span class="ident">got</span> <span class="op">==</span> <span class="ident">n</span> |
| } |
| <span class="ident">quickcheck</span>(<span class="ident">p</span> <span class="kw">as</span> <span class="kw">fn</span>(<span class="ident">i32</span>) <span class="op">-></span> <span class="ident">bool</span>); |
| } |
| } |
| </pre> |
| </section> |
| <section id='search' class="content hidden"></section> |
| |
| <section class="footer"></section> |
| |
| <aside id="help" class="hidden"> |
| <div> |
| <h1 class="hidden">Help</h1> |
| |
| <div class="shortcuts"> |
| <h2>Keyboard Shortcuts</h2> |
| |
| <dl> |
| <dt>?</dt> |
| <dd>Show this help dialog</dd> |
| <dt>S</dt> |
| <dd>Focus the search field</dd> |
| <dt>⇤</dt> |
| <dd>Move up in search results</dd> |
| <dt>⇥</dt> |
| <dd>Move down in search results</dd> |
| <dt>⏎</dt> |
| <dd>Go to active search result</dd> |
| <dt>+</dt> |
| <dd>Collapse/expand all sections</dd> |
| </dl> |
| </div> |
| |
| <div class="infos"> |
| <h2>Search Tricks</h2> |
| |
| <p> |
| Prefix searches with a type followed by a colon (e.g. |
| <code>fn:</code>) to restrict the search to a given type. |
| </p> |
| |
| <p> |
| Accepted types are: <code>fn</code>, <code>mod</code>, |
| <code>struct</code>, <code>enum</code>, |
| <code>trait</code>, <code>type</code>, <code>macro</code>, |
| and <code>const</code>. |
| </p> |
| |
| <p> |
| Search functions by type signature (e.g. |
| <code>vec -> usize</code> or <code>* -> vec</code>) |
| </p> |
| </div> |
| </div> |
| </aside> |
| |
| |
| |
| <script> |
| window.rootPath = "../../"; |
| window.currentCrate = "regex"; |
| </script> |
| <script src="../../main.js"></script> |
| <script defer src="../../search-index.js"></script> |
| </body> |
| </html> |