| <?xml version="1.0" encoding="UTF-8"?> |
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
| <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> |
| <meta http-equiv="Content-Style-Type" content="text/css"/> |
| <link rel="stylesheet" href="../book.css" charset="ISO-8859-1" type="text/css"/> |
| <title>R Help - Search</title> |
| <meta name="copyright" content="Copyright (c) 2009, 2021 The Apache Software Foundation, Stephan Wahlbrink and others. SPDX-License-Identifier: Apache-2.0"/> |
| </head> |
| <body> |
| |
| <h1>R Help - Search</h1> |
| |
| <p>The R help integration in StatET includes an advanced, fast search engine for the help pages of |
| the R packages installed in an R environment.</p> |
| <p>The handling of the R help search is similar to the workflow of most other searches in Eclipse. |
| A search is entered in the 'R Help' tab in the Search dialog, which is accessible for example via |
| the button in the R help view or, in most perspectives, the 'Search' dropdown button in the |
| toolbar. The search results are shown structured and sortable in a new page of the |
| <a href="PLUGINS_ROOT/org.eclipse.platform.doc.user/concepts/csearchview.htm">Search view</a>. |
| Then one can open selected pages in the R Help view.</p> |
| |
| <p>To enable the search functionality, the R environment must be |
| <a href="r_env-index.xhtml">indexed</a>.</p> |
| |
| |
| <h2>Standard Search</h2> |
| |
| <img src="../images/screenshot_r_help_search_input.png" title="Search dialog, R help" |
| alt="Screenshot: Search dialog, R help"/> |
| |
| <p>The R help search page provides three standard modes to search for help pages:</p> |
| <ul> |
| <li><b>Topics/Alias (strict)</b>: The text must match exactly a topic of the page. |
| This is comparable to a simple call of <code>help("searchtopic")</code> in R.</li> |
| <li><b>Selected Fields</b>: The text is searched with tolerance in the selected page |
| field(s).</li> |
| <li><b>Complete Document</b>: This is a classic full text search, the text is searched with |
| tolerance in the complete text of the help page.</li> |
| </ul> |
| <p>The matching with tolerance is comparable with the known internet search engines. That means |
| most flections of words and punctuation mark are ignored.</p> |
| |
| <p>Additionally, the search can be filtered by selected <b>packages</b> or an R help |
| <b>keyword</b>. |
| If the search text is empty, this options allow to show all help pages of the selected packages or |
| tagged with the specified keyword. |
| </p> |
| |
| |
| <h2 id="query">Advanced Query Syntax</h2> |
| |
| <p>The R help search is powered by the Apache Lucene library, which also allows several options to |
| perform advanced customized searches. This section describes the syntax for queries typed in the |
| search text field. |
| </p> |
| |
| <h3>Terms</h3> |
| <p>A query is broken up into terms and operators. There are two types of terms: Single Terms and Phrases.</p> |
| <p>A Single Term is a single word such as "test" or "hello". |
| </p> |
| <p>A Phrase is a group of words surrounded by double quotes such as "hello dolly". |
| </p> |
| <p>Multiple terms can be combined together with <a href="#querysyntax-boolean_operators">Boolean |
| operators</a> to form a more complex query. The AND operator is the default conjunction operator. |
| This means that if there is no Boolean operator between two terms, a help document have to match |
| both terms. |
| </p> |
| |
| <h3>Fields</h3> |
| <p>Lucene supports fielded data. When performing a search you can either specify a field, or use the default field(s). |
| For field names and the default field(s) for R help search see <a href="#fields">table below</a>.</p> |
| <p>You can search any field by typing the field name followed by a colon ":" and then the term you are looking for.</p> |
| <p>As an example, let's assume a Lucene index contains two fields, title and text and text is the default field. |
| If you want to find the document entitled "The Right Way" which contains the text "don't go this way", you can enter:</p> |
| <pre class="code">title:"The Right Way" AND text:go</pre> |
| <p>or</p> |
| <pre class="code">title:"The Right Way" AND go</pre> |
| <p>Since text is the default field, the field indicator is not required.</p> |
| |
| <h4 id="fields">Fields for R Help Pages</h4> |
| <table class="style1"> |
| <tr><td><code>pkg</code></td> |
| <td>the exact name of the package</td></tr> |
| <tr><td><code>page</code></td> |
| <td>the exact name of the page</td></tr> |
| <tr><td><code>alias</code></td> |
| <td>an exact alias/topic of the page (1)</td></tr> |
| |
| <tr><td><code>alias.txt</code></td> |
| <td>an alias/topic of the page (2)</td></tr> |
| <tr><td><code>title.txt</code></td> |
| <td>the title of the page (2, 3)</td></tr> |
| <tr><td><code>concept.txt</code></td> |
| <td>a concept of the page (2)</td></tr> |
| <tr><td><code>keyword</code></td> |
| <td>a keyword of the page</td></tr> |
| <tr><td><code>descr.txt</code></td> |
| <td>the description section of the page (3)</td></tr> |
| <tr><td><code>doc.txt</code></td> |
| <td>the main content of the page except the title, description and examples section (3)</td></tr> |
| <tr><td><code>examples.txt</code></td> |
| <td>the examples section of the page (3)</td></tr> |
| </table> |
| <p>Default field(s) depends on the selected R help search mode: |
| (1) = "Topics/Alias (strict)", |
| (2) = "Selected Field" (only the selected fields) and |
| (3) = "Complete Document". |
| </p> |
| |
| <h3>Term Modifiers</h3> |
| <p>Lucene supports modifying query terms to provide a wide range of searching options.</p> |
| |
| <h4 id="querysyntax-wildcard_searches">Wildcard Searches</h4> |
| <p>Lucene supports single and multiple character wildcard searches within single terms |
| (not within phrase queries).</p> |
| <p>To perform a single character wildcard search use the "?" symbol.</p> |
| <p>To perform a multiple character wildcard search use the "*" symbol.</p> |
| <p>The single character wildcard search looks for terms that match that with the single character replaced. For example, to search for "text" or "test" you can use the search:</p> |
| <pre class="code">te?t</pre> |
| <p>Multiple character wildcard searches looks for 0 or more characters. For example, to search for test, tests or tester, you can use the search:</p> |
| <pre class="code">test*</pre> |
| <p>You can also use the wildcard searches in the middle of a term.</p> |
| <pre class="code">te*t</pre> |
| <p>Note: You cannot use a * or ? symbol as the first character of a search.</p> |
| |
| <h4 id="querysyntax-regexp_searches">Regular Expression Searches</h4> |
| <p>Lucene supports regular expression searches matching a pattern between forward slashes "/". |
| For example to find documents containing "moat" or "boat":</p> |
| <pre class="code">/[mb]oat/</pre> |
| |
| <h4 id="querysyntax-fuzzy_searches">Fuzzy Searches</h4> |
| <p>Lucene supports fuzzy searches based on Damerau-Levenshtein Distance. To do a fuzzy search use the tilde, "~", symbol at the end of a Single word Term. For example to search for a term similar in spelling to "roam" use the fuzzy search:</p> |
| <pre class="code">roam~</pre> |
| <p>This search will find terms like foam and roams.</p> |
| <p>An additional (optional) parameter can specify the maximum number of edits allowed. The value is between 0 and 2, For example:</p> |
| <pre class="code">roam~1</pre> |
| <p>The default that is used if the parameter is not given is 2 edit distances.</p> |
| |
| <h4 id="querysyntax-proximity_searches">Proximity Searches</h4> |
| <p>Lucene supports finding words are a within a specific distance away. To do a proximity search use the tilde, "~", symbol at the end of a Phrase. For example to search for a "apache" and "jakarta" within 10 words of each other in a document use the search:</p> |
| <pre class="code">"jakarta apache"~10</pre> |
| |
| <h4 id="querysyntax-range_searches">Range Searches</h4> |
| <p>Range Queries allow one to match documents whose field(s) values |
| are between the lower and upper bound specified by the Range Query. |
| Range Queries can be inclusive or exclusive of the upper and lower bounds. |
| Sorting is done lexicographically.</p> |
| <pre class="code">mod_date:[20020101 TO 20030101]</pre> |
| <p>This will find documents whose mod_date fields have values between 20020101 and 20030101, inclusive. |
| Note that Range Queries are not reserved for date fields. You could also use range queries with non-date fields:</p> |
| <pre class="code">title:{Aida TO Carmen}</pre> |
| <p>This will find all documents whose titles are between Aida and Carmen, but not including Aida and Carmen.</p> |
| <p>Inclusive range queries are denoted by square brackets. Exclusive range queries are denoted by |
| curly brackets.</p> |
| |
| <h4 id="querysyntax-boosting">Boosting a Term</h4> |
| <p>Lucene provides the relevance level of matching documents based on the terms found. To boost a term use the caret, "^", symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be.</p> |
| <p>Boosting allows you to control the relevance of a document by boosting its term. For example, if you are searching for</p> |
| <pre class="code">jakarta apache</pre> |
| <p>and you want the term "jakarta" to be more relevant boost it using the ^ symbol along with the boost factor next to the term. |
| You would type:</p> |
| <pre class="code">jakarta^4 apache</pre> |
| <p>This will make documents with the term jakarta appear more relevant. You can also boost Phrase Terms as in the example:</p> |
| <pre class="code">"jakarta apache"^4 "Apache Lucene"</pre> |
| <p>By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (e.g. 0.2).</p> |
| |
| <h3 id="querysyntax-boolean_operators">Boolean Operators</h3> |
| <p>Boolean operators allow terms to be combined through logic operators. |
| Lucene supports AND, "+", OR, NOT and "-" as Boolean operators (Note: Boolean operators must be ALL CAPS).</p> |
| |
| <h4>OR</h4> |
| <p>The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. |
| The symbol || can be used in place of the word OR.</p> |
| <p>To search for documents that contain either "jakarta apache" or just "jakarta" use the query:</p> |
| <pre class="code">"jakarta apache" jakarta</pre> |
| <p>or</p> |
| <pre class="code">"jakarta apache" OR jakarta</pre> |
| |
| <h4>AND</h4> |
| <p>The AND operator matches documents where both terms exist anywhere in the text of a single document. |
| This is equivalent to an intersection using sets. The symbol && can be used in place of the word AND.</p> |
| <p>To search for documents that contain "jakarta apache" and "Apache Lucene" use the query:</p> |
| <pre class="code">"jakarta apache" AND "Apache Lucene"</pre> |
| |
| <h4>+</h4> |
| <p>The "+" or required operator requires that the term after the "+" symbol exist somewhere in a the field of a single document.</p> |
| <p>To search for documents that must contain "jakarta" and may contain "lucene" use the query:</p> |
| <pre class="code">+jakarta lucene</pre> |
| |
| <h4>NOT</h4> |
| <p>The NOT operator excludes documents that contain the term after NOT. |
| This is equivalent to a difference using sets. The symbol ! can be used in place of the word NOT.</p> |
| <p>To search for documents that contain "jakarta apache" but not "Apache Lucene" use the query:</p> |
| <pre class="code">"jakarta apache" NOT "Apache Lucene"</pre> |
| <p>Note: The NOT operator cannot be used with just one term. For example, the following search will return no results:</p> |
| <pre class="code">NOT "jakarta apache"</pre> |
| |
| <h4>-</h4> |
| <p>The "-" or prohibit operator excludes documents that contain the term after the "-" symbol.</p> |
| <p>To search for documents that contain "jakarta apache" but not "Apache Lucene" use the query:</p> |
| <pre class="code">"jakarta apache" -"Apache Lucene"</pre> |
| |
| <h3>Grouping</h3> |
| <p>Lucene supports using parentheses to group clauses to form sub queries. This can be very useful if you want to control the boolean logic for a query.</p> |
| <p>To search for either "jakarta" or "apache" and "website" use the query:</p> |
| <pre class="code">(jakarta OR apache) AND website</pre> |
| <p>This eliminates any confusion and makes sure you that website must exist and either term jakarta or apache may exist.</p> |
| |
| <h3>Field Grouping</h3> |
| <p>Lucene supports using parentheses to group multiple clauses to a single field.</p> |
| <p>To search for a title that contains both the word "return" and the phrase "pink panther" use the query:</p> |
| <pre class="code">title:(+return +"pink panther")</pre> |
| |
| <h3>Escaping Special Characters</h3> |
| <p>Lucene supports escaping special characters that are part of the query syntax. The current list special characters are</p> |
| <p>+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ /</p> |
| <p>To escape these character use the \ before the character. For example to search for (1+1):2 use the query:</p> |
| <pre class="code">\(1\+1\)\:2</pre> |
| |
| </body> |
| </html> |