blob: c85a27d5df13f9ebc962afbc03c4fc710f937738 [file] [log] [blame]
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="GENERATOR" content="Mozilla/4.79 [en] (Windows NT 5.0; U) [Netscape]">
<title>Help System extension points: Lucene Analyzer</title>
</head>
<body link="#0000FF" vlink="#800080">
<center>
<h1>
Lucene Analyzer</h1></center>
<b><i>Identifier: </i></b>org.eclipse.help.luceneAnalyzer
<p><b><i>Description: </i></b>This extension point is used to register
text analyzers for use by by help when indexing and searching documentation.
<p>Help exploits capabilities of the Lucene search engine, that allows
indexing of token streams (streams of words).&nbsp; Analyzers create tokens
from the character stream.&nbsp; They examine text content and provide
tokens for use with the index.&nbsp; The text stream can be tokenized in
many unique ways.&nbsp; A trivial analyzer can tokenize streams at white
space, a different one can perform filtering of tokens, based on the application
needs.&nbsp; Since the documentation is mostly human readable text, it
is desired that analyzers used by the help system perform language and
grammar aware tokenization and normalization of indexed text.&nbsp; For
some languages the quality of search increases significantly if stop word
removal and stemming is performed on the indexed text.&nbsp; This extension
points allows configuring analyzers for the languages that default help
system does not provide language aware analyzers.
<p><b><i>Configuration Markup:</i></b>
<p><tt>&nbsp;&nbsp; &lt;!ELEMENT analyzer EMPTY></tt>
<br><tt>&nbsp;&nbsp; &lt;!ATTLIST analyzer</tt>
<br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; locale&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
CDATA #REQUIRED</tt>
<br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; class&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
CDATA #REQUIRED</tt>
<br><tt>&nbsp;&nbsp; ></tt>
<ul>
<li>
<b>locale -</b> a string identifying locale for which the supplied analyzer
is to be used, it two letters language is provided, the analyzer will be
available to all locales of that language</li>
<li>
<b>class</b> - a fully qualified name of the Java class extending <tt>org.apache.lucene.analysis.Analyzer</tt></li>
</ul>
<b><i>Examples:</i></b>
<p>Following is an example of Lucene Analyzer configuration:
<p><tt>&nbsp;&nbsp;&nbsp; &lt;extension id="com.xyz.XYZ" point="org.eclipse.help.luceneAnalyzer"></tt>
<br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;analyzer locale="ll_CC"
class="com.xyz.ll_CCAnalyzer" /></tt>
<br><tt>&nbsp;&nbsp;&nbsp; &lt;/extension></tt>
<p><b><i>API Information</i>:</b>
<p>The value of the <tt>locale</tt> attribute must represent either a five
or two character locale string.&nbsp; If analyzer is configured for a language
by specifying two letter language designation, the analyzer is going to
be used for all locales of this language.&nbsp; If analyzer is configured
that matches five characters locale, it is going to be used instead.
<p>The value of the <tt>class</tt> attribute must represent a class that
extends <tt>org.apache.lucene.analysis.Analyzer</tt>.&nbsp; It is recommended
that this analyzer performs lowercase filtering for languages where it
is possible to increase number of search hits by making search case insensitive.
<p><b><i>Supplied Implementation: </i></b>Help system comes with English
and German analyzers, that are configured to be used for en and de locales
respectively.&nbsp; These analyzers perform stop word filtering, lowercase
filtering, and stemming.&nbsp;&nbsp; For languages that no analyzers are
configured, help uses simple analyzer that performs lowercase filtering
and English stop word filtering.
<p><a href="hglegal.htm"><img SRC="ngibmcpy.gif" ALT="Copyright IBM Corp. 2000, 2001. All Rights Reserved." BORDER=0 height=12 width=195></a>
</body>
</html>