<!doctype html public "-//w3c//dtd html 4.0 transitional//en"> | |
<html> | |
<head> | |
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> | |
<meta name="GENERATOR" content="Mozilla/4.79 [en] (Windows NT 5.0; U) [Netscape]"> | |
<title>Help System extension points: Lucene Analyzer</title> | |
</head> | |
<body link="#0000FF" vlink="#800080"> | |
<center> | |
<h1> | |
Lucene Analyzer</h1></center> | |
<b><i>Identifier: </i></b>org.eclipse.help.luceneAnalyzer | |
<p><b><i>Description: </i></b>This extension point is used to register | |
text analyzers for use by by help when indexing and searching documentation. | |
<p>Help exploits capabilities of the Lucene search engine, that allows | |
indexing of token streams (streams of words). Analyzers create tokens | |
from the character stream. They examine text content and provide | |
tokens for use with the index. The text stream can be tokenized in | |
many unique ways. A trivial analyzer can tokenize streams at white | |
space, a different one can perform filtering of tokens, based on the application | |
needs. Since the documentation is mostly human readable text, it | |
is desired that analyzers used by the help system perform language and | |
grammar aware tokenization and normalization of indexed text. For | |
some languages the quality of search increases significantly if stop word | |
removal and stemming is performed on the indexed text. This extension | |
points allows configuring analyzers for the languages that default help | |
system does not provide language aware analyzers. | |
<p><b><i>Configuration Markup:</i></b> | |
<p><tt> <!ELEMENT analyzer EMPTY></tt> | |
<br><tt> <!ATTLIST analyzer</tt> | |
<br><tt> locale | |
CDATA #REQUIRED</tt> | |
<br><tt> class | |
CDATA #REQUIRED</tt> | |
<br><tt> ></tt> | |
<ul> | |
<li> | |
<b>locale -</b> a string identifying locale for which the supplied analyzer | |
is to be used, it two letters language is provided, the analyzer will be | |
available to all locales of that language</li> | |
<li> | |
<b>class</b> - a fully qualified name of the Java class extending <tt>org.apache.lucene.analysis.Analyzer</tt></li> | |
</ul> | |
<b><i>Examples:</i></b> | |
<p>Following is an example of Lucene Analyzer configuration: | |
<p><tt> <extension id="com.xyz.XYZ" point="org.eclipse.help.luceneAnalyzer"></tt> | |
<br><tt> <analyzer locale="ll_CC" | |
class="com.xyz.ll_CCAnalyzer" /></tt> | |
<br><tt> </extension></tt> | |
<p><b><i>API Information</i>:</b> | |
<p>The value of the <tt>locale</tt> attribute must represent either a five | |
or two character locale string. If analyzer is configured for a language | |
by specifying two letter language designation, the analyzer is going to | |
be used for all locales of this language. If analyzer is configured | |
that matches five characters locale, it is going to be used instead. | |
<p>The value of the <tt>class</tt> attribute must represent a class that | |
extends <tt>org.apache.lucene.analysis.Analyzer</tt>. It is recommended | |
that this analyzer performs lowercase filtering for languages where it | |
is possible to increase number of search hits by making search case insensitive. | |
<p><b><i>Supplied Implementation: </i></b>Help system comes with English | |
and German analyzers, that are configured to be used for en and de locales | |
respectively. These analyzers perform stop word filtering, lowercase | |
filtering, and stemming. For languages that no analyzers are | |
configured, help uses simple analyzer that performs lowercase filtering | |
and English stop word filtering. | |
<p><a href="hglegal.htm"><img SRC="ngibmcpy.gif" ALT="Copyright IBM Corp. 2000, 2001. All Rights Reserved." BORDER=0 height=12 width=195></a> | |
</body> | |
</html> |