org.eclipse.help/doc/org_eclipse_help_luceneAnalyzer.html - platform/eclipse.platform.ua - Git at Google

 <!doctype html public "-//w3c//dtd html 4.0 transitional//en">
 <html>
 <head>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    <meta name="GENERATOR" content="Mozilla/4.79 [en] (Windows NT 5.0; U) [Netscape]">
    <title>Help System extension points: Lucene Analyzer</title>
 </head>
 <body link="#0000FF" vlink="#800080">

 <center>
 <h1>
 Lucene Analyzer</h1></center>
 <b><i>Identifier: </i></b>org.eclipse.help.luceneAnalyzer
 <p><b><i>Description: </i></b>This extension point is used to register
 text analyzers for use by by help when indexing and searching documentation.
 <p>Help exploits capabilities of the Lucene search engine, that allows
 indexing of token streams (streams of words).&nbsp; Analyzers create tokens
 from the character stream.&nbsp; They examine text content and provide
 tokens for use with the index.&nbsp; The text stream can be tokenized in
 many unique ways.&nbsp; A trivial analyzer can tokenize streams at white
 space, a different one can perform filtering of tokens, based on the application
 needs.&nbsp; Since the documentation is mostly human readable text, it
 is desired that analyzers used by the help system perform language and
 grammar aware tokenization and normalization of indexed text.&nbsp; For
 some languages the quality of search increases significantly if stop word
 removal and stemming is performed on the indexed text.&nbsp; This extension
 points allows configuring analyzers for the languages that default help
 system does not provide language aware analyzers.
 <p><b><i>Configuration Markup:</i></b>
 <p><tt>&nbsp;&nbsp; &lt;!ELEMENT analyzer EMPTY></tt>
 <br><tt>&nbsp;&nbsp; &lt;!ATTLIST analyzer</tt>
 <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; locale&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 CDATA #REQUIRED</tt>
 <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; class&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 CDATA #REQUIRED</tt>
 <br><tt>&nbsp;&nbsp; ></tt>
 <ul>
 <li>
 <b>locale -</b> a string identifying locale for which the supplied analyzer
 is to be used, it two letters language is provided, the analyzer will be
 available to all locales of that language</li>

 <li>
 <b>class</b> - a fully qualified name of the Java class extending <tt>org.apache.lucene.analysis.Analyzer</tt></li>
 </ul>
 <b><i>Examples:</i></b>
 <p>Following is an example of Lucene Analyzer configuration:
 <p><tt>&nbsp;&nbsp;&nbsp; &lt;extension id="com.xyz.XYZ" point="org.eclipse.help.luceneAnalyzer"></tt>
 <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;analyzer locale="ll_CC"
 class="com.xyz.ll_CCAnalyzer" /></tt>
 <br><tt>&nbsp;&nbsp;&nbsp; &lt;/extension></tt>
 <p><b><i>API Information</i>:</b>
 <p>The value of the <tt>locale</tt> attribute must represent either a five
 or two character locale string.&nbsp; If analyzer is configured for a language
 by specifying two letter language designation, the analyzer is going to
 be used for all locales of this language.&nbsp; If analyzer is configured
 that matches five characters locale, it is going to be used instead.
 <p>The value of the <tt>class</tt> attribute must represent a class that
 extends <tt>org.apache.lucene.analysis.Analyzer</tt>.&nbsp; It is recommended
 that this analyzer performs lowercase filtering for languages where it
 is possible to increase number of search hits by making search case insensitive.
 <p><b><i>Supplied Implementation: </i></b>Help system comes with English
 and German analyzers, that are configured to be used for en and de locales
 respectively.&nbsp; These analyzers perform stop word filtering, lowercase
 filtering, and stemming.&nbsp;&nbsp; For languages that no analyzers are
 configured, help uses simple analyzer that performs lowercase filtering
 and English stop word filtering.
 <p><a href="hglegal.htm"><img SRC="ngibmcpy.gif" ALT="Copyright IBM Corp. 2000, 2001.  All Rights Reserved." BORDER=0 height=12 width=195></a>
 </body>
 </html>
	<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
	<html>
	<head>
	<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
	<meta name="GENERATOR" content="Mozilla/4.79 [en] (Windows NT 5.0; U) [Netscape]">
	<title>Help System extension points: Lucene Analyzer</title>
	</head>
	<body link="#0000FF" vlink="#800080">

	<center>
	<h1>
	Lucene Analyzer</h1></center>
	<b><i>Identifier: </i></b>org.eclipse.help.luceneAnalyzer
	<p><b><i>Description: </i></b>This extension point is used to register
	text analyzers for use by by help when indexing and searching documentation.
	<p>Help exploits capabilities of the Lucene search engine, that allows
	indexing of token streams (streams of words).  Analyzers create tokens
	from the character stream.  They examine text content and provide
	tokens for use with the index.  The text stream can be tokenized in
	many unique ways.  A trivial analyzer can tokenize streams at white
	space, a different one can perform filtering of tokens, based on the application
	needs.  Since the documentation is mostly human readable text, it
	is desired that analyzers used by the help system perform language and
	grammar aware tokenization and normalization of indexed text.  For
	some languages the quality of search increases significantly if stop word
	removal and stemming is performed on the indexed text.  This extension
	points allows configuring analyzers for the languages that default help
	system does not provide language aware analyzers.
	<p><b><i>Configuration Markup:</i></b>
	<p><tt>   <!ELEMENT analyzer EMPTY></tt>
	<br><tt>   <!ATTLIST analyzer</tt>
	<br><tt>      locale
	CDATA #REQUIRED</tt>
	<br><tt>      class
	CDATA #REQUIRED</tt>
	<br><tt>   ></tt>
	<ul>
	<li>
	<b>locale -</b> a string identifying locale for which the supplied analyzer
	is to be used, it two letters language is provided, the analyzer will be
	available to all locales of that language</li>

	<li>
	<b>class</b> - a fully qualified name of the Java class extending <tt>org.apache.lucene.analysis.Analyzer</tt></li>
	</ul>
	<b><i>Examples:</i></b>
	<p>Following is an example of Lucene Analyzer configuration:
	<p><tt>    <extension id="com.xyz.XYZ" point="org.eclipse.help.luceneAnalyzer"></tt>
	<br><tt>        <analyzer locale="ll_CC"
	class="com.xyz.ll_CCAnalyzer" /></tt>
	<br><tt>    </extension></tt>
	<p><b><i>API Information</i>:</b>
	<p>The value of the <tt>locale</tt> attribute must represent either a five
	or two character locale string.  If analyzer is configured for a language
	by specifying two letter language designation, the analyzer is going to
	be used for all locales of this language.  If analyzer is configured
	that matches five characters locale, it is going to be used instead.
	<p>The value of the <tt>class</tt> attribute must represent a class that
	extends <tt>org.apache.lucene.analysis.Analyzer</tt>.  It is recommended
	that this analyzer performs lowercase filtering for languages where it
	is possible to increase number of search hits by making search case insensitive.
	<p><b><i>Supplied Implementation: </i></b>Help system comes with English
	and German analyzers, that are configured to be used for en and de locales
	respectively.  These analyzers perform stop word filtering, lowercase
	filtering, and stemming.   For languages that no analyzers are
	configured, help uses simple analyzer that performs lowercase filtering
	and English stop word filtering.
	<p><a href="hglegal.htm"><img SRC="ngibmcpy.gif" ALT="Copyright IBM Corp. 2000, 2001. All Rights Reserved." BORDER=0 height=12 width=195></a>
	</body>
	</html>