diff --git a/doc/src/sgml/unaccent.sgml b/doc/src/sgml/unaccent.sgml index ff6a2989dd4a71e9d3e3d0591c085c476fb1e62c..6c73c3f298664d5f3a14d28ee80f0abd03189cd7 100644 --- a/doc/src/sgml/unaccent.sgml +++ b/doc/src/sgml/unaccent.sgml @@ -1,3 +1,5 @@ +<!-- $PostgreSQL: pgsql/doc/src/sgml/unaccent.sgml,v 1.6 2010/08/25 02:12:00 tgl Exp $ --> + <sect1 id="unaccent"> <title>unaccent</title> @@ -6,24 +8,24 @@ </indexterm> <para> - <filename>unaccent</> removes accents (diacritic signs) from a lexeme. - It's a filtering dictionary, that means its output is - always passed to the next dictionary (if any), contrary to the standard - behavior. Currently, it supports most important accents from European - languages. + <filename>unaccent</> is a text search dictionary that removes accents + (diacritic signs) from lexemes. + It's a filtering dictionary, which means its output is + always passed to the next dictionary (if any), unlike the normal + behavior of dictionaries. This allows accent-insensitive processing + for full text search. </para> <para> - Limitation: Current implementation of <filename>unaccent</> - dictionary cannot be used as a normalizing dictionary for - <filename>thesaurus</filename> dictionary. + The current implementation of <filename>unaccent</> cannot be used as a + normalizing dictionary for the <filename>thesaurus</filename> dictionary. </para> - + <sect2> <title>Configuration</title> <para> - A <literal>unaccent</> dictionary accepts the following options: + An <literal>unaccent</> dictionary accepts the following options: </para> <itemizedlist> <listitem> @@ -43,23 +45,27 @@ <itemizedlist> <listitem> <para> - Each line represents pair: character_with_accent character_without_accent + Each line represents a pair, consisting of a character with accent + followed by a character without accent. The first is translated into + the second. For example, <programlisting> À A Á A - A + A à A -Ä A -Å A -Æ A +Ä A +Å A +Æ A </programlisting> </para> </listitem> </itemizedlist> <para> - Look at <filename>unaccent.rules</>, which is installed in - <filename>$SHAREDIR/tsearch_data/</>, for an example. + A more complete example, which is directly useful for most European + languages, can be found in <filename>unaccent.rules</>, which is installed + in <filename>$SHAREDIR/tsearch_data/</> when the <filename>unaccent</> + module is installed. </para> </sect2> @@ -67,66 +73,66 @@ <title>Usage</title> <para> - Running the installation script creates a text search template - <literal>unaccent</> and a dictionary <literal>unaccent</> + Running the installation script <filename>unaccent.sql</> creates a text + search template <literal>unaccent</> and a dictionary <literal>unaccent</> based on it, with default parameters. You can alter the parameters, for example <programlisting> -=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules'); +mydb=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules'); </programlisting> or create new dictionaries based on the template. </para> <para> - To test the dictionary, you can try - + To test the dictionary, you can try: <programlisting> -=# select ts_lexize('unaccent','Hôtel'); - ts_lexize +mydb=# select ts_lexize('unaccent','Hôtel'); + ts_lexize ----------- {Hotel} (1 row) </programlisting> </para> - + <para> - Filtering dictionary are useful for correct work of - <function>ts_headline</function> function. + Here is an example showing how to insert the + <filename>unaccent</> dictionary into a text search configuration: <programlisting> -=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french ); -=# ALTER TEXT SEARCH CONFIGURATION fr +mydb=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french ); +mydb=# ALTER TEXT SEARCH CONFIGURATION fr ALTER MAPPING FOR hword, hword_part, word WITH unaccent, french_stem; -=# select to_tsvector('fr','Hôtels de la Mer'); - to_tsvector +mydb=# select to_tsvector('fr','Hôtels de la Mer'); + to_tsvector ------------------- 'hotel':1 'mer':4 (1 row) -=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels'); - ?column? +mydb=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels'); + ?column? ---------- t (1 row) -=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels')); - ts_headline + +mydb=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels')); + ts_headline ------------------------ - <b>Hôtel</b>de la Mer + <b>Hôtel</b> de la Mer (1 row) - </programlisting> </para> </sect2> <sect2> - <title>Function</title> + <title>Functions</title> <para> - <function>unaccent</> function removes accents (diacritic signs) from - argument string. Basically, it's a wrapper around - <filename>unaccent</> dictionary. + The <function>unaccent()</> function removes accents (diacritic signs) from + a given string. Basically, it's a wrapper around the + <filename>unaccent</> dictionary, but it can be used outside normal + text search contexts. </para> <indexterm> @@ -134,14 +140,14 @@ </indexterm> <synopsis> -unaccent(<optional><replaceable class="PARAMETER">dictionary</replaceable>, </optional> <replaceable class="PARAMETER">string</replaceable>) -returns <type>text</type> +unaccent(<optional><replaceable class="PARAMETER">dictionary</replaceable>, </optional> <replaceable class="PARAMETER">string</replaceable>) returns <type>text</type> </synopsis> <para> + For example: <programlisting> -SELECT unaccent('unaccent', 'Hôtel'); -SELECT unaccent('Hôtel'); +SELECT unaccent('unaccent', 'Hôtel'); +SELECT unaccent('Hôtel'); </programlisting> </para> </sect2>