<!-- $PostgreSQL: pgsql/doc/src/sgml/unaccent.sgml,v 1.6 2010/08/25 02:12:00 tgl Exp $ -->

<sect1 id="unaccent">
 <title>unaccent</title>

 <indexterm zone="unaccent">
  <primary>unaccent</primary>
 </indexterm>

 <para>
  <filename>unaccent</> is a text search dictionary that removes accents
  (diacritic signs) from lexemes.
  It's a filtering dictionary, which means its output is
  always passed to the next dictionary (if any), unlike the normal
  behavior of dictionaries.  This allows accent-insensitive processing
  for full text search.
 </para>

 <para>
  The current implementation of <filename>unaccent</> cannot be used as a
  normalizing dictionary for the <filename>thesaurus</filename> dictionary.
 </para>

 <sect2>
  <title>Configuration</title>

  <para>
   An <literal>unaccent</> dictionary accepts the following options:
  </para>
  <itemizedlist>
   <listitem>
    <para>
     <literal>RULES</> is the base name of the file containing the list of
     translation rules.  This file must be stored in
     <filename>$SHAREDIR/tsearch_data/</> (where <literal>$SHAREDIR</> means
     the <productname>PostgreSQL</> installation's shared-data directory).
     Its name must end in <literal>.rules</> (which is not to be included in
     the <literal>RULES</> parameter).
    </para>
   </listitem>
  </itemizedlist>
  <para>
   The rules file has the following format:
  </para>
  <itemizedlist>
   <listitem>
    <para>
     Each line represents a pair, consisting of a character with accent
     followed by a character without accent.  The first is translated into
     the second.  For example,
<programlisting>
&Agrave;        A
&Aacute;        A
&Acirc;        A
&Atilde;        A
&Auml;        A
&Aring;        A
&AElig;        A
</programlisting>
    </para>
   </listitem>
  </itemizedlist>

  <para>
   A more complete example, which is directly useful for most European
   languages, can be found in <filename>unaccent.rules</>, which is installed
   in <filename>$SHAREDIR/tsearch_data/</> when the <filename>unaccent</>
   module is installed.
  </para>
 </sect2>

 <sect2>
  <title>Usage</title>

  <para>
   Running the installation script <filename>unaccent.sql</> creates a text
   search template <literal>unaccent</> and a dictionary <literal>unaccent</>
   based on it, with default parameters.  You can alter the
   parameters, for example

<programlisting>
mydb=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules');
</programlisting>

   or create new dictionaries based on the template.
  </para>

  <para>
   To test the dictionary, you can try:
<programlisting>
mydb=# select ts_lexize('unaccent','H&ocirc;tel');
 ts_lexize
-----------
 {Hotel}
(1 row)
</programlisting>
  </para>

  <para>
   Here is an example showing how to insert the
   <filename>unaccent</> dictionary into a text search configuration:
<programlisting>
mydb=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french );
mydb=# ALTER TEXT SEARCH CONFIGURATION fr
        ALTER MAPPING FOR hword, hword_part, word
        WITH unaccent, french_stem;
mydb=# select to_tsvector('fr','H&ocirc;tels de la Mer');
    to_tsvector
-------------------
 'hotel':1 'mer':4
(1 row)

mydb=# select to_tsvector('fr','H&ocirc;tel de la Mer') @@ to_tsquery('fr','Hotels');
 ?column?
----------
 t
(1 row)

mydb=# select ts_headline('fr','H&ocirc;tel de la Mer',to_tsquery('fr','Hotels'));
      ts_headline
------------------------
 &lt;b&gt;H&ocirc;tel&lt;/b&gt; de la Mer
(1 row)
</programlisting>
  </para>
 </sect2>

 <sect2>
 <title>Functions</title>

 <para>
  The <function>unaccent()</> function removes accents (diacritic signs) from
  a given string.  Basically, it's a wrapper around the
  <filename>unaccent</> dictionary, but it can be used outside normal
  text search contexts.
 </para>

 <indexterm>
  <primary>unaccent</primary>
 </indexterm>

<synopsis>
unaccent(<optional><replaceable class="PARAMETER">dictionary</replaceable>, </optional> <replaceable class="PARAMETER">string</replaceable>) returns <type>text</type>
</synopsis>

 <para>
  For example:
<programlisting>
SELECT unaccent('unaccent', 'H&ocirc;tel');
SELECT unaccent('H&ocirc;tel');
</programlisting>
 </para>
 </sect2>

</sect1>