<!-- $PostgreSQL: pgsql/doc/src/sgml/unaccent.sgml,v 1.6 2010/08/25 02:12:00 tgl Exp $ --> <sect1 id="unaccent"> <title>unaccent</title> <indexterm zone="unaccent"> <primary>unaccent</primary> </indexterm> <para> <filename>unaccent</> is a text search dictionary that removes accents (diacritic signs) from lexemes. It's a filtering dictionary, which means its output is always passed to the next dictionary (if any), unlike the normal behavior of dictionaries. This allows accent-insensitive processing for full text search. </para> <para> The current implementation of <filename>unaccent</> cannot be used as a normalizing dictionary for the <filename>thesaurus</filename> dictionary. </para> <sect2> <title>Configuration</title> <para> An <literal>unaccent</> dictionary accepts the following options: </para> <itemizedlist> <listitem> <para> <literal>RULES</> is the base name of the file containing the list of translation rules. This file must be stored in <filename>$SHAREDIR/tsearch_data/</> (where <literal>$SHAREDIR</> means the <productname>PostgreSQL</> installation's shared-data directory). Its name must end in <literal>.rules</> (which is not to be included in the <literal>RULES</> parameter). </para> </listitem> </itemizedlist> <para> The rules file has the following format: </para> <itemizedlist> <listitem> <para> Each line represents a pair, consisting of a character with accent followed by a character without accent. The first is translated into the second. For example, <programlisting> À A Á A Â A Ã A Ä A Å A Æ A </programlisting> </para> </listitem> </itemizedlist> <para> A more complete example, which is directly useful for most European languages, can be found in <filename>unaccent.rules</>, which is installed in <filename>$SHAREDIR/tsearch_data/</> when the <filename>unaccent</> module is installed. </para> </sect2> <sect2> <title>Usage</title> <para> Running the installation script <filename>unaccent.sql</> creates a text search template <literal>unaccent</> and a dictionary <literal>unaccent</> based on it, with default parameters. You can alter the parameters, for example <programlisting> mydb=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules'); </programlisting> or create new dictionaries based on the template. </para> <para> To test the dictionary, you can try: <programlisting> mydb=# select ts_lexize('unaccent','Hôtel'); ts_lexize ----------- {Hotel} (1 row) </programlisting> </para> <para> Here is an example showing how to insert the <filename>unaccent</> dictionary into a text search configuration: <programlisting> mydb=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french ); mydb=# ALTER TEXT SEARCH CONFIGURATION fr ALTER MAPPING FOR hword, hword_part, word WITH unaccent, french_stem; mydb=# select to_tsvector('fr','Hôtels de la Mer'); to_tsvector ------------------- 'hotel':1 'mer':4 (1 row) mydb=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels'); ?column? ---------- t (1 row) mydb=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels')); ts_headline ------------------------ <b>Hôtel</b> de la Mer (1 row) </programlisting> </para> </sect2> <sect2> <title>Functions</title> <para> The <function>unaccent()</> function removes accents (diacritic signs) from a given string. Basically, it's a wrapper around the <filename>unaccent</> dictionary, but it can be used outside normal text search contexts. </para> <indexterm> <primary>unaccent</primary> </indexterm> <synopsis> unaccent(<optional><replaceable class="PARAMETER">dictionary</replaceable>, </optional> <replaceable class="PARAMETER">string</replaceable>) returns <type>text</type> </synopsis> <para> For example: <programlisting> SELECT unaccent('unaccent', 'Hôtel'); SELECT unaccent('Hôtel'); </programlisting> </para> </sect2> </sect1>