diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 3c9c3ce5270c7a1639a0109af26d926c2a333b59..016993c0eb6f79bef69ba562d923212fa1976813 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -1,5 +1,5 @@ <!-- -$Header: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v 1.71 2001/11/19 09:05:00 tgl Exp $ +$Header: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v 1.72 2001/11/20 15:42:44 momjian Exp $ --> <chapter id="datatype"> @@ -965,23 +965,296 @@ SELECT b, char_length(b) FROM test2; </sect1> - <sect1 id="datatype-binary"> - <title>Binary Data</title> + <sect1 id="datatype-binary"> + <title>Binary Strings</title> + <para> + The <type>bytea</type> data type allows storage of binary strings. + </para> + + <table tocentry="1"> + <title>Binary String Types</title> + <tgroup cols="3"> + <thead> + <row> + <entry>Type Name</entry> + <entry>Storage</entry> + <entry>Description</entry> + </row> + </thead> + <tbody> + <row> + <entry>bytea</entry> + <entry>4 bytes plus the actual string</entry> + <entry>Variable (not specifically limited) + length binary string</entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + A binary string is a sequence of octets that does not have either a + character set or collation associated with it. Bytea specifically + allows storage of NULLs and other 'non-printable' <acronym>ASCII + </acronym> characters. + </para> <para> - The <type>bytea</type> data type allows storage of binary data, - specifically allowing storage of NULLs which are entered as - <literal>'\\000'</>. The first backslash is interpreted by the - single quotes, and the second is recognized by <type>bytea</> and - precedes a three digit octal value. For a similar reason, a - backslash must be entered into a field as <literal>'\\\\'</> or - <literal>'\\134'</>. You may also have to escape line feeds and - carriage return if your interface automatically translates these. It - can store values of any length. <type>Bytea</> is a non-standard - data type. + Certain <acronym>ASCII</acronym> characters MUST be escaped (but all + characters MAY be escaped) when used as part of a string literal in an + <acronym>SQL</acronym> statement. In general, to escape a character, it + is converted into the three digit octal number equal to the decimal + <acronym>ASCII</acronym> value, and preceeded by two backslashes. The + single quote (') and backslash (\) characters have special alternate + escape sequences. Details are in + <xref linkend="datatype-binary-sqlesc">. </para> + + <table id="datatype-binary-sqlesc"> + <title><acronym>SQL</acronym> Literal Escaped <acronym>ASCII</acronym> + Characters</title> + <tgroup cols="5"> + <thead> + <row> + <entry>Decimal <acronym>ASCII</acronym> Value</entry> + <entry>Description</entry> + <entry>Input Escaped Representation</entry> + <entry>Example</entry> + <entry>Printed Result</entry> + </row> + </thead> + + <tbody> + <row> + <entry> <literal> 0 </literal> </entry> + <entry> null byte </entry> + <entry> <literal> '\\000' </literal> </entry> + <entry> <literal> select '\\000'::bytea; </literal> </entry> + <entry> <literal> \000 </literal></entry> + </row> + + <row> + <entry> <literal> 39 </literal> </entry> + <entry> single quote </entry> + <entry> <literal> '\\'' or '\\047' </literal> </entry> + <entry> <literal> select '\''::bytea; </literal></entry> + <entry> <literal> ' </literal></entry> + </row> + + <row> + <entry> <literal>92</literal> </entry> + <entry> backslash </entry> + <entry> <literal> '\\\\' or '\\134' </literal> </entry> + <entry> <literal> select '\\\\'::bytea; </literal></entry> + <entry> <literal> \\ </literal></entry> + </row> + + </tbody> + </tgroup> + </table> + + <para> + Note that the result in each of the examples above was exactly one + byte in length, even though the output representation of the null byte + and backslash are more than one character. Bytea output characters + are also escaped. In general, each "non-printable" character is + converted into the three digit octal number equal to its decimal + <acronym>ASCII</acronym> value, and preceeded by one backslash. Most + "printable" characters are represented by their standard + <acronym>ASCII</acronym> representation. The backslash (\) character + has a special alternate output representation. Details are in + <xref linkend="datatype-binary-resesc">. + </para> + + <table id="datatype-binary-resesc"> + <title><acronym>SQL</acronym> Output Escaped <acronym>ASCII</acronym> + Characters</title> + <tgroup cols="5"> + <thead> + <row> + <entry>Decimal <acronym>ASCII</acronym> Value</entry> + <entry>Description</entry> + <entry>Output Escaped Representation</entry> + <entry>Example</entry> + <entry>Printed Result</entry> + </row> + </thead> + + <tbody> + + <row> + <entry> <literal> 39 </literal> </entry> + <entry> single quote </entry> + <entry> <literal> ' </literal> </entry> + <entry> <literal> select '\\047'::bytea; </literal></entry> + <entry> <literal> ' </literal></entry> + </row> + + <row> + <entry> <literal> 92 </literal> </entry> + <entry> backslash </entry> + <entry> <literal> \\ </literal> </entry> + <entry> <literal> select '\\134'::bytea; </literal></entry> + <entry> <literal> \\ </literal></entry> + </row> + + <row> + <entry> <literal> 0 to 31 and 127 to 255 </literal> </entry> + <entry> non-printable characters </entry> + <entry> <literal> \### (octal value) </literal> </entry> + <entry> <literal> select '\\001'::bytea; </literal> </entry> + <entry> <literal> \001 </literal></entry> + </row> + + <row> + <entry> <literal> 32 to 126 </literal> </entry> + <entry> printable characters </entry> + <entry> ASCII representation </entry> + <entry> <literal> select '\\176'::bytea; </literal> </entry> + <entry> <literal> ~ </literal></entry> + </row> + + </tbody> + </tgroup> + </table> + + <para> + <acronym>SQL</acronym> string literals (input strings) must be + preceeded with two backslashes due to the fact that they must pass + through two parsers in the PostgreSQL backend. The first backslash + is interpreted as an escape character by the string literal parser, + and therefore is consumed, leaving the characters that follow it. + The second backslash is recognized by <type>bytea</> input function + as the prefix of a three digit octal value. For example, a string + literal passed to the backend as <literal>'\\001'</literal> becomes + <literal>'\001'</literal> after passing through the string literal + parser. The <literal>'\001'</literal> is then sent to the bytea + input function, where it is converted to a single byte with a decimal + <acronym>ASCII</acronym> value of 1. + </para> + + <para> + For a similar reason, a backslash must be input as + <literal>'\\\\'</literal> (or <literal>'\\134'</literal>). The first + and third backslashes are interpreted as escape characters by the + string literal parser, and therefore are consumed, leaving the + second and forth backslashes untouched. The second and forth + backslashes are recognized by <type>bytea</> input function as a single + backslash. For example, a string literal passed to the backend as + <literal>'\\\\'</literal> becomes <literal>'\\'</literal> after passing + through the string literal parser. The <literal>'\\'</literal> is then + sent to the bytea input function, where it is converted to a single + byte with a decimal <acronym>ASCII</acronym> value of 92. + </para> + + <para> + A single quote is a bit different in that it must be input as + <literal>'\''</literal> (or <literal>'\\134'</literal>), NOT as + <literal>'\\''</literal>. This is because, while the literal parser + interprets the single quote as a special character, and will consume + the single backslash, the bytea input function does NOT recognize + a single quote as a special character. Therefore a string + literal passed to the backend as <literal>'\''</literal> becomes + <literal>'''</literal> after passing through the string literal + parser. The <literal>'''</literal> is then sent to the bytea + input function, where it is retains its single byte decimal + <acronym>ASCII</acronym> value of 39. + </para> + + <para> + Depending on the front end to PostgreSQL you use, you may have + additional work to do in terms of escaping and unescaping bytea + strings. For example, you may also have to escape line feeds and + carriage return if your interface automatically translates these. + Or you may have to double up on backslashes if the parser for your + language or choice also treats them as an escape character. + </para> + + <sect2 id="datatype-binary-compat"> + <title>Compatibility</title> + <para> + Bytea provides most of the functionality of the SQL99 binary string + type per SQL99 section 4.3. A comparison of PostgreSQL bytea and SQL99 + Binary Strings is presented in + <xref linkend="datatype-binary-compat-comp">. + </para> + + <table id="datatype-binary-compat-comp"> + <title>Comparison of SQL99 Binary String and BYTEA types</title> + <tgroup cols="2"> + <thead> + <row> + <entry>SQL99</entry> + <entry>BYTEA</entry> + </row> + </thead> + + <tbody> + <row> + <entry> Name of data type BINARY LARGE OBJECT or BLOB </entry> + <entry> Name of data type BYTEA </entry> + </row> + + <row> + <entry> Sequence of octets that does not have either a character set + or collation associated with it. </entry> + <entry> same </entry> + </row> + + <row> + <entry> Described by a binary data type descriptor containing the + name of the data type and the maximum length + in octets</entry> + <entry> Described by a binary data type descriptor containing the + name of the data type with no specific maximum length + </entry> + </row> + + <row> + <entry> All binary strings are mutually comparable in accordance + with the rules of comparison predicates.</entry> + <entry> same</entry> + </row> + + <row> + <entry> Binary string values can only be compared for equality. + </entry> + <entry> Binary string values can be compared for equality, greater + than, greater than or equal, less than, less than or equal + </entry> + </row> + + <row> + <entry> Operators operating on and returning binary strings + include concatenation, substring, overlay, and trim</entry> + <entry> Operators operating on and returning binary strings + include concatenation, substring, and trim. The <literal> + 'leading'</literal> and <literal>'trailing'</literal> + arguments for trim are not yet implemented. + </entry> + </row> + + <row> + <entry> Other operators involving binary strings + include length, position, and the like predicate</entry> + <entry> same</entry> + </row> + + <row> + <entry> A binary string literal is comprised of an even number of + hexidecimal digits, in single quotes, preceeded by "X", + e.g. X'1a43fe'</entry> + <entry> A binary string literal is comprised of ASCII characters + escaped according to the rules shown in + <xref linkend="datatype-binary-sqlesc"> </entry> + </row> + </tbody> + </tgroup> + </table> + </sect2> </sect1> + <sect1 id="datatype-datetime"> <title>Date/Time Types</title> diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index e9d9b47aeb8f41a26c0a951c0dce1c24e610435b..8af9fd06796bedf19e9d033d3a7e9c9ab19a3622 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1,5 +1,5 @@ <!-- -$Header: /cvsroot/pgsql/doc/src/sgml/func.sgml,v 1.82 2001/11/19 09:05:01 tgl Exp $ +$Header: /cvsroot/pgsql/doc/src/sgml/func.sgml,v 1.83 2001/11/20 15:42:44 momjian Exp $ Postgres documentation --> @@ -1133,7 +1133,7 @@ Postgres documentation <entry><type>text</type></entry> <entry> Encodes binary data to <acronym>ASCII</acronym>-only representation. Supported - types are: 'base64', 'hex'. + types are: 'base64', 'hex', 'escape'. </entry> <entry><literal>encode('123\\000\\001', 'base64')</literal></entry> <entry><literal>MTIzAAE=</literal></entry> @@ -1164,6 +1164,186 @@ Postgres documentation </sect1> + <sect1 id="functions-binarystring"> + <title>Binary String Functions and Operators</title> + + <para> + This section describes functions and operators for examining and + manipulating binary string values. Strings in this context include + values of the type <type>BYTEA</type>. + </para> + + <para> + <acronym>SQL</acronym> defines some string functions with a special syntax where + certain keywords rather than commas are used to separate the + arguments. Details are in <xref linkend="functions-binarystring-sql">. + Some functions are also implemented using the regular syntax for + function invocation. (See <xref linkend="functions-binarystring-other">.) + </para> + + <table id="functions-binarystring-sql"> + <title><acronym>SQL</acronym> Binary String Functions and Operators</title> + <tgroup cols="5"> + <thead> + <row> + <entry>Function</entry> + <entry>Return Type</entry> + <entry>Description</entry> + <entry>Example</entry> + <entry>Result</entry> + </row> + </thead> + + <tbody> + <row> + <entry> <parameter>string</parameter> <literal>||</literal> <parameter>string</parameter> </entry> + <entry> <type>bytea</type> </entry> + <entry> + string concatenation + <indexterm> + <primary>binary strings</primary> + <secondary>concatenation</secondary> + </indexterm> + </entry> + <entry><literal>'\\\\Postgre'::bytea || '\\047SQL\\000'::bytea</></entry> + <entry><literal>\\Postgre'SQL\000</></entry> + </row> + + <row> + <entry><function>octet_length</function>(<parameter>string</parameter>)</entry> + <entry><type>integer</type></entry> + <entry>number of bytes in binary string</entry> + <entry><literal>octet_length('jo\\000se'::bytea)</literal></entry> + <entry><literal>5</literal></entry> + </row> + + <row> + <entry><function>position</function>(<parameter>substring</parameter> in <parameter>string</parameter>)</entry> + <entry><type>integer</type></entry> + <entry>location of specified substring</entry> + <entry><literal>position('\\000om'::bytea in 'Th\\000omas'::bytea)</literal></entry> + <entry><literal>3</literal></entry> + </row> + + <row> + <entry><function>substring</function>(<parameter>string</parameter> <optional>from <type>integer</type></optional> <optional>for <type>integer</type></optional>)</entry> + <entry><type>bytea</type></entry> + <entry> + extract substring + <indexterm> + <primary>substring</primary> + </indexterm> + </entry> + <entry><literal>substring('Th\\000omas'::bytea from 2 for 3)</literal></entry> + <entry><literal>h\000o</literal></entry> + </row> + + <row> + <entry> + <function>trim</function>(<optional>both</optional> + <parameter>characters</parameter> from + <parameter>string</parameter>) + </entry> + <entry><type>bytea</type></entry> + <entry> + Removes the longest string containing only the + <parameter>characters</parameter> from the + beginning/end/both ends of the <parameter>string</parameter>. + </entry> + <entry><literal>trim('\\000'::bytea from '\\000Tom\\000'::bytea)</literal></entry> + <entry><literal>Tom</literal></entry> + </row> + + </tbody> + </tgroup> + </table> + + <para> + Additional binary string manipulation functions are available and are + listed below. Some of them are used internally to implement the + <acronym>SQL</acronym>-standard string functions listed above. + </para> + + <table id="functions-binarystring-other"> + <title>Other Binary String Functions</title> + <tgroup cols="5"> + <thead> + <row> + <entry>Function</entry> + <entry>Return Type</entry> + <entry>Description</entry> + <entry>Example</entry> + <entry>Result</entry> + </row> + </thead> + + <tbody> + <row> + <entry><function>btrim</function>(<parameter>string</parameter> <type>bytea</type>, <parameter>trim</parameter> <type>bytea</type>)</entry> + <entry><type>bytea</type></entry> + <entry> + Remove (trim) the longest string consisting only of characters + in <parameter>trim</parameter> from the start and end of + <parameter>string</parameter>. + </entry> + <entry><literal>btrim('\\000trim\\000'::bytea,'\\000'::bytea)</literal></entry> + <entry><literal>trim</literal></entry> + </row> + + <row> + <entry><function>length</function>(<parameter>string</parameter>)</entry> + <entry><type>integer</type></entry> + <entry> + length of binary string + <indexterm> + <primary>binary strings</primary> + <secondary>length</secondary> + </indexterm> + <indexterm> + <primary>length</primary> + <secondary>binary strings</secondary> + <see>binary strings, length</see> + </indexterm> + </entry> + <entry><literal>length('jo\\000se'::bytea)</></entry> + <entry><literal>5</></entry> + </row> + + <row> + <entry> + <function>encode</function>(<parameter>string</parameter> <type>bytea</type>, + <parameter>type</parameter> <type>text</type>) + </entry> + <entry><type>text</type></entry> + <entry> + Encodes binary string to <acronym>ASCII</acronym>-only representation. Supported + types are: 'base64', 'hex', 'escape'. + </entry> + <entry><literal>encode('123\\000456'::bytea, 'escape')</literal></entry> + <entry><literal>123\000456</literal></entry> + </row> + + <row> + <entry> + <function>decode</function>(<parameter>string</parameter> <type>text</type>, + <parameter>type</parameter> <type>text</type>) + </entry> + <entry><type>bytea</type></entry> + <entry> + Decodes binary string from <parameter>string</parameter> previously + encoded with encode(). Parameter type is same as in encode(). + </entry> + <entry><literal>decode('123\\000456', 'escape')</literal></entry> + <entry><literal>123\000456</literal></entry> + </row> + + </tbody> + </tgroup> + </table> + + </sect1> + + <sect1 id="functions-matching"> <title>Pattern Matching</title>