Skip to content
Snippets Groups Projects
Select Git revision
  • benchmark-tools
  • postgres-lambda
  • master default
  • REL9_4_25
  • REL9_5_20
  • REL9_6_16
  • REL_10_11
  • REL_11_6
  • REL_12_1
  • REL_12_0
  • REL_12_RC1
  • REL_12_BETA4
  • REL9_4_24
  • REL9_5_19
  • REL9_6_15
  • REL_10_10
  • REL_11_5
  • REL_12_BETA3
  • REL9_4_23
  • REL9_5_18
  • REL9_6_14
  • REL_10_9
  • REL_11_4
23 results

query.sgml

Blame
  • query.sgml 29.34 KiB
    <!--
    $PostgreSQL: pgsql/doc/src/sgml/query.sgml,v 1.45 2005/06/13 02:40:06 neilc Exp $
    -->
    
     <chapter id="tutorial-sql">
      <title>The <acronym>SQL</acronym> Language</title>
    
      <sect1 id="tutorial-sql-intro">
       <title>Introduction</title>
    
       <para>
        This chapter provides an overview of how to use
        <acronym>SQL</acronym> to perform simple operations.  This
        tutorial is only intended to give you an introduction and is in no
        way a complete tutorial on <acronym>SQL</acronym>.  Numerous books
        have been written on <acronym>SQL</acronym>, including <xref
        linkend="MELT93"> and <xref linkend="DATE97">.
        You should be aware that some <productname>PostgreSQL</productname>
        language features are extensions to the standard.
       </para>
    
       <para>
        In the examples that follow, we assume that you have created a
        database named <literal>mydb</literal>, as described in the previous
        chapter, and have started <application>psql</application>.
       </para>
    
       <para>
        Examples in this manual can also be found in the
        <productname>PostgreSQL</productname> source distribution
        in the directory <filename>src/tutorial/</filename>.  To use those
        files, first change to that directory and run <application>make</>:
    
    <screen>
    <prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/src/tutorial</userinput>
    <prompt>$</prompt> <userinput>make</userinput>
    </screen>
    
        This creates the scripts and compiles the C files containing user-defined
        functions and types.  (You must use GNU make for this &mdash; it may be named
        something different on your system, often <application>gmake</>.)
        Then, to start the tutorial, do the following:
    
    <screen>
    <prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/src/tutorial</userinput>
    <prompt>$</prompt> <userinput>psql -s mydb</userinput>
    <computeroutput>
    ...
    </computeroutput>
    
    <prompt>mydb=&gt;</prompt> <userinput>\i basics.sql</userinput>
    </screen>
    
        The <literal>\i</literal> command reads in commands from the
        specified file. The <literal>-s</literal> option puts you in
        single step mode which pauses before sending each statement to the
        server.  The commands used in this section are in the file
        <filename>basics.sql</filename>.
       </para>
      </sect1>
    
    
      <sect1 id="tutorial-concepts">
       <title>Concepts</title>
    
       <para>
        <indexterm><primary>relational database</primary></indexterm>
        <indexterm><primary>hierarchical database</primary></indexterm>
        <indexterm><primary>object-oriented database</primary></indexterm>
        <indexterm><primary>relation</primary></indexterm>
        <indexterm><primary>table</primary></indexterm>
    
        <productname>PostgreSQL</productname> is a <firstterm>relational
        database management system</firstterm> (<acronym>RDBMS</acronym>).
        That means it is a system for managing data stored in
        <firstterm>relations</firstterm>.  Relation is essentially a
        mathematical term for <firstterm>table</firstterm>.  The notion of
        storing data in tables is so commonplace today that it might
        seem inherently obvious, but there are a number of other ways of
        organizing databases.  Files and directories on Unix-like
        operating systems form an example of a hierarchical database.  A
        more modern development is the object-oriented database.
       </para>
    
       <para>
        <indexterm><primary>row</primary></indexterm>
        <indexterm><primary>column</primary></indexterm>
    
        Each table is a named collection of <firstterm>rows</firstterm>.
        Each row of a given table has the same set of named
        <firstterm>columns</firstterm>,
        and each column is of a specific data type.  Whereas columns have
        a fixed order in each row, it is important to remember that SQL
        does not guarantee the order of the rows within the table in any
        way (although they can be explicitly sorted for display).
       </para>
    
       <para>
        <indexterm><primary>database cluster</primary></indexterm>
        <indexterm><primary>cluster</primary><secondary>of databases</secondary><see>database cluster</see></indexterm>
    
        Tables are grouped into databases, and a collection of databases
        managed by a single <productname>PostgreSQL</productname> server
        instance constitutes a database <firstterm>cluster</firstterm>.
       </para>
      </sect1>
    
    
      <sect1 id="tutorial-table">
       <title>Creating a New Table</title>
    
       <indexterm zone="tutorial-table">
        <primary>CREATE TABLE</primary>
       </indexterm>
    
       <para>
        You  can  create  a  new  table by specifying the table
        name, along with all column names and their types:
    
    <programlisting>
    CREATE TABLE weather (
        city            varchar(80),
        temp_lo         int,           -- low temperature
        temp_hi         int,           -- high temperature
        prcp            real,          -- precipitation
        date            date
    );
    </programlisting>
    
        You can enter this into <command>psql</command> with the line
        breaks.  <command>psql</command> will recognize that the command
        is not terminated until the semicolon.
       </para>
    
       <para>
        White space (i.e., spaces, tabs, and newlines) may be used freely
        in SQL commands.  That means you can type the command aligned
        differently than above, or even all on one line.  Two dashes
        (<quote><literal>--</literal></quote>) introduce comments.
        Whatever follows them is ignored up to the end of the line.  SQL
        is case insensitive about key words and identifiers, except
        when identifiers are double-quoted to preserve the case (not done
        above).
       </para>
    
       <para>
        <type>varchar(80)</type> specifies a data type that can store
        arbitrary character strings up to 80 characters in length.
        <type>int</type> is the normal integer type.  <type>real</type> is
        a type for storing single precision floating-point numbers.
        <type>date</type> should be self-explanatory.  (Yes, the column of
        type <type>date</type> is also named <literal>date</literal>.
        This may be convenient or confusing &mdash; you choose.)
       </para>
    
       <para>
        <productname>PostgreSQL</productname> supports the standard
        <acronym>SQL</acronym> types <type>int</type>,
        <type>smallint</type>, <type>real</type>, <type>double
        precision</type>, <type>char(<replaceable>N</>)</type>,
        <type>varchar(<replaceable>N</>)</type>, <type>date</type>,
        <type>time</type>, <type>timestamp</type>, and
        <type>interval</type>, as well as other types of general utility
        and a rich set of geometric types.
        <productname>PostgreSQL</productname> can be customized with an
        arbitrary number of user-defined data types.  Consequently, type
        names are not syntactical key words, except where required to
        support special cases in the <acronym>SQL</acronym> standard.
       </para>
    
       <para>
        The second example will store cities and their associated
        geographical location:
    <programlisting>
    CREATE TABLE cities (
        name            varchar(80),
        location        point
    );
    </programlisting>
        The <type>point</type> type is an example of a
        <productname>PostgreSQL</productname>-specific data type.
       </para>
    
       <para>
        <indexterm>
         <primary>DROP TABLE</primary>
        </indexterm>
    
        Finally, it should be mentioned that if you don't need a table any
        longer or want to recreate it differently you can remove it using
        the following command:
    <synopsis>
    DROP TABLE <replaceable>tablename</replaceable>;
    </synopsis>
       </para>
      </sect1>
    
    
      <sect1 id="tutorial-populate">
       <title>Populating a Table With Rows</title>
    
       <indexterm zone="tutorial-populate">
        <primary>INSERT</primary>
       </indexterm>
    
       <para>
        The <command>INSERT</command> statement is used to populate a table  with
        rows:
    
    <programlisting>
    INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
    </programlisting>
    
        Note that all data types use rather obvious input formats.
        Constants that are not simple numeric values usually must be
        surrounded by single quotes (<literal>'</>), as in the example.
        The
        <type>date</type> type is actually quite flexible in what it
        accepts, but for this tutorial we will stick to the unambiguous
        format shown here.
       </para>
    
       <para>
        The <type>point</type> type requires a coordinate pair as input,
        as shown here:
    <programlisting>
    INSERT INTO cities VALUES ('San Francisco', '(-194.0, 53.0)');
    </programlisting>
       </para>
    
       <para>
        The syntax used so far requires you to remember the order of the
        columns.  An alternative syntax allows you to list the columns
        explicitly:
    <programlisting>
    INSERT INTO weather (city, temp_lo, temp_hi, prcp, date)
        VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29');
    </programlisting>
        You can list the columns in a different order if you wish or
        even omit some columns, e.g., if the precipitation is unknown:
    <programlisting>
    INSERT INTO weather (date, city, temp_hi, temp_lo)
        VALUES ('1994-11-29', 'Hayward', 54, 37);
    </programlisting>
        Many developers consider explicitly listing the columns better
        style than relying on the order implicitly.
       </para>
    
       <para>
        Please enter all the commands shown above so you have some data to
        work with in the following sections.
       </para>
    
       <para>
        <indexterm>
         <primary>COPY</primary>
        </indexterm>
    
        You could also have used <command>COPY</command> to load large
        amounts of data from flat-text files.  This is usually faster
        because the <command>COPY</command> command is optimized for this
        application while allowing less flexibility than
        <command>INSERT</command>.  An example would be:
    
    <programlisting>
    COPY weather FROM '/home/user/weather.txt';
    </programlisting>
    
        where the file name for the source file must be available to the
        backend server machine, not the client, since the backend server
        reads the file directly.  You can read more about the
        <command>COPY</command> command in <xref linkend="sql-copy"
        endterm="sql-copy-title">.
       </para>
      </sect1>
    
    
      <sect1 id="tutorial-select">
       <title>Querying a Table</title>
    
       <para>
        <indexterm><primary>query</primary></indexterm>
        <indexterm><primary>SELECT</primary></indexterm>
    
        To retrieve data from a table, the table is
        <firstterm>queried</firstterm>.  An <acronym>SQL</acronym>
        <command>SELECT</command> statement is used to do this.  The
        statement is divided into a select list (the part that lists the
        columns to be returned), a table list (the part that lists the
        tables from which to retrieve the data), and an optional
        qualification (the part that specifies any restrictions).  For
        example, to retrieve all the rows of table
        <classname>weather</classname>, type:
    <programlisting>
    SELECT * FROM weather;
    </programlisting>
        Here <literal>*</literal> is a shorthand for <quote>all columns</quote>.
         <footnote>
          <para>
           While <literal>SELECT *</literal> is useful for off-the-cuff
           queries, it is widely considered bad style in production code,
           since adding a column to the table would change the results.
          </para>
         </footnote>
        So the same result would be had with:
    <programlisting>
    SELECT city, temp_lo, temp_hi, prcp, date FROM weather;
    </programlisting>
    
        The output should be:
    
    <screen>
         city      | temp_lo | temp_hi | prcp |    date
    ---------------+---------+---------+------+------------
     San Francisco |      46 |      50 | 0.25 | 1994-11-27
     San Francisco |      43 |      57 |    0 | 1994-11-29
     Hayward       |      37 |      54 |      | 1994-11-29
    (3 rows)
    </screen>
       </para>
    
       <para>
        You can write expressions, not just simple column references, in the
        select list.  For example, you can do:
    <programlisting>
    SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather;
    </programlisting>
        This should give:
    <screen>
         city      | temp_avg |    date
    ---------------+----------+------------
     San Francisco |       48 | 1994-11-27
     San Francisco |       50 | 1994-11-29
     Hayward       |       45 | 1994-11-29
    (3 rows)
    </screen>
        Notice how the <literal>AS</literal> clause is used to relabel the
        output column.  (The <literal>AS</literal> clause is optional.)
       </para>
    
       <para>
        A query can be <quote>qualified</> by adding a <literal>WHERE</>
        clause that specifies which rows are wanted.  The <literal>WHERE</>
        clause contains a Boolean (truth value) expression, and only rows for
        which the Boolean expression is true are returned.  The usual
        Boolean operators (<literal>AND</literal>,
        <literal>OR</literal>, and <literal>NOT</literal>) are allowed in
        the qualification.  For example, the following
        retrieves the weather of San Francisco on rainy days:
    
    <programlisting>
    SELECT * FROM weather
        WHERE city = 'San Francisco' AND prcp &gt; 0.0;
    </programlisting>
        Result:
    <screen>
         city      | temp_lo | temp_hi | prcp |    date
    ---------------+---------+---------+------+------------
     San Francisco |      46 |      50 | 0.25 | 1994-11-27
    (1 row)
    </screen>
       </para>
    
       <para>
        <indexterm><primary>ORDER BY</primary></indexterm>
    
        You can request that the results of a query
        be returned in sorted order:
    
    <programlisting>
    SELECT * FROM weather
        ORDER BY city;
    </programlisting>
    
    <screen>
         city      | temp_lo | temp_hi | prcp |    date
    ---------------+---------+---------+------+------------
     Hayward       |      37 |      54 |      | 1994-11-29
     San Francisco |      43 |      57 |    0 | 1994-11-29
     San Francisco |      46 |      50 | 0.25 | 1994-11-27
    </screen>
    
        In this example, the sort order isn't fully specified, and so you
        might get the San Francisco rows in either order.  But you'd always
        get the results shown above if you do
    
    <programlisting>
    SELECT * FROM weather
        ORDER BY city, temp_lo;
    </programlisting>
       </para>
    
       <para>
        <indexterm><primary>DISTINCT</primary></indexterm>
        <indexterm><primary>duplicate</primary></indexterm>
    
        You can request that duplicate rows be removed from the result of
        a query:
    
    <programlisting>
    SELECT DISTINCT city
        FROM weather;
    </programlisting>
    
    <screen>
         city
    ---------------
     Hayward
     San Francisco
    (2 rows)
    </screen>
    
        Here again, the result row ordering might vary.
        You can ensure consistent results by using <literal>DISTINCT</literal> and
        <literal>ORDER BY</literal> together:
         <footnote>
          <para>
           In some database systems, including older versions of
           <productname>PostgreSQL</productname>, the implementation of
           <literal>DISTINCT</literal> automatically orders the rows and
           so <literal>ORDER BY</literal> is redundant.  But this is not
           required by the SQL standard, and current
           <productname>PostgreSQL</productname> doesn't guarantee that
           <literal>DISTINCT</literal> causes the rows to be ordered.
          </para>
         </footnote>
    
    <programlisting>
    SELECT DISTINCT city
        FROM weather
        ORDER BY city;
    </programlisting>
       </para>
      </sect1>
    
    
      <sect1 id="tutorial-join">
       <title>Joins Between Tables</title>
    
       <indexterm zone="tutorial-join">
        <primary>join</primary>
       </indexterm>
    
       <para>
        Thus far, our queries have only accessed one table at a time.
        Queries can access multiple tables at once, or access the same
        table in such a way that multiple rows of the table are being
        processed at the same time.  A query that accesses multiple rows
        of the same or different tables at one time is called a
        <firstterm>join</firstterm> query.  As an example, say you wish to
        list all the weather records together with the location of the
        associated city.  To do that, we need to compare the city column of
        each row of the weather table with the name column of all rows in
        the cities table, and select the pairs of rows where these values match.
        <note>
         <para>
          This  is only a conceptual model.  The join is usually performed
          in a more efficient manner than actually comparing each possible
          pair of rows, but this is invisible to the user.
         </para>
        </note>
        This would be accomplished by the following query:
    
    <programlisting>
    SELECT *
        FROM weather, cities
        WHERE city = name;
    </programlisting>
    
    <screen>
         city      | temp_lo | temp_hi | prcp |    date    |     name      | location
    ---------------+---------+---------+------+------------+---------------+-----------
     San Francisco |      46 |      50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
     San Francisco |      43 |      57 |    0 | 1994-11-29 | San Francisco | (-194,53)
    (2 rows)
    </screen>
    
       </para>
    
       <para>
        Observe two things about the result set:
        <itemizedlist>
         <listitem>
          <para>
           There is no result row for the city of Hayward.  This is
           because there is no matching entry in the
           <classname>cities</classname> table for Hayward, so the join
           ignores the unmatched rows in the weather table.  We will see
           shortly how this can be fixed.
          </para>
         </listitem>
    
         <listitem>
          <para>
           There are two columns containing the city name.  This is
           correct because the lists of columns of the
           <classname>weather</classname> and the
           <classname>cities</classname> table are concatenated.  In
           practice this is undesirable, though, so you will probably want
           to list the output columns explicitly rather than using
           <literal>*</literal>:
    <programlisting>
    SELECT city, temp_lo, temp_hi, prcp, date, location
        FROM weather, cities
        WHERE city = name;
    </programlisting>
          </para>
         </listitem>
        </itemizedlist>
       </para>
    
       <formalpara>
        <title>Exercise:</title>
    
        <para>
         Attempt to find out the semantics of this query when the
         <literal>WHERE</literal> clause is omitted.
        </para>
       </formalpara>
    
       <para>
        Since the columns all had different names, the parser
        automatically found out which table they belong to, but it is good
        style to fully qualify column names in join queries:
    
    <programlisting>
    SELECT weather.city, weather.temp_lo, weather.temp_hi,
           weather.prcp, weather.date, cities.location
        FROM weather, cities
        WHERE cities.name = weather.city;
    </programlisting>
       </para>
    
       <para>
        Join queries of the kind seen thus far can also be written in this
        alternative form:
    
    <programlisting>
    SELECT *
        FROM weather INNER JOIN cities ON (weather.city = cities.name);
    </programlisting>
    
        This syntax is not as commonly used as the one above, but we show
        it here to help you understand the following topics.
       </para>
    
       <para>
        <indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
    
        Now we will figure out how we can get the Hayward records back in.
        What we want the query to do is to scan the
        <classname>weather</classname> table and for each row to find the
        matching <classname>cities</classname> row.  If no matching row is
        found we want some <quote>empty values</quote> to be substituted
        for the <classname>cities</classname> table's columns.  This kind
        of query is called an <firstterm>outer join</firstterm>.  (The
        joins we have seen so far are inner joins.)  The command looks
        like this:
    
    <programlisting>
    SELECT *
        FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
    
         city      | temp_lo | temp_hi | prcp |    date    |     name      | location
    ---------------+---------+---------+------+------------+---------------+-----------
     Hayward       |      37 |      54 |      | 1994-11-29 |               |
     San Francisco |      46 |      50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
     San Francisco |      43 |      57 |    0 | 1994-11-29 | San Francisco | (-194,53)
    (3 rows)
    </programlisting>
    
        This query is called a <firstterm>left outer
        join</firstterm> because the table mentioned on the left of the
        join operator will have each of its rows in the output at least
        once, whereas the table on the right will only have those rows
        output that match some row of the left table.  When outputting a
        left-table row for which there is no right-table match, empty (null)
        values are substituted for the right-table columns.
       </para>
    
       <formalpara>
        <title>Exercise:</title>
    
        <para>
         There are also right outer joins and full outer joins.  Try to
         find out what those do.
        </para>
       </formalpara>
    
       <para>
        <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
        <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
    
        We can also join a table against itself.  This is called a
        <firstterm>self join</firstterm>.  As an example, suppose we wish
        to find all the weather records that are in the temperature range
        of other weather records.  So we need to compare the
        <structfield>temp_lo</> and <structfield>temp_hi</> columns of
        each <classname>weather</classname> row to the
        <structfield>temp_lo</structfield> and
        <structfield>temp_hi</structfield> columns of all other
        <classname>weather</classname> rows.  We can do this with the
        following query:
    
    <programlisting>
    SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,
        W2.city, W2.temp_lo AS low, W2.temp_hi AS high
        FROM weather W1, weather W2
        WHERE W1.temp_lo &lt; W2.temp_lo
        AND W1.temp_hi &gt; W2.temp_hi;
    
         city      | low | high |     city      | low | high
    ---------------+-----+------+---------------+-----+------
     San Francisco |  43 |   57 | San Francisco |  46 |   50
     Hayward       |  37 |   54 | San Francisco |  46 |   50
    (2 rows)
    </programlisting>     
    
        Here we have relabeled the weather table as <literal>W1</> and
        <literal>W2</> to be able to distinguish the left and right side
        of the join.  You can also use these kinds of aliases in other
        queries to save some typing, e.g.:
    <programlisting>
    SELECT *
        FROM weather w, cities c
        WHERE w.city = c.name;
    </programlisting>
        You will encounter this style of abbreviating quite frequently.
       </para>
      </sect1>
    
    
      <sect1 id="tutorial-agg">
       <title>Aggregate Functions</title>
    
       <indexterm zone="tutorial-agg">
        <primary>aggregate function</primary>
       </indexterm>
    
       <para>
        <indexterm><primary>average</primary></indexterm>
        <indexterm><primary>count</primary></indexterm>
        <indexterm><primary>max</primary></indexterm>
        <indexterm><primary>min</primary></indexterm>
        <indexterm><primary>sum</primary></indexterm>
    
        Like  most  other relational database products, 
        <productname>PostgreSQL</productname> supports
        aggregate functions.
        An aggregate function computes a single result from multiple input rows.
        For example, there are aggregates to compute the
        <function>count</function>, <function>sum</function>,
        <function>avg</function> (average), <function>max</function> (maximum) and
        <function>min</function> (minimum) over a set of rows.
       </para>
    
       <para>
        As an example, we can find the highest low-temperature reading anywhere
        with
    
    <programlisting>
    SELECT max(temp_lo) FROM weather;
    </programlisting>
    
    <screen>
     max
    -----
      46
    (1 row)
    </screen>
       </para>
    
       <para>
        <indexterm><primary>subquery</primary></indexterm>
    
        If we wanted to know what city (or cities) that reading occurred in,
        we might try
    
    <programlisting>
    SELECT city FROM weather WHERE temp_lo = max(temp_lo);     <lineannotation>WRONG</lineannotation>
    </programlisting>
    
        but this will not work since the aggregate
        <function>max</function> cannot be used in the
        <literal>WHERE</literal> clause.  (This restriction exists because
        the <literal>WHERE</literal> clause determines the rows that will
        go into the aggregation stage; so it has to be evaluated before
        aggregate functions are computed.)
        However, as is often the case
        the query can be restated to accomplish the intended result, here
        by using a <firstterm>subquery</firstterm>:
    
    <programlisting>
    SELECT city FROM weather
        WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
    </programlisting>
    
    <screen>
         city
    ---------------
     San Francisco
    (1 row)
    </screen>
    
        This is OK because the subquery is an independent computation
        that computes its own aggregate separately from what is happening
        in the outer query.
       </para>
    
       <para>
        <indexterm><primary>GROUP BY</primary></indexterm>
        <indexterm><primary>HAVING</primary></indexterm>
    
        Aggregates are also very useful in combination with <literal>GROUP
        BY</literal> clauses.  For example, we can get the maximum low
        temperature observed in each city with
    
    <programlisting>
    SELECT city, max(temp_lo)
        FROM weather
        GROUP BY city;
    </programlisting>
    
    <screen>
         city      | max
    ---------------+-----
     Hayward       |  37
     San Francisco |  46
    (2 rows)
    </screen>
    
        which gives us one output row per city.  Each aggregate result is
        computed over the table rows matching that city.
        We can filter these grouped
        rows using <literal>HAVING</literal>:
    
    <programlisting>
    SELECT city, max(temp_lo)
        FROM weather
        GROUP BY city
        HAVING max(temp_lo) &lt; 40;
    </programlisting>
    
    <screen>
      city   | max
    ---------+-----
     Hayward |  37
    (1 row)
    </screen>
    
        which gives us the same results for only the cities that have all
        <literal>temp_lo</> values below 40.  Finally, if we only care about
        cities whose
        names begin with <quote><literal>S</literal></quote>, we might do
    
    <programlisting>
    SELECT city, max(temp_lo)
        FROM weather
        WHERE city LIKE 'S%'<co id="co.tutorial-agg-like">
        GROUP BY city
        HAVING max(temp_lo) &lt; 40;
    </programlisting>
       <calloutlist>
        <callout arearefs="co.tutorial-agg-like">
         <para>
          The <literal>LIKE</literal> operator does pattern matching and
          is explained in <xref linkend="functions-matching">.
         </para>
        </callout>
       </calloutlist>
       </para>
    
       <para>
        It is important to understand the interaction between aggregates and
        <acronym>SQL</acronym>'s <literal>WHERE</literal> and <literal>HAVING</literal> clauses.
        The fundamental difference between <literal>WHERE</literal> and
        <literal>HAVING</literal> is this: <literal>WHERE</literal> selects
        input rows before groups and aggregates are computed (thus, it controls
        which rows go into the aggregate computation), whereas
        <literal>HAVING</literal> selects group rows after groups and
        aggregates are computed.  Thus, the
        <literal>WHERE</literal> clause must not contain aggregate functions;
        it makes no sense to try to use an aggregate to determine which rows
        will be inputs to the aggregates.  On the other hand, the
        <literal>HAVING</literal> clause always contains aggregate functions.
        (Strictly speaking, you are allowed to write a <literal>HAVING</literal>
        clause that doesn't use aggregates, but it's seldom useful. The same
        condition could be used more efficiently at the <literal>WHERE</literal>
        stage.)
       </para>
    
       <para>
        In the previous example, we can apply the city name restriction in
        <literal>WHERE</literal>, since it needs no aggregate.  This is
        more efficient than adding the restriction to <literal>HAVING</literal>,
        because we avoid doing the grouping and aggregate calculations
        for all rows that fail the <literal>WHERE</literal> check.
       </para>
      </sect1>
    
    
      <sect1 id="tutorial-update">
       <title>Updates</title>
    
       <indexterm zone="tutorial-update">
        <primary>UPDATE</primary>
       </indexterm>
    
       <para>
        You can update existing rows using the
        <command>UPDATE</command> command. 
        Suppose you discover the temperature readings are
        all  off  by 2 degrees as of November 28.  You may update the
        data as follows:
    
    <programlisting>
    UPDATE weather
        SET temp_hi = temp_hi - 2,  temp_lo = temp_lo - 2
        WHERE date &gt; '1994-11-28';
    </programlisting>
       </para>
    
       <para>
        Look at the new state of the data:
    <programlisting>
    SELECT * FROM weather;
    
         city      | temp_lo | temp_hi | prcp |    date
    ---------------+---------+---------+------+------------
     San Francisco |      46 |      50 | 0.25 | 1994-11-27
     San Francisco |      41 |      55 |    0 | 1994-11-29
     Hayward       |      35 |      52 |      | 1994-11-29
    (3 rows)
    </programlisting>
       </para>
      </sect1>
    
      <sect1 id="tutorial-delete">
       <title>Deletions</title>
    
       <indexterm zone="tutorial-delete">
        <primary>DELETE</primary>
       </indexterm>
    
       <para>
        Rows can be removed from a table using the <command>DELETE</command>
        command.
        Suppose you are no longer interested in the weather of Hayward.
        Then you can do the following to delete those rows from the table:
    <programlisting>
    DELETE FROM weather WHERE city = 'Hayward';
    </programlisting>
    
        All weather records belonging to Hayward are removed.
    
    <programlisting>
    SELECT * FROM weather;
    </programlisting>
    
    <screen>
         city      | temp_lo | temp_hi | prcp |    date
    ---------------+---------+---------+------+------------
     San Francisco |      46 |      50 | 0.25 | 1994-11-27
     San Francisco |      41 |      55 |    0 | 1994-11-29
    (2 rows)
    </screen>
       </para>
    
       <para>
        One should be wary of statements of the form
    <synopsis>
    DELETE FROM <replaceable>tablename</replaceable>;
    </synopsis>
    
        Without a qualification, <command>DELETE</command> will
        remove  <emphasis>all</>  rows from the given table, leaving it
        empty.  The system will not request confirmation before
        doing this!
       </para>
      </sect1>
    
     </chapter>
    
    <!-- Keep this comment at the end of the file
    Local variables:
    mode:sgml
    sgml-omittag:nil
    sgml-shorttag:t
    sgml-minimize-attributes:nil
    sgml-always-quote-attributes:t
    sgml-indent-step:1
    sgml-indent-data:t
    sgml-parent-document:nil
    sgml-default-dtd-file:"./reference.ced"
    sgml-exposed-tags:nil
    sgml-local-catalogs:("/usr/lib/sgml/catalog")
    sgml-local-ecat-files:nil
    End:
    -->