From ff6fe1502d2d8553fa3a305483344cb90a6411d4 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Mon, 13 Dec 1999 17:39:38 +0000 Subject: [PATCH] Update documentation to reflect availability of aggregate(DISTINCT). Try to provide a more lucid discussion in 'Using Aggregate Functions' tutorial section. --- doc/src/sgml/query.sgml | 73 ++++++++++++++++++++++++++++++------ doc/src/sgml/ref/select.sgml | 21 ++++++----- doc/src/sgml/syntax.sgml | 53 ++++++++++++++++++-------- 3 files changed, 111 insertions(+), 36 deletions(-) diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml index a1830eb0297..961bc0d9dc7 100644 --- a/doc/src/sgml/query.sgml +++ b/doc/src/sgml/query.sgml @@ -361,39 +361,90 @@ DELETE FROM classname; Like most other query languages, <ProductName>PostgreSQL</ProductName> supports aggregate functions. - The current implementation of - <ProductName>Postgres</ProductName> aggregate functions have some limitations. - Specifically, while there are aggregates to compute - such functions as the <Function>count</Function>, <Function>sum</Function>, + An aggregate function computes a single result from multiple input rows. + For example, there are aggregates to compute the + <Function>count</Function>, <Function>sum</Function>, <Function>avg</Function> (average), <Function>max</Function> (maximum) and - <Function>min</Function> (minimum) over a set of instances, aggregates can only - appear in the target list of a query and not directly in the - qualification (the where clause). As an example, + <Function>min</Function> (minimum) over a set of instances. + </para> + + <Para> + It is important to understand the interaction between aggregates and + SQL's <Command>where</Command> and <Command>having</Command> clauses. + The fundamental difference between <Command>where</Command> and + <Command>having</Command> is this: <Command>where</Command> selects + input rows before groups and aggregates are computed (thus, it controls + which rows go into the aggregate computation), whereas + <Command>having</Command> selects group rows after groups and + aggregates are computed. Thus, the + <Command>where</Command> clause may not contain aggregate functions; + it makes no sense to try to use an aggregate to determine which rows + will be inputs to the aggregates. On the other hand, + <Command>having</Command> clauses always contain aggregate functions. + (Strictly speaking, you are allowed to write a <Command>having</Command> + clause that doesn't use aggregates, but it's wasteful; the same condition + could be used more efficiently at the <Command>where</Command> stage.) + </para> + + <Para> + As an example, we can find the highest low-temperature reading anywhere + with <ProgramListing> SELECT max(temp_lo) FROM weather; </ProgramListing> - is allowed, while + If we want to know which city (or cities) that reading occurred in, + we might try <ProgramListing> SELECT city FROM weather WHERE temp_lo = max(temp_lo); </ProgramListing> - is not. However, as is often the case the query can be restated to accomplish - the intended result; here by using a <FirstTerm>subselect</FirstTerm>: + but this will not work since the aggregate max() can't be used in + <Command>where</Command>. However, as is often the case the query can be + restated to accomplish the intended result; here by using a + <FirstTerm>subselect</FirstTerm>: <ProgramListing> SELECT city FROM weather WHERE temp_lo = (SELECT max(temp_lo) FROM weather); </ProgramListing> + This is OK because the sub-select is an independent computation that + computes its own aggregate separately from what's happening in the outer + select. </Para> <Para> - Aggregates may also have <FirstTerm>group by</FirstTerm> clauses: + Aggregates are also very useful in combination with + <FirstTerm>group by</FirstTerm> clauses. For example, we can get the + maximum low temperature observed in each city with <ProgramListing> SELECT city, max(temp_lo) FROM weather GROUP BY city; </ProgramListing> + which gives us one output row per city. We can filter these grouped + rows using <Command>having</Command>: + <ProgramListing> +SELECT city, max(temp_lo) + FROM weather + GROUP BY city + HAVING min(temp_lo) < 0; + </ProgramListing> + which gives us the same results for only the cities that have some + below-zero readings. Finally, if we only care about cities whose + names begin with 'P', we might do + <ProgramListing> +SELECT city, max(temp_lo) + FROM weather + WHERE city like 'P%' + GROUP BY city + HAVING min(temp_lo) < 0; + </ProgramListing> + Note that we can apply the city-name restriction in + <Command>where</Command>, since it needs no aggregate. This is + more efficient than adding the restriction to <Command>having</Command>, + because we avoid doing the grouping and aggregate calculations + for all rows that fail the <Command>where</Command> check. </Para> </sect1> </Chapter> diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index f2afa38b7c9..070f8b43d0f 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -1,5 +1,5 @@ <!-- -$Header: /cvsroot/pgsql/doc/src/sgml/ref/select.sgml,v 1.22 1999/08/06 13:50:31 thomas Exp $ +$Header: /cvsroot/pgsql/doc/src/sgml/ref/select.sgml,v 1.23 1999/12/13 17:39:38 tgl Exp $ Postgres documentation --> @@ -202,10 +202,10 @@ SELECT [ ALL | DISTINCT [ ON <replaceable class="PARAMETER">column</replaceable> <para> <command>DISTINCT</command> will eliminate all duplicate rows from the - selection. + result. <command>DISTINCT ON <replaceable class="PARAMETER">column</replaceable></command> will eliminate all duplicates in the specified column; this is - equivalent to using + similar to using <command>GROUP BY <replaceable class="PARAMETER">column</replaceable></command>. <command>ALL</command> will return all candidate rows, including duplicates. @@ -320,11 +320,13 @@ GROUP BY <replaceable class="PARAMETER">column</replaceable> [, ...] <para> GROUP BY will condense into a single row all rows that share the - same values for the - grouped columns; aggregates return values derived from all rows - that make up the group. The value returned for an ungrouped - and unaggregated column is dependent on the order in which rows - happen to be read from the database. + same values for the grouped columns. Aggregate functions, if any, + are computed across all rows making up each group, producing a + separate value for each group (whereas without GROUP BY, an + aggregate produces a single value computed across all the selected + rows). When GROUP BY is present, it is not valid to refer to + ungrouped columns except within aggregate functions, since there + would be more than one possible value to return for an ungrouped column. </para> </refsect2> @@ -354,7 +356,8 @@ HAVING <replaceable class="PARAMETER">cond_expr</replaceable> <para> Each column referenced in <replaceable class="PARAMETER">cond_expr</replaceable> shall unambiguously - reference a grouping column. + reference a grouping column, unless the reference appears within an + aggregate function. </para> </refsect2> diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index a74bd08be11..beacf8fbdf0 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -642,15 +642,16 @@ CAST '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> <member><replaceable>a_expr</replaceable> <replaceable>right_unary_operator</replaceable></member> <member><replaceable>left_unary_operator</replaceable> <replaceable>a_expr</replaceable></member> <member>parameter</member> - <member>functional expressions</member> - <member>aggregate expressions</member> + <member>functional expression</member> + <member>aggregate expression</member> </simplelist> </para> <para> - We have already discussed constants and attributes. The two kinds of - operator expressions indicate respectively binary and left_unary - expressions. The following sections discuss the remaining options. + We have already discussed constants and attributes. The three kinds of + operator expressions indicate respectively binary (infix), right-unary + (suffix) and left-unary (prefix) operators. The following sections + discuss the remaining options. </para> <sect2> @@ -690,7 +691,7 @@ CREATE FUNCTION dept (name) enclosed in parentheses: <synopsis> -<replaceable>function</replaceable> (<replaceable>a_expr</replaceable> [, <replaceable>a_expr</replaceable> ) +<replaceable>function</replaceable> (<replaceable>a_expr</replaceable> [, <replaceable>a_expr</replaceable> ... ] ) </synopsis> </para> @@ -705,20 +706,40 @@ sqrt(emp.salary) </sect2> <sect2> - <title>Aggregate Expression</title> + <title>Aggregate Expressions</title> <para> - An <firstterm>aggregate expression</firstterm> - represents a simple aggregate (i.e., one that computes a single value) - or an aggregate function (i.e., one that computes a set of values). - The syntax is the following: + An <firstterm>aggregate expression</firstterm> represents the application + of an aggregate function across the rows selected by a query. + An aggregate function reduces multiple inputs to a single output value, + such as the sum or average of the inputs. + The syntax of an aggregate expression is one of the following: - <synopsis> -<replaceable>aggregate_name</replaceable> (<replaceable>attribute</replaceable>) - </synopsis> + <simplelist> + <member><replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable>)</member> + <member><replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable>)</member> + <member><replaceable>aggregate_name</replaceable> ( * )</member> + </simplelist> + + where <replaceable>aggregate_name</replaceable> is a previously defined + aggregate, and <replaceable>expression</replaceable> is any expression + that doesn't itself contain an aggregate expression. + </para> - where <replaceable>aggregate_name</replaceable> - must be a previously defined aggregate. + <para> + The first form of aggregate expression invokes the aggregate across all + input rows for which the given expression yields a non-null value. + The second form invokes the aggregate for all distinct non-null values + of the expression found in the input rows. The last form invokes the + aggregate once for each input row regardless of null or non-null values; + since no particular input value is specified, it is generally only useful + for the count() aggregate. + </para> + + <para> + For example, count(*) yields the total number of input rows; + count(f1) yields the number of input rows in which f1 is non-null; + count(distinct f1) yields the number of distinct non-null values of f1. </para> </sect2> -- GitLab