Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
P
postgres-lambda-diff
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Jakob Huber
postgres-lambda-diff
Commits
88ba4018
Commit
88ba4018
authored
Jun 11, 2001
by
Tom Lane
Browse files
Options
Downloads
Patches
Plain Diff
Update EXPLAIN discussion and examples to match current sources.
parent
01a819ab
No related branches found
No related tags found
No related merge requests found
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
doc/src/sgml/perform.sgml
+40
-41
40 additions, 41 deletions
doc/src/sgml/perform.sgml
with
40 additions
and
41 deletions
doc/src/sgml/perform.sgml
+
40
−
41
View file @
88ba4018
<!--
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.
5
2001/0
5
/1
7 21:50:16 petere
Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.
6
2001/0
6
/1
1 00:52:09 tgl
Exp $
-->
-->
<chapter id="performance-tips">
<chapter id="performance-tips">
...
@@ -15,26 +15,19 @@ $Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.5 2001/05/17 21:50:16 pete
...
@@ -15,26 +15,19 @@ $Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.5 2001/05/17 21:50:16 pete
<sect1 id="using-explain">
<sect1 id="using-explain">
<title>Using <command>EXPLAIN</command></title>
<title>Using <command>EXPLAIN</command></title>
<note>
<title>Author</title>
<para>
Written by Tom Lane, from e-mail dated 2000-03-27.
</para>
</note>
<para>
<para>
<productname>Postgres</productname> devises a <firstterm>query
<productname>Postgres</productname> devises a <firstterm>query
plan</firstterm> for each query it is given. Choosing the right
plan</firstterm> for each query it is given. Choosing the right
plan to match the query structure and the properties of the data
plan to match the query structure and the properties of the data
is absolutely critical for good performance. You can use the
is absolutely critical for good performance. You can use the
<command>EXPLAIN</command> command to see what query plan the system
<command>EXPLAIN</command> command to see what query plan the system
creates for any query.
Unfortunately,
creates for any query.
p
lan-reading is an art that deserves a
tutorial, and I haven't
P
lan-reading is an art that deserves a
n extensive tutorial, which
had time to write one. Here is some quick & dirty explan
ation.
this is not; but here is some basic inform
ation.
</para>
</para>
<para>
<para>
The numbers that are currently quoted by
EXPLAIN
are:
The numbers that are currently quoted by
<command>EXPLAIN</command>
are:
<itemizedlist>
<itemizedlist>
<listitem>
<listitem>
...
@@ -94,12 +87,12 @@ $Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.5 2001/05/17 21:50:16 pete
...
@@ -94,12 +87,12 @@ $Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.5 2001/05/17 21:50:16 pete
estimated selectivity of any WHERE-clause constraints that are being
estimated selectivity of any WHERE-clause constraints that are being
applied at this node. Ideally the top-level rows estimate will
applied at this node. Ideally the top-level rows estimate will
approximate the number of rows actually returned, updated, or deleted
approximate the number of rows actually returned, updated, or deleted
by the query
(again, without considering the effects of LIMIT)
.
by the query.
</para>
</para>
<para>
<para>
Here are some examples (using the regress test database after a
Here are some examples (using the regress test database after a
vacuum analyze, and
almost-7.0
sources):
vacuum analyze, and
7.2 development
sources):
<programlisting>
<programlisting>
regression=# explain select * from tenk1;
regression=# explain select * from tenk1;
...
@@ -129,45 +122,51 @@ select * from pg_class where relname = 'tenk1';
...
@@ -129,45 +122,51 @@ select * from pg_class where relname = 'tenk1';
regression=# explain select * from tenk1 where unique1 < 1000;
regression=# explain select * from tenk1 where unique1 < 1000;
NOTICE: QUERY PLAN:
NOTICE: QUERY PLAN:
Seq Scan on tenk1 (cost=0.00..358.00 rows=100
0
width=148)
Seq Scan on tenk1 (cost=0.00..358.00 rows=100
3
width=148)
</programlisting>
</programlisting>
The estimate of output rows has gone down because of the WHERE clause.
The estimate of output rows has gone down because of the WHERE clause.
(This estimate is uncannily accurate because tenk1 is a particularly
However, the scan will still have to visit all 10000 rows, so the cost
simple case --- the unique1 column has 10000 distinct values ranging
hasn't decreased; in fact it has gone up a bit to reflect the extra CPU
from 0 to 9999, so the estimator's linear interpolation between min and
time spent checking the WHERE condition.
max column values is dead-on.) However, the scan will still have to
</para>
visit all 10000 rows, so the cost hasn't decreased; in fact it has gone
up a bit to reflect the extra CPU time spent checking the WHERE
<para>
condition.
The actual number of rows this query would select is 1000, but the
estimate is only approximate. If you try to duplicate this experiment,
you will probably get a slightly different estimate; moreover, it will
change after each <command>ANALYZE</command> command, because the
statistics produced by <command>ANALYZE</command> are taken from a
randomized sample of the table.
</para>
</para>
<para>
<para>
Modify the query to restrict the qualification even more:
Modify the query to restrict the qualification even more:
<programlisting>
<programlisting>
regression=# explain select * from tenk1 where unique1 <
10
0;
regression=# explain select * from tenk1 where unique1 <
5
0;
NOTICE: QUERY PLAN:
NOTICE: QUERY PLAN:
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..
89
.3
5
rows=
100
width=148)
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..
173
.3
2
rows=
47
width=148)
</programlisting>
</programlisting>
and you will see that if we make the WHERE condition selective
and you will see that if we make the WHERE condition selective
enough, the planner will
enough, the planner will
eventually decide that an indexscan is cheaper than a sequential scan.
eventually decide that an indexscan is cheaper than a sequential scan.
This plan will only have to visit 100 tuples because of the index,
This plan will only have to visit 50 tuples because of the index,
so it wins despite the fact that each individual fetch is expensive.
so it wins despite the fact that each individual fetch is more expensive
than reading a whole disk page sequentially.
</para>
</para>
<para>
<para>
Add another condition to the qualification:
Add another condition to the qualification:
<programlisting>
<programlisting>
regression=# explain select * from tenk1 where unique1 <
10
0 and
regression=# explain select * from tenk1 where unique1 <
5
0 and
regression-# stringu1 = 'xxx';
regression-# stringu1 = 'xxx';
NOTICE: QUERY PLAN:
NOTICE: QUERY PLAN:
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..
89.60
rows=1 width=148)
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..
173.44
rows=1 width=148)
</programlisting>
</programlisting>
The added clause "stringu1 = 'xxx'" reduces the output-rows estimate,
The added clause "stringu1 = 'xxx'" reduces the output-rows estimate,
...
@@ -178,22 +177,22 @@ Index Scan using tenk1_unique1 on tenk1 (cost=0.00..89.60 rows=1 width=148)
...
@@ -178,22 +177,22 @@ Index Scan using tenk1_unique1 on tenk1 (cost=0.00..89.60 rows=1 width=148)
Let's try joining two tables, using the fields we have been discussing:
Let's try joining two tables, using the fields we have been discussing:
<programlisting>
<programlisting>
regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1 <
10
0
regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1 <
5
0
regression-# and t1.unique2 = t2.unique2;
regression-# and t1.unique2 = t2.unique2;
NOTICE: QUERY PLAN:
NOTICE: QUERY PLAN:
Nested Loop (cost=0.00..
144.07
rows=
100
width=296)
Nested Loop (cost=0.00..
269.11
rows=
47
width=296)
-> Index Scan using tenk1_unique1 on tenk1 t1
-> Index Scan using tenk1_unique1 on tenk1 t1
(cost=0.00..
89
.3
5
rows=
100
width=148)
(cost=0.00..
173
.3
2
rows=
47
width=148)
-> Index Scan using tenk2_unique2 on tenk2 t2
-> Index Scan using tenk2_unique2 on tenk2 t2
(cost=0.00..
0.53
rows=1 width=148)
(cost=0.00..
2.01
rows=1 width=148)
</programlisting>
</programlisting>
</para>
</para>
<para>
<para>
In this nested-loop join, the outer scan is the same indexscan we had
In this nested-loop join, the outer scan is the same indexscan we had
in the example before last, and so its cost and row count are the same
in the example before last, and so its cost and row count are the same
because we are applying the "unique1 <
10
0" WHERE clause at that node.
because we are applying the "unique1 <
5
0" WHERE clause at that node.
The "t1.unique2 = t2.unique2" clause isn't relevant yet, so it doesn't
The "t1.unique2 = t2.unique2" clause isn't relevant yet, so it doesn't
affect the outer scan's row count. For the inner scan, the
affect the outer scan's row count. For the inner scan, the
current
current
...
@@ -203,7 +202,7 @@ Nested Loop (cost=0.00..144.07 rows=100 width=296)
...
@@ -203,7 +202,7 @@ Nested Loop (cost=0.00..144.07 rows=100 width=296)
same inner-scan plan and costs that we'd get from, say, "explain select
same inner-scan plan and costs that we'd get from, say, "explain select
* from tenk2 where unique2 = 42". The loop node's costs are then set
* from tenk2 where unique2 = 42". The loop node's costs are then set
on the basis of the outer scan's cost, plus one repetition of the
on the basis of the outer scan's cost, plus one repetition of the
inner scan for each outer tuple (
100 * 0.53
, here), plus a little CPU
inner scan for each outer tuple (
47 * 2.01
, here), plus a little CPU
time for join processing.
time for join processing.
</para>
</para>
...
@@ -226,27 +225,27 @@ Nested Loop (cost=0.00..144.07 rows=100 width=296)
...
@@ -226,27 +225,27 @@ Nested Loop (cost=0.00..144.07 rows=100 width=296)
<programlisting>
<programlisting>
regression=# set enable_nestloop = off;
regression=# set enable_nestloop = off;
SET VARIABLE
SET VARIABLE
regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1
< 10
0
regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1
< 5
0
regression-# and t1.unique2 = t2.unique2;
regression-# and t1.unique2 = t2.unique2;
NOTICE: QUERY PLAN:
NOTICE: QUERY PLAN:
Hash Join (cost=
89.60..574.10
rows=
100
width=296)
Hash Join (cost=
173.44..557.03
rows=
47
width=296)
-> Seq Scan on tenk2 t2
-> Seq Scan on tenk2 t2
(cost=0.00..333.00 rows=10000 width=148)
(cost=0.00..333.00 rows=10000 width=148)
-> Hash (cost=
89
.3
5
..
89
.3
5
rows=
100
width=148)
-> Hash (cost=
173
.3
2
..
173
.3
2
rows=
47
width=148)
-> Index Scan using tenk1_unique1 on tenk1 t1
-> Index Scan using tenk1_unique1 on tenk1 t1
(cost=0.00..
89
.3
5
rows=
100
width=148)
(cost=0.00..
173
.3
2
rows=
47
width=148)
</programlisting>
</programlisting>
This plan proposes to extract the
10
0 interesting rows of tenk1
This plan proposes to extract the
5
0 interesting rows of tenk1
using ye same olde indexscan, stash them into an in-memory hash table,
using ye same olde indexscan, stash them into an in-memory hash table,
and then do a sequential scan of tenk2, probing into the hash table
and then do a sequential scan of tenk2, probing into the hash table
for possible matches of "t1.unique2 = t2.unique2" at each tenk2 tuple.
for possible matches of "t1.unique2 = t2.unique2" at each tenk2 tuple.
The cost to read tenk1 and set up the hash table is entirely start-up
The cost to read tenk1 and set up the hash table is entirely start-up
cost for the hash join, since we won't get any tuples out until we can
cost for the hash join, since we won't get any tuples out until we can
start reading tenk2. The total time estimate for the join also
start reading tenk2. The total time estimate for the join also
includes a
pretty
hefty charge for CPU time to probe the hash table
includes a hefty charge for CPU time to probe the hash table
10000 times. Note, however, that we are NOT charging 10000 times
89
.3
5
;
10000 times. Note, however, that we are NOT charging 10000 times
173
.3
2
;
the hash table setup is only done once in this plan type.
the hash table setup is only done once in this plan type.
</para>
</para>
</sect1>
</sect1>
...
...
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment