diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index cd66abc8ba6bb6d3bef2084736fa39931b2ffade..a848a7edd11f6961fa13e55b635849d62903a565 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2027,6 +2027,11 @@ include_dir 'conf.d' as much CPU time, memory, I/O bandwidth, and so forth as a query which uses no workers at all. </para> + + <para> + For more information on parallel query, see + <xref linkend="parallel-query">. + </para> </listitem> </varlistentry> diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 43837114baee804cf39c9e7c2024bb1b884f9f4b..69649a7da4bbcf87aff2cfc4de44394dbf68460d 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -24,6 +24,7 @@ <!ENTITY indices SYSTEM "indices.sgml"> <!ENTITY json SYSTEM "json.sgml"> <!ENTITY mvcc SYSTEM "mvcc.sgml"> +<!ENTITY parallel SYSTEM "parallel.sgml"> <!ENTITY perform SYSTEM "perform.sgml"> <!ENTITY queries SYSTEM "queries.sgml"> <!ENTITY rangetypes SYSTEM "rangetypes.sgml"> diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml new file mode 100644 index 0000000000000000000000000000000000000000..c80d42dbef18f45e310c7ba4025e008034e746a0 --- /dev/null +++ b/doc/src/sgml/parallel.sgml @@ -0,0 +1,472 @@ +<!-- doc/src/sgml/parallel.sgml --> + + <chapter id="parallel-query"> + <title>Parallel Query</title> + + <indexterm zone="parallel-query"> + <primary>parallel query</primary> + </indexterm> + + <para> + <productname>PostgreSQL</> can devise query plans which can leverage + multiple CPUs in order to answer queries faster. This feature is known + as parallel query. Many queries cannot benefit from parallel query, either + due to limitations of the current implementation or because there is no + imaginable query plan which is any faster than the serial query plan. + However, for queries that can benefit, the speedup from parallel query + is often very significant. Many queries can run more than twice as fast + when using parallel query, and some queries can run four times faster or + even more. Queries that touch a large amount of data but return only a + few rows to the user will typically benefit most. This chapter explains + some details of how parallel query works and in which situations it can be + used so that users who wish to make use of it can understand what to expect. + </para> + + <sect1 id="how-parallel-query-works"> + <title>How Parallel Query Works</title> + + <para> + When the optimizer determines that parallel query is the fastest execution + strategy for a particular query, it will create a query plan which includes + a <firstterm>Gather node</firstterm>. Here is a simple example: + +<screen> +EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%'; + QUERY PLAN +------------------------------------------------------------------------------------- + Gather (cost=1000.00..217018.43 rows=1 width=97) + Workers Planned: 2 + -> Parallel Seq Scan on pgbench_accounts (cost=0.00..216018.33 rows=1 width=97) + Filter: (filler ~~ '%x%'::text) +(4 rows) +</screen> + </para> + + <para> + In all cases, the <literal>Gather</literal> node will have exactly one + child plan, which is the portion of the plan that will be executed in + parallel. If the <literal>Gather</> node is at the very top of the plan + tree, then the entire query will execute in parallel. If it is somewhere + else in the plan tree, then only that portion of the query will run in + parallel. In the example above, the query accesses only one table, so + there is only one plan node other than the <literal>Gather</> node itself; + since that plan node is a child of the <literal>Gather</> node, it will + run in parallel. + </para> + + <para> + <link linkend="using-explain">Using EXPLAIN</>, you can see the number of + workers chosen by the planner. When the <literal>Gather</> node is reached + during query execution, the process which is implementing the user's + session will request a number of <link linkend="bgworker">background + worker processes</link> equal to the number + of workers chosen by the planner. The total number of background + workers that can exist at any one time is limited by + <xref linkend="guc-max-worker-processes">, so it is possible for a + parallel query to run with fewer workers than planned, or even with + no workers at all. The optimal plan may depend on the number of workers + that are available, so this can result in poor query performance. If this + occurrence is frequent, considering increasing + <varname>max_worker_processes</> so that more workers can be run + simultaneously or alternatively reducing + <xref linkend="guc-max-parallel-workers-per-gather"> so that the planner + requests fewer workers. + </para> + + <para> + Every background worker process which is successfully started for a given + parallel query will execute the portion of the plan which is a descendent + of the <literal>Gather</> node. The leader will also execute that portion + of the plan, but it has an additional responsibility: it must also read + all of the tuples generated by the workers. When the parallel portion of + the plan generates only a small number of tuples, the leader will often + behave very much like an additional worker, speeding up query execution. + Conversely, when the parallel portion of the plan generates a large number + of tuples, the leader may be almost entirely occupied with reading the + tuples generated by the workers and performing any further processing + steps which are required by plan nodes above the level of the + <literal>Gather</literal> node. In such cases, the leader will do very + little of the work of executing the parallel portion of the plan. + </para> + </sect1> + + <sect1 id="when-can-parallel-query-be-used"> + <title>When Can Parallel Query Be Used?</title> + + <para> + There are several settings which can cause the query planner not to + generate a parallel query plan under any circumstances. In order for + any parallel query plans whatsoever to be generated, the following + settings must be configured as indicated. + </para> + + <itemizedlist> + <listitem> + <para> + <xref linkend="guc-max-parallel-workers-per-gather"> must be set to a + value which is greater than zero. This is a special case of the more + general principle that no more workers should be used than the number + configured via <varname>max_parallel_workers_per_gather</varname>. + </para> + </listitem> + + <listitem> + <para> + <xref linkend="guc-dynamic-shared-memory-type"> must be set to a + value other than <literal>none</>. Parallel query requires dynamic + shared memory in order to pass data between cooperating processes. + </para> + </listitem> + </itemizedlist> + + <para> + In addition, the system must not be running in single-user mode. Since + the entire database system is running in single process in this situation, + no background workers will be available. + </para> + + <para> + Even when it is in general possible for parallel query plans to be + generated, the planner will not generate them for a given query + if any of the following are true: + </para> + + <itemizedlist> + <listitem> + <para> + The query writes any data or locks any database rows. If a query + contains a data-modifying operation either at the top level or within + a CTE, no parallel plans for that query will be generated. This is a + limitation of the current implementation which could be lifted in a + future release. + </para> + </listitem> + + <listitem> + <para> + The query might be suspended during execution. In any situation in + which the system thinks that partial or incremental execution might + occur, no parallel plan is generated. For example, a cursor created + using <link linkend="sql-declare">DECLARE CURSOR</link> will never use + a parallel plan. Similarly, a PL/pgsql loop of the form + <literal>FOR x IN query LOOP .. END LOOP</literal> will never use a + parallel plan, because the parallel query system is unable to verify + that the code in the loop is safe to execute while parallel query is + active. + </para> + </listitem> + + <listitem> + <para> + The query uses any function marked <literal>PARALLEL UNSAFE</literal>. + Most system-defined functions are <literal>PARALLEL SAFE</literal>, + but user-defined functions are marked <literal>PARALLEL + UNSAFE</literal> by default. See the discussion of + <xref linkend="parallel-safety">. + </para> + </listitem> + + <listitem> + <para> + The query is running inside of another query that is already parallel. + For example, if a function called by a parallel query issues an SQL + query itself, that query will never use a parallel plan. This is a + limitation of the current implementation, but it may not be desirable + to remove this limitation, since it could result in a single query + using a very large number of processes. + </para> + </listitem> + + <listitem> + <para> + The transaction isolation level is serializable. This is + a limitation of the current implementation. + </para> + </listitem> + </itemizedlist> + + <para> + Even when parallel query plan is generated for a particular query, there + are several circumstances under which it will be impossible to execute + that plan in parallel at execution time. If this occurs, the leader + will execute the portion of the plan between below the <literal>Gather</> + node entirely by itself, almost as if the <literal>Gather</> node were + not present. This will happen if any of the following conditions are met: + </para> + + <itemizedlist> + <listitem> + <para> + No background workers can be obtained because of the limitation that + the total number of background workers cannot exceed + <xref linkend="guc-max-worker-processes">. + </para> + </listitem> + + <listitem> + <para> + The client sends an Execute message with a non-zero fetch count. + See the discussion of the + <link linkend="protocol-flow-ext-query">extended query protocol</link>. + Since <link linkend="libpq">libpq</link> currently provides no way to + send such a message, this can only occur when using a client that + does not rely on libpq. If this is a frequent + occurrence, it may be a good idea to set + <xref linkend="guc-max-parallel-workers-per-gather"> in sessions + where it is likely, so as to avoid generating query plans that may + be suboptimal when run serially. + </para> + </listitem> + + <listitem> + <para> + The transaction isolation level is serializable. This situation + does not normally arise, because parallel query plans are not + generated when the transaction isolation level is serializable. + However, it can happen if the transaction isolation level is changed to + serializable after the plan is generated and before it is executed. + </para> + </listitem> + </itemizedlist> + </sect1> + + <sect1 id="parallel-plans"> + <title>Parallel Plans</title> + + <para> + Because each worker executes the parallel portion of the plan to + completion, it is not possible to simply take an ordinary query plan + and run it using multiple workers. Each worker would produce a full + copy of the output result set, so the query would not run any faster + than normal but would produce incorrect results. Instead, the parallel + portion of the plan must be what is known internally to the query + optimizer as a <firstterm>partial plan</>; that is, it must constructed + so that each process will which executes the plan will generate only a + subset of the output rows in such a way that each required output row + is guaranteed to be generated by exactly one of the cooperating processes. + </para> + + <sect2 id="parallel-scans"> + <title>Parallel Scans</title> + + <para> + Currently, the only type of scan which has been modified to work with + parallel query is a sequential scan. Therefore, the driving table in + a parallel plan will always be scanned using a + <literal>Parallel Seq Scan</>. The relation's blocks will be divided + among the cooperating processes. Blocks are handed out one at a + time, so that access to the relation remains sequential. Each process + will visit every tuple on the page assigned to it before requesting a new + page. + </para> + </sect2> + + <sect2 id="parallel-joins"> + <title>Parallel Joins</title> + + <para> + The driving table may be joined to one or more other tables using nested + loops or hash joins. The outer side of the join may be any kind of + non-parallel plan that is otherwise supported by the planner provided that + it is safe to run within a parallel worker. For example, it may be an + index scan which looks up a value based on a column taken from the inner + table. Each worker will execute the outer side of the plan in full, which + is why merge joins are not supported here. The outer side of a merge join + will often involve sorting the entire inner table; even if it involves an + index, it is unlikely to be productive to have multiple processes each + conduct a full index scan of the inner table. + </para> + </sect2> + + <sect2 id="parallel-aggregation"> + <title>Parallel Aggregation</title> + <para> + It is not possible to perform the aggregation portion of a query entirely + in parallel. For example, if a query involves selecting + <literal>COUNT(*)</>, each worker could compute a total, but those totals + would need to combined in order to produce a final answer. If the query + involved a <literal>GROUP BY</> clause, a separate total would need to + be computed for each group. Even though aggregation can't be done entirely + in parallel, queries involving aggregation are often excellent candidates + for parallel query, because they typically read many rows but return only + a few rows to the client. Queries that return many rows to the client + are often limited by the speed at which the client can read the data, + in which case parallel query cannot help very much. + </para> + + <para> + <productname>PostgreSQL</> supports parallel aggregation by aggregating + twice. First, each process participating in the parallel portion of the + query performs an aggregation step, producing a partial result for each + group of which that process is aware. This is reflected in the plan as + a <literal>PartialAggregate</> node. Second, the partial results are + transferred to the leader via the <literal>Gather</> node. Finally, the + leader re-aggregates the results across all workers in order to produce + the final result. This is reflected in the plan as a + <literal>FinalizeAggregate</> node. + </para> + + <para> + Parallel aggregation is not supported in all situations. Each aggregate + must be <link linkend="parallel-safety">safe</> for parallelism and must + have a combine function. If the aggregate has a transition state of type + <literal>internal</>, it must have serialization and deserialization + functions. See <xref linkend="sql-createaggregate"> for more details. + Parallel aggregation is not supported for ordered set aggregates or when + the query involves <literal>GROUPING SETS</>. It can only be used when + all joins involved in the query are also part of the parallel portion + of the plan. + </para> + + </sect2> + + <sect2 id="parallel-plan-tips"> + <title>Parallel Plan Tips</title> + + <para> + If a query that is expected to do so does not produce a parallel plan, + you can try reducing <xref linkend="guc-parallel-setup-cost"> or + <xref linkend="guc-parallel-tuple-cost">. Of course, this plan may turn + out to be slower than the serial plan which the planner preferred, but + this will not always be the case. If you don't get a parallel + plan even with very small values of these settings (e.g. after setting + them both to zero), there may be some reason why the query planner is + unable to generate a parallel plan for your query. See + <xref linkend="when-can-parallel-query-be-used"> and + <xref linkend="parallel-safety"> for information on why this may be + the case. + </para> + + <para> + When executing a parallel plan, you can use <literal>EXPLAIN (ANALYZE, + VERBOSE)</literal> will display per-worker statistics for each plan node. + This may be useful in determining whether the work is being evenly + distributed between all plan nodes and more generally in understanding the + performance characteristics of the plan. + </para> + + </sect2> + </sect1> + + <sect1 id="parallel-safety"> + <title>Parallel Safety</title> + + <para> + The planner classifies operations involved in a query as either + <firstterm>parallel safe</>, <firstterm>parallel restricted</>, + or <firstterm>parallel unsafe</>. A parallel safe operation is one which + does not conflict with the use of parallel query. A parallel restricted + operation is one which cannot be performed in a parallel worker, but which + can be performed in the leader while parallel query is in use. Therefore, + parallel restricted operations can never occur below a <literal>Gather</> + node, but can occur elsewhere in a plan which contains a + <literal>Gather</> node. A parallel unsafe operation is one which cannot + be performed while parallel query is in use, not even in the leader. + When a query contains anything which is parallel unsafe, parallel query + is completely disabled for that query. + </para> + + <para> + The following operations are always parallel restricted. + </para> + + <itemizedlist> + <listitem> + <para> + Scans of common table expressions (CTEs). + </para> + </listitem> + + <listitem> + <para> + Scans of temporary tables. + </para> + </listitem> + + <listitem> + <para> + Scans of foreign tables, unless the foreign data wrapper has + an <literal>IsForeignScanParallelSafe</> API which indicates otherwise. + </para> + </listitem> + + <listitem> + <para> + Access to an <literal>InitPlan</> or <literal>SubPlan</>. + </para> + </listitem> + </itemizedlist> + + <sect2 id="parallel-labeling"> + <title>Parallel Labeling for Functions and Aggregates</title> + + <para> + The planner cannot automatically determine whether a user-defined + function or aggregate is parallel safe, parallel restricted, or parallel + unsafe, because this would require predicting every operation which the + function could possibly perform. In general, this is equivalent to the + Halting Problem and therefore impossible. Even for simple functions + where it conceivably be done, we do not try, since this would be expensive + and error-prone. Instead, all user-defined functions are assumed to + be parallel unsafe unless otherwise marked. When using + <xref linkend="sql-createfunction"> or + <xref linkend="sql-alterfunction">, markings can be set by specifying + <literal>PARALLEL SAFE</>, <literal>PARALLEL RESTRICTED</>, or + <literal>PARALLEL UNSAFE</> as appropriate. When using + <xref linkend="sql-createaggregate">, the + <literal>PARALLEL</> option can be specified with <literal>SAFE</>, + <literal>RESTRICTED</>, or <literal>UNSAFE</> as the corresponding value. + </para> + + <para> + Functions and aggregates must be marked <literal>PARALLEL UNSAFE</> if + they write to the database, access sequences, change the transaction state + even temporarily (e.g. a PL/pgsql function which establishes an + <literal>EXCEPTION</> block to catch errors), or make persistent changes to + settings. Similarly, functions must be marked <literal>PARALLEL + RESTRICTED</> if they access temporary tables, client connection state, + cursors, prepared statements, or miscellaneous backend-local state which + the system cannot synchronize across workers. For example, + <literal>setseed</> and <literal>random</> are parallel restricted for + this last reason. + </para> + + <para> + In general, if a function is labeled as being safe when it is restricted or + unsafe, or if it is labeled as being restricted when it is in fact unsafe, + it may throw errors or produce wrong answers when used in a parallel query. + C-language functions could in theory exhibit totally undefined behavior if + mislabeled, since there is no way for the system to protect itself against + arbitrary C code, but in most likely cases the result will be no worse than + for any other function. If in doubt, it is probably best to label functions + as <literal>UNSAFE</>. + </para> + + <para> + If a function executed within a parallel worker acquires locks which are + not held by the leader, for example by querying a table not referenced in + the query, those locks will be released at worker exit, not end of + transaction. If you write a function which does this, and this behavior + difference is important to you, mark such functions as + <literal>PARALLEL RESTRICTED</literal> + to ensure that they execute only in the leader. + </para> + + <para> + Note that the query planner does not consider deferring the evaluation of + parallel-restricted functions or aggregates involved in the query in + order to obtain a superior plan. So, for example, if a <literal>WHERE</> + clause applied to a particular table is parallel restricted, the query + planner will not consider placing the scan of that table below a + <literal>Gather</> node. In some cases, it would be + possible (and perhaps even efficient) to include the scan of that table in + the parallel portion of the query and defer the evaluation of the + <literal>WHERE</> clause so that it happens above the <literal>Gather</> + node. However, the planner does not do this. + </para> + + </sect2> + + </sect1> + + </chapter> diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml index 0346d367e5d761fe900ff5cd58ff62b5f18b5ff4..9143917c490b753e832bb301923b8e2f691c5ea7 100644 --- a/doc/src/sgml/postgres.sgml +++ b/doc/src/sgml/postgres.sgml @@ -106,6 +106,7 @@ &textsearch; &mvcc; &perform; + ∥ </part>