Commits · 9e8da0f75731aaa7605cf4656c21ea09e84d2eb1 · Jakob Huber / postgres-lambda-diff

Oct 16, 2011

Teach btree to handle ScalarArrayOpExpr quals natively. · 9e8da0f7

Tom Lane authored 13 years ago

This allows "indexedcol op ANY(ARRAY[...])" conditions to be used in plain
indexscans, and particularly in index-only scans.

9e8da0f7

Oct 14, 2011

Measure the number of all-visible pages for use in index-only scan costing. · e6858e66

Tom Lane authored 13 years ago

Add a column pg_class.relallvisible to remember the number of pages that
were all-visible according to the visibility map as of the last VACUUM
(or ANALYZE, or some other operations that update pg_class.relpages).
Use relallvisible/relpages, instead of an arbitrary constant, to estimate
how many heap page fetches can be avoided during an index-only scan.

This is pretty primitive and will no doubt see refinements once we've
acquired more field experience with the index-only scan mechanism, but
it's way better than using a constant.

Note: I had to adjust an underspecified query in the window.sql regression
test, because it was changing answers when the plan changed to use an
index-only scan.  Some of the adjacent tests perhaps should be adjusted
as well, but I didn't do that here.

e6858e66

Oct 11, 2011

Rearrange the implementation of index-only scans. · a0185461

Tom Lane authored 13 years ago

This commit changes index-only scans so that data is read directly from the
index tuple without first generating a faux heap tuple. The only immediate
benefit is that indexes on system columns (such as OID) can be used in
index-only scans, but this is necessary infrastructure if we are ever to
support index-only scans on expression indexes. The executor is now ready
for that, though the planner still needs substantial work to recognize
the possibility.

To do this, Vars in index-only plan nodes have to refer to index columns
not heap columns. I introduced a new special varno, INDEX_VAR, to mark
such Vars to avoid confusion. (In passing, this commit renames the two
existing special varnos to OUTER_VAR and INNER_VAR.) This allows
ruleutils.c to handle them with logic similar to what we use for subplan
reference Vars.

Since index-only scans are now fundamentally different from regular
indexscans so far as their expression subtrees are concerned, I also chose
to change them to have their own plan node type (and hence, their own
executor source file).

a0185461

Oct 08, 2011

Support index-only scans using the visibility map to avoid heap fetches. · a2822fb9

Tom Lane authored 13 years ago

When a btree index contains all columns required by the query, and the
visibility map shows that all tuples on a target heap page are
visible-to-all, we don't need to fetch that heap page. This patch depends
on the previous patches that made the visibility map reliable.

There's a fair amount left to do here, notably trying to figure out a less
chintzy way of estimating the cost of an index-only scan, but the core
functionality seems ready to commit.

Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.

a2822fb9

Sep 03, 2011

Rearrange planner to save the whole PlannerInfo (subroot) for a subquery. · b3aaf908

Tom Lane authored 13 years ago

Formerly, set_subquery_pathlist and other creators of plans for subqueries
saved only the rangetable and rowMarks lists from the lower-level
PlannerInfo. But there's no reason not to remember the whole PlannerInfo,
and indeed this turns out to simplify matters in a number of places.

The immediate reason for doing this was so that the subroot will still be
accessible when we're trying to extract column statistics out of an
already-planned subquery. But now that I've done it, it seems like a good
code-beautification effort in its own right.

I also chose to get rid of the transient subrtable and subrowmark fields in
SubqueryScan nodes, in favor of having setrefs.c look up the subquery's
RelOptInfo. That required changing all the APIs in setrefs.c to pass
PlannerInfo not PlannerGlobal, which was a large but quite mechanical
transformation.

One side-effect not foreseen at the beginning is that this finally broke
inheritance_planner's assumption that replanning the same subquery RTE N
times would necessarily give interchangeable results each time. That
assumption was always pretty risky, but now we really have to make a
separate RTE for each instance so that there's a place to carry the
separate subroots.

b3aaf908

Sep 01, 2011
- Remove unnecessary #include references, per pgrminclude script. · 6416a82a
  Bruce Momjian authored 13 years ago
  
  6416a82a
Apr 24, 2011

Improve cost estimation for aggregates and window functions. · e6a30a8c

Tom Lane authored 13 years ago

The previous coding failed to account properly for the costs of evaluating
the input expressions of aggregates and window functions, as seen in a
recent gripe from Claudio Freire. (I said at the time that it wasn't
counting these costs at all; but on closer inspection, it was effectively
charging these costs once per output tuple. That is completely wrong for
aggregates, and not exactly right for window functions either.)

There was also a hard-wired assumption that aggregates and window functions
had procost 1.0, which is now fixed to respect the actual cataloged costs.

The costing of WindowAgg is still pretty bogus, since it doesn't try to
estimate the effects of spilling data to disk, but that seems like a
separate issue.

e6a30a8c

Apr 10, 2011
- pgindent run before PG 9.1 beta 1. · bf50caf1
  Bruce Momjian authored 13 years ago
  
  bf50caf1
Mar 26, 2011
- More collations cleanup, from trawling for missed collation assignments. · 0c9d9e8d
  Tom Lane authored 14 years ago
  
  Mostly cosmetic, though I did find that generateClonedIndexStmt failed to clone the index's collations.
  0c9d9e8d
Mar 22, 2011

Reimplement planner's handling of MIN/MAX aggregate optimization (again). · 8df08c84

Tom Lane authored 14 years ago

Instead of playing cute games with pathkeys, just build a direct
representation of the intended sub-select, and feed it through
query_planner to get a Path for the index access. This is a bit slower
than 9.1's previous method, since we'll duplicate most of the overhead of
query_planner; but since the whole optimization only applies to rather
simple single-table queries, that probably won't be much of a problem in
practice. The advantage is that we get to do the right thing when there's
a partial index that needs the implicit IS NOT NULL clause to be usable.
Also, although this makes planagg.c be a bit more closely tied to the
ordering of operations in grouping_planner, we can get rid of some coupling
to lower-level parts of the planner. Per complaint from Marti Raudsepp.

8df08c84

Mar 20, 2011

Revise collation derivation method and expression-tree representation. · b310b6e3

Tom Lane authored 14 years ago

All expression nodes now have an explicit output-collation field, unless
they are known to only return a noncollatable data type (such as boolean
or record). Also, nodes that can invoke collation-aware functions store
a separate field that is the collation value to pass to the function.
This avoids confusion that arises when a function has collatable inputs
and noncollatable output type, or vice versa.

Also, replace the parser's on-the-fly collation assignment method with
a post-pass over the completed expression tree. This allows us to use
a more complex (and hopefully more nearly spec-compliant) assignment
rule without paying for it in extra storage in every expression node.

Fix assorted bugs in the planner's handling of collations by making
collation one of the defining properties of an EquivalenceClass and
by converting CollateExprs into discardable RelabelType nodes during
expression preprocessing.

b310b6e3

Feb 26, 2011

Support data-modifying commands (INSERT/UPDATE/DELETE) in WITH. · 389af951

Tom Lane authored 14 years ago

This patch implements data-modifying WITH queries according to the
semantics that the updates all happen with the same command counter value,
and in an unspecified order.  Therefore one WITH clause can't see the
effects of another, nor can the outer query see the effects other than
through the RETURNING values.  And attempts to do conflicting updates will
have unpredictable results.  We'll need to document all that.

This commit just fixes the code; documentation updates are waiting on
author.

Marko Tiikkaja and Hitoshi Harada

389af951

Feb 20, 2011

Implement an API to let foreign-data wrappers actually be functional. · bb742407

Tom Lane authored 14 years ago

This commit provides the core code and documentation needed.  A contrib
module test case will follow shortly.

Shigeru Hanada, Jan Urbanski, Heikki Linnakangas

bb742407

Feb 17, 2011

Fix bogus test for hypothetical indexes in get_actual_variable_range(). · a2095f7f

Tom Lane authored 14 years ago

That function was supposing that indexoid == 0 for a hypothetical index,
but that is not likely to be true in any non-toy implementation of an index
adviser, since assigning a fake OID is the only way to know at EXPLAIN time
which hypothetical index got selected.  Fix by adding a flag to
IndexOptInfo to mark hypothetical indexes.  Back-patch to 9.0 where
get_actual_variable_range() was added.

Gurjeet Singh

a2095f7f

Feb 10, 2011

Fix improper matching of resjunk column names for FOR UPDATE in subselect. · e617f0d7

Tom Lane authored 14 years ago

Flattening of subquery range tables during setrefs.c could lead to the
rangetable indexes in PlanRowMark nodes not matching up with the column
names previously assigned to the corresponding resjunk ctid (resp. tableoid
or wholerow) columns. Typical symptom would be either a "cannot extract
system attribute from virtual tuple" error or an Assert failure. This
wasn't a problem before 9.0 because we didn't support FOR UPDATE below the
top query level, and so the final flattening could never renumber an RTE
that was relevant to FOR UPDATE. Fix by using a plan-tree-wide unique
number for each PlanRowMark to label the associated resjunk columns, so
that the number need not change during flattening.

Per report from David Johnston (though I'm darned if I can see how this got
past initial testing of the relevant code). Back-patch to 9.0.

e617f0d7

Feb 08, 2011

Per-column collation support · 414c5a2e

Peter Eisentraut authored 14 years ago

This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.

Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch

414c5a2e

Jan 01, 2011
- Stamp copyrights for year 2011. · 5d950e3b
  Bruce Momjian authored 14 years ago
  
  5d950e3b
Dec 03, 2010

Create core infrastructure for KNNGIST. · d583f10b

Tom Lane authored 14 years ago

This is a heavily revised version of builtin_knngist_core-0.9. The
ordering operators are no longer mixed in with actual quals, which would
have confused not only humans but significant parts of the planner.
Instead, ordering operators are carried separately throughout planning and
execution.

Since the API for ambeginscan and amrescan functions had to be changed
anyway, this commit takes the opportunity to rationalize that a bit.
RelationGetIndexScan no longer forces a premature index_rescan call;
instead, callers of index_beginscan must call index_rescan too. Aside from
making the AM-side initialization logic a bit less peculiar, this has the
advantage that we do not make a useless extra am_rescan call when there are
runtime key values. AMs formerly could not assume that the key values
passed to amrescan were actually valid; now they can.

Teodor Sigaev and Tom Lane

d583f10b

Nov 29, 2010

Simplify and speed up mapping of index opfamilies to pathkeys. · c0b5fac7

Tom Lane authored 14 years ago

Formerly we looked up the operators associated with each index (caching
them in relcache) and then the planner looked up the btree opfamily
containing such operators in order to build the btree-centric pathkey
representation that describes the index's sort order. This is quite
pointless for btree indexes: we might as well just use the index's opfamily
information directly. That saves syscache lookup cycles during planning,
and furthermore allows us to eliminate the relcache's caching of operators
altogether, which may help in reducing backend startup time.

I added code to plancat.c to perform the same type of double lookup
on-the-fly if it's ever faced with a non-btree amcanorder index AM.
If such a thing actually becomes interesting for production, we should
replace that logic with some more-direct method for identifying the
corresponding btree opfamily; but it's not worth spending effort on now.

There is considerably more to do pursuant to my recent proposal to get rid
of sort-operator-based representations of sort orderings, but this patch
grabs some of the low-hanging fruit. I'll look at the remainder of that
work after the current commitfest.

c0b5fac7

Nov 24, 2010

Create the system catalog infrastructure needed for KNNGIST. · 725d52d0

Tom Lane authored 14 years ago

This commit adds columns amoppurpose and amopsortfamily to pg_amop, and
column amcanorderbyop to pg_am.  For the moment all the entries in
amcanorderbyop are "false", since the underlying support isn't there yet.

Also, extend the CREATE OPERATOR CLASS/ALTER OPERATOR FAMILY commands with
[ FOR SEARCH | FOR ORDER BY sort_operator_family ] clauses to allow the new
columns of pg_amop to be populated, and create pg_dump support for dumping
that information.

I also added some documentation, although it's perhaps a bit premature
given that the feature doesn't do anything useful yet.

Teodor Sigaev, Robert Haas, Tom Lane

725d52d0

Nov 20, 2010

Further cleanup of indxpath logic related to IndexOptInfo.opfamily array. · 89a36841

Tom Lane authored 14 years ago

We no longer need the terminating zero entry in opfamily[], so get rid of
it. Also replace assorted ad-hoc looping logic with simple for and foreach
constructs. This code is now noticeably more readable than it was an hour
ago; credit to Robert for seeing that it could be simplified.

89a36841

Nov 18, 2010

Further fallout from the MergeAppend patch. · 6fbc323c

Tom Lane authored 14 years ago

Fix things so that top-N sorting can be used in child Sort nodes of a
MergeAppend node, when there is a LIMIT and no intervening joins or
grouping. Actually doing this on the executor side isn't too bad,
but it's a bit messier to get the planner to cost it properly.
Per gripe from Robert Haas.

In passing, fix an oversight in the original top-N-sorting patch:
query_planner should not assume that a LIMIT can be used to make an
explicit sort cheaper when there will be grouping or aggregation in
between. Possibly this should be back-patched, but I'm not sure the
mistake is serious enough to be a real problem in practice.

6fbc323c

Nov 04, 2010

Reimplement planner's handling of MIN/MAX aggregate optimization. · 034967bd

Tom Lane authored 14 years ago

Per my recent proposal, get rid of all the direct inspection of indexes
and manual generation of paths in planagg.c. Instead, set up
EquivalenceClasses for the aggregate argument expressions, and let the
regular path generation logic deal with creating paths that can satisfy
those sort orders. This makes planagg.c a bit more visible to the rest
of the planner than it was originally, but the approach is basically a lot
cleaner than before. A major advantage of doing it this way is that we get
MIN/MAX optimization on inheritance trees (using MergeAppend of indexscans)
practically for free, whereas in the old way we'd have had to add a whole
lot more duplicative logic.

One small disadvantage of this approach is that MIN/MAX aggregates can no
longer exploit partial indexes having an "x IS NOT NULL" predicate, unless
that restriction or something that implies it is specified in the query.
The previous implementation was able to use the added "x IS NOT NULL"
condition as an extra predicate proof condition, but in this version we
rely entirely on indexes that are considered usable by the main planning
process. That seems a fair tradeoff for the simplicity and functionality
gained.

034967bd

Oct 14, 2010

Support MergeAppend plans, to allow sorted output from append relations. · 11cad29c

Tom Lane authored 14 years ago

This patch eliminates the former need to sort the output of an Append scan
when an ordered scan of an inheritance tree is wanted. This should be
particularly useful for fast-start cases such as queries with LIMIT.

Original patch by Greg Stark, with further hacking by Hans-Jurgen Schonig,
Robert Haas, and Tom Lane.

11cad29c

Sep 28, 2010

Fix PlaceHolderVar mechanism's interaction with outer joins. · eb229505

Tom Lane authored 14 years ago

The point of a PlaceHolderVar is to allow a non-strict expression to be
evaluated below an outer join, after which its value bubbles up like a Var
and can be forced to NULL when the outer join's semantics require that.
However, there was a serious design oversight in that, namely that we
didn't ensure that there was actually a correct place in the plan tree
to evaluate the placeholder :-(. It may be necessary to delay evaluation
of an outer join to ensure that a placeholder that should be evaluated
below the join can be evaluated there. Per recent bug report from Kirill
Simonov.

Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.

eb229505

Sep 20, 2010
- Remove cvs keywords from all files. · 9f2e2113
  Magnus Hagander authored 14 years ago
  
  9f2e2113
Jul 12, 2010

Make NestLoop plan nodes pass outer-relation variables into their inner · 53e75768

Tom Lane authored 14 years ago

relation using the general PARAM_EXEC executor parameter mechanism, rather
than the ad-hoc kluge of passing the outer tuple down through ExecReScan.
The previous method was hard to understand and could never be extended to
handle parameters coming from multiple join levels. This patch doesn't
change the set of possible plans nor have any significant performance effect,
but it's necessary infrastructure for future generalization of the concept
of an inner indexscan plan.

ExecReScan's second parameter is now unused, so it's removed.

53e75768

Jul 06, 2010
- pgindent run for 9.0, second run · 239d769e
  Bruce Momjian authored 14 years ago
  
  239d769e
Mar 30, 2010

Fix "constraint_exclusion = partition" logic so that it will also attempt · d174a4ad

Tom Lane authored 15 years ago

constraint exclusion on an inheritance set that is the target of an UPDATE
or DELETE query.  Per gripe from Marc Cousin.  Back-patch to 8.4 where
the feature was introduced.

d174a4ad

Mar 29, 2010

Rework join-removal logic as per recent discussion. In particular this · b78f6264

Tom Lane authored 15 years ago

fixes things so that it works for cases where nested removals are possible.
The overhead of the optimization should be significantly less, as well.

b78f6264

Feb 26, 2010
- pgindent run for 9.0 · 65e806cb
  Bruce Momjian authored 15 years ago
  
  65e806cb
Jan 05, 2010

Support ALTER TABLESPACE name SET/RESET ( tablespace_options ). · d86d51a9

Robert Haas authored 15 years ago

This patch only supports seq_page_cost and random_page_cost as parameters,
but it provides the infrastructure to scalably support many more.
In particular, we may want to add support for effective_io_concurrency,
but I'm leaving that as future work for now.

Thanks to Tom Lane for design help and Alvaro Herrera for the review.

d86d51a9

Jan 02, 2010
- Update copyright for the year 2010. · 02398008
  Bruce Momjian authored 15 years ago
  
  02398008
Jan 01, 2010

Support "x IS NOT NULL" clauses as indexscan conditions. This turns out · 29c4ad98

Tom Lane authored 15 years ago

to be just a minor extension of the previous patch that made "x IS NULL"
indexable, because we can treat the IS NOT NULL condition as if it were
"x < NULL" or "x > NULL" (depending on the index's NULLS FIRST/LAST option),
just like IS NULL is treated like "x = NULL". Aside from any possible
usefulness in its own right, this is an important improvement for
index-optimized MAX/MIN aggregates: it is now reliably possible to get
a column's min or max value cheaply, even when there are a lot of nulls
cluttering the interesting end of the index.

29c4ad98

Nov 28, 2009

Eliminate a lot of list-management overhead within join_search_one_level · 1a95f127

Tom Lane authored 15 years ago

by adding a requirement that build_join_rel add new join RelOptInfos to the
appropriate list immediately at creation. Per report from Robert Haas,
the list_concat_unique_ptr() calls that this change eliminates were taking
the lion's share of the runtime in larger join problems. This doesn't do
anything to fix the fundamental combinatorial explosion in large join
problems, but it should push out the threshold of pain a bit further.

Note: because this changes the order in which joinrel lists are built,
it might result in changes in selected plans in cases where different
alternatives have exactly the same costs. There is one example in the
regression tests.

1a95f127

Nov 15, 2009

Improve planning of Materialize nodes inserted atop the inner input of a · caf9c830

Tom Lane authored 15 years ago

mergejoin to shield it from doing mark/restore and refetches. Put an explicit
flag in MergePath so we can centralize the logic that knows about this,
and add costing logic that considers using Materialize even when it's not
forced by the previously-existing considerations. This is in response to
a discussion back in August that suggested that materializing an inner
indexscan can be helpful when the refetch percentage is high enough.

caf9c830

Oct 26, 2009

Re-implement EvalPlanQual processing to improve its performance and eliminate · 9f2ee8f2

Tom Lane authored 15 years ago

a lot of strange behaviors that occurred in join cases. We now identify the
"current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the
appropriate row into each scan node in the rechecking plan, forcing it to emit
only that one row. The former behavior could rescan the whole of each joined
relation for each recheck, which was terrible for performance, and what's much
worse could result in duplicated output tuples.

Also, the original implementation of EvalPlanQual could not re-use the recheck
execution tree --- it had to go through a full executor init and shutdown for
every row to be tested. To avoid this overhead, I've associated a special
runtime Param with each LockRows or ModifyTable plan node, and arranged to
make every scan node below such a node depend on that Param. Thus, by
signaling a change in that Param, the EPQ machinery can just rescan the
already-built test plan.

This patch also adds a prohibition on set-returning functions in the
targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the
duplicate-output-tuple problem. It seems fairly reasonable since the
other restrictions on SELECT FOR UPDATE are meant to ensure that there
is a unique correspondence between source tuples and result tuples,
which an output SRF destroys as much as anything else does.

9f2ee8f2

Oct 12, 2009

Move the handling of SELECT FOR UPDATE locking and rechecking out of · 0adaf4cb

Tom Lane authored 15 years ago

execMain.c and into a new plan node type LockRows. Like the recent change
to put table updating into a ModifyTable plan node, this increases planning
flexibility by allowing the operations to occur below the top level of the
plan tree. It's necessary in any case to restore the previous behavior of
having FOR UPDATE locking occur before ModifyTable does.

This partially refactors EvalPlanQual to allow multiple rows-under-test
to be inserted into the EPQ machinery before starting an EPQ test query.
That isn't sufficient to fix EPQ's general bogosity in the face of plans
that return multiple rows per test row, though. Since this patch is
mostly about getting some plan node infrastructure in place and not about
fixing ten-year-old bugs, I will leave EPQ improvements for another day.

Another behavioral change that we could now think about is doing FOR UPDATE
before LIMIT, but that too seems like it should be treated as a followon
patch.

0adaf4cb

Oct 10, 2009

Split the processing of INSERT/UPDATE/DELETE operations out of execMain.c. · 8a5849b7

Tom Lane authored 15 years ago

They are now handled by a new plan node type called ModifyTable, which is
placed at the top of the plan tree.  In itself this change doesn't do much,
except perhaps make the handling of RETURNING lists and inherited UPDATEs a
tad less klugy.  But it is necessary preparation for the intended extension of
allowing RETURNING queries inside WITH.

Marko Tiikkaja

8a5849b7

Sep 17, 2009

Implement "join removal" for cases where the inner side of a left join · 488d70ab

Tom Lane authored 15 years ago

is unique and is not referenced above the join. In this case the inner
side doesn't affect the query result and can be thrown away entirely.
Although perhaps nobody would ever write such a thing by hand, it's
a reasonably common case in machine-generated SQL.

The current implementation only recognizes the case where the inner side
is a simple relation with a unique index matching the query conditions.
This is enough for the use-cases that have been shown so far, but we
might want to try to handle other cases later.

Robert Haas, somewhat rewritten by Tom

488d70ab