Skip to content
Snippets Groups Projects
  1. Oct 16, 2011
  2. Oct 14, 2011
    • Tom Lane's avatar
      Measure the number of all-visible pages for use in index-only scan costing. · e6858e66
      Tom Lane authored
      Add a column pg_class.relallvisible to remember the number of pages that
      were all-visible according to the visibility map as of the last VACUUM
      (or ANALYZE, or some other operations that update pg_class.relpages).
      Use relallvisible/relpages, instead of an arbitrary constant, to estimate
      how many heap page fetches can be avoided during an index-only scan.
      
      This is pretty primitive and will no doubt see refinements once we've
      acquired more field experience with the index-only scan mechanism, but
      it's way better than using a constant.
      
      Note: I had to adjust an underspecified query in the window.sql regression
      test, because it was changing answers when the plan changed to use an
      index-only scan.  Some of the adjacent tests perhaps should be adjusted
      as well, but I didn't do that here.
      e6858e66
  3. Oct 11, 2011
    • Tom Lane's avatar
      Rearrange the implementation of index-only scans. · a0185461
      Tom Lane authored
      This commit changes index-only scans so that data is read directly from the
      index tuple without first generating a faux heap tuple.  The only immediate
      benefit is that indexes on system columns (such as OID) can be used in
      index-only scans, but this is necessary infrastructure if we are ever to
      support index-only scans on expression indexes.  The executor is now ready
      for that, though the planner still needs substantial work to recognize
      the possibility.
      
      To do this, Vars in index-only plan nodes have to refer to index columns
      not heap columns.  I introduced a new special varno, INDEX_VAR, to mark
      such Vars to avoid confusion.  (In passing, this commit renames the two
      existing special varnos to OUTER_VAR and INNER_VAR.)  This allows
      ruleutils.c to handle them with logic similar to what we use for subplan
      reference Vars.
      
      Since index-only scans are now fundamentally different from regular
      indexscans so far as their expression subtrees are concerned, I also chose
      to change them to have their own plan node type (and hence, their own
      executor source file).
      a0185461
  4. Oct 08, 2011
    • Tom Lane's avatar
      Support index-only scans using the visibility map to avoid heap fetches. · a2822fb9
      Tom Lane authored
      When a btree index contains all columns required by the query, and the
      visibility map shows that all tuples on a target heap page are
      visible-to-all, we don't need to fetch that heap page.  This patch depends
      on the previous patches that made the visibility map reliable.
      
      There's a fair amount left to do here, notably trying to figure out a less
      chintzy way of estimating the cost of an index-only scan, but the core
      functionality seems ready to commit.
      
      Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.
      a2822fb9
  5. Sep 03, 2011
    • Tom Lane's avatar
      Rearrange planner to save the whole PlannerInfo (subroot) for a subquery. · b3aaf908
      Tom Lane authored
      Formerly, set_subquery_pathlist and other creators of plans for subqueries
      saved only the rangetable and rowMarks lists from the lower-level
      PlannerInfo.  But there's no reason not to remember the whole PlannerInfo,
      and indeed this turns out to simplify matters in a number of places.
      
      The immediate reason for doing this was so that the subroot will still be
      accessible when we're trying to extract column statistics out of an
      already-planned subquery.  But now that I've done it, it seems like a good
      code-beautification effort in its own right.
      
      I also chose to get rid of the transient subrtable and subrowmark fields in
      SubqueryScan nodes, in favor of having setrefs.c look up the subquery's
      RelOptInfo.  That required changing all the APIs in setrefs.c to pass
      PlannerInfo not PlannerGlobal, which was a large but quite mechanical
      transformation.
      
      One side-effect not foreseen at the beginning is that this finally broke
      inheritance_planner's assumption that replanning the same subquery RTE N
      times would necessarily give interchangeable results each time.  That
      assumption was always pretty risky, but now we really have to make a
      separate RTE for each instance so that there's a place to carry the
      separate subroots.
      b3aaf908
  6. Sep 01, 2011
  7. Apr 24, 2011
    • Tom Lane's avatar
      Improve cost estimation for aggregates and window functions. · e6a30a8c
      Tom Lane authored
      The previous coding failed to account properly for the costs of evaluating
      the input expressions of aggregates and window functions, as seen in a
      recent gripe from Claudio Freire.  (I said at the time that it wasn't
      counting these costs at all; but on closer inspection, it was effectively
      charging these costs once per output tuple.  That is completely wrong for
      aggregates, and not exactly right for window functions either.)
      
      There was also a hard-wired assumption that aggregates and window functions
      had procost 1.0, which is now fixed to respect the actual cataloged costs.
      
      The costing of WindowAgg is still pretty bogus, since it doesn't try to
      estimate the effects of spilling data to disk, but that seems like a
      separate issue.
      e6a30a8c
  8. Apr 10, 2011
  9. Mar 26, 2011
  10. Mar 22, 2011
    • Tom Lane's avatar
      Reimplement planner's handling of MIN/MAX aggregate optimization (again). · 8df08c84
      Tom Lane authored
      Instead of playing cute games with pathkeys, just build a direct
      representation of the intended sub-select, and feed it through
      query_planner to get a Path for the index access.  This is a bit slower
      than 9.1's previous method, since we'll duplicate most of the overhead of
      query_planner; but since the whole optimization only applies to rather
      simple single-table queries, that probably won't be much of a problem in
      practice.  The advantage is that we get to do the right thing when there's
      a partial index that needs the implicit IS NOT NULL clause to be usable.
      Also, although this makes planagg.c be a bit more closely tied to the
      ordering of operations in grouping_planner, we can get rid of some coupling
      to lower-level parts of the planner.  Per complaint from Marti Raudsepp.
      8df08c84
  11. Mar 20, 2011
    • Tom Lane's avatar
      Revise collation derivation method and expression-tree representation. · b310b6e3
      Tom Lane authored
      All expression nodes now have an explicit output-collation field, unless
      they are known to only return a noncollatable data type (such as boolean
      or record).  Also, nodes that can invoke collation-aware functions store
      a separate field that is the collation value to pass to the function.
      This avoids confusion that arises when a function has collatable inputs
      and noncollatable output type, or vice versa.
      
      Also, replace the parser's on-the-fly collation assignment method with
      a post-pass over the completed expression tree.  This allows us to use
      a more complex (and hopefully more nearly spec-compliant) assignment
      rule without paying for it in extra storage in every expression node.
      
      Fix assorted bugs in the planner's handling of collations by making
      collation one of the defining properties of an EquivalenceClass and
      by converting CollateExprs into discardable RelabelType nodes during
      expression preprocessing.
      b310b6e3
  12. Feb 26, 2011
    • Tom Lane's avatar
      Support data-modifying commands (INSERT/UPDATE/DELETE) in WITH. · 389af951
      Tom Lane authored
      This patch implements data-modifying WITH queries according to the
      semantics that the updates all happen with the same command counter value,
      and in an unspecified order.  Therefore one WITH clause can't see the
      effects of another, nor can the outer query see the effects other than
      through the RETURNING values.  And attempts to do conflicting updates will
      have unpredictable results.  We'll need to document all that.
      
      This commit just fixes the code; documentation updates are waiting on
      author.
      
      Marko Tiikkaja and Hitoshi Harada
      389af951
  13. Feb 20, 2011
  14. Feb 17, 2011
    • Tom Lane's avatar
      Fix bogus test for hypothetical indexes in get_actual_variable_range(). · a2095f7f
      Tom Lane authored
      That function was supposing that indexoid == 0 for a hypothetical index,
      but that is not likely to be true in any non-toy implementation of an index
      adviser, since assigning a fake OID is the only way to know at EXPLAIN time
      which hypothetical index got selected.  Fix by adding a flag to
      IndexOptInfo to mark hypothetical indexes.  Back-patch to 9.0 where
      get_actual_variable_range() was added.
      
      Gurjeet Singh
      a2095f7f
  15. Feb 10, 2011
    • Tom Lane's avatar
      Fix improper matching of resjunk column names for FOR UPDATE in subselect. · e617f0d7
      Tom Lane authored
      Flattening of subquery range tables during setrefs.c could lead to the
      rangetable indexes in PlanRowMark nodes not matching up with the column
      names previously assigned to the corresponding resjunk ctid (resp. tableoid
      or wholerow) columns.  Typical symptom would be either a "cannot extract
      system attribute from virtual tuple" error or an Assert failure.  This
      wasn't a problem before 9.0 because we didn't support FOR UPDATE below the
      top query level, and so the final flattening could never renumber an RTE
      that was relevant to FOR UPDATE.  Fix by using a plan-tree-wide unique
      number for each PlanRowMark to label the associated resjunk columns, so
      that the number need not change during flattening.
      
      Per report from David Johnston (though I'm darned if I can see how this got
      past initial testing of the relevant code).  Back-patch to 9.0.
      e617f0d7
  16. Feb 08, 2011
    • Peter Eisentraut's avatar
      Per-column collation support · 414c5a2e
      Peter Eisentraut authored
      This adds collation support for columns and domains, a COLLATE clause
      to override it per expression, and B-tree index support.
      
      Peter Eisentraut
      reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
      414c5a2e
  17. Jan 01, 2011
  18. Dec 03, 2010
    • Tom Lane's avatar
      Create core infrastructure for KNNGIST. · d583f10b
      Tom Lane authored
      This is a heavily revised version of builtin_knngist_core-0.9.  The
      ordering operators are no longer mixed in with actual quals, which would
      have confused not only humans but significant parts of the planner.
      Instead, ordering operators are carried separately throughout planning and
      execution.
      
      Since the API for ambeginscan and amrescan functions had to be changed
      anyway, this commit takes the opportunity to rationalize that a bit.
      RelationGetIndexScan no longer forces a premature index_rescan call;
      instead, callers of index_beginscan must call index_rescan too.  Aside from
      making the AM-side initialization logic a bit less peculiar, this has the
      advantage that we do not make a useless extra am_rescan call when there are
      runtime key values.  AMs formerly could not assume that the key values
      passed to amrescan were actually valid; now they can.
      
      Teodor Sigaev and Tom Lane
      d583f10b
  19. Nov 29, 2010
    • Tom Lane's avatar
      Simplify and speed up mapping of index opfamilies to pathkeys. · c0b5fac7
      Tom Lane authored
      Formerly we looked up the operators associated with each index (caching
      them in relcache) and then the planner looked up the btree opfamily
      containing such operators in order to build the btree-centric pathkey
      representation that describes the index's sort order.  This is quite
      pointless for btree indexes: we might as well just use the index's opfamily
      information directly.  That saves syscache lookup cycles during planning,
      and furthermore allows us to eliminate the relcache's caching of operators
      altogether, which may help in reducing backend startup time.
      
      I added code to plancat.c to perform the same type of double lookup
      on-the-fly if it's ever faced with a non-btree amcanorder index AM.
      If such a thing actually becomes interesting for production, we should
      replace that logic with some more-direct method for identifying the
      corresponding btree opfamily; but it's not worth spending effort on now.
      
      There is considerably more to do pursuant to my recent proposal to get rid
      of sort-operator-based representations of sort orderings, but this patch
      grabs some of the low-hanging fruit.  I'll look at the remainder of that
      work after the current commitfest.
      c0b5fac7
  20. Nov 24, 2010
    • Tom Lane's avatar
      Create the system catalog infrastructure needed for KNNGIST. · 725d52d0
      Tom Lane authored
      This commit adds columns amoppurpose and amopsortfamily to pg_amop, and
      column amcanorderbyop to pg_am.  For the moment all the entries in
      amcanorderbyop are "false", since the underlying support isn't there yet.
      
      Also, extend the CREATE OPERATOR CLASS/ALTER OPERATOR FAMILY commands with
      [ FOR SEARCH | FOR ORDER BY sort_operator_family ] clauses to allow the new
      columns of pg_amop to be populated, and create pg_dump support for dumping
      that information.
      
      I also added some documentation, although it's perhaps a bit premature
      given that the feature doesn't do anything useful yet.
      
      Teodor Sigaev, Robert Haas, Tom Lane
      725d52d0
  21. Nov 20, 2010
    • Tom Lane's avatar
      Further cleanup of indxpath logic related to IndexOptInfo.opfamily array. · 89a36841
      Tom Lane authored
      We no longer need the terminating zero entry in opfamily[], so get rid of
      it.  Also replace assorted ad-hoc looping logic with simple for and foreach
      constructs.  This code is now noticeably more readable than it was an hour
      ago; credit to Robert for seeing that it could be simplified.
      89a36841
  22. Nov 18, 2010
    • Tom Lane's avatar
      Further fallout from the MergeAppend patch. · 6fbc323c
      Tom Lane authored
      Fix things so that top-N sorting can be used in child Sort nodes of a
      MergeAppend node, when there is a LIMIT and no intervening joins or
      grouping.  Actually doing this on the executor side isn't too bad,
      but it's a bit messier to get the planner to cost it properly.
      Per gripe from Robert Haas.
      
      In passing, fix an oversight in the original top-N-sorting patch:
      query_planner should not assume that a LIMIT can be used to make an
      explicit sort cheaper when there will be grouping or aggregation in
      between.  Possibly this should be back-patched, but I'm not sure the
      mistake is serious enough to be a real problem in practice.
      6fbc323c
  23. Nov 04, 2010
    • Tom Lane's avatar
      Reimplement planner's handling of MIN/MAX aggregate optimization. · 034967bd
      Tom Lane authored
      Per my recent proposal, get rid of all the direct inspection of indexes
      and manual generation of paths in planagg.c.  Instead, set up
      EquivalenceClasses for the aggregate argument expressions, and let the
      regular path generation logic deal with creating paths that can satisfy
      those sort orders.  This makes planagg.c a bit more visible to the rest
      of the planner than it was originally, but the approach is basically a lot
      cleaner than before.  A major advantage of doing it this way is that we get
      MIN/MAX optimization on inheritance trees (using MergeAppend of indexscans)
      practically for free, whereas in the old way we'd have had to add a whole
      lot more duplicative logic.
      
      One small disadvantage of this approach is that MIN/MAX aggregates can no
      longer exploit partial indexes having an "x IS NOT NULL" predicate, unless
      that restriction or something that implies it is specified in the query.
      The previous implementation was able to use the added "x IS NOT NULL"
      condition as an extra predicate proof condition, but in this version we
      rely entirely on indexes that are considered usable by the main planning
      process.  That seems a fair tradeoff for the simplicity and functionality
      gained.
      034967bd
  24. Oct 14, 2010
    • Tom Lane's avatar
      Support MergeAppend plans, to allow sorted output from append relations. · 11cad29c
      Tom Lane authored
      This patch eliminates the former need to sort the output of an Append scan
      when an ordered scan of an inheritance tree is wanted.  This should be
      particularly useful for fast-start cases such as queries with LIMIT.
      
      Original patch by Greg Stark, with further hacking by Hans-Jurgen Schonig,
      Robert Haas, and Tom Lane.
      11cad29c
  25. Sep 28, 2010
    • Tom Lane's avatar
      Fix PlaceHolderVar mechanism's interaction with outer joins. · eb229505
      Tom Lane authored
      The point of a PlaceHolderVar is to allow a non-strict expression to be
      evaluated below an outer join, after which its value bubbles up like a Var
      and can be forced to NULL when the outer join's semantics require that.
      However, there was a serious design oversight in that, namely that we
      didn't ensure that there was actually a correct place in the plan tree
      to evaluate the placeholder :-(.  It may be necessary to delay evaluation
      of an outer join to ensure that a placeholder that should be evaluated
      below the join can be evaluated there.  Per recent bug report from Kirill
      Simonov.
      
      Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.
      eb229505
  26. Sep 20, 2010
  27. Jul 12, 2010
    • Tom Lane's avatar
      Make NestLoop plan nodes pass outer-relation variables into their inner · 53e75768
      Tom Lane authored
      relation using the general PARAM_EXEC executor parameter mechanism, rather
      than the ad-hoc kluge of passing the outer tuple down through ExecReScan.
      The previous method was hard to understand and could never be extended to
      handle parameters coming from multiple join levels.  This patch doesn't
      change the set of possible plans nor have any significant performance effect,
      but it's necessary infrastructure for future generalization of the concept
      of an inner indexscan plan.
      
      ExecReScan's second parameter is now unused, so it's removed.
      53e75768
  28. Jul 06, 2010
  29. Mar 30, 2010
  30. Mar 29, 2010
  31. Feb 26, 2010
  32. Jan 05, 2010
    • Robert Haas's avatar
      Support ALTER TABLESPACE name SET/RESET ( tablespace_options ). · d86d51a9
      Robert Haas authored
      This patch only supports seq_page_cost and random_page_cost as parameters,
      but it provides the infrastructure to scalably support many more.
      In particular, we may want to add support for effective_io_concurrency,
      but I'm leaving that as future work for now.
      
      Thanks to Tom Lane for design help and Alvaro Herrera for the review.
      d86d51a9
  33. Jan 02, 2010
  34. Jan 01, 2010
    • Tom Lane's avatar
      Support "x IS NOT NULL" clauses as indexscan conditions. This turns out · 29c4ad98
      Tom Lane authored
      to be just a minor extension of the previous patch that made "x IS NULL"
      indexable, because we can treat the IS NOT NULL condition as if it were
      "x < NULL" or "x > NULL" (depending on the index's NULLS FIRST/LAST option),
      just like IS NULL is treated like "x = NULL".  Aside from any possible
      usefulness in its own right, this is an important improvement for
      index-optimized MAX/MIN aggregates: it is now reliably possible to get
      a column's min or max value cheaply, even when there are a lot of nulls
      cluttering the interesting end of the index.
      29c4ad98
  35. Nov 28, 2009
    • Tom Lane's avatar
      Eliminate a lot of list-management overhead within join_search_one_level · 1a95f127
      Tom Lane authored
      by adding a requirement that build_join_rel add new join RelOptInfos to the
      appropriate list immediately at creation.  Per report from Robert Haas,
      the list_concat_unique_ptr() calls that this change eliminates were taking
      the lion's share of the runtime in larger join problems.  This doesn't do
      anything to fix the fundamental combinatorial explosion in large join
      problems, but it should push out the threshold of pain a bit further.
      
      Note: because this changes the order in which joinrel lists are built,
      it might result in changes in selected plans in cases where different
      alternatives have exactly the same costs.  There is one example in the
      regression tests.
      1a95f127
  36. Nov 15, 2009
    • Tom Lane's avatar
      Improve planning of Materialize nodes inserted atop the inner input of a · caf9c830
      Tom Lane authored
      mergejoin to shield it from doing mark/restore and refetches.  Put an explicit
      flag in MergePath so we can centralize the logic that knows about this,
      and add costing logic that considers using Materialize even when it's not
      forced by the previously-existing considerations.  This is in response to
      a discussion back in August that suggested that materializing an inner
      indexscan can be helpful when the refetch percentage is high enough.
      caf9c830
  37. Oct 26, 2009
    • Tom Lane's avatar
      Re-implement EvalPlanQual processing to improve its performance and eliminate · 9f2ee8f2
      Tom Lane authored
      a lot of strange behaviors that occurred in join cases.  We now identify the
      "current" row for every joined relation in UPDATE, DELETE, and SELECT FOR
      UPDATE/SHARE queries.  If an EvalPlanQual recheck is necessary, we jam the
      appropriate row into each scan node in the rechecking plan, forcing it to emit
      only that one row.  The former behavior could rescan the whole of each joined
      relation for each recheck, which was terrible for performance, and what's much
      worse could result in duplicated output tuples.
      
      Also, the original implementation of EvalPlanQual could not re-use the recheck
      execution tree --- it had to go through a full executor init and shutdown for
      every row to be tested.  To avoid this overhead, I've associated a special
      runtime Param with each LockRows or ModifyTable plan node, and arranged to
      make every scan node below such a node depend on that Param.  Thus, by
      signaling a change in that Param, the EPQ machinery can just rescan the
      already-built test plan.
      
      This patch also adds a prohibition on set-returning functions in the
      targetlist of SELECT FOR UPDATE/SHARE.  This is needed to avoid the
      duplicate-output-tuple problem.  It seems fairly reasonable since the
      other restrictions on SELECT FOR UPDATE are meant to ensure that there
      is a unique correspondence between source tuples and result tuples,
      which an output SRF destroys as much as anything else does.
      9f2ee8f2
  38. Oct 12, 2009
    • Tom Lane's avatar
      Move the handling of SELECT FOR UPDATE locking and rechecking out of · 0adaf4cb
      Tom Lane authored
      execMain.c and into a new plan node type LockRows.  Like the recent change
      to put table updating into a ModifyTable plan node, this increases planning
      flexibility by allowing the operations to occur below the top level of the
      plan tree.  It's necessary in any case to restore the previous behavior of
      having FOR UPDATE locking occur before ModifyTable does.
      
      This partially refactors EvalPlanQual to allow multiple rows-under-test
      to be inserted into the EPQ machinery before starting an EPQ test query.
      That isn't sufficient to fix EPQ's general bogosity in the face of plans
      that return multiple rows per test row, though.  Since this patch is
      mostly about getting some plan node infrastructure in place and not about
      fixing ten-year-old bugs, I will leave EPQ improvements for another day.
      
      Another behavioral change that we could now think about is doing FOR UPDATE
      before LIMIT, but that too seems like it should be treated as a followon
      patch.
      0adaf4cb
  39. Oct 10, 2009
    • Tom Lane's avatar
      Split the processing of INSERT/UPDATE/DELETE operations out of execMain.c. · 8a5849b7
      Tom Lane authored
      They are now handled by a new plan node type called ModifyTable, which is
      placed at the top of the plan tree.  In itself this change doesn't do much,
      except perhaps make the handling of RETURNING lists and inherited UPDATEs a
      tad less klugy.  But it is necessary preparation for the intended extension of
      allowing RETURNING queries inside WITH.
      
      Marko Tiikkaja
      8a5849b7
  40. Sep 17, 2009
    • Tom Lane's avatar
      Implement "join removal" for cases where the inner side of a left join · 488d70ab
      Tom Lane authored
      is unique and is not referenced above the join.  In this case the inner
      side doesn't affect the query result and can be thrown away entirely.
      Although perhaps nobody would ever write such a thing by hand, it's
      a reasonably common case in machine-generated SQL.
      
      The current implementation only recognizes the case where the inner side
      is a simple relation with a unique index matching the query conditions.
      This is enough for the use-cases that have been shown so far, but we
      might want to try to handle other cases later.
      
      Robert Haas, somewhat rewritten by Tom
      488d70ab
Loading