Skip to content
Snippets Groups Projects
  1. May 24, 2016
    • Tom Lane's avatar
      Fetch XIDs atomically during vac_truncate_clog(). · 4cf0978e
      Tom Lane authored
      Because vac_update_datfrozenxid() updates datfrozenxid and datminmxid
      in-place, it's unsafe to assume that successive reads of those values will
      give consistent results.  Fetch each one just once to ensure sane behavior
      in the minimum calculation.  Noted while reviewing Alexander Korotkov's
      patch in the same area.
      
      Discussion: <8564.1464116473@sss.pgh.pa.us>
      4cf0978e
    • Tom Lane's avatar
      Avoid consuming an XID during vac_truncate_clog(). · 2e7f0c34
      Tom Lane authored
      vac_truncate_clog() uses its own transaction ID as the comparison point in
      a sanity check that no database's datfrozenxid has already wrapped around
      "into the future".  That was probably fine when written, but in a lazy
      vacuum we won't have assigned an XID, so calling GetCurrentTransactionId()
      causes an XID to be assigned when otherwise one would not be.  Most of the
      time that's not a big problem ... but if we are hard up against the
      wraparound limit, consuming XIDs during antiwraparound vacuums is a very
      bad thing.
      
      Instead, use ReadNewTransactionId(), which not only avoids this problem
      but is in itself a better comparison point to test whether wraparound
      has already occurred.
      
      Report and patch by Alexander Korotkov.  Back-patch to all versions.
      
      Report: <CAPpHfdspOkmiQsxh-UZw2chM6dRMwXAJGEmmbmqYR=yvM7-s6A@mail.gmail.com>
      2e7f0c34
  2. Jan 08, 2015
  3. Oct 30, 2014
    • Tom Lane's avatar
      Test IsInTransactionChain, not IsTransactionBlock, in vac_update_relstats. · 38cb8687
      Tom Lane authored
      As noted by Noah Misch, my initial cut at fixing bug #11638 didn't cover
      all cases where ANALYZE might be invoked in an unsafe context.  We need to
      test the result of IsInTransactionChain not IsTransactionBlock; which is
      notationally a pain because IsInTransactionChain requires an isTopLevel
      flag, which would have to be passed down through several levels of callers.
      I chose to pass in_outer_xact (ie, the result of IsInTransactionChain)
      rather than isTopLevel per se, as that seemed marginally more apropos
      for the intermediate functions to know about.
      38cb8687
  4. Oct 29, 2014
    • Tom Lane's avatar
      Avoid corrupting tables when ANALYZE inside a transaction is rolled back. · 40058fbc
      Tom Lane authored
      VACUUM and ANALYZE update the target table's pg_class row in-place, that is
      nontransactionally.  This is OK, more or less, for the statistical columns,
      which are mostly nontransactional anyhow.  It's not so OK for the DDL hint
      flags (relhasindex etc), which might get changed in response to
      transactional changes that could still be rolled back.  This isn't a
      problem for VACUUM, since it can't be run inside a transaction block nor
      in parallel with DDL on the table.  However, we allow ANALYZE inside a
      transaction block, so if the transaction had earlier removed the last
      index, rule, or trigger from the table, and then we roll back the
      transaction after ANALYZE, the table would be left in a corrupted state
      with the hint flags not set though they should be.
      
      To fix, suppress the hint-flag updates if we are InTransactionBlock().
      This is safe enough because it's always OK to postpone hint maintenance
      some more; the worst-case consequence is a few extra searches of pg_index
      et al.  There was discussion of instead using a transactional update,
      but that would change the behavior in ways that are not all desirable:
      in most scenarios we're better off keeping ANALYZE's statistical values
      even if the ANALYZE itself rolls back.  In any case we probably don't want
      to change this behavior in back branches.
      
      Per bug #11638 from Casey Shobe.  This has been broken for a good long
      time, so back-patch to all supported branches.
      
      Tom Lane and Michael Paquier, initial diagnosis by Andres Freund
      40058fbc
  5. May 06, 2014
    • Bruce Momjian's avatar
      Remove tabs after spaces in C comments · 0b44914c
      Bruce Momjian authored
      This was not changed in HEAD, but will be done later as part of a
      pgindent run.  Future pgindent runs will also do this.
      
      Report by Tom Lane
      
      Backpatch through all supported branches, but not HEAD
      0b44914c
  6. Feb 01, 2013
    • Alvaro Herrera's avatar
      Fix typo in freeze_table_age implementation · 231dbb3c
      Alvaro Herrera authored
      The original code used freeze_min_age instead of freeze_table_age.  The
      main consequence of this mistake is that lowering freeze_min_age would
      cause full-table scans to occur much more frequently, which causes
      serious issues because the number of writes required is much larger.
      That feature (freeze_min_age) is supposed to affect only how soon tuples
      are frozen; some pages should still be skipped due to the visibility
      map.
      
      Backpatch to 8.4, where the freeze_table_age feature was introduced.
      
      Report and patch from Andres Freund
      231dbb3c
  7. Nov 29, 2012
    • Tom Lane's avatar
      Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY. · 94c014b5
      Tom Lane authored
      Commit 8cb53654, which introduced DROP
      INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor
      choice of catalog state representation.  The pg_index state for an index
      that's reached the final pre-drop stage was the same as the state for an
      index just created by CREATE INDEX CONCURRENTLY.  This meant that the
      (necessary) change to make RelationGetIndexList ignore about-to-die indexes
      also made it ignore freshly-created indexes; which is catastrophic because
      the latter do need to be considered in HOT-safety decisions.  Failure to
      do so leads to incorrect index entries and subsequently wrong results from
      queries depending on the concurrently-created index.
      
      To fix, make the final state be indisvalid = true and indisready = false,
      which is otherwise nonsensical.  This is pretty ugly but we can't add
      another column without forcing initdb, and it's too late for that in 9.2.
      (There's a cleaner fix in HEAD.)
      
      In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index
      flag changes they make without exclusive lock on the index are made via
      heap_inplace_update() rather than a normal transactional update.  The
      latter is not very safe because moving the pg_index tuple could result in
      concurrent SnapshotNow scans finding it twice or not at all, thus possibly
      resulting in index corruption.  This is a pre-existing bug in CREATE INDEX
      CONCURRENTLY, which was copied into the DROP code.
      
      In addition, fix various places in the code that ought to check to make
      sure that the indexes they are manipulating are valid and/or ready as
      appropriate.  These represent bugs that have existed since 8.2, since
      a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
      index behind, and we ought not try to do anything that might fail with
      such an index.
      
      Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
      columns that are allowed to change after initial creation.  Previously we
      could have been left with stale values of some fields in an index relcache
      entry.  It's not clear whether this actually had any user-visible
      consequences, but it's at least a bug waiting to happen.
      
      In addition, do some code and docs review for DROP INDEX CONCURRENTLY;
      some cosmetic code cleanup but mostly addition and revision of comments.
      
      Portions of this need to be back-patched even further, but I'll work
      on that separately.
      
      Problem reported by Amit Kapila, diagnosis by Pavan Deolasee,
      fix by Tom Lane and Andres Freund.
      94c014b5
  8. Jun 10, 2012
  9. May 14, 2012
    • Heikki Linnakangas's avatar
      Update comments that became out-of-date with the PGXACT struct. · 9e4637bf
      Heikki Linnakangas authored
      When the "hot" members of PGPROC were split off to separate PGXACT structs,
      many PGPROC fields referred to in comments were moved to PGXACT, but the
      comments were neglected in the commit. Mostly this is just a search/replace
      of PGPROC with PGXACT, but the way the dummy PGPROC entries are created for
      prepared transactions changed more, making some of the comments totally
      bogus.
      
      Noah Misch
      9e4637bf
  10. Jan 02, 2012
  11. Nov 30, 2011
    • Robert Haas's avatar
      Improve table locking behavior in the face of current DDL. · 2ad36c4e
      Robert Haas authored
      In the previous coding, callers were faced with an awkward choice:
      look up the name, do permissions checks, and then lock the table; or
      look up the name, lock the table, and then do permissions checks.
      The first choice was wrong because the results of the name lookup
      and permissions checks might be out-of-date by the time the table
      lock was acquired, while the second allowed a user with no privileges
      to interfere with access to a table by users who do have privileges
      (e.g. if a malicious backend queues up for an AccessExclusiveLock on
      a table on which AccessShareLock is already held, further attempts
      to access the table will be blocked until the AccessExclusiveLock
      is obtained and the malicious backend's transaction rolls back).
      
      To fix, allow callers of RangeVarGetRelid() to pass a callback which
      gets executed after performing the name lookup but before acquiring
      the relation lock.  If the name lookup is retried (because
      invalidation messages are received), the callback will be re-executed
      as well, so we get the best of both worlds.  RangeVarGetRelid() is
      renamed to RangeVarGetRelidExtended(); callers not wishing to supply
      a callback can continue to invoke it as RangeVarGetRelid(), which is
      now a macro.  Since the only one caller that uses nowait = true now
      passes a callback anyway, the RangeVarGetRelid() macro defaults nowait
      as well.  The callback can also be used for supplemental locking - for
      example, REINDEX INDEX needs to acquire the table lock before the index
      lock to reduce deadlock possibilities.
      
      There's a lot more work to be done here to fix all the cases where this
      can be a problem, but this commit provides the general infrastructure
      and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE,
      LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE.
      
      Per discussion with Noah Misch and Alvaro Herrera.
      2ad36c4e
  12. Nov 25, 2011
    • Alvaro Herrera's avatar
      Improve logging of autovacuum I/O activity · 9d3b5024
      Alvaro Herrera authored
      This adds some I/O stats to the logging of autovacuum (when the
      operation takes long enough that log_autovacuum_min_duration causes it
      to be logged), so that it is easier to tune.  Notably, it adds buffer
      I/O counts (hits, misses, dirtied) and read and write rate.
      
      Authors: Greg Smith and Noah Misch
      9d3b5024
    • Robert Haas's avatar
      Move "hot" members of PGPROC into a separate PGXACT array. · ed0b409d
      Robert Haas authored
      This speeds up snapshot-taking and reduces ProcArrayLock contention.
      Also, the PGPROC (and PGXACT) structures used by two-phase commit are
      now allocated as part of the main array, rather than in a separate
      array, and we keep ProcArray sorted in pointer order.  These changes
      are intended to minimize the number of cache lines that must be pulled
      in to take a snapshot, and testing shows a substantial increase in
      performance on both read and write workloads at high concurrencies.
      
      Pavan Deolasee, Heikki Linnakangas, Robert Haas
      ed0b409d
  13. Oct 14, 2011
    • Tom Lane's avatar
      Measure the number of all-visible pages for use in index-only scan costing. · e6858e66
      Tom Lane authored
      Add a column pg_class.relallvisible to remember the number of pages that
      were all-visible according to the visibility map as of the last VACUUM
      (or ANALYZE, or some other operations that update pg_class.relpages).
      Use relallvisible/relpages, instead of an arbitrary constant, to estimate
      how many heap page fetches can be avoided during an index-only scan.
      
      This is pretty primitive and will no doubt see refinements once we've
      acquired more field experience with the index-only scan mechanism, but
      it's way better than using a constant.
      
      Note: I had to adjust an underspecified query in the window.sql regression
      test, because it was changing answers when the plan changed to use an
      index-only scan.  Some of the adjacent tests perhaps should be adjusted
      as well, but I didn't do that here.
      e6858e66
  14. Sep 04, 2011
    • Tom Lane's avatar
      Clean up the #include mess a little. · 1609797c
      Tom Lane authored
      walsender.h should depend on xlog.h, not vice versa.  (Actually, the
      inclusion was circular until a couple hours ago, which was even sillier;
      but Bruce broke it in the expedient rather than logically correct
      direction.)  Because of that poor decision, plus blind application of
      pgrminclude, we had a situation where half the system was depending on
      xlog.h to include such unrelated stuff as array.h and guc.h.  Clean up
      the header inclusion, and manually revert a lot of what pgrminclude had
      done so things build again.
      
      This episode reinforces my feeling that pgrminclude should not be run
      without adult supervision.  Inclusion changes in header files in particular
      need to be reviewed with great care.  More generally, it'd be good if we
      had a clearer notion of module layering to dictate which headers can sanely
      include which others ... but that's a big task for another day.
      1609797c
  15. Sep 01, 2011
  16. Aug 30, 2011
    • Tom Lane's avatar
      Fix a missed case in code for "moving average" estimate of reltuples. · 5bba65de
      Tom Lane authored
      It is possible for VACUUM to scan no pages at all, if the visibility map
      shows that all pages are all-visible.  In this situation VACUUM has no new
      information to report about the relation's tuple density, so it wasn't
      changing pg_class.reltuples ... but it updated pg_class.relpages anyway.
      That's wrong in general, since there is no evidence to justify changing the
      density ratio reltuples/relpages, but it's particularly bad if the previous
      state was relpages=reltuples=0, which means "unknown tuple density".
      We just replaced "unknown" with "zero".  ANALYZE would eventually recover
      from this, but it could take a lot of repetitions of ANALYZE to do so if
      the relation size is much larger than the maximum number of pages ANALYZE
      will scan, because of the moving-average behavior introduced by commit
      b4b6923e.
      
      The only known situation where we could have relpages=reltuples=0 and yet
      the visibility map asserts everything's visible is immediately following
      a pg_upgrade.  It might be advisable for pg_upgrade to try to preserve the
      relpages/reltuples statistics; but in any case this code is wrong on its
      own terms, so fix it.  Per report from Sergey Koposov.
      
      Back-patch to 8.4, where the visibility map was introduced, same as the
      previous change.
      5bba65de
  17. Jul 09, 2011
    • Robert Haas's avatar
      Try to acquire relation locks in RangeVarGetRelid. · 4240e429
      Robert Haas authored
      In the previous coding, we would look up a relation in RangeVarGetRelid,
      lock the resulting OID, and then AcceptInvalidationMessages().  While
      this was sufficient to ensure that we noticed any changes to the
      relation definition before building the relcache entry, it didn't
      handle the possibility that the name we looked up no longer referenced
      the same OID.  This was particularly problematic in the case where a
      table had been dropped and recreated: we'd latch on to the entry for
      the old relation and fail later on.  Now, we acquire the relation lock
      inside RangeVarGetRelid, and retry the name lookup if we notice that
      invalidation messages have been processed meanwhile.  Many operations
      that would previously have failed with an error in the presence of
      concurrent DDL will now succeed.
      
      There is a good deal of work remaining to be done here: many callers
      of RangeVarGetRelid still pass NoLock for one reason or another.  In
      addition, nothing in this patch guards against the possibility that
      the meaning of an unqualified name might change due to the creation
      of a relation in a schema earlier in the user's search path than the
      one where it was previously found.  Furthermore, there's nothing at
      all here to guard against similar race conditions for non-relations.
      For all that, it's a start.
      
      Noah Misch and Robert Haas
      4240e429
  18. Jun 29, 2011
  19. Jun 09, 2011
  20. May 30, 2011
    • Tom Lane's avatar
      Fix VACUUM so that it always updates pg_class.reltuples/relpages. · b4b6923e
      Tom Lane authored
      When we added the ability for vacuum to skip heap pages by consulting the
      visibility map, we made it just not update the reltuples/relpages
      statistics if it skipped any pages.  But this could leave us with extremely
      out-of-date stats for a table that contains any unchanging areas,
      especially for TOAST tables which never get processed by ANALYZE.  In
      particular this could result in autovacuum making poor decisions about when
      to process the table, as in recent report from Florian Helmberger.  And in
      general it's a bad idea to not update the stats at all.  Instead, use the
      previous values of reltuples/relpages as an estimate of the tuple density
      in unvisited pages.  This approach results in a "moving average" estimate
      of reltuples, which should converge to the correct value over multiple
      VACUUM and ANALYZE cycles even when individual measurements aren't very
      good.
      
      This new method for updating reltuples is used by both VACUUM and ANALYZE,
      with the result that we no longer need the grotty interconnections that
      caused ANALYZE to not update the stats depending on what had happened
      in the parent VACUUM command.
      
      Also, fix the logic for skipping all-visible pages during VACUUM so that it
      looks ahead rather than behind to decide what to do, as per a suggestion
      from Greg Stark.  This eliminates useless scanning of all-visible pages at
      the start of the relation or just after a not-all-visible page.  In
      particular, the first few pages of the relation will not be invariably
      included in the scanned pages, which seems to help in not overweighting
      them in the reltuples estimate.
      
      Back-patch to 8.4, where the visibility map was introduced.
      b4b6923e
  21. Apr 11, 2011
  22. Apr 10, 2011
  23. Feb 08, 2011
    • Robert Haas's avatar
      Avoid having autovacuum workers wait for relation locks. · 32896c40
      Robert Haas authored
      Waiting for relation locks can lead to starvation - it pins down an
      autovacuum worker for as long as the lock is held.  But if we're doing
      an anti-wraparound vacuum, then we still wait; maintenance can no longer
      be put off.
      
      To assist with troubleshooting, if log_autovacuum_min_duration >= 0,
      we log whenever an autovacuum or autoanalyze is skipped for this reason.
      
      Per a gripe by Josh Berkus, and ensuing discussion.
      32896c40
  24. Jan 25, 2011
    • Tom Lane's avatar
      Replace pg_class.relhasexclusion with pg_index.indisexclusion. · bd1ad1b0
      Tom Lane authored
      There isn't any need to track this state on a table-wide basis, and trying
      to do so introduces undesirable semantic fuzziness.  Move the flag to
      pg_index, where it clearly describes just a single index and can be
      immutable after index creation.
      bd1ad1b0
  25. Jan 02, 2011
    • Robert Haas's avatar
      Fix typo. · e657b55e
      Robert Haas authored
      Noted by Magnus Hagander.
      e657b55e
    • Robert Haas's avatar
      Basic foreign table support. · 0d692a0d
      Robert Haas authored
      Foreign tables are a core component of SQL/MED.  This commit does
      not provide a working SQL/MED infrastructure, because foreign tables
      cannot yet be queried.  Support for foreign table scans will need to
      be added in a future patch.  However, this patch creates the necessary
      system catalog structure, syntax support, and support for ancillary
      operations such as COMMENT and SECURITY LABEL.
      
      Shigeru Hanada, heavily revised by Robert Haas
      0d692a0d
  26. Jan 01, 2011
  27. Nov 18, 2010
  28. Sep 20, 2010
  29. Feb 26, 2010
  30. Feb 15, 2010
  31. Feb 14, 2010
    • Robert Haas's avatar
      Wrap calls to SearchSysCache and related functions using macros. · e26c539e
      Robert Haas authored
      The purpose of this change is to eliminate the need for every caller
      of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists,
      GetSysCacheOid, and SearchSysCacheList to know the maximum number
      of allowable keys for a syscache entry (currently 4).  This will
      make it far easier to increase the maximum number of keys in a
      future release should we choose to do so, and it makes the code
      shorter, too.
      
      Design and review by Tom Lane.
      e26c539e
  32. Feb 09, 2010
    • Tom Lane's avatar
      Fix up rickety handling of relation-truncation interlocks. · cbe9d6be
      Tom Lane authored
      Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr
      relation entries, so that they will get reset to InvalidBlockNumber whenever
      an smgr-level flush happens.  Because we now send smgr invalidation messages
      immediately (not at end of transaction) when a relation truncation occurs,
      this ensures that other backends will reset their values before they next
      access the relation.  We no longer need the unreliable assumption that a
      VACUUM that's doing a truncation will hold its AccessExclusive lock until
      commit --- in fact, we can intentionally release that lock as soon as we've
      completed the truncation.  This patch therefore reverts (most of) Alvaro's
      patch of 2009-11-10, as well as my marginal hacking on it yesterday.  We can
      also get rid of assorted no-longer-needed relcache flushes, which are far more
      expensive than an smgr flush because they kill a lot more state.
      
      In passing this patch fixes smgr_redo's failure to perform visibility-map
      truncation, and cleans up some rather dubious assumptions in freespace.c and
      visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of
      date.
      cbe9d6be
  33. Feb 08, 2010
    • Tom Lane's avatar
      Fix serious performance bug in new implementation of VACUUM FULL: · 9184cc7d
      Tom Lane authored
      cluster_rel necessarily builds an all-new toast table, so it's useless to
      then go and VACUUM FULL the toast table.
      9184cc7d
    • Tom Lane's avatar
      Remove old-style VACUUM FULL (which was known for a little while as · 0a469c87
      Tom Lane authored
      VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity.
      Per discussion, the use case for this method of vacuuming is no longer large
      enough to justify maintaining it; not to mention that we don't wish to invest
      the work that would be needed to make it play nicely with Hot Standby.
      
      Aside from the code directly related to old-style VACUUM FULL, this commit
      removes support for certain WAL record types that could only be generated
      within VACUUM FULL, redirect-pointer removal in heap_page_prune, and
      nontransactional generation of cache invalidation sinval messages (the last
      being the sticking point for Hot Standby).
      
      We still have to retain all code that copes with finding HEAP_MOVED_OFF and
      HEAP_MOVED_IN flag bits on existing tuples.  This can't be removed as long
      as we want to support in-place update from pre-9.0 databases.
      0a469c87
  34. Feb 07, 2010
    • Tom Lane's avatar
      Create a "relation mapping" infrastructure to support changing the relfilenodes · b9b8831a
      Tom Lane authored
      of shared or nailed system catalogs.  This has two key benefits:
      
      * The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs.
      
      * We no longer have to use an unsafe reindex-in-place approach for reindexing
        shared catalogs.
      
      CLUSTER on nailed catalogs now works too, although I left it disabled on
      shared catalogs because the resulting pg_index.indisclustered update would
      only be visible in one database.
      
      Since reindexing shared system catalogs is now fully transactional and
      crash-safe, the former special cases in REINDEX behavior have been removed;
      shared catalogs are treated the same as non-shared.
      
      This commit does not do anything about the recently-discussed problem of
      deadlocks between VACUUM FULL/CLUSTER on a system catalog and other
      concurrent queries; will address that in a separate patch.  As a stopgap,
      parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid
      such failures during the regression tests.
      b9b8831a
  35. Jan 06, 2010
    • Itagaki Takahiro's avatar
      Support rewritten-based full vacuum as VACUUM FULL. Traditional · 946cf229
      Itagaki Takahiro authored
      VACUUM FULL was renamed to VACUUM FULL INPLACE. Also added a new
      option -i, --inplace for vacuumdb to perform FULL INPLACE vacuuming.
      
      Since the new VACUUM FULL uses CLUSTER infrastructure, we cannot
      use it for system tables. VACUUM FULL for system tables always
      fall back into VACUUM FULL INPLACE silently.
      
      Itagaki Takahiro, reviewed by Jeff Davis and Simon Riggs.
      946cf229
  36. Jan 02, 2010
Loading