Skip to content
Snippets Groups Projects
  1. Dec 16, 2016
    • Robert Haas's avatar
      Fix more hash index bugs around marking buffers dirty. · 6a4fe112
      Robert Haas authored
      In _hash_freeovflpage(), if we're freeing the overflow page that
      immediate follows the page to which tuples are being moved (the
      confusingly-named "write buffer"), don't forget to mark that
      page dirty after updating its hasho_nextblkno.
      
      In _hash_squeezebucket(), it's not necessary to mark the primary
      bucket page dirty if there are no overflow pages, because there's
      nothing to squeeze in that case.
      
      Amit Kapila, with help from Kuntal Ghosh and Dilip Kumar, after
      an initial trouble report by Jeff Janes.
      6a4fe112
    • Robert Haas's avatar
      Remove _hash_wrtbuf() in favor of calling MarkBufferDirty(). · 25216c98
      Robert Haas authored
      The whole concept of _hash_wrtbuf() is that we need to know at the
      time we're releasing the buffer lock (and pin) whether we dirtied the
      buffer, but this is easy to get wrong.  This patch actually fixes one
      non-obvious bug of that form: hashbucketcleanup forgot to signal
      _hash_squeezebucket, which gets the primary bucket page already
      locked, as to whether it had already dirtied the page.  Calling
      MarkBufferDirty() at the places where we dirty the buffer is more
      intuitive and lets us simplify the code in various places as well.
      
      On top of all that, the ultimate goal here is to make hash indexes
      WAL-logged, and as the comments to _hash_wrtbuf() note, it should
      go away when that happens.  Making it go away a little earlier than
      that seems like a good preparatory step.
      
      Report by Jeff Janes.  Diagnosis by Amit Kapila, Kuntal Ghosh,
      and Dilip Kumar.  Patch by me, after studying an alternative patch
      submitted by Amit Kapila.
      
      Discussion: http://postgr.es/m/CAA4eK1Kf6tOY0oVz_SEdngiNFkeXrA3xUSDPPORQvsWVPdKqnA@mail.gmail.com
      25216c98
  2. Nov 30, 2016
    • Robert Haas's avatar
      Improve hash index bucket split behavior. · 6d46f478
      Robert Haas authored
      Previously, the right to split a bucket was represented by a
      heavyweight lock on the page number of the primary bucket page.
      Unfortunately, this meant that every scan needed to take a heavyweight
      lock on that bucket also, which was bad for concurrency.  Instead, use
      a cleanup lock on the primary bucket page to indicate the right to
      begin a split, so that scans only need to retain a pin on that page,
      which is they would have to acquire anyway, and which is also much
      cheaper.
      
      In addition to reducing the locking cost, this also avoids locking out
      scans and inserts for the entire lifetime of the split: while the new
      bucket is being populated with copies of the appropriate tuples from
      the old bucket, scans and inserts can happen in parallel.  There are
      minor concurrency improvements for vacuum operations as well, though
      the situation there is still far from ideal.
      
      This patch also removes the unworldly assumption that a split will
      never be interrupted.  With the new code, a split is done in a series
      of small steps and the system can pick up where it left off if it is
      interrupted prior to completion.  While this patch does not itself add
      write-ahead logging for hash indexes, it is clearly a necessary first
      step, since one of the things that could interrupt a split is the
      removal of electrical power from the machine performing it.
      
      Amit Kapila.  I wrote the original design on which this patch is
      based, and did a good bit of work on the comments and README through
      multiple rounds of review, but all of the code is Amit's.  Also
      reviewed by Jesper Pedersen, Jeff Janes, and others.
      
      Discussion: http://postgr.es/m/CAA4eK1LfzcZYxLoXS874Ad0+S-ZM60U9bwcyiUZx9mHZ-KCWhw@mail.gmail.com
      6d46f478
  3. Nov 08, 2016
    • Robert Haas's avatar
      Improve handling of dead tuples in hash indexes. · f0e72a25
      Robert Haas authored
      When squeezing a bucket during vacuum, it's not necessary to retain
      any tuples already marked as dead, so ignore them when deciding which
      tuples must be moved in order to empty a bucket page.  Similarly, when
      splitting a bucket, relocating dead tuples to the new bucket is a
      waste of effort; instead, just ignore them.
      
      Amit Kapila, reviewed by me.  Testing help provided by Ashutosh
      Sharma.
      f0e72a25
  4. Apr 20, 2016
    • Kevin Grittner's avatar
      Revert no-op changes to BufferGetPage() · a343e223
      Kevin Grittner authored
      The reverted changes were intended to force a choice of whether any
      newly-added BufferGetPage() calls needed to be accompanied by a
      test of the snapshot age, to support the "snapshot too old"
      feature.  Such an accompanying test is needed in about 7% of the
      cases, where the page is being used as part of a scan rather than
      positioning for other purposes (such as DML or vacuuming).  The
      additional effort required for back-patching, and the doubt whether
      the intended benefit would really be there, have indicated it is
      best just to rely on developers to do the right thing based on
      comments and existing usage, as we do with many other conventions.
      
      This change should have little or no effect on generated executable
      code.
      
      Motivated by the back-patching pain of Tom Lane and Robert Haas
      a343e223
  5. Apr 08, 2016
    • Kevin Grittner's avatar
      Modify BufferGetPage() to prepare for "snapshot too old" feature · 8b65cf4c
      Kevin Grittner authored
      This patch is a no-op patch which is intended to reduce the chances
      of failures of omission once the functional part of the "snapshot
      too old" patch goes in.  It adds parameters for snapshot, relation,
      and an enum to specify whether the snapshot age check needs to be
      done for the page at this point.  This initial patch passes NULL
      for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the
      third.  The follow-on patch will change the places where the test
      needs to be made.
      8b65cf4c
  6. Jan 02, 2016
  7. Jan 06, 2015
  8. May 06, 2014
    • Bruce Momjian's avatar
      pgindent run for 9.4 · 0a783200
      Bruce Momjian authored
      This includes removing tabs after periods in C comments, which was
      applied to back branches, so this change should not effect backpatching.
      0a783200
  9. Jan 07, 2014
  10. Jan 01, 2013
  11. Jun 10, 2012
  12. Mar 21, 2012
  13. Jan 02, 2012
  14. Sep 01, 2011
  15. Jan 01, 2011
  16. Dec 29, 2010
    • Robert Haas's avatar
      Support unlogged tables. · 53dbc27c
      Robert Haas authored
      The contents of an unlogged table are WAL-logged; thus, they are not
      available on standby servers and are truncated whenever the database
      system enters recovery.  Indexes on unlogged tables are also unlogged.
      Unlogged GiST indexes are not currently supported.
      53dbc27c
  17. Sep 20, 2010
  18. Feb 26, 2010
  19. Jan 02, 2010
  20. Nov 01, 2009
    • Tom Lane's avatar
      Fix two serious bugs introduced into hash indexes by the 8.4 patch that made · c4afdca4
      Tom Lane authored
      hash indexes keep entries sorted by hash value.  First, the original plans for
      concurrency assumed that insertions would happen only at the end of a page,
      which is no longer true; this could cause scans to transiently fail to find
      index entries in the presence of concurrent insertions.  We can compensate
      by teaching scans to re-find their position after re-acquiring read locks.
      Second, neither the bucket split nor the bucket compaction logic had been
      fixed to preserve hashvalue ordering, so application of either of those
      processes could lead to permanent corruption of an index, in the sense
      that searches might fail to find entries that are present.
      
      This patch fixes the split and compaction logic to preserve hashvalue
      ordering, but it cannot do anything about pre-existing corruption.  We will
      need to recommend reindexing all hash indexes in the 8.4.2 release notes.
      
      To buy back the performance loss hereby induced in split and compaction,
      fix them to use PageIndexMultiDelete instead of retail PageIndexDelete
      operations.  We might later want to do something with qsort'ing the
      page contents rather than doing a binary search for each insertion,
      but that seemed more invasive than I cared to risk in a back-patch.
      
      Per bug #5157 from Jeff Janes and subsequent investigation.
      c4afdca4
  21. Jan 01, 2009
  22. Sep 15, 2008
    • Tom Lane's avatar
      Change hash indexes to store only the hash code rather than the whole indexed · 4adc2f72
      Tom Lane authored
      value.  This means that hash index lookups are always lossy and have to be
      rechecked when the heap is visited; however, the gain in index compactness
      outweighs this when the indexed values are wide.  Also, we only need to
      perform datatype comparisons when the hash codes match exactly, rather than
      for every entry in the hash bucket; so it could also win for datatypes that
      have expensive comparison functions.  A small additional win is gained by
      keeping hash index pages sorted by hash code and using binary search to reduce
      the number of index tuples we have to look at.
      
      Xiao Meng
      
      This commit also incorporates Zdenek Kotala's patch to isolate hash metapages
      and hash bitmaps a bit better from the page header datastructures.
      4adc2f72
  23. Jun 19, 2008
  24. May 12, 2008
    • Alvaro Herrera's avatar
      Restructure some header files a bit, in particular heapam.h, by removing some · f8c4d7db
      Alvaro Herrera authored
      unnecessary #include lines in it.  Also, move some tuple routine prototypes and
      macros to htup.h, which allows removal of heapam.h inclusion from some .c
      files.
      
      For this to work, a new header file access/sysattr.h needed to be created,
      initially containing attribute numbers of system columns, for pg_dump usage.
      
      While at it, make contrib ltree, intarray and hstore header files more
      consistent with our header style.
      f8c4d7db
  25. Jan 01, 2008
  26. Nov 15, 2007
  27. Sep 20, 2007
    • Tom Lane's avatar
      HOT updates. When we update a tuple without changing any of its indexed · 282d2a03
      Tom Lane authored
      columns, and the new version can be stored on the same heap page, we no longer
      generate extra index entries for the new version.  Instead, index searches
      follow the HOT-chain links to ensure they find the correct tuple version.
      
      In addition, this patch introduces the ability to "prune" dead tuples on a
      per-page basis, without having to do a complete VACUUM pass to recover space.
      VACUUM is still needed to clean up dead index entries, however.
      
      Pavan Deolasee, with help from a bunch of other people.
      282d2a03
  28. Sep 13, 2007
    • Tom Lane's avatar
      Redefine the lp_flags field of item pointers as having four states, rather · 68893035
      Tom Lane authored
      than two independent bits (one of which was never used in heap pages anyway,
      or at least hadn't been in a very long time).  This gives us flexibility to
      add the HOT notions of redirected and dead item pointers without requiring
      anything so klugy as magic values of lp_off and lp_len.  The state values
      are chosen so that for the states currently in use (pre-HOT) there is no
      change in the physical representation.
      68893035
  29. May 30, 2007
    • Tom Lane's avatar
      Make large sequential scans and VACUUMs work in a limited-size "ring" of · d526575f
      Tom Lane authored
      buffers, rather than blowing out the whole shared-buffer arena.  Aside from
      avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
      to cause a WAL flush for every page it modified, because we had it hacked to
      use only a single buffer.  Those flushes will now occur only once per
      ring-ful.  The exact ring size, and the threshold for seqscans to switch into
      the ring usage pattern, remain under debate; but the infrastructure seems
      done.  The key bit of infrastructure is a new optional BufferAccessStrategy
      object that can be passed to ReadBuffer operations; this replaces the former
      StrategyHintVacuum API.
      
      This patch also changes the buffer usage-count methodology a bit: we now
      advance usage_count when first pinning a buffer, rather than when last
      unpinning it.  To preserve the behavior that a buffer's lifetime starts to
      decrease when it's released, the clock sweep code is modified to not decrement
      usage_count of pinned buffers.
      
      Work not done in this commit: teach GiST and GIN indexes to use the vacuum
      BufferAccessStrategy for vacuum-driven fetches.
      
      Original patch by Simon, reworked by Heikki and again by Tom.
      d526575f
  30. May 03, 2007
  31. Apr 19, 2007
    • Tom Lane's avatar
      Repair PANIC condition in hash indexes when a previous index extension attempt · 9d37c038
      Tom Lane authored
      failed (due to lock conflicts or out-of-space).  We might have already
      extended the index's filesystem EOF before failing, causing the EOF to be
      beyond what the metapage says is the last used page.  Hence the invariant
      maintained by the code needs to be "EOF is at or beyond last used page",
      not "EOF is exactly the last used page".  Problem was created by my patch
      of 2006-11-19 that attempted to repair bug #2737.  Since that was
      back-patched to 7.4, this needs to be as well.  Per report and test case
      from Vlastimil Krejcir.
      9d37c038
  32. Apr 10, 2007
    • Tom Lane's avatar
      Minor tweaking of index special-space definitions so that the various · 56218fbc
      Tom Lane authored
      index types can be reliably distinguished by examining the special space
      on an index page.  Per my earlier proposal, plus the realization that
      there's no need for btree's vacuum cycle ID to cycle through every possible
      16-bit value.  Restricting its range a little costs nearly nothing and
      eliminates the possibility of collisions.
      Memo to self: remember to make bitmap indexes play along with this scheme,
      assuming that patch ever gets accepted.
      56218fbc
  33. Jan 05, 2007
  34. Nov 19, 2006
    • Tom Lane's avatar
      Repair problems with hash indexes that span multiple segments: the hash code's · d68efb3f
      Tom Lane authored
      preference for filling pages out-of-order tends to confuse the sanity checks
      in md.c, as per report from Balazs Nagy in bug #2737.  The fix is to ensure
      that the smgr-level code always has the same idea of the logical EOF as the
      hash index code does, by using ReadBuffer(P_NEW) where we are adding a single
      page to the end of the index, and using smgrextend() to reserve a large batch
      of pages when creating a new splitpoint.  The patch is a bit ugly because it
      avoids making any changes in md.c, which seems the most prudent approach for a
      backpatchable beta-period fix.  After 8.3 development opens, I'll take a look
      at a cleaner but more invasive patch, in particular getting rid of the now
      unnecessary hack to allow reading beyond EOF in mdread().
      
      Backpatch as far as 7.4.  The bug likely exists in 7.3 as well, but because
      of the magnitude of the 7.3-to-7.4 changes in hash, the later-version patch
      doesn't even begin to apply.  Given the other known bugs in the 7.3-era hash
      code, it does not seem worth trying to develop a separate patch for 7.3.
      d68efb3f
  35. Apr 01, 2006
    • Tom Lane's avatar
      Clean up WAL/buffer interactions as per my recent proposal. Get rid of the · a8b8f4db
      Tom Lane authored
      misleadingly-named WriteBuffer routine, and instead require routines that
      change buffer pages to call MarkBufferDirty (which does exactly what it says).
      We also require that they do so before calling XLogInsert; this takes care of
      the synchronization requirement documented in SyncOneBuffer.  Note that
      because bufmgr takes the buffer content lock (in shared mode) while writing
      out any buffer, it doesn't matter whether MarkBufferDirty is executed before
      the buffer content change is complete, so long as the content change is
      completed before releasing exclusive lock on the buffer.  So it's OK to set
      the dirtybit before we fill in the LSN.
      This eliminates the former kluge of needing to set the dirtybit in LockBuffer.
      Aside from making the code more transparent, we can also add some new
      debugging assertions, in particular that the caller of MarkBufferDirty must
      hold the buffer content lock, not merely a pin.
      a8b8f4db
  36. Mar 05, 2006
  37. Jan 26, 2006
  38. Nov 22, 2005
  39. Nov 06, 2005
    • Tom Lane's avatar
      Add defenses to btree and hash index AMs to do simple sanity checks · 766dc45d
      Tom Lane authored
      on every index page they read; in particular to catch the case of an
      all-zero page, which PageHeaderIsValid allows to pass.  It turns out
      hash already had this idea, but it was just Assert()ing things rather
      than doing a straight error check, and the Asserts were partially
      redundant with PageHeaderIsValid anyway.  Per recent failure example
      from Jim Nasby.  (gist still needs the same treatment.)
      766dc45d
Loading