Skip to content
Snippets Groups Projects
  1. Jan 01, 2013
  2. Dec 28, 2012
    • Alvaro Herrera's avatar
      Remove obsolete XLogRecPtr macros · 5ab3af46
      Alvaro Herrera authored
      This gets rid of XLByteLT, XLByteLE, XLByteEQ and XLByteAdvance.
      These were useful for brevity when XLogRecPtrs were split in
      xlogid/xrecoff; but now that they are simple uint64's, they are just
      clutter.  The only downside to making this change would be ease of
      backporting patches, but that has been negated by other substantive
      changes to the involved code anyway.  The clarity of simpler expressions
      makes the change worthwhile.
      
      Most of the changes are mechanical, but in a couple of places, the patch
      author chose to invert the operator sense, making the code flow more
      logical (and more in line with preceding comments).
      
      Author: Andres Freund
      Eyeballed by Dimitri Fontaine and Alvaro Herrera
      5ab3af46
  3. Nov 28, 2012
  4. Nov 13, 2012
    • Tom Lane's avatar
      Fix multiple problems in WAL replay. · 3bbf668d
      Tom Lane authored
      Most of the replay functions for WAL record types that modify more than
      one page failed to ensure that those pages were locked correctly to ensure
      that concurrent queries could not see inconsistent page states.  This is
      a hangover from coding decisions made long before Hot Standby was added,
      when it was hardly necessary to acquire buffer locks during WAL replay
      at all, let alone hold them for carefully-chosen periods.
      
      The key problem was that RestoreBkpBlocks was written to hold lock on each
      page restored from a full-page image for only as long as it took to update
      that page.  This was guaranteed to break any WAL replay function in which
      there was any update-ordering constraint between pages, because even if the
      nominal order of the pages is the right one, any mixture of full-page and
      non-full-page updates in the same record would result in out-of-order
      updates.  Moreover, it wouldn't work for situations where there's a
      requirement to maintain lock on one page while updating another.  Failure
      to honor an update ordering constraint in this way is thought to be the
      cause of bug #7648 from Daniel Farina: what seems to have happened there
      is that a btree page being split was rewritten from a full-page image
      before the new right sibling page was written, and because lock on the
      original page was not maintained it was possible for hot standby queries to
      try to traverse the page's right-link to the not-yet-existing sibling page.
      
      To fix, get rid of RestoreBkpBlocks as such, and instead create a new
      function RestoreBackupBlock that restores just one full-page image at a
      time.  This function can be invoked by WAL replay functions at the points
      where they would otherwise perform non-full-page updates; in this way, the
      physical order of page updates remains the same no matter which pages are
      replaced by full-page images.  We can then further adjust the logic in
      individual replay functions if it is necessary to hold buffer locks
      for overlapping periods.  A side benefit is that we can simplify the
      handling of concurrency conflict resolution by moving that code into the
      record-type-specfic functions; there's no more need to contort the code
      layout to keep conflict resolution in front of the RestoreBkpBlocks call.
      
      In connection with that, standardize on zero-based numbering rather than
      one-based numbering for referencing the full-page images.  In HEAD, I
      removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4.  They are
      still there in the header files in previous branches, but are no longer
      used by the code.
      
      In addition, fix some other bugs identified in the course of making these
      changes:
      
      spgRedoAddNode could fail to update the parent downlink at all, if the
      parent tuple is in the same page as either the old or new split tuple and
      we're not doing a full-page image: it would get fooled by the LSN having
      been advanced already.  This would result in permanent index corruption,
      not just transient failure of concurrent queries.
      
      Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old
      tail page as a candidate for a full-page image; in the worst case this
      could result in torn-page corruption.
      
      heap_xlog_freeze() was inconsistent about using a cleanup lock or plain
      exclusive lock: it did the former in the normal path but the latter for a
      full-page image.  A plain exclusive lock seems sufficient, so change to
      that.
      
      Also, remove gistRedoPageDeleteRecord(), which has been dead code since
      VACUUM FULL was rewritten.
      
      Back-patch to 9.0, where hot standby was introduced.  Note however that 9.0
      had a significantly different WAL-logging scheme for GIST index updates,
      and it doesn't appear possible to make that scheme safe for concurrent hot
      standby queries, because it can leave inconsistent states in the index even
      between WAL records.  Given the lack of complaints from the field, we won't
      work too hard on fixing that branch.
      3bbf668d
  5. Aug 31, 2012
    • Tom Lane's avatar
      Improve coding of gistchoose and gistRelocateBuildBuffersOnSplit. · e5db11c5
      Tom Lane authored
      This is mostly cosmetic, but it does eliminate a speculative portability
      issue.  The previous coding ignored the fact that sum_grow could easily
      overflow (in fact, it could be summing multiple IEEE float infinities).
      On a platform where that didn't guarantee to produce a positive result,
      the code would misbehave.  In any case, it was less than readable.
      e5db11c5
  6. Aug 30, 2012
    • Robert Haas's avatar
      Fix logic bug in gistchoose and gistRelocateBuildBuffersOnSplit. · c8ba697a
      Robert Haas authored
      Every time the best-tuple-found-so-far changes, we need to reset all
      the penalty values in which_grow[] to the penalties for the new best
      tuple.  The old code failed to do this, resulting in inferior index
      quality.
      
      The original patch from Alexander Korotkov was just two lines; I took
      the liberty of fleshing that out by adding a bunch of comments that I
      hope will make this logic easier for others to understand than it was
      for me.
      c8ba697a
  7. Aug 16, 2012
    • Heikki Linnakangas's avatar
      Fix GiST buffering build bug, which caused "failed to re-find parent" errors. · 89911b3a
      Heikki Linnakangas authored
      We use a hash table to track the parents of inner pages, but when inserting
      to a leaf page, the caller of gistbufferinginserttuples() must pass a
      correct block number of the leaf's parent page. Before gistProcessItup()
      descends to a child page, it checks if the downlink needs to be adjusted to
      accommodate the new tuple, and updates the downlink if necessary. However,
      updating the downlink might require splitting the page, which might move the
      downlink to a page to the right. gistProcessItup() doesn't realize that, so
      when it descends to the leaf page, it might pass an out-of-date parent block
      number as a result. Fix that by returning the block a tuple was inserted to
      from gistbufferinginserttuples().
      
      This fixes the bug reported by Zdeněk Jílovec.
      89911b3a
  8. Jul 16, 2012
    • Peter Eisentraut's avatar
      Remove unreachable code · dd16f948
      Peter Eisentraut authored
      The Solaris Studio compiler warns about these instances, unlike more
      mainstream compilers such as gcc.  But manual inspection showed that
      the code is clearly not reachable, and we hope no worthy compiler will
      complain about removing this code.
      dd16f948
  9. Jun 24, 2012
    • Heikki Linnakangas's avatar
      Replace XLogRecPtr struct with a 64-bit integer. · 0ab9d1c4
      Heikki Linnakangas authored
      This simplifies code that needs to do arithmetic on XLogRecPtrs.
      
      To avoid changing on-disk format of data pages, the LSN on data pages is
      still stored in the old format. That should keep pg_upgrade happy. However,
      we have XLogRecPtrs embedded in the control file, and in the structs that
      are sent over the replication protocol, so this changes breaks compatibility
      of pg_basebackup and server. I didn't do anything about this in this patch,
      per discussion on -hackers, the right thing to do would to be to change the
      replication protocol to be architecture-independent, so that you could use
      a newer version of pg_receivexlog, for example, against an older server
      version.
      0ab9d1c4
  10. Jun 10, 2012
  11. May 30, 2012
    • Heikki Linnakangas's avatar
      Change the way parent pages are tracked during buffered GiST build. · d1996ed5
      Heikki Linnakangas authored
      We used to mimic the way a stack is constructed when descending the tree
      during normal GiST inserts, but that was quite complicated during a buffered
      build. It was also wrong: in GiST, the left-to-right relationships on
      different levels might not match each other, so that when you know the
      parent of a child page, you won't necessarily find the parent of the page to
      the right of the child page by following the rightlinks at the parent level.
      This sometimes led to "could not re-find parent" errors while building a
      GiST index.
      
      We now use a simple hash table to track the parent of every internal page.
      Whenever a page is split, and downlinks are moved from one page to another,
      we update the hash table accordingly. This is also better for performance
      than the old method, as we never need to move right to re-find the parent
      page, which could take a significant amount of time for buffers that were
      created much earlier in the index build.
      d1996ed5
    • Heikki Linnakangas's avatar
      Delete the temporary file used in buffered GiST build, after the build. · be02b168
      Heikki Linnakangas authored
      There were two bugs here: We forgot to call gistFreeBuildBuffers() function
      at the end of build, and we passed interXact == true to BufFileCreateTemp,
      so the file wasn't automatically cleaned up at end-of-transaction either.
      be02b168
  12. May 29, 2012
    • Heikki Linnakangas's avatar
      Fix integer overflow bug in GiST buffering build calculations. · 4bc6fb57
      Heikki Linnakangas authored
      The result of (maintenance_work_mem * 1024) / BLCKSZ doesn't fit in a signed
      32-bit integer, if maintenance_work_mem >= 2GB. Use double instead. And
      while we're at it, write the calculations in an easier to understand form,
      with the intermediary steps written out and commented.
      4bc6fb57
  13. May 18, 2012
    • Heikki Linnakangas's avatar
      Fix bug in gistRelocateBuildBuffersOnSplit(). · 1d27dcf5
      Heikki Linnakangas authored
      When we create a temporary copy of the old node buffer, in stack, we mustn't
      leak that into any of the long-lived data structures. Before this patch,
      when we called gistPopItupFromNodeBuffer(), it got added to the array of
      "loaded buffers". After gistRelocateBuildBuffersOnSplit() exits, the
      pointer added to the loaded buffers array points to garbage. Often that goes
      unnotied, because when we go through the array of loaded buffers to unload
      them, buffers with a NULL pageBuffer are ignored, which can often happen by
      accident even if the pointer points to garbage.
      
      This patch fixes that by marking the temporary copy in stack explicitly as
      temporary, and refrain from adding buffers marked as temporary to the array
      of loaded buffers.
      
      While we're at it, initialize nodeBuffer->pageBlocknum to InvalidBlockNumber
      and improve comments a bit. This isn't strictly necessary, but makes
      debugging easier.
      1d27dcf5
  14. May 12, 2012
  15. May 11, 2012
    • Heikki Linnakangas's avatar
      On GiST page split, release the locks on child pages before recursing up. · 3652d72d
      Heikki Linnakangas authored
      When inserting the downlinks for a split gist page, we used hold the locks
      on the child pages until the insertion into the parent - and recursively its
      parent if it had to be split too - were all completed. Change that so that
      the locks on child pages are released after the insertion in the immediate
      parent is done, before recursing further up the tree.
      
      This reduces the number of lwlocks that are held simultaneously. Holding
      many locks is bad for concurrency, and in extreme cases you can even hit
      the limit of 100 simultaneously held lwlocks in a backend. If you're really
      unlucky, you can hit the limit while in a critical section, which brings
      down the whole system.
      
      This fixes bug #6629 reported by Tom Forbes. Backpatch to 9.1. The page
      splitting code was rewritten in 9.1, and the old code did not have this
      problem.
      3652d72d
  16. May 02, 2012
  17. Mar 02, 2012
    • Heikki Linnakangas's avatar
      When a GiST page is split during index build, it might not have a buffer. · 2502f459
      Heikki Linnakangas authored
      Previously it was thought that it's impossible as the code stands, because
      insertions create buffers as tuples are cascaded downwards, and index
      split also creaters buffers eagerly for all halves. But the example from
      Jay Levitt demonstrates that it can happen, when the root page is split.
      It's in fact OK if the buffer doesn't exist, so we just need to remove the
      sanity check. In fact, we've been discussing the possibility of destroying
      empty buffers to conserve memory, which would render the sanity check
      completely useless anyway.
      
      Fix by Alexander Korotkov
      2502f459
  18. Feb 28, 2012
  19. Feb 24, 2012
  20. Feb 08, 2012
  21. Jan 30, 2012
  22. Jan 02, 2012
  23. Nov 27, 2011
  24. Oct 09, 2011
    • Heikki Linnakangas's avatar
      Clean up a couple of box gist helper functions. · d50e1251
      Heikki Linnakangas authored
      The original idea of this patch was to make box picksplit run faster, by
      eliminating unnecessary palloc() overhead, but that was obsoleted by the new
      double-sorting split algorithm that doesn't call these functions so heavily
      anymore. Nevertheless, the code looks better this way.
      
      Original patch by me, reviewed and tidied up after the double-sorting patch
      by Kevin Grittner.
      d50e1251
  25. Oct 06, 2011
  26. Oct 01, 2011
    • Tom Lane's avatar
      Support GiST index support functions that want to cache data across calls. · d22a09dc
      Tom Lane authored
      pg_trgm was already doing this unofficially, but the implementation hadn't
      been thought through very well and leaked memory.  Restructure the core
      GiST code so that it actually works, and document it.  Ordinarily this
      would have required an extra memory context creation/destruction for each
      GiST index search, but I was able to avoid that in the normal case of a
      non-rescanned search by finessing the handling of the RBTree.  It used to
      have its own context always, but now shares a context with the
      scan-lifespan data structures, unless there is more than one rescan call.
      This should make the added overhead unnoticeable in typical cases.
      d22a09dc
  27. Sep 16, 2011
    • Tom Lane's avatar
      gistendscan() forgot to free so->giststate. · 0a6cc285
      Tom Lane authored
      This oversight led to a massive memory leak --- upwards of 10KB per tuple
      --- during creation-time verification of an exclusion constraint based on a
      GIST index.  In most other scenarios it'd just be a leak of 10KB that would
      be recovered at end of query, so not too significant; though perhaps the
      leak would be noticeable in a situation where a GIST index was being used
      in a nestloop inner indexscan.  In any case, it's a real leak of long
      standing, so patch all supported branches.  Per report from Harald Fuchs.
      0a6cc285
  28. Sep 12, 2011
    • Heikki Linnakangas's avatar
      In the final emptying phase of the new GiST buffering build, set the · 8caf6132
      Heikki Linnakangas authored
      queuedForEmptying flag correctly on buffer when adding it to the queue.
      Also, don't add buffer to the queue if it's there already. These were
      harmless oversights; failing to set the flag just means that a buffer might
      get added to the queue twice if more tuples are added to it (although that
      can't actually happen at this point because all the upper buffers have
      already been emptied), and having the same buffer twice in the emptying
      queue is harmless. But better be tidy.
      8caf6132
  29. Sep 08, 2011
  30. Sep 01, 2011
  31. Jul 15, 2011
    • Heikki Linnakangas's avatar
      Change the way the offset of downlink is stored in GISTInsertStack. · 8d260911
      Heikki Linnakangas authored
      GISTInsertStack.childoffnum used to mean "offset of the downlink in this
      node, pointing to the child node in the stack". It's now replaced with
      downlinkoffnum, which means "offset of the downlink in the parent of this
      node". gistFindPath() already used childoffnum with this new meaning, and
      had an extra step at the end to pull all the childoffnum values down one
      node in the stack, to adjust the stack for the meaning that childoffnum had
      elsewhere. That's no longer required.
      
      The reason to do this now is this new representation is more convenient for
      the GiST fast build patch that Alexander Korotkov is working on.
      
      While we're at it, replace the linked list used in gistFindPath with a
      standard List, and make gistFindPath() static.
      
      Alexander Korotkov, with some changes by me.
      8d260911
    • Heikki Linnakangas's avatar
      Fix two ancient bugs in GiST code to re-find a parent after page split: · bc175eb8
      Heikki Linnakangas authored
      First, when following a right-link, we incorrectly marked the current page
      as the parent of the right sibling. In reality, the parent of the right page
      is the same as the parent of the current page (or some page to the right of
      it, gistFindCorrectParent() will sort that out).
      
      Secondly, when we follow a right-link, we must prepend, not append, the right
      page to our list of pages to visit. That's because we assume that once we
      hit a leaf page in the list, all the rest are leaf pages too, and give up.
      
      To hit these bugs, you need concurrent actions and several unlucky accidents.
      Another backend must split the root page, while you're in process of
      splitting a lower-level page. Furthermore, while you scan the internal nodes
      to re-find the parent, another backend needs to again split some more internal
      pages. Even then, the bugs don't necessarily manifest as user-visible errors
      or index corruption.
      
      While we're at it, make the error reporting a bit better if gistFindPath()
      fails to re-find the parent. It used to be an assertion, but an elog() seems
      more appropriate.
      
      Backpatch to all supported branches.
      bc175eb8
  32. Jul 04, 2011
  33. Jun 21, 2011
  34. Jun 09, 2011
  35. May 31, 2011
    • Tom Lane's avatar
      Protect GIST logic that assumes penalty values can't be negative. · 6923d699
      Tom Lane authored
      Apparently sane-looking penalty code might return small negative values,
      for example because of roundoff error.  This will confuse places like
      gistchoose().  Prevent problems by clamping negative penalty values to
      zero.  (Just to be really sure, I also made it force NaNs to zero.)
      Back-patch to all supported branches.
      
      Alexander Korotkov
      6923d699
  36. May 19, 2011
  37. Apr 23, 2011
Loading