Skip to content
Snippets Groups Projects
  1. Apr 03, 2014
    • Heikki Linnakangas's avatar
      Avoid palloc in critical section in GiST WAL-logging. · 04e298b8
      Heikki Linnakangas authored
      Memory allocation can fail if you run out of memory, and inside a critical
      section that will lead to a PANIC. Use conservatively-sized arrays in stack
      instead.
      
      There was previously no explicit limit on the number of pages a GiST split
      can produce, it was only limited by the number of LWLocks that can be held
      simultaneously (100 at the moment). This patch adds an explicit limit of 75
      pages. That should be plenty, a typical split shouldn't produce more than
      2-3 page halves.
      
      The bug has been there forever, but only backpatch down to 9.1. The code
      was changed significantly in 9.1, and it doesn't seem worth the risk or
      trouble to adapt this for 9.0 and 8.4.
      04e298b8
  2. Jan 07, 2014
  3. Dec 03, 2013
  4. Mar 18, 2013
    • Simon Riggs's avatar
      Remove PageSetTLI and rename pd_tli to pd_checksum · bb7cc262
      Simon Riggs authored
      Remove use of PageSetTLI() from all page manipulation functions
      and adjust README to indicate change in the way we make changes
      to pages. Repurpose those bytes into the pd_checksum field and
      explain how that works in comments about page header.
      
      Refactoring ahead of actual feature patch which would make use
      of the checksum field, arriving later.
      
      Jeff Davis, with comments and doc changes by Simon Riggs
      Direction suggested by Robert Haas; many others providing
      review comments.
      bb7cc262
  5. Feb 11, 2013
    • Heikki Linnakangas's avatar
      Support unlogged GiST index. · 62401db4
      Heikki Linnakangas authored
      The reason this wasn't supported before was that GiST indexes need an
      increasing sequence to detect concurrent page-splits. In a regular WAL-
      logged GiST index, the LSN of the page-split record is used for that
      purpose, and in a temporary index, we can get away with a backend-local
      counter. Neither of those methods works for an unlogged relation.
      
      To provide such an increasing sequence of numbers, create a "fake LSN"
      counter that is saved and restored across shutdowns. On recovery, unlogged
      relations are blown away, so the counter doesn't need to survive that
      either.
      
      Jeevan Chalke, based on discussions with Robert Haas, Tom Lane and me.
      62401db4
  6. Feb 10, 2013
    • Tom Lane's avatar
      Document and clean up gistsplit.c. · 0fd0f368
      Tom Lane authored
      Improve comments, rename some variables and functions, slightly simplify
      a couple of APIs, in an attempt to make this code readable by people other
      than its original author.
      
      Even though this is essentially just cosmetic, back-patch to all active
      branches, because otherwise it's going to make back-patching future fixes
      in this file very painful.
      0fd0f368
  7. Jan 17, 2013
    • Heikki Linnakangas's avatar
      Make GiST indexes on-disk compatible with 9.2 again. · 9ee4d06f
      Heikki Linnakangas authored
      The patch that turned XLogRecPtr into a uint64 inadvertently changed the
      on-disk format of GiST indexes, because the NSN field in the GiST page
      opaque is an XLogRecPtr. That breaks pg_upgrade. Revert the format of that
      field back to the two-field struct that XLogRecPtr was before. This is the
      same we did to LSNs in the page header to avoid changing on-disk format.
      
      Bump catversion, as this invalidates any existing GiST indexes built on
      9.3devel.
      9ee4d06f
  8. Jan 01, 2013
  9. Dec 28, 2012
    • Alvaro Herrera's avatar
      Remove obsolete XLogRecPtr macros · 5ab3af46
      Alvaro Herrera authored
      This gets rid of XLByteLT, XLByteLE, XLByteEQ and XLByteAdvance.
      These were useful for brevity when XLogRecPtrs were split in
      xlogid/xrecoff; but now that they are simple uint64's, they are just
      clutter.  The only downside to making this change would be ease of
      backporting patches, but that has been negated by other substantive
      changes to the involved code anyway.  The clarity of simpler expressions
      makes the change worthwhile.
      
      Most of the changes are mechanical, but in a couple of places, the patch
      author chose to invert the operator sense, making the code flow more
      logical (and more in line with preceding comments).
      
      Author: Andres Freund
      Eyeballed by Dimitri Fontaine and Alvaro Herrera
      5ab3af46
  10. Aug 16, 2012
    • Heikki Linnakangas's avatar
      Fix GiST buffering build bug, which caused "failed to re-find parent" errors. · 89911b3a
      Heikki Linnakangas authored
      We use a hash table to track the parents of inner pages, but when inserting
      to a leaf page, the caller of gistbufferinginserttuples() must pass a
      correct block number of the leaf's parent page. Before gistProcessItup()
      descends to a child page, it checks if the downlink needs to be adjusted to
      accommodate the new tuple, and updates the downlink if necessary. However,
      updating the downlink might require splitting the page, which might move the
      downlink to a page to the right. gistProcessItup() doesn't realize that, so
      when it descends to the leaf page, it might pass an out-of-date parent block
      number as a result. Fix that by returning the block a tuple was inserted to
      from gistbufferinginserttuples().
      
      This fixes the bug reported by Zdeněk Jílovec.
      89911b3a
  11. Jun 24, 2012
    • Heikki Linnakangas's avatar
      Replace XLogRecPtr struct with a 64-bit integer. · 0ab9d1c4
      Heikki Linnakangas authored
      This simplifies code that needs to do arithmetic on XLogRecPtrs.
      
      To avoid changing on-disk format of data pages, the LSN on data pages is
      still stored in the old format. That should keep pg_upgrade happy. However,
      we have XLogRecPtrs embedded in the control file, and in the structs that
      are sent over the replication protocol, so this changes breaks compatibility
      of pg_basebackup and server. I didn't do anything about this in this patch,
      per discussion on -hackers, the right thing to do would to be to change the
      replication protocol to be architecture-independent, so that you could use
      a newer version of pg_receivexlog, for example, against an older server
      version.
      0ab9d1c4
  12. Jun 10, 2012
  13. May 11, 2012
    • Heikki Linnakangas's avatar
      On GiST page split, release the locks on child pages before recursing up. · 3652d72d
      Heikki Linnakangas authored
      When inserting the downlinks for a split gist page, we used hold the locks
      on the child pages until the insertion into the parent - and recursively its
      parent if it had to be split too - were all completed. Change that so that
      the locks on child pages are released after the insertion in the immediate
      parent is done, before recursing further up the tree.
      
      This reduces the number of lwlocks that are held simultaneously. Holding
      many locks is bad for concurrency, and in extreme cases you can even hit
      the limit of 100 simultaneously held lwlocks in a backend. If you're really
      unlucky, you can hit the limit while in a critical section, which brings
      down the whole system.
      
      This fixes bug #6629 reported by Tom Forbes. Backpatch to 9.1. The page
      splitting code was rewritten in 9.1, and the old code did not have this
      problem.
      3652d72d
  14. Jan 30, 2012
  15. Jan 02, 2012
  16. Oct 01, 2011
    • Tom Lane's avatar
      Support GiST index support functions that want to cache data across calls. · d22a09dc
      Tom Lane authored
      pg_trgm was already doing this unofficially, but the implementation hadn't
      been thought through very well and leaked memory.  Restructure the core
      GiST code so that it actually works, and document it.  Ordinarily this
      would have required an extra memory context creation/destruction for each
      GiST index search, but I was able to avoid that in the normal case of a
      non-rescanned search by finessing the handling of the RBTree.  It used to
      have its own context always, but now shares a context with the
      scan-lifespan data structures, unless there is more than one rescan call.
      This should make the added overhead unnoticeable in typical cases.
      d22a09dc
  17. Sep 08, 2011
    • Heikki Linnakangas's avatar
      Buffering GiST index build algorithm. · 5edb24a8
      Heikki Linnakangas authored
      When building a GiST index that doesn't fit in cache, buffers are attached
      to some internal nodes in the index. This speeds up the build by avoiding
      random I/O that would otherwise be needed to traverse all the way down the
      tree to the find right leaf page for tuple.
      
      Alexander Korotkov
      5edb24a8
  18. Jul 15, 2011
    • Heikki Linnakangas's avatar
      Change the way the offset of downlink is stored in GISTInsertStack. · 8d260911
      Heikki Linnakangas authored
      GISTInsertStack.childoffnum used to mean "offset of the downlink in this
      node, pointing to the child node in the stack". It's now replaced with
      downlinkoffnum, which means "offset of the downlink in the parent of this
      node". gistFindPath() already used childoffnum with this new meaning, and
      had an extra step at the end to pull all the childoffnum values down one
      node in the stack, to adjust the stack for the meaning that childoffnum had
      elsewhere. That's no longer required.
      
      The reason to do this now is this new representation is more convenient for
      the GiST fast build patch that Alexander Korotkov is working on.
      
      While we're at it, replace the linked list used in gistFindPath with a
      standard List, and make gistFindPath() static.
      
      Alexander Korotkov, with some changes by me.
      8d260911
    • Heikki Linnakangas's avatar
      Fix two ancient bugs in GiST code to re-find a parent after page split: · bc175eb8
      Heikki Linnakangas authored
      First, when following a right-link, we incorrectly marked the current page
      as the parent of the right sibling. In reality, the parent of the right page
      is the same as the parent of the current page (or some page to the right of
      it, gistFindCorrectParent() will sort that out).
      
      Secondly, when we follow a right-link, we must prepend, not append, the right
      page to our list of pages to visit. That's because we assume that once we
      hit a leaf page in the list, all the rest are leaf pages too, and give up.
      
      To hit these bugs, you need concurrent actions and several unlucky accidents.
      Another backend must split the root page, while you're in process of
      splitting a lower-level page. Furthermore, while you scan the internal nodes
      to re-find the parent, another backend needs to again split some more internal
      pages. Even then, the bugs don't necessarily manifest as user-visible errors
      or index corruption.
      
      While we're at it, make the error reporting a bit better if gistFindPath()
      fails to re-find the parent. It used to be an assertion, but an elog() seems
      more appropriate.
      
      Backpatch to all supported branches.
      bc175eb8
  19. Jul 04, 2011
  20. Jun 21, 2011
  21. Jun 09, 2011
  22. May 19, 2011
  23. Apr 23, 2011
  24. Apr 10, 2011
  25. Jan 09, 2011
  26. Jan 01, 2011
  27. Dec 29, 2010
    • Robert Haas's avatar
      Support unlogged tables. · 53dbc27c
      Robert Haas authored
      The contents of an unlogged table are WAL-logged; thus, they are not
      available on standby servers and are truncated whenever the database
      system enters recovery.  Indexes on unlogged tables are also unlogged.
      Unlogged GiST indexes are not currently supported.
      53dbc27c
  28. Dec 23, 2010
    • Heikki Linnakangas's avatar
      Rewrite the GiST insertion logic so that we don't need the post-recovery · 9de3aa65
      Heikki Linnakangas authored
      cleanup stage to finish incomplete inserts or splits anymore. There was two
      reasons for the cleanup step:
      
      1. When a new tuple was inserted to a leaf page, the downlink in the parent
      needed to be updated to contain (ie. to be consistent with) the new key.
      Updating the parent in turn might require recursively updating the parent of
      the parent. We now handle that by updating the parent while traversing down
      the tree, so that when we insert the leaf tuple, all the parents are already
      consistent with the new key, and the tree is consistent at every step.
      
      2. When a page is split, we need to insert the downlink for the new right
      page(s), and update the downlink for the original page to not include keys
      that moved to the right page(s). We now handle that by setting a new flag,
      F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is
      set, scans always follow the rightlink, regardless of the NSN mechanism used
      to detect concurrent page splits. That way the tree is consistent right after
      split, even though the downlink is still missing. This is very similar to the
      way B-tree splits are handled. When the downlink is inserted in the parent,
      the flag is cleared. To keep the insertion algorithm simple, when an
      insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it
      finishes the split before doing anything else.
      
      These changes allow removing the whole "invalid tuple" mechanism, but I
      retained the scan code to still follow invalid tuples correctly. While we
      don't create any such tuples anymore, we want to handle them gracefully in
      case you pg_upgrade a GiST index that has them. If we encounter any on an
      insert, though, we just throw an error saying that you need to REINDEX.
      
      The issue that got me into doing this is that if you did a checkpoint while
      an insert or split was in progress, and the checkpoint finishes quickly so
      that there is no WAL record related to the insert between RedoRecPtr and the
      checkpoint record, recovery from that checkpoint would not know to finish
      the incomplete insert. IOW, we have the same issue we solved with the
      rm_safe_restartpoint mechanism during normal operation too. It's highly
      unlikely to happen in practice, and this fix is far too large to backpatch,
      so we're just going to live with in previous versions, but this refactoring
      fixes it going forward.
      
      With this patch, you don't get the annoying
      'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices
      anymore if you crash at an unfortunate moment.
      9de3aa65
  29. Dec 13, 2010
    • Robert Haas's avatar
      Generalize concept of temporary relations to "relation persistence". · 5f7b58fa
      Robert Haas authored
      This commit replaces pg_class.relistemp with pg_class.relpersistence;
      and also modifies the RangeVar node type to carry relpersistence rather
      than istemp.  It also removes removes rd_istemp from RelationData and
      instead performs the correct computation based on relpersistence.
      
      For clarity, we add three new macros: RelationNeedsWAL(),
      RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we
      can clarify the purpose of each check that previous depended on
      rd_istemp.
      
      This is intended as infrastructure for the upcoming unlogged tables
      patch, as well as for future possible work on global temporary tables.
      5f7b58fa
  30. Dec 04, 2010
    • Tom Lane's avatar
      KNNGIST, otherwise known as order-by-operator support for GIST. · 55450687
      Tom Lane authored
      This commit represents a rather heavily editorialized version of
      Teodor's builtin_knngist_itself-0.8.2 and builtin_knngist_proc-0.8.1
      patches.  I redid the opclass API to add a separate Distance method
      instead of turning the Consistent method into an illogical mess,
      fixed some bit-rot in the rbtree interfaces, and generally worked over
      the code style and comments.
      
      There's still no non-code documentation to speak of, but I'll work on
      that separately.  Some contrib-module changes are also yet to come
      (right now, point <-> point is the only KNN-ified operator).
      
      Teodor Sigaev and Tom Lane
      55450687
  31. Nov 16, 2010
    • Heikki Linnakangas's avatar
      The GiST scan algorithm uses LSNs to detect concurrent pages splits, but · 2edc5cd4
      Heikki Linnakangas authored
      temporary indexes are not WAL-logged. We used a constant LSN for temporary
      indexes, on the assumption that we don't need to worry about concurrent page
      splits in temporary indexes because they're only visible to the current
      session. But that assumption is wrong, it's possible to insert rows and
      split pages in the same session, while a scan is in progress. For example,
      by opening a cursor and fetching some rows, and INSERTing new rows before
      fetching some more.
      
      Fix by generating fake increasing LSNs, used in place of real LSNs in
      temporary GiST indexes.
      2edc5cd4
  32. Sep 20, 2010
  33. Jan 02, 2010
  34. Jul 29, 2009
    • Tom Lane's avatar
      Support deferrable uniqueness constraints. · 25d9bf2e
      Tom Lane authored
      The current implementation fires an AFTER ROW trigger for each tuple that
      looks like it might be non-unique according to the index contents at the
      time of insertion.  This works well as long as there aren't many conflicts,
      but won't scale to massive unique-key reassignments.  Improving that case
      is a TODO item.
      
      Dean Rasheed
      25d9bf2e
  35. Jan 01, 2009
  36. Nov 19, 2008
    • Heikki Linnakangas's avatar
      Rethink the way FSM truncation works. Instead of WAL-logging FSM · 33960006
      Heikki Linnakangas authored
      truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To
      make that cleaner from modularity point of view, move the WAL-logging one
      level up to RelationTruncate, and move RelationTruncate and all the
      related WAL-logging to new src/backend/catalog/storage.c file. Introduce
      new RelationCreateStorage and RelationDropStorage functions that are used
      instead of calling smgrcreate/smgrscheduleunlink directly. Move the
      pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new
      functions. This leaves smgr.c as a thin wrapper around md.c; all the
      transactional stuff is now in storage.c.
      
      This will make it easier to add new forks with similar truncation logic,
      like the visibility map.
      33960006
  37. Nov 13, 2008
    • Tom Lane's avatar
      Prevent synchronous scan during GIN index build, because GIN is optimized · 10e3acb8
      Tom Lane authored
      for inserting tuples in increasing TID order.  It's not clear whether this
      fully explains Ivan Sergio Borgonovo's complaint, but simple testing
      confirms that a scan that doesn't start at block 0 can slow GIN build by
      a factor of three or four.
      
      Backpatch to 8.3.  Sync scan didn't exist before that.
      10e3acb8
  38. Nov 03, 2008
  39. Sep 30, 2008
    • Heikki Linnakangas's avatar
      Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, the · 15c121b3
      Heikki Linnakangas authored
      free space information is stored in a dedicated FSM relation fork, with each
      relation (except for hash indexes; they don't use FSM).
      
      This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any
      trace of them from the backend, initdb, and documentation.
      
      Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also
      introduce a new variant of the get_raw_page(regclass, int4, int4) function in
      contrib/pageinspect that let's you to return pages from any relation fork, and
      a new fsm_page_contents() function to inspect the new FSM pages.
      15c121b3
Loading