Skip to content
Snippets Groups Projects
  1. Mar 26, 2015
    • Heikki Linnakangas's avatar
      Fix GiST index-only scans for opclasses with different storage type. · 55b59eda
      Heikki Linnakangas authored
      We cannot use the index's tuple descriptor directly to describe the index
      tuples returned in an index-only scan. That's because the index might use
      a different datatype for the values stored on disk than the type originally
      indexed. As long as they were both pass-by-ref, it worked, but will not work
      for pass-by-value types of different sizes. I noticed this as a crash when I
      started hacking a patch to add fetch methods to btree_gist.
      55b59eda
    • Heikki Linnakangas's avatar
      Add support for index-only scans in GiST. · d04c8ed9
      Heikki Linnakangas authored
      This adds a new GiST opclass method, 'fetch', which is used to reconstruct
      the original Datum from the value stored in the index. Also, the 'canreturn'
      index AM interface function gains a new 'attno' argument. That makes it
      possible to use index-only scans on a multi-column index where some of the
      opclasses support index-only scans but some do not.
      
      This patch adds support in the box and point opclasses. Other opclasses
      can added later as follow-on patches (btree_gist would be particularly
      interesting).
      
      Anastasia Lubennikova, with additional fixes and modifications by me.
      d04c8ed9
    • Heikki Linnakangas's avatar
      Minor cleanup of GiST code, for readability. · 8fa393a6
      Heikki Linnakangas authored
      Remove the gistcentryinit function, inlining the relevant part of it into
      the only caller.
      8fa393a6
  2. Jan 06, 2015
  3. May 06, 2014
    • Bruce Momjian's avatar
      pgindent run for 9.4 · 0a783200
      Bruce Momjian authored
      This includes removing tabs after periods in C comments, which was
      applied to back branches, so this change should not effect backpatching.
      0a783200
  4. Jan 07, 2014
  5. May 29, 2013
  6. Feb 11, 2013
    • Heikki Linnakangas's avatar
      Support unlogged GiST index. · 62401db4
      Heikki Linnakangas authored
      The reason this wasn't supported before was that GiST indexes need an
      increasing sequence to detect concurrent page-splits. In a regular WAL-
      logged GiST index, the LSN of the page-split record is used for that
      purpose, and in a temporary index, we can get away with a backend-local
      counter. Neither of those methods works for an unlogged relation.
      
      To provide such an increasing sequence of numbers, create a "fake LSN"
      counter that is saved and restored across shutdowns. On recovery, unlogged
      relations are blown away, so the counter doesn't need to survive that
      either.
      
      Jeevan Chalke, based on discussions with Robert Haas, Tom Lane and me.
      62401db4
  7. Feb 07, 2013
    • Tom Lane's avatar
      Repair bugs in GiST page splitting code for multi-column indexes. · 166d534f
      Tom Lane authored
      When considering a non-last column in a multi-column GiST index,
      gistsplit.c tries to improve on the split chosen by the opclass-specific
      pickSplit function by considering penalties for the next column.  However,
      there were two bugs in this code: it failed to recompute the union keys for
      the leftmost index columns, even though these might well change after
      reassigning tuples; and it included the old union keys in the recomputation
      for the columns it did recompute, so that those keys couldn't get smaller
      even if they should.  The first problem could result in an invalid index
      in which searches wouldn't find index entries that are in fact present;
      the second would make the index less efficient to search.
      
      Both of these errors were caused by misuse of gistMakeUnionItVec, whose
      API was designed in a way that just begged such errors to be made.  There
      is no situation in which it's safe or useful to compute the union keys for
      a subset of the index columns, and there is no caller that wants any
      previous union keys to be included in the computation; so the undocumented
      choice to treat the union keys as in/out rather than pure output parameters
      is a waste of code as well as being dangerous.
      
      Hence, rather than just making a minimal patch, I've changed the API of
      gistMakeUnionItVec to remove the "startkey" parameter (it now always
      processes all index columns) and treat the attr/isnull arrays as purely
      output parameters.
      
      In passing, also get rid of a couple of unnecessary and dangerous uses
      of static variables in gistutil.c.  It's remarkable that the one in
      gistMakeUnionKey hasn't given us portability troubles before now, because
      in addition to posing a re-entrancy hazard, it was unsafely assuming that
      a static char[] array would have at least Datum alignment.
      
      Per investigation of a trouble report from Tomas Vondra.  (There are also
      some bugs in contrib/btree_gist to be fixed, but that seems like material
      for a separate patch.)  Back-patch to all supported branches.
      166d534f
  8. Jan 25, 2013
    • Heikki Linnakangas's avatar
      Add some randomness to the choice of which GiST page to insert to. · ba1cc650
      Heikki Linnakangas authored
      When descending the tree for an insert, and there are multiple equally good
      pages we could insert to, make the choice in random. Previously, we would
      always choose the tuple with lowest offset number. That meant that when two
      non-leaf pages overlap - in the extreme case they might have exactly the same
      key - all but the first such page went unused. That wasn't optimal for space
      usage; if you deleted some tuples from the non-first pages, the space would
      never be reused.
      
      With this patch, the other pages are sometimes chosen too, although there's
      still a heavy bias towards low-offset tuples, so that we don't lose cache
      locality when doing a lot of inserts with similar keys.
      
      Original idea by Alexander Korotkov, although this patch version was written
      by me and copy-edited by Tom Lane.
      ba1cc650
  9. Jan 01, 2013
  10. Aug 31, 2012
    • Tom Lane's avatar
      Improve coding of gistchoose and gistRelocateBuildBuffersOnSplit. · e5db11c5
      Tom Lane authored
      This is mostly cosmetic, but it does eliminate a speculative portability
      issue.  The previous coding ignored the fact that sum_grow could easily
      overflow (in fact, it could be summing multiple IEEE float infinities).
      On a platform where that didn't guarantee to produce a positive result,
      the code would misbehave.  In any case, it was less than readable.
      e5db11c5
  11. Aug 30, 2012
    • Robert Haas's avatar
      Fix logic bug in gistchoose and gistRelocateBuildBuffersOnSplit. · c8ba697a
      Robert Haas authored
      Every time the best-tuple-found-so-far changes, we need to reset all
      the penalty values in which_grow[] to the penalties for the new best
      tuple.  The old code failed to do this, resulting in inferior index
      quality.
      
      The original patch from Alexander Korotkov was just two lines; I took
      the liberty of fleshing that out by adding a bunch of comments that I
      hope will make this logic easier for others to understand than it was
      for me.
      c8ba697a
  12. Jun 24, 2012
    • Heikki Linnakangas's avatar
      Replace XLogRecPtr struct with a 64-bit integer. · 0ab9d1c4
      Heikki Linnakangas authored
      This simplifies code that needs to do arithmetic on XLogRecPtrs.
      
      To avoid changing on-disk format of data pages, the LSN on data pages is
      still stored in the old format. That should keep pg_upgrade happy. However,
      we have XLogRecPtrs embedded in the control file, and in the structs that
      are sent over the replication protocol, so this changes breaks compatibility
      of pg_basebackup and server. I didn't do anything about this in this patch,
      per discussion on -hackers, the right thing to do would to be to change the
      replication protocol to be architecture-independent, so that you could use
      a newer version of pg_receivexlog, for example, against an older server
      version.
      0ab9d1c4
  13. Jan 30, 2012
  14. Jan 02, 2012
  15. Nov 27, 2011
  16. Sep 08, 2011
    • Heikki Linnakangas's avatar
      Buffering GiST index build algorithm. · 5edb24a8
      Heikki Linnakangas authored
      When building a GiST index that doesn't fit in cache, buffers are attached
      to some internal nodes in the index. This speeds up the build by avoiding
      random I/O that would otherwise be needed to traverse all the way down the
      tree to the find right leaf page for tuple.
      
      Alexander Korotkov
      5edb24a8
  17. Sep 01, 2011
  18. Jun 09, 2011
  19. May 31, 2011
    • Tom Lane's avatar
      Protect GIST logic that assumes penalty values can't be negative. · 6923d699
      Tom Lane authored
      Apparently sane-looking penalty code might return small negative values,
      for example because of roundoff error.  This will confuse places like
      gistchoose().  Prevent problems by clamping negative penalty values to
      zero.  (Just to be really sure, I also made it force NaNs to zero.)
      Back-patch to all supported branches.
      
      Alexander Korotkov
      6923d699
  20. Apr 23, 2011
  21. Apr 10, 2011
  22. Jan 01, 2011
  23. Dec 23, 2010
    • Heikki Linnakangas's avatar
      Rewrite the GiST insertion logic so that we don't need the post-recovery · 9de3aa65
      Heikki Linnakangas authored
      cleanup stage to finish incomplete inserts or splits anymore. There was two
      reasons for the cleanup step:
      
      1. When a new tuple was inserted to a leaf page, the downlink in the parent
      needed to be updated to contain (ie. to be consistent with) the new key.
      Updating the parent in turn might require recursively updating the parent of
      the parent. We now handle that by updating the parent while traversing down
      the tree, so that when we insert the leaf tuple, all the parents are already
      consistent with the new key, and the tree is consistent at every step.
      
      2. When a page is split, we need to insert the downlink for the new right
      page(s), and update the downlink for the original page to not include keys
      that moved to the right page(s). We now handle that by setting a new flag,
      F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is
      set, scans always follow the rightlink, regardless of the NSN mechanism used
      to detect concurrent page splits. That way the tree is consistent right after
      split, even though the downlink is still missing. This is very similar to the
      way B-tree splits are handled. When the downlink is inserted in the parent,
      the flag is cleared. To keep the insertion algorithm simple, when an
      insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it
      finishes the split before doing anything else.
      
      These changes allow removing the whole "invalid tuple" mechanism, but I
      retained the scan code to still follow invalid tuples correctly. While we
      don't create any such tuples anymore, we want to handle them gracefully in
      case you pg_upgrade a GiST index that has them. If we encounter any on an
      insert, though, we just throw an error saying that you need to REINDEX.
      
      The issue that got me into doing this is that if you did a checkpoint while
      an insert or split was in progress, and the checkpoint finishes quickly so
      that there is no WAL record related to the insert between RedoRecPtr and the
      checkpoint record, recovery from that checkpoint would not know to finish
      the incomplete insert. IOW, we have the same issue we solved with the
      rm_safe_restartpoint mechanism during normal operation too. It's highly
      unlikely to happen in practice, and this fix is far too large to backpatch,
      so we're just going to live with in previous versions, but this refactoring
      fixes it going forward.
      
      With this patch, you don't get the annoying
      'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices
      anymore if you crash at an unfortunate moment.
      9de3aa65
  24. Nov 16, 2010
    • Heikki Linnakangas's avatar
      The GiST scan algorithm uses LSNs to detect concurrent pages splits, but · 2edc5cd4
      Heikki Linnakangas authored
      temporary indexes are not WAL-logged. We used a constant LSN for temporary
      indexes, on the assumption that we don't need to worry about concurrent page
      splits in temporary indexes because they're only visible to the current
      session. But that assumption is wrong, it's possible to insert rows and
      split pages in the same session, while a scan is in progress. For example,
      by opening a cursor and fetching some rows, and INSERTing new rows before
      fetching some more.
      
      Fix by generating fake increasing LSNs, used in place of real LSNs in
      temporary GiST indexes.
      2edc5cd4
  25. Sep 20, 2010
  26. Jan 02, 2010
  27. Jun 11, 2009
  28. Jan 05, 2009
  29. Jan 01, 2009
  30. Sep 30, 2008
    • Heikki Linnakangas's avatar
      Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, the · 15c121b3
      Heikki Linnakangas authored
      free space information is stored in a dedicated FSM relation fork, with each
      relation (except for hash indexes; they don't use FSM).
      
      This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any
      trace of them from the backend, initdb, and documentation.
      
      Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also
      introduce a new variant of the get_raw_page(regclass, int4, int4) function in
      contrib/pageinspect that let's you to return pages from any relation fork, and
      a new fsm_page_contents() function to inspect the new FSM pages.
      15c121b3
  31. Jul 13, 2008
    • Tom Lane's avatar
      Clean up the use of some page-header-access macros: principally, use · 9d035f42
      Tom Lane authored
      SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that
      makes the code clearer, and avoid casting between Page and PageHeader where
      possible.  Zdenek Kotala, with some additional cleanup by Heikki Linnakangas.
      
      I did not apply the parts of the proposed patch that would have resulted in
      slightly changing the on-disk format of hash indexes; it seems to me that's
      not a win as long as there's any chance of having in-place upgrade for 8.4.
      9d035f42
  32. Jun 19, 2008
  33. Jun 15, 2008
  34. Jun 12, 2008
    • Heikki Linnakangas's avatar
      Refactor XLogOpenRelation() and XLogReadBuffer() in preparation for relation · a213f1ee
      Heikki Linnakangas authored
      forks. XLogOpenRelation() and the associated light-weight relation cache in
      xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument,
      instead of Relation.
      
      For functions that still need a Relation struct during WAL replay, there's a
      new function called CreateFakeRelcacheEntry() that returns a fake entry like
      XLogOpenRelation() used to.
      a213f1ee
  35. May 12, 2008
    • Alvaro Herrera's avatar
      Restructure some header files a bit, in particular heapam.h, by removing some · f8c4d7db
      Alvaro Herrera authored
      unnecessary #include lines in it.  Also, move some tuple routine prototypes and
      macros to htup.h, which allows removal of heapam.h inclusion from some .c
      files.
      
      For this to work, a new header file access/sysattr.h needed to be created,
      initially containing attribute numbers of system columns, for pg_dump usage.
      
      While at it, make contrib ltree, intarray and hstore header files more
      consistent with our header style.
      f8c4d7db
  36. Jan 01, 2008
  37. Sep 20, 2007
    • Tom Lane's avatar
      HOT updates. When we update a tuple without changing any of its indexed · 282d2a03
      Tom Lane authored
      columns, and the new version can be stored on the same heap page, we no longer
      generate extra index entries for the new version.  Instead, index searches
      follow the HOT-chain links to ensure they find the correct tuple version.
      
      In addition, this patch introduces the ability to "prune" dead tuples on a
      per-page basis, without having to do a complete VACUUM pass to recover space.
      VACUUM is still needed to clean up dead index entries, however.
      
      Pavan Deolasee, with help from a bunch of other people.
      282d2a03
  38. Sep 13, 2007
    • Tom Lane's avatar
      Redefine the lp_flags field of item pointers as having four states, rather · 68893035
      Tom Lane authored
      than two independent bits (one of which was never used in heap pages anyway,
      or at least hadn't been in a very long time).  This gives us flexibility to
      add the HOT notions of redirected and dead item pointers without requiring
      anything so klugy as magic values of lp_off and lp_len.  The state values
      are chosen so that for the states currently in use (pre-HOT) there is no
      change in the physical representation.
      68893035
Loading