Skip to content
Snippets Groups Projects
  1. Feb 09, 2017
    • Robert Haas's avatar
      pageinspect: Fix hash_bitmap_info not to read the underlying page. · fc8219dc
      Robert Haas authored
      It did that to verify that the page was an overflow page rather than
      anything else, but that means that checking the status of all the
      overflow bits requires reading the entire index.  So don't do that.
      The new code validates that the page is not a primary bucket page
      or bitmap page by looking at the metapage, so that using this on
      large numbers of pages can be reasonably efficient.
      
      Ashutosh Sharma, per a complaint from me, and with further
      modifications by me.
      fc8219dc
  2. Feb 07, 2017
    • Robert Haas's avatar
      Cache hash index's metapage in rel->rd_amcache. · 293e24e5
      Robert Haas authored
      This avoids a very significant amount of buffer manager traffic and
      contention when scanning hash indexes, because it's no longer
      necessary to lock and pin the metapage for every scan.  We do need
      some way of figuring out when the cache is too stale to use any more,
      so that when we lock the primary bucket page to which the cached
      metapage points us, we can tell whether a split has occurred since we
      cached the metapage data.  To do that, we use the hash_prevblkno field
      in the primary bucket page, which would otherwise always be set to
      InvalidBuffer.
      
      This patch contains code so that it will continue working (although
      less efficiently) with hash indexes built before this change, but
      perhaps we should consider bumping the hash version and ripping out
      the compatibility code.  That decision can be made later, though.
      
      Mithun Cy, reviewed by Jesper Pedersen, Amit Kapila, and by me.
      Before committing, I made a number of cosmetic changes to the last
      posted version of the patch, adjusted _hash_getcachedmetap to be more
      careful about order of operation, and made some necessary updates to
      the pageinspect documentation and regression tests.
      293e24e5
  3. Feb 03, 2017
    • Robert Haas's avatar
      pageinspect: More type-sanity surgery on the new hash index code. · 871ec0e3
      Robert Haas authored
      Uniformly expose unsigned quantities using the next-wider signed
      integer type (since we have no unsigned types at the SQL level).
      At the SQL level, this results a change to report itemoffset as
      int4 rather than int2.  Also at the SQL level, report one value
      that is an OID as type oid.  Under the hood, uniformly use macros
      that match the SQL output type as to both width and signedness.
      871ec0e3
    • Tom Lane's avatar
      In pageinspect/hashfuncs.c, avoid crashes on alignment-picky machines. · 14e9b18f
      Tom Lane authored
      On machines with MAXALIGN = 8, the payload of a bytea is not maxaligned,
      since it will start 4 bytes into a palloc'd value.  On alignment-picky
      hardware, this will cause failures in accesses to 8-byte-wide values
      within the page.  We already encountered this problem when we introduced
      GIN index inspection functions, and fixed it in commit 84ad68d6.  Make
      use of the same function for hash indexes.
      
      A small difficulty is that up to now contrib/pageinspect has not shared
      any functions at all across files.  To support that, introduce a common
      header file "pageinspect.h" for the module.
      
      Also, move get_page_from_raw() out of ginfuncs.c, where it didn't
      especially belong, and put it in rawpage.c which seems a more natural home.
      
      Per buildfarm.
      
      Discussion: https://postgr.es/m/17311.1486134714@sss.pgh.pa.us
      14e9b18f
    • Robert Haas's avatar
      pageinspect: Remove platform-dependent values from hash tests. · 29e312bc
      Robert Haas authored
      Per a report from Tom Lane, the ffactor reported by hash_metapage_info
      and the free_size reported by hash_page_stats vary by platform.
      
      Ashutosh Sharma and Robert Haas
      29e312bc
    • Tom Lane's avatar
      Fix a bunch more portability bugs in commit 08bf6e52. · c6eeb67d
      Tom Lane authored
      It seems like somebody used a dartboard while choosing integer widths
      for the various values taken and returned by these functions ... and
      then threw a fresh set of darts while writing the SQL declarations.
      
      This patch brings the C code into line with what the SQL declarations
      say, which is enough to make it not dump core on the particular 32-bit
      machine I'm testing on.  But I think we could do with another round
      of looking at what the datum widths *should* be.  For instance, it's
      not all that sensible that hash_bitmap_info decided to use int64 to
      represent a BlockNumber input when get_raw_page doesn't do it that way.
      
      There's also a remaining problem that the expected outputs from the
      test script are platform-dependent, but I'll leave that issue for
      somebody else.
      
      Per buildfarm.
      c6eeb67d
    • Robert Haas's avatar
      pageinspect: Try to fix some bugs in previous commit. · ed807fda
      Robert Haas authored
      Commit 08bf6e52 seems not to have
      used the correct *GetDatum and PG_GETARG_* macros for the SQL types
      in some cases, and some of the SQL types seem to have been poorly
      chosen, too.  Try to fix it.  I'm not sure if this is the reason
      why the buildfarm is currently unhappy with this code, but it
      seems like a good place to start.
      
      Buildfarm unhappiness reported by Tom Lane.
      ed807fda
  4. Feb 02, 2017
  5. Jan 21, 2017
  6. Jan 03, 2017
  7. Nov 04, 2016
    • Tom Lane's avatar
      Fix gin_leafpage_items(). · 367b99bb
      Tom Lane authored
      On closer inspection, commit 84ad68d6 broke gin_leafpage_items(),
      because the aligned copy of the page got palloc'd in a short-lived
      context whereas it needs to be in the SRF's multi_call_memory_ctx.
      This was not exposed by the regression test, because the regression
      test doesn't actually exercise the function in a meaningful way.
      Fix the code bug, and extend the test in what I hope is a portable
      fashion.
      367b99bb
    • Peter Eisentraut's avatar
      pageinspect: Fix unaligned struct access in GIN functions · 84ad68d6
      Peter Eisentraut authored
      The raw page data that is passed into the functions will not be aligned
      at 8-byte boundaries.  Casting that to a struct and accessing int64
      fields will result in unaligned access.  On most platforms, you get away
      with it, but it will result on a crash on pickier platforms such as ia64
      and sparc64.
      84ad68d6
  8. Nov 02, 2016
  9. Nov 01, 2016
  10. Oct 26, 2016
  11. Jun 10, 2016
  12. Jun 09, 2016
  13. May 03, 2016
    • Robert Haas's avatar
      Tweak a few more things in preparation for upcoming pgindent run. · 8826d850
      Robert Haas authored
      These adjustments adjust code and comments in minor ways to prevent
      pgindent from mangling them.  Among other things, I tried to avoid
      situations where pgindent would emit "a +b" instead of "a + b", and I
      tried to avoid having it break up inline comments across multiple
      lines.
      8826d850
  14. May 02, 2016
    • Heikki Linnakangas's avatar
      Remove unused macros. · d22b85fb
      Heikki Linnakangas authored
      CHECK_PAGE_OFFSET_RANGE() has been unused forever.
      CHECK_RELATION_BLOCK_RANGE() has been unused in pgstatindex.c ever since
      bt_page_stats() and bt_page_items() functions were moved from pgstattuple
      to pageinspect module. It still exists in pageinspect/btreefuncs.c.
      
      Daniel Gustafsson
      d22b85fb
  15. Apr 20, 2016
    • Kevin Grittner's avatar
      Revert no-op changes to BufferGetPage() · a343e223
      Kevin Grittner authored
      The reverted changes were intended to force a choice of whether any
      newly-added BufferGetPage() calls needed to be accompanied by a
      test of the snapshot age, to support the "snapshot too old"
      feature.  Such an accompanying test is needed in about 7% of the
      cases, where the page is being used as part of a scan rather than
      positioning for other purposes (such as DML or vacuuming).  The
      additional effort required for back-patching, and the doubt whether
      the intended benefit would really be there, have indicated it is
      best just to rely on developers to do the right thing based on
      comments and existing usage, as we do with many other conventions.
      
      This change should have little or no effect on generated executable
      code.
      
      Motivated by the back-patching pain of Tom Lane and Robert Haas
      a343e223
  16. Apr 08, 2016
  17. Mar 28, 2016
    • Alvaro Herrera's avatar
      Add missing checks to some of pageinspect's BRIN functions · 3e133847
      Alvaro Herrera authored
      brin_page_type() and brin_metapage_info() did not enforce being called
      by superuser, like other pageinspect functions that take bytea do.
      Since they don't verify the passed page thoroughly, it is possible to
      use them to read the server memory with a carefully crafted bytea value,
      up to a file kilobytes from where the input bytea is located.
      
      Have them throw errors if called by a non-superuser.
      
      Report and initial patch: Andreas Seltenreich
      
      Security: CVE-2016-3065
      3e133847
  18. Jan 18, 2016
    • Tom Lane's avatar
      Restructure index access method API to hide most of it at the C level. · 65c5fcd3
      Tom Lane authored
      This patch reduces pg_am to just two columns, a name and a handler
      function.  All the data formerly obtained from pg_am is now provided
      in a C struct returned by the handler function.  This is similar to
      the designs we've adopted for FDWs and tablesample methods.  There
      are multiple advantages.  For one, the index AM's support functions
      are now simple C functions, making them faster to call and much less
      error-prone, since the C compiler can now check function signatures.
      For another, this will make it far more practical to define index access
      methods in installable extensions.
      
      A disadvantage is that SQL-level code can no longer see attributes
      of index AMs; in particular, some of the crosschecks in the opr_sanity
      regression test are no longer possible from SQL.  We've addressed that
      by adding a facility for the index AM to perform such checks instead.
      (Much more could be done in that line, but for now we're content if the
      amvalidate functions more or less replace what opr_sanity used to do.)
      We might also want to expose some sort of reporting functionality, but
      this patch doesn't do that.
      
      Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
      editorialized on by me.
      65c5fcd3
  19. Jan 02, 2016
  20. Nov 25, 2015
  21. Aug 13, 2015
    • Alvaro Herrera's avatar
      Use materialize SRF mode in brin_page_items · 94d626ff
      Alvaro Herrera authored
      This function was using the single-value-per-call mechanism, but the
      code relied on a relcache entry that wasn't kept open across calls.
      This manifested as weird errors in buildfarm during the short time that
      the "brin-1" isolation test lived.
      
      Backpatch to 9.5, where it was introduced.
      94d626ff
  22. May 24, 2015
  23. May 07, 2015
    • Alvaro Herrera's avatar
      Improve BRIN infra, minmax opclass and regression test · db5f98ab
      Alvaro Herrera authored
      The minmax opclass was using the wrong support functions when
      cross-datatypes queries were run.  Instead of trying to fix the
      pg_amproc definitions (which apparently is not possible), use the
      already correct pg_amop entries instead.  This requires jumping through
      more hoops (read: extra syscache lookups) to obtain the underlying
      functions to execute, but it is necessary for correctness.
      
      Author: Emre Hasegeli, tweaked by Álvaro
      Review: Andreas Karlsson
      
      Also change BrinOpcInfo to record each stored type's typecache entry
      instead of just the OID.  Turns out that the full type cache is
      necessary in brin_deform_tuple: the original code used the indexed
      type's byval and typlen properties to extract the stored tuple, which is
      correct in Minmax; but in other implementations that want to store
      something different, that's wrong.  The realization that this is a bug
      comes from Emre also, but I did not use his patch.
      
      I also adopted Emre's regression test code (with smallish changes),
      which is more complete.
      db5f98ab
  24. Mar 10, 2015
  25. Feb 21, 2015
  26. Feb 20, 2015
    • Tom Lane's avatar
      Use FLEXIBLE_ARRAY_MEMBER in a bunch more places. · 09d8d110
      Tom Lane authored
      Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]".
      Aside from being more self-documenting, this should help prevent bogus
      warnings from static code analyzers and perhaps compiler misoptimizations.
      
      This patch is just a down payment on eliminating the whole problem, but
      it gets rid of a lot of easy-to-fix cases.
      
      Note that the main problem with doing this is that one must no longer rely
      on computing sizeof(the containing struct), since the result would be
      compiler-dependent.  Instead use offsetof(struct, lastfield).  Autoconf
      also warns against spelling that offsetof(struct, lastfield[0]).
      
      Michael Paquier, review and additional fixes by me.
      09d8d110
  27. Jan 06, 2015
  28. Dec 02, 2014
  29. Nov 21, 2014
  30. Nov 07, 2014
    • Alvaro Herrera's avatar
      BRIN: Block Range Indexes · 7516f525
      Alvaro Herrera authored
      BRIN is a new index access method intended to accelerate scans of very
      large tables, without the maintenance overhead of btrees or other
      traditional indexes.  They work by maintaining "summary" data about
      block ranges.  Bitmap index scans work by reading each summary tuple and
      comparing them with the query quals; all pages in the range are returned
      in a lossy TID bitmap if the quals are consistent with the values in the
      summary tuple, otherwise not.  Normal index scans are not supported
      because these indexes do not store TIDs.
      
      As new tuples are added into the index, the summary information is
      updated (if the block range in which the tuple is added is already
      summarized) or not; in the latter case, a subsequent pass of VACUUM or
      the brin_summarize_new_values() function will create the summary
      information.
      
      For data types with natural 1-D sort orders, the summary info consists
      of the maximum and the minimum values of each indexed column within each
      page range.  This type of operator class we call "Minmax", and we
      supply a bunch of them for most data types with B-tree opclasses.
      Since the BRIN code is generalized, other approaches are possible for
      things such as arrays, geometric types, ranges, etc; even for things
      such as enum types we could do something different than minmax with
      better results.  In this commit I only include minmax.
      
      Catalog version bumped due to new builtin catalog entries.
      
      There's more that could be done here, but this is a good step forwards.
      
      Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
      with contribution by Heikki Linnakangas.
      
      Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
      Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
      
      PS:
        The research leading to these results has received funding from the
        European Union's Seventh Framework Programme (FP7/2007-2013) under
        grant agreement n° 318633.
      7516f525
Loading