Commits · 04aad401867ad3e1519615d8486e32b50dbcb5f5 · Jakob Huber / postgres-lambda-diff

Feb 09, 2017

pageinspect: Fix hash_bitmap_info not to read the underlying page. · fc8219dc

Robert Haas authored 8 years ago

It did that to verify that the page was an overflow page rather than
anything else, but that means that checking the status of all the
overflow bits requires reading the entire index.  So don't do that.
The new code validates that the page is not a primary bucket page
or bitmap page by looking at the metapage, so that using this on
large numbers of pages can be reasonably efficient.

Ashutosh Sharma, per a complaint from me, and with further
modifications by me.

fc8219dc

Feb 07, 2017

Cache hash index's metapage in rel->rd_amcache. · 293e24e5

Robert Haas authored 8 years ago

This avoids a very significant amount of buffer manager traffic and
contention when scanning hash indexes, because it's no longer
necessary to lock and pin the metapage for every scan. We do need
some way of figuring out when the cache is too stale to use any more,
so that when we lock the primary bucket page to which the cached
metapage points us, we can tell whether a split has occurred since we
cached the metapage data. To do that, we use the hash_prevblkno field
in the primary bucket page, which would otherwise always be set to
InvalidBuffer.

This patch contains code so that it will continue working (although
less efficiently) with hash indexes built before this change, but
perhaps we should consider bumping the hash version and ripping out
the compatibility code. That decision can be made later, though.

Mithun Cy, reviewed by Jesper Pedersen, Amit Kapila, and by me.
Before committing, I made a number of cosmetic changes to the last
posted version of the patch, adjusted _hash_getcachedmetap to be more
careful about order of operation, and made some necessary updates to
the pageinspect documentation and regression tests.

293e24e5

Feb 03, 2017

pageinspect: More type-sanity surgery on the new hash index code. · 871ec0e3

Robert Haas authored 8 years ago

Uniformly expose unsigned quantities using the next-wider signed
integer type (since we have no unsigned types at the SQL level).
At the SQL level, this results a change to report itemoffset as
int4 rather than int2. Also at the SQL level, report one value
that is an OID as type oid. Under the hood, uniformly use macros
that match the SQL output type as to both width and signedness.

871ec0e3

In pageinspect/hashfuncs.c, avoid crashes on alignment-picky machines. · 14e9b18f

Tom Lane authored 8 years ago

On machines with MAXALIGN = 8, the payload of a bytea is not maxaligned,
since it will start 4 bytes into a palloc'd value. On alignment-picky
hardware, this will cause failures in accesses to 8-byte-wide values
within the page. We already encountered this problem when we introduced
GIN index inspection functions, and fixed it in commit 84ad68d6. Make
use of the same function for hash indexes.

A small difficulty is that up to now contrib/pageinspect has not shared
any functions at all across files. To support that, introduce a common
header file "pageinspect.h" for the module.

Also, move get_page_from_raw() out of ginfuncs.c, where it didn't
especially belong, and put it in rawpage.c which seems a more natural home.

Per buildfarm.

Discussion: https://postgr.es/m/17311.1486134714@sss.pgh.pa.us

14e9b18f

pageinspect: Remove platform-dependent values from hash tests. · 29e312bc

Robert Haas authored 8 years ago

Per a report from Tom Lane, the ffactor reported by hash_metapage_info
and the free_size reported by hash_page_stats vary by platform.

Ashutosh Sharma and Robert Haas

29e312bc

Fix a bunch more portability bugs in commit . · c6eeb67d

Tom Lane authored 8 years ago

It seems like somebody used a dartboard while choosing integer widths
for the various values taken and returned by these functions ... and
then threw a fresh set of darts while writing the SQL declarations.

This patch brings the C code into line with what the SQL declarations
say, which is enough to make it not dump core on the particular 32-bit
machine I'm testing on.  But I think we could do with another round
of looking at what the datum widths *should* be.  For instance, it's
not all that sensible that hash_bitmap_info decided to use int64 to
represent a BlockNumber input when get_raw_page doesn't do it that way.

There's also a remaining problem that the expected outputs from the
test script are platform-dependent, but I'll leave that issue for
somebody else.

Per buildfarm.

c6eeb67d

pageinspect: Try to fix some bugs in previous commit. · ed807fda

Robert Haas authored 8 years ago

Commit 08bf6e52 seems not to have
used the correct *GetDatum and PG_GETARG_* macros for the SQL types
in some cases, and some of the SQL types seem to have been poorly
chosen, too.  Try to fix it.  I'm not sure if this is the reason
why the buildfarm is currently unhappy with this code, but it
seems like a good place to start.

Buildfarm unhappiness reported by Tom Lane.

ed807fda

Feb 02, 2017

pageinspect: Support hash indexes. · 08bf6e52

Robert Haas authored 8 years ago

Patch by Jesper Pedersen and Ashutosh Sharma, with some error handling
improvements by me. Tests from Peter Eisentraut. Reviewed by Álvaro
Herrera, Michael Paquier, Jesper Pedersen, Jeff Janes, Peter
Eisentraut, Amit Kapila, Mithun Cy, and me.

Discussion: http://postgr.es/m/e2ac6c58-b93f-9dd9-f4e6-d6d30add7fdf@redhat.com

08bf6e52

Jan 21, 2017
- Move some things from builtins.h to new header files · f21a563d
  Peter Eisentraut authored 8 years ago
  
  This avoids that builtins.h has to include additional header files.
  f21a563d
Jan 03, 2017
- Update copyright via script for 2017 · 1d257792
  Bruce Momjian authored 8 years ago
  
  1d257792
Nov 04, 2016

Fix gin_leafpage_items(). · 367b99bb

Tom Lane authored 8 years ago

On closer inspection, commit 84ad68d6 broke gin_leafpage_items(),
because the aligned copy of the page got palloc'd in a short-lived
context whereas it needs to be in the SRF's multi_call_memory_ctx.
This was not exposed by the regression test, because the regression
test doesn't actually exercise the function in a meaningful way.
Fix the code bug, and extend the test in what I hope is a portable
fashion.

367b99bb

pageinspect: Fix unaligned struct access in GIN functions · 84ad68d6

Peter Eisentraut authored 8 years ago

The raw page data that is passed into the functions will not be aligned
at 8-byte boundaries.  Casting that to a struct and accessing int64
fields will result in unaligned access.  On most platforms, you get away
with it, but it will result on a crash on pickier platforms such as ia64
and sparc64.

84ad68d6

Nov 02, 2016

pageinspect: Make page test more portable · 00a86856
Peter Eisentraut authored 8 years ago
```
Choose test data that makes the output independent of endianness.
```
00a86856

Fix portability bug in gin_page_opaque_info(). · 14ee3579

Tom Lane authored 8 years ago

Somebody apparently thought that "if Int32GetDatum is good,
Int64GetDatum must be better".  Per buildfarm failures now
that Peter has added some regression tests here.

14ee3579

pageinspect: Make btree test more portable · f7c9a6e0
Peter Eisentraut authored 8 years ago
```
Choose test data that makes the output independent of endianness and
alignment.
```
f7c9a6e0

Nov 01, 2016
- pageinspect: Add tests · adfb81d9
  Peter Eisentraut authored 8 years ago
  
  adfb81d9
Oct 26, 2016
- Fix typo in comment. · 8a2f08fb
  Heikki Linnakangas authored 8 years ago
  
  Daniel Gustafsson
  8a2f08fb
Jun 10, 2016
- pgindent run for 9.6 · 4bc424b9
  Robert Haas authored 8 years ago
  
  4bc424b9
Jun 09, 2016
- Update pageinspect extension for parallel query. · e3b607cd
  Robert Haas authored 8 years ago
  
  All functions provided by this extension are PARALLEL SAFE. Andreas Karlsson
  e3b607cd
May 03, 2016

Tweak a few more things in preparation for upcoming pgindent run. · 8826d850

Robert Haas authored 8 years ago

These adjustments adjust code and comments in minor ways to prevent
pgindent from mangling them.  Among other things, I tried to avoid
situations where pgindent would emit "a +b" instead of "a + b", and I
tried to avoid having it break up inline comments across multiple
lines.

8826d850

May 02, 2016

Remove unused macros. · d22b85fb

Heikki Linnakangas authored 8 years ago

CHECK_PAGE_OFFSET_RANGE() has been unused forever.
CHECK_RELATION_BLOCK_RANGE() has been unused in pgstatindex.c ever since
bt_page_stats() and bt_page_items() functions were moved from pgstattuple
to pageinspect module. It still exists in pageinspect/btreefuncs.c.

Daniel Gustafsson

d22b85fb

Apr 20, 2016

Revert no-op changes to BufferGetPage() · a343e223

Kevin Grittner authored 8 years ago

The reverted changes were intended to force a choice of whether any
newly-added BufferGetPage() calls needed to be accompanied by a
test of the snapshot age, to support the "snapshot too old"
feature.  Such an accompanying test is needed in about 7% of the
cases, where the page is being used as part of a scan rather than
positioning for other purposes (such as DML or vacuuming).  The
additional effort required for back-patching, and the doubt whether
the intended benefit would really be there, have indicated it is
best just to rely on developers to do the right thing based on
comments and existing usage, as we do with many other conventions.

This change should have little or no effect on generated executable
code.

Motivated by the back-patching pain of Tom Lane and Robert Haas

a343e223

Apr 08, 2016

Modify BufferGetPage() to prepare for "snapshot too old" feature · 8b65cf4c

Kevin Grittner authored 8 years ago

This patch is a no-op patch which is intended to reduce the chances
of failures of omission once the functional part of the "snapshot
too old" patch goes in.  It adds parameters for snapshot, relation,
and an enum to specify whether the snapshot age check needs to be
done for the page at this point.  This initial patch passes NULL
for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the
third.  The follow-on patch will change the places where the test
needs to be made.

8b65cf4c

Replace printf format %i by %d · 339025c6
Peter Eisentraut authored 8 years ago
```
see also ce8d7bb6
```
339025c6
Fix printf format · 8b737f90
Peter Eisentraut authored 8 years ago

8b737f90

Mar 28, 2016

Add missing checks to some of pageinspect's BRIN functions · 3e133847

Alvaro Herrera authored 9 years ago

brin_page_type() and brin_metapage_info() did not enforce being called
by superuser, like other pageinspect functions that take bytea do.
Since they don't verify the passed page thoroughly, it is possible to
use them to read the server memory with a carefully crafted bytea value,
up to a file kilobytes from where the input bytea is located.

Have them throw errors if called by a non-superuser.

Report and initial patch: Andreas Seltenreich

Security: CVE-2016-3065

3e133847

Jan 18, 2016

Restructure index access method API to hide most of it at the C level. · 65c5fcd3

Tom Lane authored 9 years ago

This patch reduces pg_am to just two columns, a name and a handler
function.  All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function.  This is similar to
the designs we've adopted for FDWs and tablesample methods.  There
are multiple advantages.  For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.

A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL.  We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.

Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.

65c5fcd3

Jan 02, 2016
- Update copyright for 2016 · ee943004
  Bruce Momjian authored 9 years ago
  
  Backpatch certain files through 9.1
  ee943004
Nov 25, 2015
- Add forgotten file in commit d6061f83 · 0271e27c
  Teodor Sigaev authored 9 years ago
  
  0271e27c
- Improve pageinspect module · d6061f83
  Teodor Sigaev authored 9 years ago
  
  Now pageinspect can show data stored in the heap tuple. Nikolay Shaplov
  d6061f83
Aug 13, 2015

Use materialize SRF mode in brin_page_items · 94d626ff

Alvaro Herrera authored 9 years ago

This function was using the single-value-per-call mechanism, but the
code relied on a relcache entry that wasn't kept open across calls.
This manifested as weird errors in buildfarm during the short time that
the "brin-1" isolation test lived.

Backpatch to 9.5, where it was introduced.

94d626ff

May 24, 2015
- pgindent run for 9.5 · 807b9e0d
  Bruce Momjian authored 9 years ago
  
  807b9e0d
May 07, 2015

Improve BRIN infra, minmax opclass and regression test · db5f98ab

Alvaro Herrera authored 9 years ago

The minmax opclass was using the wrong support functions when
cross-datatypes queries were run.  Instead of trying to fix the
pg_amproc definitions (which apparently is not possible), use the
already correct pg_amop entries instead.  This requires jumping through
more hoops (read: extra syscache lookups) to obtain the underlying
functions to execute, but it is necessary for correctness.

Author: Emre Hasegeli, tweaked by Álvaro
Review: Andreas Karlsson

Also change BrinOpcInfo to record each stored type's typecache entry
instead of just the OID.  Turns out that the full type cache is
necessary in brin_deform_tuple: the original code used the indexed
type's byval and typlen properties to extract the stored tuple, which is
correct in Minmax; but in other implementations that want to store
something different, that's wrong.  The realization that this is a bug
comes from Emre also, but I did not use his patch.

I also adopted Emre's regression test code (with smallish changes),
which is more complete.

db5f98ab

Mar 10, 2015

Move BRIN page type to page's last two bytes · e491bd2e

Alvaro Herrera authored 10 years ago

... which is the usual convention among AMs, so that pg_filedump and
similar utilities can tell apart pages of different AMs.  It was also
the intent of the original code, but I failed to realize that alignment
considerations would move the whole thing to the previous-to-last word
in the page.

The new definition of the associated macro makes surrounding code a bit
leaner, too.

Per note from Heikki at
http://www.postgresql.org/message-id/546A16EF.9070005@vmware.com

e491bd2e

Feb 21, 2015

Use FLEXIBLE_ARRAY_MEMBER for HeapTupleHeaderData.t_bits[]. · e1a11d93

Tom Lane authored 10 years ago

This requires changing quite a few places that were depending on
sizeof(HeapTupleHeaderData), but it seems for the best.

Michael Paquier, some adjustments by me

e1a11d93

Feb 20, 2015

Use FLEXIBLE_ARRAY_MEMBER in a bunch more places. · 09d8d110

Tom Lane authored 10 years ago

Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]".
Aside from being more self-documenting, this should help prevent bogus
warnings from static code analyzers and perhaps compiler misoptimizations.

This patch is just a down payment on eliminating the whole problem, but
it gets rid of a lot of easy-to-fix cases.

Note that the main problem with doing this is that one must no longer rely
on computing sizeof(the containing struct), since the result would be
compiler-dependent. Instead use offsetof(struct, lastfield). Autoconf
also warns against spelling that offsetof(struct, lastfield[0]).

Michael Paquier, review and additional fixes by me.

09d8d110

Jan 06, 2015
- Update copyright for 2015 · 4baaf863
  Bruce Momjian authored 10 years ago
  
  Backpatch certain files through 9.0
  4baaf863
Dec 02, 2014
- pageinspect/BRIN: minor tweaks · b52cb469
  Alvaro Herrera authored 10 years ago
  
  Michael Paquier Double-dash additions suggested by Peter Geoghegan
  b52cb469
Nov 21, 2014
- Add pageinspect functions for inspecting GIN indexes. · 3a82bc6f
  Heikki Linnakangas authored 10 years ago
  
  Patch by me, Peter Geoghegan and Michael Paquier, reviewed by Amit Kapila.
  3a82bc6f
Nov 07, 2014

BRIN: Block Range Indexes · 7516f525

Alvaro Herrera authored 10 years ago

BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes.  They work by maintaining "summary" data about
block ranges.  Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not.  Normal index scans are not supported
because these indexes do not store TIDs.

As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.

For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range.  This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results.  In this commit I only include minmax.

Catalog version bumped due to new builtin catalog entries.

There's more that could be done here, but this is a good step forwards.

Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.

Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.

PS:
  The research leading to these results has received funding from the
  European Union's Seventh Framework Programme (FP7/2007-2013) under
  grant agreement n° 318633.

7516f525