Commits · 6a4fe1127c5a0ea1515589e416aa29e088170c0e · Jakob Huber / postgres-lambda-diff

Dec 16, 2016

Fix more hash index bugs around marking buffers dirty. · 6a4fe112

Robert Haas authored 8 years ago

In _hash_freeovflpage(), if we're freeing the overflow page that
immediate follows the page to which tuples are being moved (the
confusingly-named "write buffer"), don't forget to mark that
page dirty after updating its hasho_nextblkno.

In _hash_squeezebucket(), it's not necessary to mark the primary
bucket page dirty if there are no overflow pages, because there's
nothing to squeeze in that case.

Amit Kapila, with help from Kuntal Ghosh and Dilip Kumar, after
an initial trouble report by Jeff Janes.

6a4fe112

Remove _hash_wrtbuf() in favor of calling MarkBufferDirty(). · 25216c98

Robert Haas authored 8 years ago

The whole concept of _hash_wrtbuf() is that we need to know at the
time we're releasing the buffer lock (and pin) whether we dirtied the
buffer, but this is easy to get wrong.  This patch actually fixes one
non-obvious bug of that form: hashbucketcleanup forgot to signal
_hash_squeezebucket, which gets the primary bucket page already
locked, as to whether it had already dirtied the page.  Calling
MarkBufferDirty() at the places where we dirty the buffer is more
intuitive and lets us simplify the code in various places as well.

On top of all that, the ultimate goal here is to make hash indexes
WAL-logged, and as the comments to _hash_wrtbuf() note, it should
go away when that happens.  Making it go away a little earlier than
that seems like a good preparatory step.

Report by Jeff Janes.  Diagnosis by Amit Kapila, Kuntal Ghosh,
and Dilip Kumar.  Patch by me, after studying an alternative patch
submitted by Amit Kapila.

Discussion: http://postgr.es/m/CAA4eK1Kf6tOY0oVz_SEdngiNFkeXrA3xUSDPPORQvsWVPdKqnA@mail.gmail.com

25216c98

Nov 30, 2016

Improve hash index bucket split behavior. · 6d46f478

Robert Haas authored 8 years ago

Previously, the right to split a bucket was represented by a
heavyweight lock on the page number of the primary bucket page.
Unfortunately, this meant that every scan needed to take a heavyweight
lock on that bucket also, which was bad for concurrency. Instead, use
a cleanup lock on the primary bucket page to indicate the right to
begin a split, so that scans only need to retain a pin on that page,
which is they would have to acquire anyway, and which is also much
cheaper.

In addition to reducing the locking cost, this also avoids locking out
scans and inserts for the entire lifetime of the split: while the new
bucket is being populated with copies of the appropriate tuples from
the old bucket, scans and inserts can happen in parallel. There are
minor concurrency improvements for vacuum operations as well, though
the situation there is still far from ideal.

This patch also removes the unworldly assumption that a split will
never be interrupted. With the new code, a split is done in a series
of small steps and the system can pick up where it left off if it is
interrupted prior to completion. While this patch does not itself add
write-ahead logging for hash indexes, it is clearly a necessary first
step, since one of the things that could interrupt a split is the
removal of electrical power from the machine performing it.

Amit Kapila. I wrote the original design on which this patch is
based, and did a good bit of work on the comments and README through
multiple rounds of review, but all of the code is Amit's. Also
reviewed by Jesper Pedersen, Jeff Janes, and others.

Discussion: http://postgr.es/m/CAA4eK1LfzcZYxLoXS874Ad0+S-ZM60U9bwcyiUZx9mHZ-KCWhw@mail.gmail.com

6d46f478

Nov 08, 2016

Improve handling of dead tuples in hash indexes. · f0e72a25

Robert Haas authored 8 years ago

When squeezing a bucket during vacuum, it's not necessary to retain
any tuples already marked as dead, so ignore them when deciding which
tuples must be moved in order to empty a bucket page.  Similarly, when
splitting a bucket, relocating dead tuples to the new bucket is a
waste of effort; instead, just ignore them.

Amit Kapila, reviewed by me.  Testing help provided by Ashutosh
Sharma.

f0e72a25

Apr 20, 2016

Revert no-op changes to BufferGetPage() · a343e223

Kevin Grittner authored 8 years ago

The reverted changes were intended to force a choice of whether any
newly-added BufferGetPage() calls needed to be accompanied by a
test of the snapshot age, to support the "snapshot too old"
feature.  Such an accompanying test is needed in about 7% of the
cases, where the page is being used as part of a scan rather than
positioning for other purposes (such as DML or vacuuming).  The
additional effort required for back-patching, and the doubt whether
the intended benefit would really be there, have indicated it is
best just to rely on developers to do the right thing based on
comments and existing usage, as we do with many other conventions.

This change should have little or no effect on generated executable
code.

Motivated by the back-patching pain of Tom Lane and Robert Haas

a343e223

Apr 08, 2016

Modify BufferGetPage() to prepare for "snapshot too old" feature · 8b65cf4c

Kevin Grittner authored 8 years ago

This patch is a no-op patch which is intended to reduce the chances
of failures of omission once the functional part of the "snapshot
too old" patch goes in.  It adds parameters for snapshot, relation,
and an enum to specify whether the snapshot age check needs to be
done for the page at this point.  This initial patch passes NULL
for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the
third.  The follow-on patch will change the places where the test
needs to be made.

8b65cf4c

Jan 02, 2016
- Update copyright for 2016 · ee943004
  Bruce Momjian authored 9 years ago
  
  Backpatch certain files through 9.1
  ee943004
Jan 06, 2015
- Update copyright for 2015 · 4baaf863
  Bruce Momjian authored 10 years ago
  
  Backpatch certain files through 9.0
  4baaf863
May 06, 2014

pgindent run for 9.4 · 0a783200

Bruce Momjian authored 10 years ago

This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.

0a783200

Jan 07, 2014

Update copyright for 2014 · 7e04792a

Bruce Momjian authored 11 years ago

Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.

7e04792a

Jan 01, 2013

Update copyrights for 2013 · bd61a623

Bruce Momjian authored 12 years ago

Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.

bd61a623

Jun 10, 2012
- Run pgindent on 9.2 source tree in preparation for first 9.3 · 927d61ee
  Bruce Momjian authored 12 years ago
  
  commit-fest.
  927d61ee
Mar 21, 2012

Clean up compiler warnings from unused variables with asserts disabled · 0e85abd6

Peter Eisentraut authored 12 years ago

For those variables only used when asserts are enabled, use a new
macro PG_USED_FOR_ASSERTS_ONLY, which expands to
__attribute__((unused)) when asserts are not enabled.

0e85abd6

Jan 02, 2012
- Update copyright notices for year 2012. · e126958c
  Bruce Momjian authored 13 years ago
  
  e126958c
Sep 01, 2011
- Remove unnecessary #include references, per pgrminclude script. · 6416a82a
  Bruce Momjian authored 13 years ago
  
  6416a82a
Jan 01, 2011
- Stamp copyrights for year 2011. · 5d950e3b
  Bruce Momjian authored 14 years ago
  
  5d950e3b
Dec 29, 2010

Support unlogged tables. · 53dbc27c

Robert Haas authored 14 years ago

The contents of an unlogged table are WAL-logged; thus, they are not
available on standby servers and are truncated whenever the database
system enters recovery. Indexes on unlogged tables are also unlogged.
Unlogged GiST indexes are not currently supported.

53dbc27c

Sep 20, 2010
- Remove cvs keywords from all files. · 9f2e2113
  Magnus Hagander authored 14 years ago
  
  9f2e2113
Feb 26, 2010
- pgindent run for 9.0 · 65e806cb
  Bruce Momjian authored 15 years ago
  
  65e806cb
Jan 02, 2010
- Update copyright for the year 2010. · 02398008
  Bruce Momjian authored 15 years ago
  
  02398008
Nov 01, 2009

Fix two serious bugs introduced into hash indexes by the 8.4 patch that made · c4afdca4

Tom Lane authored 15 years ago

hash indexes keep entries sorted by hash value. First, the original plans for
concurrency assumed that insertions would happen only at the end of a page,
which is no longer true; this could cause scans to transiently fail to find
index entries in the presence of concurrent insertions. We can compensate
by teaching scans to re-find their position after re-acquiring read locks.
Second, neither the bucket split nor the bucket compaction logic had been
fixed to preserve hashvalue ordering, so application of either of those
processes could lead to permanent corruption of an index, in the sense
that searches might fail to find entries that are present.

This patch fixes the split and compaction logic to preserve hashvalue
ordering, but it cannot do anything about pre-existing corruption. We will
need to recommend reindexing all hash indexes in the 8.4.2 release notes.

To buy back the performance loss hereby induced in split and compaction,
fix them to use PageIndexMultiDelete instead of retail PageIndexDelete
operations. We might later want to do something with qsort'ing the
page contents rather than doing a binary search for each insertion,
but that seemed more invasive than I cared to risk in a back-patch.

Per bug #5157 from Jeff Janes and subsequent investigation.

c4afdca4

Jan 01, 2009
- Update copyright for 2009. · 511db38a
  Bruce Momjian authored 16 years ago
  
  511db38a
Sep 15, 2008

Change hash indexes to store only the hash code rather than the whole indexed · 4adc2f72

Tom Lane authored 16 years ago

value. This means that hash index lookups are always lossy and have to be
rechecked when the heap is visited; however, the gain in index compactness
outweighs this when the indexed values are wide. Also, we only need to
perform datatype comparisons when the hash codes match exactly, rather than
for every entry in the hash bucket; so it could also win for datatypes that
have expensive comparison functions. A small additional win is gained by
keeping hash index pages sorted by hash code and using binary search to reduce
the number of index tuples we have to look at.

Xiao Meng

This commit also incorporates Zdenek Kotala's patch to isolate hash metapages
and hash bitmaps a bit better from the page header datastructures.

4adc2f72

Jun 19, 2008

Improve our #include situation by moving pointer types away from the · a3540b0f

Alvaro Herrera authored 16 years ago

corresponding struct definitions. This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.

a3540b0f

May 12, 2008

Restructure some header files a bit, in particular heapam.h, by removing some · f8c4d7db

Alvaro Herrera authored 16 years ago

unnecessary #include lines in it.  Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.

For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.

While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.

f8c4d7db

Jan 01, 2008
- Update copyrights in source tree to 2008. · 9098ab9e
  Bruce Momjian authored 17 years ago
  
  9098ab9e
Nov 15, 2007
- pgindent run for 8.3. · fdf5a5ef
  Bruce Momjian authored 17 years ago
  
  fdf5a5ef
Sep 20, 2007

HOT updates. When we update a tuple without changing any of its indexed · 282d2a03

Tom Lane authored 17 years ago

columns, and the new version can be stored on the same heap page, we no longer
generate extra index entries for the new version. Instead, index searches
follow the HOT-chain links to ensure they find the correct tuple version.

In addition, this patch introduces the ability to "prune" dead tuples on a
per-page basis, without having to do a complete VACUUM pass to recover space.
VACUUM is still needed to clean up dead index entries, however.

Pavan Deolasee, with help from a bunch of other people.

282d2a03

Sep 13, 2007

Redefine the lp_flags field of item pointers as having four states, rather · 68893035

Tom Lane authored 17 years ago

than two independent bits (one of which was never used in heap pages anyway,
or at least hadn't been in a very long time). This gives us flexibility to
add the HOT notions of redirected and dead item pointers without requiring
anything so klugy as magic values of lp_off and lp_len. The state values
are chosen so that for the states currently in use (pre-HOT) there is no
change in the physical representation.

68893035

May 30, 2007

Make large sequential scans and VACUUMs work in a limited-size "ring" of · d526575f

Tom Lane authored 17 years ago

buffers, rather than blowing out the whole shared-buffer arena. Aside from
avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
to cause a WAL flush for every page it modified, because we had it hacked to
use only a single buffer. Those flushes will now occur only once per
ring-ful. The exact ring size, and the threshold for seqscans to switch into
the ring usage pattern, remain under debate; but the infrastructure seems
done. The key bit of infrastructure is a new optional BufferAccessStrategy
object that can be passed to ReadBuffer operations; this replaces the former
StrategyHintVacuum API.

This patch also changes the buffer usage-count methodology a bit: we now
advance usage_count when first pinning a buffer, rather than when last
unpinning it. To preserve the behavior that a buffer's lifetime starts to
decrease when it's released, the clock sweep code is modified to not decrement
usage_count of pinned buffers.

Work not done in this commit: teach GiST and GIN indexes to use the vacuum
BufferAccessStrategy for vacuum-driven fetches.

Original patch by Simon, reworked by Heikki and again by Tom.

d526575f

May 03, 2007

Tweak hash index AM to use the new ReadOrZeroBuffer bufmgr API when fetching · 0fef38da

Tom Lane authored 17 years ago

pages it intends to zero immediately. Just to show there is some use for that
function besides WAL recovery :-).
Along the way, fold _hash_checkpage and _hash_pageinit calls into _hash_getbuf
and friends, instead of expecting callers to do that separately.

0fef38da

Apr 19, 2007

Repair PANIC condition in hash indexes when a previous index extension attempt · 9d37c038

Tom Lane authored 17 years ago

failed (due to lock conflicts or out-of-space). We might have already
extended the index's filesystem EOF before failing, causing the EOF to be
beyond what the metapage says is the last used page. Hence the invariant
maintained by the code needs to be "EOF is at or beyond last used page",
not "EOF is exactly the last used page". Problem was created by my patch
of 2006-11-19 that attempted to repair bug #2737. Since that was
back-patched to 7.4, this needs to be as well. Per report and test case
from Vlastimil Krejcir.

9d37c038

Apr 10, 2007

Minor tweaking of index special-space definitions so that the various · 56218fbc

Tom Lane authored 17 years ago

index types can be reliably distinguished by examining the special space
on an index page. Per my earlier proposal, plus the realization that
there's no need for btree's vacuum cycle ID to cycle through every possible
16-bit value. Restricting its range a little costs nearly nothing and
eliminates the possibility of collisions.
Memo to self: remember to make bitmap indexes play along with this scheme,
assuming that patch ever gets accepted.

56218fbc

Jan 05, 2007
- Update CVS HEAD for 2007 copyright. Back branches are typically not · 29dccf5f
  Bruce Momjian authored 18 years ago
  
  back-stamped for this.
  29dccf5f
Nov 19, 2006

Repair problems with hash indexes that span multiple segments: the hash code's · d68efb3f

Tom Lane authored 18 years ago

preference for filling pages out-of-order tends to confuse the sanity checks
in md.c, as per report from Balazs Nagy in bug #2737. The fix is to ensure
that the smgr-level code always has the same idea of the logical EOF as the
hash index code does, by using ReadBuffer(P_NEW) where we are adding a single
page to the end of the index, and using smgrextend() to reserve a large batch
of pages when creating a new splitpoint. The patch is a bit ugly because it
avoids making any changes in md.c, which seems the most prudent approach for a
backpatchable beta-period fix. After 8.3 development opens, I'll take a look
at a cleaner but more invasive patch, in particular getting rid of the now
unnecessary hack to allow reading beyond EOF in mdread().

Backpatch as far as 7.4. The bug likely exists in 7.3 as well, but because
of the magnitude of the 7.3-to-7.4 changes in hash, the later-version patch
doesn't even begin to apply. Given the other known bugs in the 7.3-era hash
code, it does not seem worth trying to develop a separate patch for 7.3.

d68efb3f

Apr 01, 2006

Clean up WAL/buffer interactions as per my recent proposal. Get rid of the · a8b8f4db

Tom Lane authored 18 years ago

misleadingly-named WriteBuffer routine, and instead require routines that
change buffer pages to call MarkBufferDirty (which does exactly what it says).
We also require that they do so before calling XLogInsert; this takes care of
the synchronization requirement documented in SyncOneBuffer. Note that
because bufmgr takes the buffer content lock (in shared mode) while writing
out any buffer, it doesn't matter whether MarkBufferDirty is executed before
the buffer content change is complete, so long as the content change is
completed before releasing exclusive lock on the buffer. So it's OK to set
the dirtybit before we fill in the LSN.
This eliminates the former kluge of needing to set the dirtybit in LockBuffer.
Aside from making the code more transparent, we can also add some new
debugging assertions, in particular that the caller of MarkBufferDirty must
hold the buffer content lock, not merely a pin.

a8b8f4db

Mar 05, 2006
- Update copyright for 2006. Update scripts. · f2f5b056
  Bruce Momjian authored 19 years ago
  
  f2f5b056
Jan 26, 2006
- Remove the no-longer-useful HashItem/HashItemData level of structure. · 5997386a
  Tom Lane authored 19 years ago
  
  Same motivation as for BTItem.
  5997386a
Nov 22, 2005

Re-run pgindent, fixing a problem where comment lines after a blank · 436a2956

Bruce Momjian authored 19 years ago

comment line where output as too long, and update typedefs for /lib
directory.  Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).

Backpatch to 8.1.X.

436a2956

Nov 06, 2005

Add defenses to btree and hash index AMs to do simple sanity checks · 766dc45d

Tom Lane authored 19 years ago

on every index page they read; in particular to catch the case of an
all-zero page, which PageHeaderIsValid allows to pass. It turns out
hash already had this idea, but it was just Assert()ing things rather
than doing a straight error check, and the Asserts were partially
redundant with PageHeaderIsValid anyway. Per recent failure example
from Jim Nasby. (gist still needs the same treatment.)

766dc45d