Commits · 55b59eda13a742f8af913734e22ecc8a21754414 · Jakob Huber / postgres-lambda-diff

Mar 26, 2015

Fix GiST index-only scans for opclasses with different storage type. · 55b59eda

Heikki Linnakangas authored 10 years ago

We cannot use the index's tuple descriptor directly to describe the index
tuples returned in an index-only scan. That's because the index might use
a different datatype for the values stored on disk than the type originally
indexed. As long as they were both pass-by-ref, it worked, but will not work
for pass-by-value types of different sizes. I noticed this as a crash when I
started hacking a patch to add fetch methods to btree_gist.

55b59eda

Add support for index-only scans in GiST. · d04c8ed9

Heikki Linnakangas authored 10 years ago

This adds a new GiST opclass method, 'fetch', which is used to reconstruct
the original Datum from the value stored in the index. Also, the 'canreturn'
index AM interface function gains a new 'attno' argument. That makes it
possible to use index-only scans on a multi-column index where some of the
opclasses support index-only scans but some do not.

This patch adds support in the box and point opclasses. Other opclasses
can added later as follow-on patches (btree_gist would be particularly
interesting).

Anastasia Lubennikova, with additional fixes and modifications by me.

d04c8ed9

Minor cleanup of GiST code, for readability. · 8fa393a6
Heikki Linnakangas authored 10 years ago
```
Remove the gistcentryinit function, inlining the relevant part of it into
the only caller.
```
8fa393a6

Jan 06, 2015
- Update copyright for 2015 · 4baaf863
  Bruce Momjian authored 10 years ago
  
  Backpatch certain files through 9.0
  4baaf863
May 06, 2014

pgindent run for 9.4 · 0a783200

Bruce Momjian authored 10 years ago

This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.

0a783200

Jan 07, 2014

Update copyright for 2014 · 7e04792a

Bruce Momjian authored 11 years ago

Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.

7e04792a

May 29, 2013

pgindent run for release 9.3 · 9af4159f

Bruce Momjian authored 11 years ago

This is the first run of the Perl-based pgindent script.  Also update
pgindent instructions.

9af4159f

Feb 11, 2013

Support unlogged GiST index. · 62401db4

Heikki Linnakangas authored 12 years ago

The reason this wasn't supported before was that GiST indexes need an
increasing sequence to detect concurrent page-splits. In a regular WAL-
logged GiST index, the LSN of the page-split record is used for that
purpose, and in a temporary index, we can get away with a backend-local
counter. Neither of those methods works for an unlogged relation.

To provide such an increasing sequence of numbers, create a "fake LSN"
counter that is saved and restored across shutdowns. On recovery, unlogged
relations are blown away, so the counter doesn't need to survive that
either.

Jeevan Chalke, based on discussions with Robert Haas, Tom Lane and me.

62401db4

Feb 07, 2013

Repair bugs in GiST page splitting code for multi-column indexes. · 166d534f

Tom Lane authored 12 years ago

When considering a non-last column in a multi-column GiST index,
gistsplit.c tries to improve on the split chosen by the opclass-specific
pickSplit function by considering penalties for the next column. However,
there were two bugs in this code: it failed to recompute the union keys for
the leftmost index columns, even though these might well change after
reassigning tuples; and it included the old union keys in the recomputation
for the columns it did recompute, so that those keys couldn't get smaller
even if they should. The first problem could result in an invalid index
in which searches wouldn't find index entries that are in fact present;
the second would make the index less efficient to search.

Both of these errors were caused by misuse of gistMakeUnionItVec, whose
API was designed in a way that just begged such errors to be made. There
is no situation in which it's safe or useful to compute the union keys for
a subset of the index columns, and there is no caller that wants any
previous union keys to be included in the computation; so the undocumented
choice to treat the union keys as in/out rather than pure output parameters
is a waste of code as well as being dangerous.

Hence, rather than just making a minimal patch, I've changed the API of
gistMakeUnionItVec to remove the "startkey" parameter (it now always
processes all index columns) and treat the attr/isnull arrays as purely
output parameters.

In passing, also get rid of a couple of unnecessary and dangerous uses
of static variables in gistutil.c. It's remarkable that the one in
gistMakeUnionKey hasn't given us portability troubles before now, because
in addition to posing a re-entrancy hazard, it was unsafely assuming that
a static char[] array would have at least Datum alignment.

Per investigation of a trouble report from Tomas Vondra. (There are also
some bugs in contrib/btree_gist to be fixed, but that seems like material
for a separate patch.) Back-patch to all supported branches.

166d534f

Jan 25, 2013

Add some randomness to the choice of which GiST page to insert to. · ba1cc650

Heikki Linnakangas authored 12 years ago

When descending the tree for an insert, and there are multiple equally good
pages we could insert to, make the choice in random. Previously, we would
always choose the tuple with lowest offset number. That meant that when two
non-leaf pages overlap - in the extreme case they might have exactly the same
key - all but the first such page went unused. That wasn't optimal for space
usage; if you deleted some tuples from the non-first pages, the space would
never be reused.

With this patch, the other pages are sometimes chosen too, although there's
still a heavy bias towards low-offset tuples, so that we don't lose cache
locality when doing a lot of inserts with similar keys.

Original idea by Alexander Korotkov, although this patch version was written
by me and copy-edited by Tom Lane.

ba1cc650

Jan 01, 2013

Update copyrights for 2013 · bd61a623

Bruce Momjian authored 12 years ago

Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.

bd61a623

Aug 31, 2012

Improve coding of gistchoose and gistRelocateBuildBuffersOnSplit. · e5db11c5

Tom Lane authored 12 years ago

This is mostly cosmetic, but it does eliminate a speculative portability
issue. The previous coding ignored the fact that sum_grow could easily
overflow (in fact, it could be summing multiple IEEE float infinities).
On a platform where that didn't guarantee to produce a positive result,
the code would misbehave. In any case, it was less than readable.

e5db11c5

Aug 30, 2012

Fix logic bug in gistchoose and gistRelocateBuildBuffersOnSplit. · c8ba697a

Robert Haas authored 12 years ago

Every time the best-tuple-found-so-far changes, we need to reset all
the penalty values in which_grow[] to the penalties for the new best
tuple.  The old code failed to do this, resulting in inferior index
quality.

The original patch from Alexander Korotkov was just two lines; I took
the liberty of fleshing that out by adding a bunch of comments that I
hope will make this logic easier for others to understand than it was
for me.

c8ba697a

Jun 24, 2012

Replace XLogRecPtr struct with a 64-bit integer. · 0ab9d1c4

Heikki Linnakangas authored 12 years ago

This simplifies code that needs to do arithmetic on XLogRecPtrs.

To avoid changing on-disk format of data pages, the LSN on data pages is
still stored in the old format. That should keep pg_upgrade happy. However,
we have XLogRecPtrs embedded in the control file, and in the structs that
are sent over the replication protocol, so this changes breaks compatibility
of pg_basebackup and server. I didn't do anything about this in this patch,
per discussion on -hackers, the right thing to do would to be to change the
replication protocol to be architecture-independent, so that you could use
a newer version of pg_receivexlog, for example, against an older server
version.

0ab9d1c4

Jan 30, 2012
- Assorted comment fixes, mostly just typos, but some obsolete statements. · ad10853b
  Tom Lane authored 13 years ago
  
  YAMAMOTO Takashi
  ad10853b
Jan 02, 2012
- Update copyright notices for year 2012. · e126958c
  Bruce Momjian authored 13 years ago
  
  e126958c
Nov 27, 2011

Use IEEE infinity, not 1e10, for null-and-not-null case in gistpenalty(). · 9f4563f7

Tom Lane authored 13 years ago

Use of a randomly chosen large value was never exactly graceful, and
now that there are penalty functions that are intentionally using infinity,
it doesn't seem like a good idea for null-vs-not-null to be using something
less.

9f4563f7

Sep 08, 2011

Buffering GiST index build algorithm. · 5edb24a8

Heikki Linnakangas authored 13 years ago

When building a GiST index that doesn't fit in cache, buffers are attached
to some internal nodes in the index. This speeds up the build by avoiding
random I/O that would otherwise be needed to traverse all the way down the
tree to the find right leaf page for tuple.

Alexander Korotkov

5edb24a8

Sep 01, 2011
- Remove unnecessary #include references, per pgrminclude script. · 6416a82a
  Bruce Momjian authored 13 years ago
  
  6416a82a
Jun 09, 2011
- Pgindent run before 9.1 beta2. · 6560407c
  Bruce Momjian authored 13 years ago
  
  6560407c
May 31, 2011

Protect GIST logic that assumes penalty values can't be negative. · 6923d699

Tom Lane authored 13 years ago

Apparently sane-looking penalty code might return small negative values,
for example because of roundoff error.  This will confuse places like
gistchoose().  Prevent problems by clamping negative penalty values to
zero.  (Just to be really sure, I also made it force NaNs to zero.)
Back-patch to all supported branches.

Alexander Korotkov

6923d699

Apr 23, 2011

Make GIN and GIST pass the index collation to all their support functions. · ae20bf17

Tom Lane authored 13 years ago

Experimentation with contrib/btree_gist shows that the majority of the GIST
support functions potentially need collation information. Safest policy
seems to be to pass it to all of them, instead of making assumptions about
which ones could possibly need it.

ae20bf17

Apr 10, 2011
- pgindent run before PG 9.1 beta 1. · bf50caf1
  Bruce Momjian authored 14 years ago
  
  bf50caf1
Jan 01, 2011
- Stamp copyrights for year 2011. · 5d950e3b
  Bruce Momjian authored 14 years ago
  
  5d950e3b
Dec 23, 2010

Rewrite the GiST insertion logic so that we don't need the post-recovery · 9de3aa65

Heikki Linnakangas authored 14 years ago

cleanup stage to finish incomplete inserts or splits anymore. There was two
reasons for the cleanup step:

1. When a new tuple was inserted to a leaf page, the downlink in the parent
needed to be updated to contain (ie. to be consistent with) the new key.
Updating the parent in turn might require recursively updating the parent of
the parent. We now handle that by updating the parent while traversing down
the tree, so that when we insert the leaf tuple, all the parents are already
consistent with the new key, and the tree is consistent at every step.

2. When a page is split, we need to insert the downlink for the new right
page(s), and update the downlink for the original page to not include keys
that moved to the right page(s). We now handle that by setting a new flag,
F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is
set, scans always follow the rightlink, regardless of the NSN mechanism used
to detect concurrent page splits. That way the tree is consistent right after
split, even though the downlink is still missing. This is very similar to the
way B-tree splits are handled. When the downlink is inserted in the parent,
the flag is cleared. To keep the insertion algorithm simple, when an
insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it
finishes the split before doing anything else.

These changes allow removing the whole "invalid tuple" mechanism, but I
retained the scan code to still follow invalid tuples correctly. While we
don't create any such tuples anymore, we want to handle them gracefully in
case you pg_upgrade a GiST index that has them. If we encounter any on an
insert, though, we just throw an error saying that you need to REINDEX.

The issue that got me into doing this is that if you did a checkpoint while
an insert or split was in progress, and the checkpoint finishes quickly so
that there is no WAL record related to the insert between RedoRecPtr and the
checkpoint record, recovery from that checkpoint would not know to finish
the incomplete insert. IOW, we have the same issue we solved with the
rm_safe_restartpoint mechanism during normal operation too. It's highly
unlikely to happen in practice, and this fix is far too large to backpatch,
so we're just going to live with in previous versions, but this refactoring
fixes it going forward.

With this patch, you don't get the annoying
'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices
anymore if you crash at an unfortunate moment.

9de3aa65

Nov 16, 2010

The GiST scan algorithm uses LSNs to detect concurrent pages splits, but · 2edc5cd4

Heikki Linnakangas authored 14 years ago

temporary indexes are not WAL-logged. We used a constant LSN for temporary
indexes, on the assumption that we don't need to worry about concurrent page
splits in temporary indexes because they're only visible to the current
session. But that assumption is wrong, it's possible to insert rows and
split pages in the same session, while a scan is in progress. For example,
by opening a cursor and fetching some rows, and INSERTing new rows before
fetching some more.

Fix by generating fake increasing LSNs, used in place of real LSNs in
temporary GiST indexes.

2edc5cd4

Sep 20, 2010
- Remove cvs keywords from all files. · 9f2e2113
  Magnus Hagander authored 14 years ago
  
  9f2e2113
Jan 02, 2010
- Update copyright for the year 2010. · 02398008
  Bruce Momjian authored 15 years ago
  
  02398008
Jun 11, 2009
- 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list · d7471402
  Bruce Momjian authored 15 years ago
  
  provided by Andrew.
  d7471402
Jan 05, 2009

Change the reloptions machinery to use a table-based parser, and provide · ba748f7a

Alvaro Herrera authored 16 years ago

a more complete framework for writing custom option processing routines
by user-defined access methods.

Catalog version bumped due to the general API changes, which are going to
affect user-defined "amoptions" routines.

ba748f7a

Jan 01, 2009
- Update copyright for 2009. · 511db38a
  Bruce Momjian authored 16 years ago
  
  511db38a
Sep 30, 2008

Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, the · 15c121b3

Heikki Linnakangas authored 16 years ago

free space information is stored in a dedicated FSM relation fork, with each
relation (except for hash indexes; they don't use FSM).

This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any
trace of them from the backend, initdb, and documentation.

Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also
introduce a new variant of the get_raw_page(regclass, int4, int4) function in
contrib/pageinspect that let's you to return pages from any relation fork, and
a new fsm_page_contents() function to inspect the new FSM pages.

15c121b3

Jul 13, 2008

Clean up the use of some page-header-access macros: principally, use · 9d035f42

Tom Lane authored 16 years ago

SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that
makes the code clearer, and avoid casting between Page and PageHeader where
possible. Zdenek Kotala, with some additional cleanup by Heikki Linnakangas.

I did not apply the parts of the proposed patch that would have resulted in
slightly changing the on-disk format of hash indexes; it seems to me that's
not a win as long as there's any chance of having in-place upgrade for 8.4.

9d035f42

Jun 19, 2008

Improve our #include situation by moving pointer types away from the · a3540b0f

Alvaro Herrera authored 16 years ago

corresponding struct definitions. This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.

a3540b0f

Jun 15, 2008
- Fix 64-bit problem in recent patch. · 71ff461a
  Tom Lane authored 16 years ago
  
  71ff461a
Jun 12, 2008

Refactor XLogOpenRelation() and XLogReadBuffer() in preparation for relation · a213f1ee

Heikki Linnakangas authored 16 years ago

forks. XLogOpenRelation() and the associated light-weight relation cache in
xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument,
instead of Relation.

For functions that still need a Relation struct during WAL replay, there's a
new function called CreateFakeRelcacheEntry() that returns a fake entry like
XLogOpenRelation() used to.

a213f1ee

May 12, 2008

Restructure some header files a bit, in particular heapam.h, by removing some · f8c4d7db

Alvaro Herrera authored 16 years ago

unnecessary #include lines in it.  Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.

For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.

While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.

f8c4d7db

Jan 01, 2008
- Update copyrights in source tree to 2008. · 9098ab9e
  Bruce Momjian authored 17 years ago
  
  9098ab9e
Sep 20, 2007

HOT updates. When we update a tuple without changing any of its indexed · 282d2a03

Tom Lane authored 17 years ago

columns, and the new version can be stored on the same heap page, we no longer
generate extra index entries for the new version. Instead, index searches
follow the HOT-chain links to ensure they find the correct tuple version.

In addition, this patch introduces the ability to "prune" dead tuples on a
per-page basis, without having to do a complete VACUUM pass to recover space.
VACUUM is still needed to clean up dead index entries, however.

Pavan Deolasee, with help from a bunch of other people.

282d2a03

Sep 13, 2007

Redefine the lp_flags field of item pointers as having four states, rather · 68893035

Tom Lane authored 17 years ago

than two independent bits (one of which was never used in heap pages anyway,
or at least hadn't been in a very long time). This gives us flexibility to
add the HOT notions of redirected and dead item pointers without requiring
anything so klugy as magic values of lp_off and lp_len. The state values
are chosen so that for the states currently in use (pre-HOT) there is no
change in the physical representation.

68893035