Commits · decb3bc07e27cf476299842db0ce1c33cfb42f53 · Jakob Huber / postgres-lambda-diff

Dec 03, 2012
- Update release notes for 9.2.2, 9.1.7, 9.0.11, 8.4.15, 8.3.22. · decb3bc0
  Tom Lane authored 12 years ago
  
  decb3bc0
- Revert "Add mode where contrib installcheck runs each module in a separately named database." · 7f7b49c4
  Andrew Dunstan authored 12 years ago
  
  This reverts commit 30248be9.
  7f7b49c4
- Avoid holding vmbuffer pin after VACUUM. · 17ee02ea
  Simon Riggs authored 12 years ago
  
  During VACUUM if we pause to perform a cycle of index cleanup we drop the vmbuffer pin, so we should do the same thing when heap scan completes. This avoids holding vmbuffer pin across the main index cleanup in VACUUM, which could be minutes or hours longer than necessary for correctness. Bug report and suggested fix from Pavan Deolasee
  17ee02ea
- Fix documentation of path(polygon) function. · 6a8b9987
  Tom Lane authored 12 years ago
  
  Obviously, this returns type "path", but somebody made a copy-and-pasteo long ago. Dagfinn Ilmari Mannsåker
  6a8b9987
- Translation updates · 562fca74
  Peter Eisentraut authored 12 years ago
  
  562fca74
Dec 02, 2012

Add mode where contrib installcheck runs each module in a separately named database. · 30248be9

Andrew Dunstan authored 12 years ago

Normally each module is tested in aq database named contrib_regression,
which is dropped and recreated at the beginhning of each pg_regress run.
This mode, enabled by adding USE_MODULE_DB=1 to the make command line,
runs most modules in a database with the module name embedded in it.

This will make testing pg_upgrade on clusters with the contrib modules
a lot easier.

Still to be done: adapt to the MSVC build system.

Backpatch to 9.0, which is the earliest version it is reasonably
possible to test upgrading from.

30248be9

Update time zone data files to tzdata release 2012j. · 842cb63f
Tom Lane authored 12 years ago
```
DST law changes in Cuba, Israel, Jordan, Libya, Palestine, Western Samoa,
and portions of Brazil.
```
842cb63f

Recommend triggers, not rules, in the CREATE VIEW reference page. · f7968554

Tom Lane authored 12 years ago

We've generally recommended use of INSTEAD triggers over rules since that
feature was added; but this old text in the CREATE VIEW reference page
didn't get the memo. Noted by Thomas Kellerer.

f7968554

Don't advance checkPoint.nextXid near the end of a checkpoint sequence. · aaceb0d6

Tom Lane authored 12 years ago

This reverts commit c1113069 in favor of
actually fixing the problem: namely, that we should never have been
modifying the checkpoint record's nextXid at this point to begin with.
The nextXid should match the state as of the checkpoint's logical WAL
position (ie the redo point), not the state as of its physical position.
It's especially bogus to advance it in some wal_levels and not others.
In any case there is no need for the checkpoint record to carry the
same nextXid shown in the XLOG_RUNNING_XACTS record just emitted by
LogStandbySnapshot, as any replay operation will already have adopted
that value as current.

This fixes bug #7710 from Tarvi Pillessaar, and probably also explains bug
#6291 from Daniel Farina, in that if a checkpoint were in progress at the
instant of XID wraparound, the epoch bump would be lost as reported.
(And, of course, these days there's at least a 50-50 chance of a checkpoint
being in progress at any given instant.)

Diagnosed by me and independently by Andres Freund.  Back-patch to all
branches supporting hot standby.

aaceb0d6

XidEpoch++ if wraparound during checkpoint. · c3f64ade

Simon Riggs authored 12 years ago

If wal_level = hot_standby we update the checkpoint nextxid,
though in the case where a wraparound occurred half-way through
a checkpoint we would neglect updating the epoch also. Updating
the nextxid is arguably the wrong thing to do, but changing that
may introduce subtle bugs into hot standby startup, while updating
the value doesn't cause any known bugs yet. Minimal fix now to
HEAD and backbranches, wider fix later in HEAD.

Bug reported in #6291 by Daniel Farina and slightly differently in

Cause analysis and recommended fixes from Tom Lane and Andres Freund.

Applied patch is minimal version of Andres Freund's work.

c3f64ade

Fix psql crash while parsing SQL file whose encoding is different from · 530cbf6b

Tatsuo Ishii authored 12 years ago

client encoding and the client encoding is not *safe* one. Such an
example is, file encoding is UTF-8 and client encoding SJIS. Patch
contributed by Jiang Guiqing.

530cbf6b

Dec 01, 2012

Prevent passing gmake's environment variables down through pg_regress. · 19a08ef7

Tom Lane authored 12 years ago

When we do "make install" to create a temp installation, we don't want
that instance of make to try to communicate with any instance of make
that might be calling us.  This is known to cause problems if the
upper make has a -jN flag, and in principle could cause problems even
without that.  Unset the relevant environment variables to prevent such
issues.

Andres Freund

19a08ef7

Make sure sharedir/extension/ directory is created when needed. · 53a1c312

Tom Lane authored 12 years ago

The previous coding worked as long as MODULEDIR wasn't set explicitly,
because we create sharedir/$(datamoduledir) and the default value of
that is "extension".  But if some other value is specified for MODULEDIR
then the installation directory needed for the control file wasn't made.

Cédric Villemain

53a1c312

doc: Fix broken links to DocBook wiki · af09efda
Peter Eisentraut authored 12 years ago

af09efda

Nov 30, 2012

Change test ExceptionalCondition to return void · f18c9510
Alvaro Herrera authored 12 years ago
```
Commit 81107282 changed it in assert.c, but overlooked this other file.
```
f18c9510
Take buffer lock while inspecting btree index pages in contrib/pageinspect. · 0adcaeef
Tom Lane authored 12 years ago
```
It's not safe to examine a shared buffer without any lock.
```
0adcaeef

Add missing buffer lock acquisition in GetTupleForTrigger(). · 1c316e3a

Tom Lane authored 12 years ago

If we had not been holding buffer pin continuously since the tuple was
initially fetched by the UPDATE or DELETE query, it would be possible for
VACUUM or a page-prune operation to move the tuple while we're trying to
copy it. This would result in a garbage "old" tuple value being passed to
an AFTER ROW UPDATE or AFTER ROW DELETE trigger. The preconditions for
this are somewhat improbable, and the timing constraints are very tight;
so it's not so surprising that this hasn't been reported from the field,
even though the bug has been there a long time.

Problem found by Andres Freund. Back-patch to all active branches.

1c316e3a

Clean environment for pg_upgrade test. · 2c55189b

Andrew Dunstan authored 12 years ago

This removes exisiting PG settings from the environment for
pg_upgrade tests, just like pg_regress does.

2c55189b

Produce a more useful error message for over-length Unix socket paths. · a8e60f0b

Tom Lane authored 12 years ago

The length of a socket path name is constrained by the size of struct
sockaddr_un, and there's not a lot we can do about it since that is a
kernel API. However, it would be a good thing if we produced an
intelligible error message when the user specifies a socket path that's too
long --- and getaddrinfo's standard API is too impoverished to do this in
the natural way. So insert explicit tests at the places where we construct
a socket path name. Now you'll get an error that makes sense and even
tells you what the limit is, rather than something generic like
"Non-recoverable failure in name resolution".

Per trouble report from Jeremy Drake and a fix idea from Andrew Dunstan.

a8e60f0b

Nov 29, 2012

Cleanup VirtualXact at end of Hot Standby · edfc84b8
Simon Riggs authored 12 years ago
```
Resolves bug 7572 reported by Daniele Varrazzo
```
edfc84b8
Correctly init fast path fields on PGPROC · fdac4e2b
Simon Riggs authored 12 years ago

fdac4e2b
When processing nested structure pointer variables ecpg always expected an · 44fe8ae9
Michael Meskes authored 12 years ago
```
array datatype which of course is wrong.

Applied patch by Muhammad Usama <m.usama@gmail.com> to fix this.
```
44fe8ae9

Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY. · 94c014b5

Tom Lane authored 12 years ago

Commit 8cb53654, which introduced DROP
INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor
choice of catalog state representation.  The pg_index state for an index
that's reached the final pre-drop stage was the same as the state for an
index just created by CREATE INDEX CONCURRENTLY.  This meant that the
(necessary) change to make RelationGetIndexList ignore about-to-die indexes
also made it ignore freshly-created indexes; which is catastrophic because
the latter do need to be considered in HOT-safety decisions.  Failure to
do so leads to incorrect index entries and subsequently wrong results from
queries depending on the concurrently-created index.

To fix, make the final state be indisvalid = true and indisready = false,
which is otherwise nonsensical.  This is pretty ugly but we can't add
another column without forcing initdb, and it's too late for that in 9.2.
(There's a cleaner fix in HEAD.)

In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index
flag changes they make without exclusive lock on the index are made via
heap_inplace_update() rather than a normal transactional update.  The
latter is not very safe because moving the pg_index tuple could result in
concurrent SnapshotNow scans finding it twice or not at all, thus possibly
resulting in index corruption.  This is a pre-existing bug in CREATE INDEX
CONCURRENTLY, which was copied into the DROP code.

In addition, fix various places in the code that ought to check to make
sure that the indexes they are manipulating are valid and/or ready as
appropriate.  These represent bugs that have existed since 8.2, since
a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
index behind, and we ought not try to do anything that might fail with
such an index.

Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
columns that are allowed to change after initial creation.  Previously we
could have been left with stale values of some fields in an index relcache
entry.  It's not clear whether this actually had any user-visible
consequences, but it's at least a bug waiting to happen.

In addition, do some code and docs review for DROP INDEX CONCURRENTLY;
some cosmetic code cleanup but mostly addition and revision of comments.

Portions of this need to be back-patched even further, but I'll work
on that separately.

Problem reported by Amit Kapila, diagnosis by Pavan Deolasee,
fix by Tom Lane and Andres Freund.

94c014b5

Nov 28, 2012

If we don't have a backup-end-location, don't claim we've reached it. · ffc3172e

Heikki Linnakangas authored 12 years ago

This was apparently a typo, which caused recovery to think that it
immediately reached the end of backup, and allowed the database to start
up too early.

Reported by Jeff Janes. Backpatch to 9.2, where this code was introduced.

ffc3172e

Nov 26, 2012

Revert patch for taking fewer snapshots. · 786afc1c

Tom Lane authored 12 years ago

This reverts commit d573e239, "Take fewer
snapshots". While that seemed like a good idea at the time, it caused
execution to use a snapshot that had been acquired before locking any of
the tables mentioned in the query. This created user-visible anomalies
that were not present in any prior release of Postgres, as reported by
Tomas Vondra. While this whole area could do with a redesign (since there
are related cases that have anomalies anyway), it doesn't seem likely that
any future patch would be reasonably back-patchable; and we don't want 9.2
to exhibit a behavior that's subtly unlike either past or future releases.
Hence, revert to prior code while we rethink the problem.

786afc1c

Fix SELECT DISTINCT with index-optimized MIN/MAX on inheritance trees. · eea6ada9

Tom Lane authored 12 years ago

In a query such as "SELECT DISTINCT min(x) FROM tab", the DISTINCT is
pretty useless (there being only one output row), but nonetheless it
shouldn't fail. But it could fail if "tab" is an inheritance parent,
because planagg.c's code for fixing up equivalence classes after making the
index-optimized MIN/MAX transformation wasn't prepared to find child-table
versions of the aggregate expression. The least ugly fix seems to be
to add an option to mutate_eclass_expressions() to skip child-table
equivalence class members, which aren't used anymore at this stage of
planning so it's not really necessary to fix them. Since child members
are ignored in many cases already, it seems plausible for
mutate_eclass_expressions() to have an option to ignore them too.

Per bug #7703 from Maxim Boguk.

Back-patch to 9.1. Although the same code exists before that, it cannot
encounter child-table aggregates AFAICS, because the index optimization
transformation cannot succeed on inheritance trees before 9.1 (for lack
of MergeAppend).

eea6ada9

Nov 23, 2012

pg_stat_replication.sync_state was displayed incorrectly at page boundary. · 3f7b04d6

Heikki Linnakangas authored 12 years ago

XLogRecPtrIsInvalid() only checks the xrecoff field, which is correct when
checking if a WAL record could legally begin at the given position, but WAL
sending can legally be paused at a page boundary, in which case xrecoff is
0. Use XLByteEQ(..., InvalidXLogRecPtr) instead, which checks that both
xlogid and xrecoff are 0.

9.3 doesn't have this problem because XLogRecPtr is now a single 64-bit
integer, so XLogRecPtrIsInvalid() does the right thing. Apply to 9.2, and
9.1 where pg_stat_replication view was introduced.

Kyotaro HORIGUCHI, reviewed by Fujii Masao.

3f7b04d6

Nov 22, 2012

Fix pg_resetxlog to use correct path to postmaster.pid. · 430b47f3

Tom Lane authored 12 years ago

Since we've already chdir'd into the data directory, the file should
be referenced as just "postmaster.pid", without prefixing the directory
path. This is harmless in the normal case where an absolute PGDATA path
is used, but quite dangerous if a relative path is specified, since the
program might then fail to notice an active postmaster.

Reported by Hari Babu. This got broken in my commit
eb5949d1, so patch all active versions.

430b47f3

Avoid bogus "out-of-sequence timeline ID" errors in standby-mode. · dda8b87b

Heikki Linnakangas authored 12 years ago

When startup process opens a WAL segment after replaying part of it, it
validates the first page on the WAL segment, even though the page it's
really interested in later in the file. As part of the validation, it checks
that the TLI on the page header is >= the TLI it saw on the last page it
read. If the segment contains a timeline switch, and we have already
replayed it, and then re-open the WAL segment (because of streaming
replication got disconnected and reconnected, for example), the TLI check
will fail when the first page is validated. Fix that by relaxing the TLI
check when re-opening a WAL segment.

Backpatch to 9.0. Earlier versions had the same code, but before standby
mode was introduced in 9.0, recovery never tried to re-read a segment after
partially replaying it.

Reported by Amit Kapila, while testing a new feature.

dda8b87b

Nov 21, 2012

Don't launch new child processes after we've been told to shut down. · 8af60da9

Tom Lane authored 12 years ago

Once we've received a shutdown signal (SIGINT or SIGTERM), we should not
launch any more child processes, even if we get signals requesting such.
The normal code path for spawning backends has always understood that,
but the postmaster's infrastructure for hot standby and autovacuum didn't
get the memo. As reported by Hari Babu in bug #7643, this could lead to
failure to shut down at all in some cases, such as when SIGINT is received
just before the startup process sends PMSIGNAL_RECOVERY_STARTED: we'd
launch a bgwriter and checkpointer, and then those processes would have no
idea that they ought to quit. Similarly, launching a new autovacuum worker
would result in waiting till it finished before shutting down.

Also, switch the order of the code blocks in reaper() that detect startup
process crash versus shutdown termination. Once we've sent it a signal,
we should not consider that exit(1) is surprising. This is just a cosmetic
fix since shutdown occurs correctly anyway, but better not to log a phony
complaint about startup process crash.

Back-patch to 9.0. Some parts of this might be applicable before that,
but given the lack of prior complaints I'm not going to worry too much
about older branches.

8af60da9

Nov 20, 2012

Improve handling of INT_MIN / -1 and related cases. · 278b6059

Tom Lane authored 12 years ago

Some platforms throw an exception for this division, rather than returning
a necessarily-overflowed result.  Since we were testing for overflow after
the fact, an exception isn't nice.  We can avoid the problem by treating
division by -1 as negation.

Add some regression tests so that we'll find out if any compilers try to
optimize away the overflow check conditions.

Back-patch of commit 1f7cb5c3.

Per discussion with Xi Wang, though this is different from the patch he
submitted.

278b6059

Nov 18, 2012

Limit values of archive_timeout, post_auth_delay, auth_delay.milliseconds. · 83d48a81

Tom Lane authored 12 years ago

The previous definitions of these GUC variables allowed them to range
up to INT_MAX, but in point of fact the underlying code would suffer
overflows or other errors with large values. Reduce the maximum values
to something that won't misbehave. There's no apparent value in working
harder than this, since very large delays aren't sensible for any of
these. (Note: the risk with archive_timeout is that if we're late
checking the state, the timestamp difference it's being compared to
might overflow. So we need some amount of slop; the choice of INT_MAX/2
is arbitrary.)

Per followup investigation of bug #7670. Although this isn't a very
significant fix, might as well back-patch.

83d48a81

Fix syslogger to not fail when log_rotation_age exceeds 2^31 milliseconds. · 89067bc1

Tom Lane authored 12 years ago

We need to avoid calling WaitLatch with timeouts exceeding INT_MAX.
Fortunately a simple clamp will do the trick, since no harm is done if
the wait times out before it's really time to rotate the log file.
Per bug #7670 (probably bug #7545 is the same thing, too).

In passing, fix bogus definition of log_rotation_age's maximum value in
guc.c --- it was numerically right, but only because MINS_PER_HOUR and
SECS_PER_MINUTE have the same value.

Back-patch to 9.2. Before that, syslogger wasn't using WaitLatch.

89067bc1

Nov 16, 2012

Improve check_partial_indexes() to consider join clauses in proof attempts. · f0461cd8

Tom Lane authored 12 years ago

Traditionally check_partial_indexes() has only looked at restriction
clauses while trying to prove partial indexes usable in queries. However,
join clauses can also be used in some cases; mainly, that a strict operator
on "x" proves an "x IS NOT NULL" index predicate, even if the operator is
in a join clause rather than a restriction clause. Adding this code fixes
a regression in 9.2, because previously we would take join clauses into
account when considering whether a partial index could be used in a
nestloop inner indexscan path. 9.2 doesn't handle nestloop inner
indexscans in the same way, and this consideration was overlooked in the
rewrite. Moving the work to check_partial_indexes() is a better solution
anyway, since the proof applies whether or not we actually use the index
in that particular way, and we don't have to do it over again for each
possible outer relation. Per report from Dave Cramer.

f0461cd8

Nov 14, 2012

Fix the int8 and int2 cases of (minimum possible integer) % (-1). · 3b4db79e

Tom Lane authored 12 years ago

The correct answer for this (or any other case with arg2 = -1) is zero,
but some machines throw a floating-point exception instead of behaving
sanely.  Commit f9ac414c dealt with this
in int4mod, but overlooked the fact that it also happens in int8mod
(at least on my Linux x86_64 machine).  Protect int2mod as well; it's
not clear whether any machines fail there (mine does not) but since the
test is so cheap it seems better safe than sorry.  While at it, simplify
the original guard in int4mod: we need only check for arg2 == -1, we
don't need to check arg1 explicitly.

Xi Wang, with some editing by me.

3b4db79e

Nov 13, 2012

Fix memory leaks in record_out() and record_send(). · 5355e39c

Tom Lane authored 12 years ago

record_out() leaks memory: it fails to free the strings returned by the
per-column output functions, and also is careless about detoasted values.
This results in a query-lifespan memory leakage when returning composite
values to the client, because printtup() runs the output functions in the
query-lifespan memory context.  Fix it to handle these issues the same way
printtup() does.  Also fix a similar leakage in record_send().

(At some point we might want to try to run output functions in
shorter-lived memory contexts, so that we don't need a zero-leakage policy
for them.  But that would be a significantly more invasive patch, which
doesn't seem like material for back-patching.)

In passing, use appendStringInfoCharMacro instead of appendStringInfoChar
in the innermost data-copying loop of record_out, to try to shave a few
cycles from this function's runtime.

Per trouble report from Carlos Henrique Reimer.  Back-patch to all
supported versions.

5355e39c

Skip searching for subxact locks at commit. · 31541778

Simon Riggs authored 12 years ago

At commit all standby locks are released
for the top-level transaction, so searching
for locks for each subtransaction is both
pointless and costly (N^2) in the presence
of many AccessExclusiveLocks.

31541778

Clarify docs on hot standby lock release · f66e7ab6
Simon Riggs authored 12 years ago
```
Andres Freund and Simon Riggs
```
f66e7ab6

Fix multiple problems in WAL replay. · 8805ff65

Tom Lane authored 12 years ago

Most of the replay functions for WAL record types that modify more than
one page failed to ensure that those pages were locked correctly to ensure
that concurrent queries could not see inconsistent page states. This is
a hangover from coding decisions made long before Hot Standby was added,
when it was hardly necessary to acquire buffer locks during WAL replay
at all, let alone hold them for carefully-chosen periods.

The key problem was that RestoreBkpBlocks was written to hold lock on each
page restored from a full-page image for only as long as it took to update
that page. This was guaranteed to break any WAL replay function in which
there was any update-ordering constraint between pages, because even if the
nominal order of the pages is the right one, any mixture of full-page and
non-full-page updates in the same record would result in out-of-order
updates. Moreover, it wouldn't work for situations where there's a
requirement to maintain lock on one page while updating another. Failure
to honor an update ordering constraint in this way is thought to be the
cause of bug #7648 from Daniel Farina: what seems to have happened there
is that a btree page being split was rewritten from a full-page image
before the new right sibling page was written, and because lock on the
original page was not maintained it was possible for hot standby queries to
try to traverse the page's right-link to the not-yet-existing sibling page.

To fix, get rid of RestoreBkpBlocks as such, and instead create a new
function RestoreBackupBlock that restores just one full-page image at a
time. This function can be invoked by WAL replay functions at the points
where they would otherwise perform non-full-page updates; in this way, the
physical order of page updates remains the same no matter which pages are
replaced by full-page images. We can then further adjust the logic in
individual replay functions if it is necessary to hold buffer locks
for overlapping periods. A side benefit is that we can simplify the
handling of concurrency conflict resolution by moving that code into the
record-type-specfic functions; there's no more need to contort the code
layout to keep conflict resolution in front of the RestoreBkpBlocks call.

In connection with that, standardize on zero-based numbering rather than
one-based numbering for referencing the full-page images. In HEAD, I
removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4. They are
still there in the header files in previous branches, but are no longer
used by the code.

In addition, fix some other bugs identified in the course of making these
changes:

spgRedoAddNode could fail to update the parent downlink at all, if the
parent tuple is in the same page as either the old or new split tuple and
we're not doing a full-page image: it would get fooled by the LSN having
been advanced already. This would result in permanent index corruption,
not just transient failure of concurrent queries.

Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old
tail page as a candidate for a full-page image; in the worst case this
could result in torn-page corruption.

heap_xlog_freeze() was inconsistent about using a cleanup lock or plain
exclusive lock: it did the former in the normal path but the latter for a
full-page image. A plain exclusive lock seems sufficient, so change to
that.

Also, remove gistRedoPageDeleteRecord(), which has been dead code since
VACUUM FULL was rewritten.

Back-patch to 9.0, where hot standby was introduced. Note however that 9.0
had a significantly different WAL-logging scheme for GIST index updates,
and it doesn't appear possible to make that scheme safe for concurrent hot
standby queries, because it can leave inconsistent states in the index even
between WAL records. Given the lack of complaints from the field, we won't
work too hard on fixing that branch.

8805ff65

Nov 12, 2012

Check for stack overflow in transformSetOperationTree(). · 454edf1d

Tom Lane authored 12 years ago

Since transformSetOperationTree() recurses, it can be driven to stack
overflow with enough UNION/INTERSECT/EXCEPT clauses in a query. Add a
check to ensure it fails cleanly instead of crashing. Per report from
Matthew Gerber (though it's not clear whether this is the only thing
going wrong for him).

Historical note: I think the reasoning behind not putting a check here in
the beginning was that the check in transformExpr() ought to be sufficient
to guard the whole parser. However, because transformSetOperationTree()
recurses all the way to the bottom of the set-operation tree before doing
any analysis of the statement's expressions, that check doesn't save it.

454edf1d