- Dec 03, 2012
-
-
Tom Lane authored
-
Andrew Dunstan authored
This reverts commit 30248be9.
-
Simon Riggs authored
During VACUUM if we pause to perform a cycle of index cleanup we drop the vmbuffer pin, so we should do the same thing when heap scan completes. This avoids holding vmbuffer pin across the main index cleanup in VACUUM, which could be minutes or hours longer than necessary for correctness. Bug report and suggested fix from Pavan Deolasee
-
Tom Lane authored
Obviously, this returns type "path", but somebody made a copy-and-pasteo long ago. Dagfinn Ilmari Mannsåker
-
Peter Eisentraut authored
-
- Dec 02, 2012
-
-
Andrew Dunstan authored
Normally each module is tested in aq database named contrib_regression, which is dropped and recreated at the beginhning of each pg_regress run. This mode, enabled by adding USE_MODULE_DB=1 to the make command line, runs most modules in a database with the module name embedded in it. This will make testing pg_upgrade on clusters with the contrib modules a lot easier. Still to be done: adapt to the MSVC build system. Backpatch to 9.0, which is the earliest version it is reasonably possible to test upgrading from.
-
Tom Lane authored
DST law changes in Cuba, Israel, Jordan, Libya, Palestine, Western Samoa, and portions of Brazil.
-
Tom Lane authored
We've generally recommended use of INSTEAD triggers over rules since that feature was added; but this old text in the CREATE VIEW reference page didn't get the memo. Noted by Thomas Kellerer.
-
Tom Lane authored
This reverts commit c1113069 in favor of actually fixing the problem: namely, that we should never have been modifying the checkpoint record's nextXid at this point to begin with. The nextXid should match the state as of the checkpoint's logical WAL position (ie the redo point), not the state as of its physical position. It's especially bogus to advance it in some wal_levels and not others. In any case there is no need for the checkpoint record to carry the same nextXid shown in the XLOG_RUNNING_XACTS record just emitted by LogStandbySnapshot, as any replay operation will already have adopted that value as current. This fixes bug #7710 from Tarvi Pillessaar, and probably also explains bug #6291 from Daniel Farina, in that if a checkpoint were in progress at the instant of XID wraparound, the epoch bump would be lost as reported. (And, of course, these days there's at least a 50-50 chance of a checkpoint being in progress at any given instant.) Diagnosed by me and independently by Andres Freund. Back-patch to all branches supporting hot standby.
-
Simon Riggs authored
If wal_level = hot_standby we update the checkpoint nextxid, though in the case where a wraparound occurred half-way through a checkpoint we would neglect updating the epoch also. Updating the nextxid is arguably the wrong thing to do, but changing that may introduce subtle bugs into hot standby startup, while updating the value doesn't cause any known bugs yet. Minimal fix now to HEAD and backbranches, wider fix later in HEAD. Bug reported in #6291 by Daniel Farina and slightly differently in Cause analysis and recommended fixes from Tom Lane and Andres Freund. Applied patch is minimal version of Andres Freund's work.
-
Tatsuo Ishii authored
client encoding and the client encoding is not *safe* one. Such an example is, file encoding is UTF-8 and client encoding SJIS. Patch contributed by Jiang Guiqing.
-
- Dec 01, 2012
-
-
Tom Lane authored
When we do "make install" to create a temp installation, we don't want that instance of make to try to communicate with any instance of make that might be calling us. This is known to cause problems if the upper make has a -jN flag, and in principle could cause problems even without that. Unset the relevant environment variables to prevent such issues. Andres Freund
-
Tom Lane authored
The previous coding worked as long as MODULEDIR wasn't set explicitly, because we create sharedir/$(datamoduledir) and the default value of that is "extension". But if some other value is specified for MODULEDIR then the installation directory needed for the control file wasn't made. Cédric Villemain
-
Peter Eisentraut authored
-
- Nov 30, 2012
-
-
Alvaro Herrera authored
Commit 81107282 changed it in assert.c, but overlooked this other file.
-
Tom Lane authored
It's not safe to examine a shared buffer without any lock.
-
Tom Lane authored
If we had not been holding buffer pin continuously since the tuple was initially fetched by the UPDATE or DELETE query, it would be possible for VACUUM or a page-prune operation to move the tuple while we're trying to copy it. This would result in a garbage "old" tuple value being passed to an AFTER ROW UPDATE or AFTER ROW DELETE trigger. The preconditions for this are somewhat improbable, and the timing constraints are very tight; so it's not so surprising that this hasn't been reported from the field, even though the bug has been there a long time. Problem found by Andres Freund. Back-patch to all active branches.
-
Andrew Dunstan authored
This removes exisiting PG settings from the environment for pg_upgrade tests, just like pg_regress does.
-
Tom Lane authored
The length of a socket path name is constrained by the size of struct sockaddr_un, and there's not a lot we can do about it since that is a kernel API. However, it would be a good thing if we produced an intelligible error message when the user specifies a socket path that's too long --- and getaddrinfo's standard API is too impoverished to do this in the natural way. So insert explicit tests at the places where we construct a socket path name. Now you'll get an error that makes sense and even tells you what the limit is, rather than something generic like "Non-recoverable failure in name resolution". Per trouble report from Jeremy Drake and a fix idea from Andrew Dunstan.
-
- Nov 29, 2012
-
-
Simon Riggs authored
Resolves bug 7572 reported by Daniele Varrazzo
-
Simon Riggs authored
-
Michael Meskes authored
array datatype which of course is wrong. Applied patch by Muhammad Usama <m.usama@gmail.com> to fix this.
-
Tom Lane authored
Commit 8cb53654, which introduced DROP INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor choice of catalog state representation. The pg_index state for an index that's reached the final pre-drop stage was the same as the state for an index just created by CREATE INDEX CONCURRENTLY. This meant that the (necessary) change to make RelationGetIndexList ignore about-to-die indexes also made it ignore freshly-created indexes; which is catastrophic because the latter do need to be considered in HOT-safety decisions. Failure to do so leads to incorrect index entries and subsequently wrong results from queries depending on the concurrently-created index. To fix, make the final state be indisvalid = true and indisready = false, which is otherwise nonsensical. This is pretty ugly but we can't add another column without forcing initdb, and it's too late for that in 9.2. (There's a cleaner fix in HEAD.) In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index flag changes they make without exclusive lock on the index are made via heap_inplace_update() rather than a normal transactional update. The latter is not very safe because moving the pg_index tuple could result in concurrent SnapshotNow scans finding it twice or not at all, thus possibly resulting in index corruption. This is a pre-existing bug in CREATE INDEX CONCURRENTLY, which was copied into the DROP code. In addition, fix various places in the code that ought to check to make sure that the indexes they are manipulating are valid and/or ready as appropriate. These represent bugs that have existed since 8.2, since a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid index behind, and we ought not try to do anything that might fail with such an index. Also fix RelationReloadIndexInfo to ensure it copies all the pg_index columns that are allowed to change after initial creation. Previously we could have been left with stale values of some fields in an index relcache entry. It's not clear whether this actually had any user-visible consequences, but it's at least a bug waiting to happen. In addition, do some code and docs review for DROP INDEX CONCURRENTLY; some cosmetic code cleanup but mostly addition and revision of comments. Portions of this need to be back-patched even further, but I'll work on that separately. Problem reported by Amit Kapila, diagnosis by Pavan Deolasee, fix by Tom Lane and Andres Freund.
-
- Nov 28, 2012
-
-
Heikki Linnakangas authored
This was apparently a typo, which caused recovery to think that it immediately reached the end of backup, and allowed the database to start up too early. Reported by Jeff Janes. Backpatch to 9.2, where this code was introduced.
-
- Nov 26, 2012
-
-
Tom Lane authored
This reverts commit d573e239, "Take fewer snapshots". While that seemed like a good idea at the time, it caused execution to use a snapshot that had been acquired before locking any of the tables mentioned in the query. This created user-visible anomalies that were not present in any prior release of Postgres, as reported by Tomas Vondra. While this whole area could do with a redesign (since there are related cases that have anomalies anyway), it doesn't seem likely that any future patch would be reasonably back-patchable; and we don't want 9.2 to exhibit a behavior that's subtly unlike either past or future releases. Hence, revert to prior code while we rethink the problem.
-
Tom Lane authored
In a query such as "SELECT DISTINCT min(x) FROM tab", the DISTINCT is pretty useless (there being only one output row), but nonetheless it shouldn't fail. But it could fail if "tab" is an inheritance parent, because planagg.c's code for fixing up equivalence classes after making the index-optimized MIN/MAX transformation wasn't prepared to find child-table versions of the aggregate expression. The least ugly fix seems to be to add an option to mutate_eclass_expressions() to skip child-table equivalence class members, which aren't used anymore at this stage of planning so it's not really necessary to fix them. Since child members are ignored in many cases already, it seems plausible for mutate_eclass_expressions() to have an option to ignore them too. Per bug #7703 from Maxim Boguk. Back-patch to 9.1. Although the same code exists before that, it cannot encounter child-table aggregates AFAICS, because the index optimization transformation cannot succeed on inheritance trees before 9.1 (for lack of MergeAppend).
-
- Nov 23, 2012
-
-
Heikki Linnakangas authored
XLogRecPtrIsInvalid() only checks the xrecoff field, which is correct when checking if a WAL record could legally begin at the given position, but WAL sending can legally be paused at a page boundary, in which case xrecoff is 0. Use XLByteEQ(..., InvalidXLogRecPtr) instead, which checks that both xlogid and xrecoff are 0. 9.3 doesn't have this problem because XLogRecPtr is now a single 64-bit integer, so XLogRecPtrIsInvalid() does the right thing. Apply to 9.2, and 9.1 where pg_stat_replication view was introduced. Kyotaro HORIGUCHI, reviewed by Fujii Masao.
-
- Nov 22, 2012
-
-
Tom Lane authored
Since we've already chdir'd into the data directory, the file should be referenced as just "postmaster.pid", without prefixing the directory path. This is harmless in the normal case where an absolute PGDATA path is used, but quite dangerous if a relative path is specified, since the program might then fail to notice an active postmaster. Reported by Hari Babu. This got broken in my commit eb5949d1, so patch all active versions.
-
Heikki Linnakangas authored
When startup process opens a WAL segment after replaying part of it, it validates the first page on the WAL segment, even though the page it's really interested in later in the file. As part of the validation, it checks that the TLI on the page header is >= the TLI it saw on the last page it read. If the segment contains a timeline switch, and we have already replayed it, and then re-open the WAL segment (because of streaming replication got disconnected and reconnected, for example), the TLI check will fail when the first page is validated. Fix that by relaxing the TLI check when re-opening a WAL segment. Backpatch to 9.0. Earlier versions had the same code, but before standby mode was introduced in 9.0, recovery never tried to re-read a segment after partially replaying it. Reported by Amit Kapila, while testing a new feature.
-
- Nov 21, 2012
-
-
Tom Lane authored
Once we've received a shutdown signal (SIGINT or SIGTERM), we should not launch any more child processes, even if we get signals requesting such. The normal code path for spawning backends has always understood that, but the postmaster's infrastructure for hot standby and autovacuum didn't get the memo. As reported by Hari Babu in bug #7643, this could lead to failure to shut down at all in some cases, such as when SIGINT is received just before the startup process sends PMSIGNAL_RECOVERY_STARTED: we'd launch a bgwriter and checkpointer, and then those processes would have no idea that they ought to quit. Similarly, launching a new autovacuum worker would result in waiting till it finished before shutting down. Also, switch the order of the code blocks in reaper() that detect startup process crash versus shutdown termination. Once we've sent it a signal, we should not consider that exit(1) is surprising. This is just a cosmetic fix since shutdown occurs correctly anyway, but better not to log a phony complaint about startup process crash. Back-patch to 9.0. Some parts of this might be applicable before that, but given the lack of prior complaints I'm not going to worry too much about older branches.
-
- Nov 20, 2012
-
-
Tom Lane authored
Some platforms throw an exception for this division, rather than returning a necessarily-overflowed result. Since we were testing for overflow after the fact, an exception isn't nice. We can avoid the problem by treating division by -1 as negation. Add some regression tests so that we'll find out if any compilers try to optimize away the overflow check conditions. Back-patch of commit 1f7cb5c3. Per discussion with Xi Wang, though this is different from the patch he submitted.
-
- Nov 18, 2012
-
-
Tom Lane authored
The previous definitions of these GUC variables allowed them to range up to INT_MAX, but in point of fact the underlying code would suffer overflows or other errors with large values. Reduce the maximum values to something that won't misbehave. There's no apparent value in working harder than this, since very large delays aren't sensible for any of these. (Note: the risk with archive_timeout is that if we're late checking the state, the timestamp difference it's being compared to might overflow. So we need some amount of slop; the choice of INT_MAX/2 is arbitrary.) Per followup investigation of bug #7670. Although this isn't a very significant fix, might as well back-patch.
-
Tom Lane authored
We need to avoid calling WaitLatch with timeouts exceeding INT_MAX. Fortunately a simple clamp will do the trick, since no harm is done if the wait times out before it's really time to rotate the log file. Per bug #7670 (probably bug #7545 is the same thing, too). In passing, fix bogus definition of log_rotation_age's maximum value in guc.c --- it was numerically right, but only because MINS_PER_HOUR and SECS_PER_MINUTE have the same value. Back-patch to 9.2. Before that, syslogger wasn't using WaitLatch.
-
- Nov 16, 2012
-
-
Tom Lane authored
Traditionally check_partial_indexes() has only looked at restriction clauses while trying to prove partial indexes usable in queries. However, join clauses can also be used in some cases; mainly, that a strict operator on "x" proves an "x IS NOT NULL" index predicate, even if the operator is in a join clause rather than a restriction clause. Adding this code fixes a regression in 9.2, because previously we would take join clauses into account when considering whether a partial index could be used in a nestloop inner indexscan path. 9.2 doesn't handle nestloop inner indexscans in the same way, and this consideration was overlooked in the rewrite. Moving the work to check_partial_indexes() is a better solution anyway, since the proof applies whether or not we actually use the index in that particular way, and we don't have to do it over again for each possible outer relation. Per report from Dave Cramer.
-
- Nov 14, 2012
-
-
Tom Lane authored
The correct answer for this (or any other case with arg2 = -1) is zero, but some machines throw a floating-point exception instead of behaving sanely. Commit f9ac414c dealt with this in int4mod, but overlooked the fact that it also happens in int8mod (at least on my Linux x86_64 machine). Protect int2mod as well; it's not clear whether any machines fail there (mine does not) but since the test is so cheap it seems better safe than sorry. While at it, simplify the original guard in int4mod: we need only check for arg2 == -1, we don't need to check arg1 explicitly. Xi Wang, with some editing by me.
-
- Nov 13, 2012
-
-
Tom Lane authored
record_out() leaks memory: it fails to free the strings returned by the per-column output functions, and also is careless about detoasted values. This results in a query-lifespan memory leakage when returning composite values to the client, because printtup() runs the output functions in the query-lifespan memory context. Fix it to handle these issues the same way printtup() does. Also fix a similar leakage in record_send(). (At some point we might want to try to run output functions in shorter-lived memory contexts, so that we don't need a zero-leakage policy for them. But that would be a significantly more invasive patch, which doesn't seem like material for back-patching.) In passing, use appendStringInfoCharMacro instead of appendStringInfoChar in the innermost data-copying loop of record_out, to try to shave a few cycles from this function's runtime. Per trouble report from Carlos Henrique Reimer. Back-patch to all supported versions.
-
Simon Riggs authored
At commit all standby locks are released for the top-level transaction, so searching for locks for each subtransaction is both pointless and costly (N^2) in the presence of many AccessExclusiveLocks.
-
Simon Riggs authored
Andres Freund and Simon Riggs
-
Tom Lane authored
Most of the replay functions for WAL record types that modify more than one page failed to ensure that those pages were locked correctly to ensure that concurrent queries could not see inconsistent page states. This is a hangover from coding decisions made long before Hot Standby was added, when it was hardly necessary to acquire buffer locks during WAL replay at all, let alone hold them for carefully-chosen periods. The key problem was that RestoreBkpBlocks was written to hold lock on each page restored from a full-page image for only as long as it took to update that page. This was guaranteed to break any WAL replay function in which there was any update-ordering constraint between pages, because even if the nominal order of the pages is the right one, any mixture of full-page and non-full-page updates in the same record would result in out-of-order updates. Moreover, it wouldn't work for situations where there's a requirement to maintain lock on one page while updating another. Failure to honor an update ordering constraint in this way is thought to be the cause of bug #7648 from Daniel Farina: what seems to have happened there is that a btree page being split was rewritten from a full-page image before the new right sibling page was written, and because lock on the original page was not maintained it was possible for hot standby queries to try to traverse the page's right-link to the not-yet-existing sibling page. To fix, get rid of RestoreBkpBlocks as such, and instead create a new function RestoreBackupBlock that restores just one full-page image at a time. This function can be invoked by WAL replay functions at the points where they would otherwise perform non-full-page updates; in this way, the physical order of page updates remains the same no matter which pages are replaced by full-page images. We can then further adjust the logic in individual replay functions if it is necessary to hold buffer locks for overlapping periods. A side benefit is that we can simplify the handling of concurrency conflict resolution by moving that code into the record-type-specfic functions; there's no more need to contort the code layout to keep conflict resolution in front of the RestoreBkpBlocks call. In connection with that, standardize on zero-based numbering rather than one-based numbering for referencing the full-page images. In HEAD, I removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4. They are still there in the header files in previous branches, but are no longer used by the code. In addition, fix some other bugs identified in the course of making these changes: spgRedoAddNode could fail to update the parent downlink at all, if the parent tuple is in the same page as either the old or new split tuple and we're not doing a full-page image: it would get fooled by the LSN having been advanced already. This would result in permanent index corruption, not just transient failure of concurrent queries. Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old tail page as a candidate for a full-page image; in the worst case this could result in torn-page corruption. heap_xlog_freeze() was inconsistent about using a cleanup lock or plain exclusive lock: it did the former in the normal path but the latter for a full-page image. A plain exclusive lock seems sufficient, so change to that. Also, remove gistRedoPageDeleteRecord(), which has been dead code since VACUUM FULL was rewritten. Back-patch to 9.0, where hot standby was introduced. Note however that 9.0 had a significantly different WAL-logging scheme for GIST index updates, and it doesn't appear possible to make that scheme safe for concurrent hot standby queries, because it can leave inconsistent states in the index even between WAL records. Given the lack of complaints from the field, we won't work too hard on fixing that branch.
-
- Nov 12, 2012
-
-
Tom Lane authored
Since transformSetOperationTree() recurses, it can be driven to stack overflow with enough UNION/INTERSECT/EXCEPT clauses in a query. Add a check to ensure it fails cleanly instead of crashing. Per report from Matthew Gerber (though it's not clear whether this is the only thing going wrong for him). Historical note: I think the reasoning behind not putting a check here in the beginning was that the check in transformExpr() ought to be sufficient to guard the whole parser. However, because transformSetOperationTree() recurses all the way to the bottom of the set-operation tree before doing any analysis of the statement's expressions, that check doesn't save it.
-