- Nov 20, 2012
-
-
Tom Lane authored
Some platforms throw an exception for this division, rather than returning a necessarily-overflowed result. Since we were testing for overflow after the fact, an exception isn't nice. We can avoid the problem by treating division by -1 as negation. Add some regression tests so that we'll find out if any compilers try to optimize away the overflow check conditions. Back-patch of commit 1f7cb5c3. Per discussion with Xi Wang, though this is different from the patch he submitted.
-
- Nov 18, 2012
-
-
Tom Lane authored
The previous definitions of these GUC variables allowed them to range up to INT_MAX, but in point of fact the underlying code would suffer overflows or other errors with large values. Reduce the maximum values to something that won't misbehave. There's no apparent value in working harder than this, since very large delays aren't sensible for any of these. (Note: the risk with archive_timeout is that if we're late checking the state, the timestamp difference it's being compared to might overflow. So we need some amount of slop; the choice of INT_MAX/2 is arbitrary.) Per followup investigation of bug #7670. Although this isn't a very significant fix, might as well back-patch.
-
Tom Lane authored
We need to avoid calling WaitLatch with timeouts exceeding INT_MAX. Fortunately a simple clamp will do the trick, since no harm is done if the wait times out before it's really time to rotate the log file. Per bug #7670 (probably bug #7545 is the same thing, too). In passing, fix bogus definition of log_rotation_age's maximum value in guc.c --- it was numerically right, but only because MINS_PER_HOUR and SECS_PER_MINUTE have the same value. Back-patch to 9.2. Before that, syslogger wasn't using WaitLatch.
-
- Nov 16, 2012
-
-
Tom Lane authored
Traditionally check_partial_indexes() has only looked at restriction clauses while trying to prove partial indexes usable in queries. However, join clauses can also be used in some cases; mainly, that a strict operator on "x" proves an "x IS NOT NULL" index predicate, even if the operator is in a join clause rather than a restriction clause. Adding this code fixes a regression in 9.2, because previously we would take join clauses into account when considering whether a partial index could be used in a nestloop inner indexscan path. 9.2 doesn't handle nestloop inner indexscans in the same way, and this consideration was overlooked in the rewrite. Moving the work to check_partial_indexes() is a better solution anyway, since the proof applies whether or not we actually use the index in that particular way, and we don't have to do it over again for each possible outer relation. Per report from Dave Cramer.
-
- Nov 14, 2012
-
-
Tom Lane authored
The correct answer for this (or any other case with arg2 = -1) is zero, but some machines throw a floating-point exception instead of behaving sanely. Commit f9ac414c dealt with this in int4mod, but overlooked the fact that it also happens in int8mod (at least on my Linux x86_64 machine). Protect int2mod as well; it's not clear whether any machines fail there (mine does not) but since the test is so cheap it seems better safe than sorry. While at it, simplify the original guard in int4mod: we need only check for arg2 == -1, we don't need to check arg1 explicitly. Xi Wang, with some editing by me.
-
- Nov 13, 2012
-
-
Tom Lane authored
record_out() leaks memory: it fails to free the strings returned by the per-column output functions, and also is careless about detoasted values. This results in a query-lifespan memory leakage when returning composite values to the client, because printtup() runs the output functions in the query-lifespan memory context. Fix it to handle these issues the same way printtup() does. Also fix a similar leakage in record_send(). (At some point we might want to try to run output functions in shorter-lived memory contexts, so that we don't need a zero-leakage policy for them. But that would be a significantly more invasive patch, which doesn't seem like material for back-patching.) In passing, use appendStringInfoCharMacro instead of appendStringInfoChar in the innermost data-copying loop of record_out, to try to shave a few cycles from this function's runtime. Per trouble report from Carlos Henrique Reimer. Back-patch to all supported versions.
-
Simon Riggs authored
At commit all standby locks are released for the top-level transaction, so searching for locks for each subtransaction is both pointless and costly (N^2) in the presence of many AccessExclusiveLocks.
-
Simon Riggs authored
Andres Freund and Simon Riggs
-
Tom Lane authored
Most of the replay functions for WAL record types that modify more than one page failed to ensure that those pages were locked correctly to ensure that concurrent queries could not see inconsistent page states. This is a hangover from coding decisions made long before Hot Standby was added, when it was hardly necessary to acquire buffer locks during WAL replay at all, let alone hold them for carefully-chosen periods. The key problem was that RestoreBkpBlocks was written to hold lock on each page restored from a full-page image for only as long as it took to update that page. This was guaranteed to break any WAL replay function in which there was any update-ordering constraint between pages, because even if the nominal order of the pages is the right one, any mixture of full-page and non-full-page updates in the same record would result in out-of-order updates. Moreover, it wouldn't work for situations where there's a requirement to maintain lock on one page while updating another. Failure to honor an update ordering constraint in this way is thought to be the cause of bug #7648 from Daniel Farina: what seems to have happened there is that a btree page being split was rewritten from a full-page image before the new right sibling page was written, and because lock on the original page was not maintained it was possible for hot standby queries to try to traverse the page's right-link to the not-yet-existing sibling page. To fix, get rid of RestoreBkpBlocks as such, and instead create a new function RestoreBackupBlock that restores just one full-page image at a time. This function can be invoked by WAL replay functions at the points where they would otherwise perform non-full-page updates; in this way, the physical order of page updates remains the same no matter which pages are replaced by full-page images. We can then further adjust the logic in individual replay functions if it is necessary to hold buffer locks for overlapping periods. A side benefit is that we can simplify the handling of concurrency conflict resolution by moving that code into the record-type-specfic functions; there's no more need to contort the code layout to keep conflict resolution in front of the RestoreBkpBlocks call. In connection with that, standardize on zero-based numbering rather than one-based numbering for referencing the full-page images. In HEAD, I removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4. They are still there in the header files in previous branches, but are no longer used by the code. In addition, fix some other bugs identified in the course of making these changes: spgRedoAddNode could fail to update the parent downlink at all, if the parent tuple is in the same page as either the old or new split tuple and we're not doing a full-page image: it would get fooled by the LSN having been advanced already. This would result in permanent index corruption, not just transient failure of concurrent queries. Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old tail page as a candidate for a full-page image; in the worst case this could result in torn-page corruption. heap_xlog_freeze() was inconsistent about using a cleanup lock or plain exclusive lock: it did the former in the normal path but the latter for a full-page image. A plain exclusive lock seems sufficient, so change to that. Also, remove gistRedoPageDeleteRecord(), which has been dead code since VACUUM FULL was rewritten. Back-patch to 9.0, where hot standby was introduced. Note however that 9.0 had a significantly different WAL-logging scheme for GIST index updates, and it doesn't appear possible to make that scheme safe for concurrent hot standby queries, because it can leave inconsistent states in the index even between WAL records. Given the lack of complaints from the field, we won't work too hard on fixing that branch.
-
- Nov 12, 2012
-
-
Tom Lane authored
Since transformSetOperationTree() recurses, it can be driven to stack overflow with enough UNION/INTERSECT/EXCEPT clauses in a query. Add a check to ensure it fails cleanly instead of crashing. Per report from Matthew Gerber (though it's not clear whether this is the only thing going wrong for him). Historical note: I think the reasoning behind not putting a check here in the beginning was that the check in transformExpr() ought to be sufficient to guard the whole parser. However, because transformSetOperationTree() recurses all the way to the bottom of the set-operation tree before doing any analysis of the statement's expressions, that check doesn't save it.
-
- Nov 09, 2012
-
-
Tom Lane authored
If the sleep is interrupted by a signal, we must recompute the remaining time to wait; otherwise, a steady stream of non-wait-terminating interrupts could delay return from WaitLatch indefinitely. This has been shown to be a problem for the autovacuum launcher, and there may well be other places now or in the future with similar issues. So we'd better make the function robust, even though this'll add at least one gettimeofday call per wait. Back-patch to 9.2. We might eventually need to fix 9.1 as well, but the code is quite different there, and the usage of WaitLatch in 9.1 is so limited that it's not clearly important to do so. Reported and diagnosed by Jeff Janes, though I rewrote his patch rather heavily.
-
- Nov 08, 2012
-
-
Tom Lane authored
The trigger and rule cases need to split up the input name list, but they mustn't corrupt the passed-in data structure, since it could be part of a cached utility-statement parsetree. Per bug #7641.
-
- Nov 07, 2012
-
-
Alvaro Herrera authored
Commit 4c9d0901 mistakenly introduced a call to TransferPredicateLocksToHeapRelation() on an index relation that had been closed a few lines above. Moving up an index_open() call that's below is enough to fix the problem. Discovered by me while testing an unrelated patch.
-
- Nov 05, 2012
-
-
Tom Lane authored
This case got broken in 8.4 by the addition of an error check that complains if ALTER TABLE ONLY is used on a table that has children. We do use ONLY for this situation, but it's okay because the necessary recursion occurs at a higher level. So we need to have a separate flag to suppress recursion without making the error check. Reported and patched by Pavan Deolasee, with some editorial adjustments by me. Back-patch to 8.4, since this is a regression of functionality that worked in earlier branches.
-
- Nov 02, 2012
-
- Nov 01, 2012
-
-
Tom Lane authored
In bug #7626, Brian Dunavant exposes a performance problem created by commit 3b8968f2: that commit attempted to consider *all* possible combinations of indexable join clauses, but if said clauses join to enough different relations, there's an exponential increase in the number of outer-relation sets considered. In Brian's example, all the clauses come from the same equivalence class, which means it's redundant to use more than one of them in an indexscan anyway. So we can prevent the problem in this class of cases (which is probably the majority of real examples) by rejecting combinations that would only serve to add a known-redundant clause. But that still leaves us exposed to exponential growth of planning time when the query has a lot of non-equivalence join clauses that are usable with the same index. I chose to prevent such cases by setting an upper limit on the number of relation sets considered, equal to ten times the number of index clauses considered so far. (This sliding limit still allows new relsets to be added on as we move to additional index columns, which is probably more important than considering even more combinations of clauses for the previous column.) This should keep the amount of work done roughly linear rather than exponential in the apparent query complexity. This part of the fix is pretty ad-hoc; but without a clearer idea of real-world cases for which this would result in markedly inferior plans, it's hard to see how to do better.
-
- Oct 31, 2012
-
-
Alvaro Herrera authored
In its original conception, it was leaving some objects into the old schema, but without their proper pg_depend entries; this meant that the old schema could be dropped, causing future pg_dump calls to fail on the affected database. This was originally reported by Jeff Frost as #6704; there have been other complaints elsewhere that can probably be traced to this bug. To fix, be more consistent about altering a table's subsidiary objects along the table itself; this requires some restructuring in how tables are relocated when altering an extension -- hence the new AlterTableNamespaceInternal routine which encapsulates it for both the ALTER TABLE and the ALTER EXTENSION cases. There was another bug lurking here, which was unmasked after fixing the previous one: certain objects would be reached twice via the dependency graph, and the second attempt to move them would cause the entire operation to fail. Per discussion, it seems the best fix for this is to do more careful tracking of objects already moved: we now maintain a list of moved objects, to avoid attempting to do it twice for the same object. Authors: Alvaro Herrera, Dimitri Fontaine Reviewed by Tom Lane
-
- Oct 26, 2012
-
-
Tom Lane authored
generate_base_implied_equalities_const() should prefer plain Consts over other em_is_const eclass members when choosing the "pivot" value that all the other members will be equated to. This makes it more likely that the generated equalities will be useful in constraint-exclusion proofs. Per report from Rushabh Lathia.
-
Tom Lane authored
Represent a sequence's current value as a separate TableDataInfo dumpable object, so that it can be dumped within the data section of the archive rather than in pre-data. This fixes an undesirable inconsistency between the meanings of "--data-only" and "--section=data", and also fixes dumping of sequences that are marked as extension configuration tables, as per a report from Marko Kreen back in July. The main cost is that we do one more SQL query per sequence, but that's probably not very meaningful in most databases. Back-patch to 9.1, since it has the extension configuration issue even though not the --section switch.
-
- Oct 24, 2012
-
-
Tom Lane authored
Views should not have any pg_attribute entries for system columns. However, we forgot to remove such entries when converting a table to a view. This could lead to crashes later on, if someone attempted to reference such a column, as reported by Kohei KaiGai. This problem is corrected properly in HEAD (by removing the pg_attribute entries during conversion), but in the back branches we need to defend against existing mis-converted views. This fix costs us an extra syscache lookup per system column reference, which is annoying but probably not really measurable in the big scheme of things.
-
- Oct 22, 2012
-
-
Kevin Grittner authored
For the non-concurrent case there is an AccessExclusiveLock lock on both the index and the heap at a time during which no other process is using either, before which the index is maintained and used for scans, and after which the index is no longer used or maintained. Predicate locks can safely be moved from the index to the related heap relation under the protection of these locks. This was done prior to the introductin of DROP INDEX CONCURRENTLY and continues to be done for non-concurrent index drops. For concurrent index drops, the predicate locks must be moved when there are no index scans in progress on that index and no more can subsequently start, and before heap inserts stop maintaining the index. As long as these conditions are guaranteed when the TransferPredicateLocksToHeapRelation() function is called, stronger locks are not needed for correctness. Kevin Grittner based on questions by Tom Lane in reviewing the DROP INDEX CONCURRENTLY patch and in cooperation with Andres Freund and Simon Riggs. Back-patch of commit 4c9d0901
-
- Oct 20, 2012
-
-
Tom Lane authored
In commit 4317e024, I accidentally broke this behavior while rearranging code to ensure that --create wouldn't affect whether a DATABASE entry gets put into archive-format output. Thus, 9.2 would issue a DROP DATABASE command in --clean mode, which is either useless or dangerous depending on the usage scenario. It should not do that, and no longer does. A bright spot is that this refactoring makes it easy to allow the combination of --clean and --create to work sensibly, ie, emit DROP DATABASE then CREATE DATABASE before reconnecting. Ordinarily we'd consider that a feature addition and not back-patch it, but it seems silly to not include the extra couple of lines required in the 9.2 version of the code. Per report from Guillaume Lelarge, though this is slightly more extensive than his proposed patch.
-
Tom Lane authored
The code seems to have been written to handle the pre-parse-analysis representation, where an ExecuteStmt would appear directly under CreateTableAsStmt. But in reality the function is only run on already-parse-analyzed statements, so there will be a Query node in between. We'd not noticed the bug because the function is generally not used at all except in extended query protocol. Per report from Robert Haas and Rushabh Lathia.
-
- Oct 19, 2012
-
-
Tom Lane authored
An out-of-memory error during expand_table() on a palloc-based hash table would leave a partially-initialized entry in the table. This would not be harmful for transient hash tables, since they'd get thrown away anyway at transaction abort. But for long-lived hash tables, such as the relcache hash, this would effectively corrupt the table, leading to crash or other misbehavior later. To fix, rearrange the order of operations so that table enlargement is attempted before we insert a new entry, rather than after adding it to the hash table. Problem discovered by Hitoshi Harada, though this is a bit different from his proposed patch.
-
Tom Lane authored
Per bug #7615 from Marko Tiikkaja. Apparently nobody ever tried this case before ...
-
Simon Riggs authored
Canceling DROP INDEX CONCURRENTLY during wait could allow an orphaned index to be left behind which could not be dropped. Backpatch to 9.2 Andres Freund, tested by Abhijit Menon-Sen
-
- Oct 18, 2012
-
-
Heikki Linnakangas authored
Don't leak a file descriptor if the file is empty or we can't read its size. Expect there to be a newline at the end of the last line, too. If there isn't, ignore anything after the last newline. This makes it a tiny bit more robust in case the file is appended to concurrently, so that we don't return the last line if it hasn't been fully written yet. And this makes the code a bit less obscure, anyway. Per Tom Lane's suggestion. Backpatch to all supported branches.
-
Simon Riggs authored
for recent concurrent changes. Abhijit Menon-Sen
-
Simon Riggs authored
Concurrent behaviour was flawed when using a two-step process, so add an additional phase of processing to ensure concurrency for both SELECTs and INSERT/UPDATE/DELETEs. Backpatch to 9.2 Andres Freund, tweaked by me
-
Tom Lane authored
If a potential equivalence clause references a variable from the nullable side of an outer join, the planner needs to take care that derived clauses are not pushed to below the outer join; else they may use the wrong value for the variable. (The problem arises only with non-strict clauses, since if an upper clause can be proven strict then the outer join will get simplified to a plain join.) The planner attempted to prevent this type of error by checking that potential equivalence clauses aren't outerjoin-delayed as a whole, but actually we have to check each side separately, since the two sides of the clause will get moved around separately if it's treated as an equivalence. Bugs of this type can be demonstrated as far back as 7.4, even though releases before 8.3 had only a very ad-hoc notion of equivalence clauses. In addition, we neglected to account for the possibility that such clauses might have nonempty nullable_relids even when not outerjoin-delayed; so the equivalence-class machinery lacked logic to compute correct nullable_relids values for clauses it constructs. This oversight was harmless before 9.2 because we were only using RestrictInfo.nullable_relids for OR clauses; but as of 9.2 it could result in pushing constructed equivalence clauses to incorrect places. (This accounts for bug #7604 from Bill MacArthur.) Fix the first problem by adding a new test check_equivalence_delay() in distribute_qual_to_rels, and fix the second one by adding code in equivclass.c and called functions to set correct nullable_relids for generated clauses. Although I believe the second part of this is not currently necessary before 9.2, I chose to back-patch it anyway, partly to keep the logic similar across branches and partly because it seems possible we might find other reasons why we need valid values of nullable_relids in the older branches. Add regression tests illustrating these problems. In 9.0 and up, also add test cases checking that we can push constants through outer joins, since we've broken that optimization before and I nearly broke it again with an overly simplistic patch for this problem.
-
Simon Riggs authored
-
Simon Riggs authored
Backpatch to 9.2 to ensure bugs are fixed. Abhijit Menon-Sen
-
- Oct 17, 2012
-
-
Tom Lane authored
If an SMgrRelation is not "owned" by a relcache entry, don't allow it to live past transaction end. This design allows the same SMgrRelation to be used for blind writes of multiple blocks during a transaction, but ensures that we don't hold onto such an SMgrRelation indefinitely. Because an SMgrRelation typically corresponds to open file descriptors at the fd.c level, leaving it open when there's no corresponding relcache entry can mean that we prevent the kernel from reclaiming deleted disk space. (While CacheInvalidateSmgr messages usually fix that, there are cases where they're not issued, such as DROP DATABASE. We might want to add some more sinval messaging for that, but I'd be inclined to keep this type of logic anyway, since allowing VFDs to accumulate indefinitely for blind-written relations doesn't seem like a good idea.) This code replaces a previous attempt towards the same goal that proved to be unreliable. Back-patch to 9.1 where the previous patch was added.
-
Tom Lane authored
This reverts commit fba105b1. That approach had problems with the smgr-level state not tracking what we really want to happen, and with the VFD-level state not tracking the smgr-level state very well either. In consequence, it was still possible to hold kernel file descriptors open for long-gone tables (as in recent report from Tore Halset), and yet there were also cases of FDs being closed undesirably soon. A replacement implementation will follow.
-
- Oct 15, 2012
-
-
Heikki Linnakangas authored
If postmaster changed postmaster.pid while pg_ctl was reading it, pg_ctl could overrun the buffer it allocated for the file. Fix by reading the whole file to memory with one read() call. initdb contains an identical copy of the readfile() function, but the files that initdb reads are static, not modified concurrently. Nevertheless, add a simple bounds-check there, if only to silence static analysis tools. Per report from Dave Vitek. Backpatch to all supported branches.
-
Tom Lane authored
In the previous coding, new backend processes would attempt to create their self-pipe during the OwnLatch call in InitProcess. However, pipe creation could fail if the kernel is short of resources; and the system does not recover gracefully from a FATAL error right there, since we have armed the dead-man switch for this process and not yet set up the on_shmem_exit callback that would disarm it. The postmaster then forces an unnecessary database-wide crash and restart, as reported by Sean Chittenden. There are various ways we could rearrange the code to fix this, but the simplest and sanest seems to be to split out creation of the self-pipe into a new function InitializeLatchSupport, which must be called from a place where failure is allowed. For most processes that gets called in InitProcess or InitAuxiliaryProcess, but processes that don't call either but still use latches need their own calls. Back-patch to 9.1, which has only a part of the latch logic that 9.2 and HEAD have, but nonetheless includes this bug.
-
- Oct 12, 2012
-
-
Tom Lane authored
This change ensures that the planner will see implicit and explicit casts as equivalent for all purposes, except in the minority of cases where there's actually a semantic difference (as reflected by having a 3-argument cast function). In particular, this fixes cases where the EquivalenceClass machinery failed to consider two references to a varchar column as equivalent if one was implicitly cast to text but the other was explicitly cast to text, as seen in bug #7598 from Vaclav Juza. We have had similar bugs before in other parts of the planner, so I think it's time to fix this problem at the core instead of continuing to band-aid around it. Remove set_coercionform_dontcare(), which represents the band-aid previously in use for allowing matching of index and constraint expressions with inconsistent cast labeling. (We can probably get rid of COERCE_DONTCARE altogether, but I don't think removing that enum value in back branches would be wise; it's possible there's third party code referring to it.) Back-patch to 9.2. We could go back further, and might want to once this has been tested more; but for the moment I won't risk destabilizing plan choices in long-since-stable branches.
-
- Oct 11, 2012
-
-
Tom Lane authored
When hashing a subplan like "WHERE (a, b) NOT IN (SELECT x, y FROM ...)", findPartialMatch() attempted to match rows using the hashtable's internal equality operators, which of course are for x and y's datatypes. What we need to use are the potentially cross-type operators for a=x, b=y, etc. Failure to do that leads to wrong answers or even crashes. The scope for problems is limited to cases where we have different types with compatible hash functions (else we'd not be using a hashed subplan), but for example int4 vs int8 can cause the problem. Per bug #7597 from Bo Jensen. This has been wrong since the hashed-subplan code was written, so patch all the way back.
-
- Oct 10, 2012
-
-
Tom Lane authored
Building a shlib on AIX requires use of the mkldexport.sh script, but we failed to install that, preventing its use from non-source-tree contexts. Also, Makefile.aix had the wrong idea about where to find the installed copy of the postgres.imp symbol file used by AIX. Per report from John Pierce. Patch all the way back, since this has been broken since the beginning of PGXS.
-
- Oct 09, 2012
-
-
Tom Lane authored
I found that these functions tend to return -1 while leaving an empty error message string in the PGconn, if they suffer some kind of I/O error on the file. The reason is that lo_close, which thinks it's executed a perfectly fine SQL command, clears the errorMessage. The minimum-change workaround is to reorder operations here so that we don't fill the errorMessage until after lo_close.
-