Skip to content
Snippets Groups Projects
  1. Dec 03, 2012
  2. Dec 02, 2012
    • Andrew Dunstan's avatar
      Add mode where contrib installcheck runs each module in a separately named database. · 30248be9
      Andrew Dunstan authored
      Normally each module is tested in aq database named contrib_regression,
      which is dropped and recreated at the beginhning of each pg_regress run.
      This mode, enabled by adding USE_MODULE_DB=1 to the make command line,
      runs most modules in a database with the module name embedded in it.
      
      This will make testing pg_upgrade on clusters with the contrib modules
      a lot easier.
      
      Still to be done: adapt to the MSVC build system.
      
      Backpatch to 9.0, which is the earliest version it is reasonably
      possible to test upgrading from.
      30248be9
    • Tom Lane's avatar
      Update time zone data files to tzdata release 2012j. · 842cb63f
      Tom Lane authored
      DST law changes in Cuba, Israel, Jordan, Libya, Palestine, Western Samoa,
      and portions of Brazil.
      842cb63f
    • Tom Lane's avatar
      Recommend triggers, not rules, in the CREATE VIEW reference page. · f7968554
      Tom Lane authored
      We've generally recommended use of INSTEAD triggers over rules since that
      feature was added; but this old text in the CREATE VIEW reference page
      didn't get the memo.  Noted by Thomas Kellerer.
      f7968554
    • Tom Lane's avatar
      Don't advance checkPoint.nextXid near the end of a checkpoint sequence. · aaceb0d6
      Tom Lane authored
      This reverts commit c1113069 in favor of
      actually fixing the problem: namely, that we should never have been
      modifying the checkpoint record's nextXid at this point to begin with.
      The nextXid should match the state as of the checkpoint's logical WAL
      position (ie the redo point), not the state as of its physical position.
      It's especially bogus to advance it in some wal_levels and not others.
      In any case there is no need for the checkpoint record to carry the
      same nextXid shown in the XLOG_RUNNING_XACTS record just emitted by
      LogStandbySnapshot, as any replay operation will already have adopted
      that value as current.
      
      This fixes bug #7710 from Tarvi Pillessaar, and probably also explains bug
      #6291 from Daniel Farina, in that if a checkpoint were in progress at the
      instant of XID wraparound, the epoch bump would be lost as reported.
      (And, of course, these days there's at least a 50-50 chance of a checkpoint
      being in progress at any given instant.)
      
      Diagnosed by me and independently by Andres Freund.  Back-patch to all
      branches supporting hot standby.
      aaceb0d6
    • Simon Riggs's avatar
      XidEpoch++ if wraparound during checkpoint. · c3f64ade
      Simon Riggs authored
      If wal_level = hot_standby we update the checkpoint nextxid,
      though in the case where a wraparound occurred half-way through
      a checkpoint we would neglect updating the epoch also. Updating
      the nextxid is arguably the wrong thing to do, but changing that
      may introduce subtle bugs into hot standby startup, while updating
      the value doesn't cause any known bugs yet. Minimal fix now to
      HEAD and backbranches, wider fix later in HEAD.
      
      Bug reported in #6291 by Daniel Farina and slightly differently in
      
      Cause analysis and recommended fixes from Tom Lane and Andres Freund.
      
      Applied patch is minimal version of Andres Freund's work.
      c3f64ade
    • Tatsuo Ishii's avatar
      Fix psql crash while parsing SQL file whose encoding is different from · 530cbf6b
      Tatsuo Ishii authored
      client encoding and the client encoding is not *safe* one. Such an
      example is, file encoding is UTF-8 and client encoding SJIS. Patch
      contributed by Jiang Guiqing.
      530cbf6b
  3. Dec 01, 2012
  4. Nov 30, 2012
    • Alvaro Herrera's avatar
      Change test ExceptionalCondition to return void · f18c9510
      Alvaro Herrera authored
      Commit 81107282 changed it in assert.c, but overlooked this other file.
      f18c9510
    • Tom Lane's avatar
      Take buffer lock while inspecting btree index pages in contrib/pageinspect. · 0adcaeef
      Tom Lane authored
      It's not safe to examine a shared buffer without any lock.
      0adcaeef
    • Tom Lane's avatar
      Add missing buffer lock acquisition in GetTupleForTrigger(). · 1c316e3a
      Tom Lane authored
      If we had not been holding buffer pin continuously since the tuple was
      initially fetched by the UPDATE or DELETE query, it would be possible for
      VACUUM or a page-prune operation to move the tuple while we're trying to
      copy it.  This would result in a garbage "old" tuple value being passed to
      an AFTER ROW UPDATE or AFTER ROW DELETE trigger.  The preconditions for
      this are somewhat improbable, and the timing constraints are very tight;
      so it's not so surprising that this hasn't been reported from the field,
      even though the bug has been there a long time.
      
      Problem found by Andres Freund.  Back-patch to all active branches.
      1c316e3a
    • Andrew Dunstan's avatar
      Clean environment for pg_upgrade test. · 2c55189b
      Andrew Dunstan authored
      This removes exisiting PG settings from the environment for
      pg_upgrade tests, just like pg_regress does.
      2c55189b
    • Tom Lane's avatar
      Produce a more useful error message for over-length Unix socket paths. · a8e60f0b
      Tom Lane authored
      The length of a socket path name is constrained by the size of struct
      sockaddr_un, and there's not a lot we can do about it since that is a
      kernel API.  However, it would be a good thing if we produced an
      intelligible error message when the user specifies a socket path that's too
      long --- and getaddrinfo's standard API is too impoverished to do this in
      the natural way.  So insert explicit tests at the places where we construct
      a socket path name.  Now you'll get an error that makes sense and even
      tells you what the limit is, rather than something generic like
      "Non-recoverable failure in name resolution".
      
      Per trouble report from Jeremy Drake and a fix idea from Andrew Dunstan.
      a8e60f0b
  5. Nov 29, 2012
    • Simon Riggs's avatar
      Cleanup VirtualXact at end of Hot Standby · edfc84b8
      Simon Riggs authored
      Resolves bug 7572 reported by Daniele Varrazzo
      edfc84b8
    • Simon Riggs's avatar
      Correctly init fast path fields on PGPROC · fdac4e2b
      Simon Riggs authored
      fdac4e2b
    • Michael Meskes's avatar
      When processing nested structure pointer variables ecpg always expected an · 44fe8ae9
      Michael Meskes authored
      array datatype which of course is wrong.
      
      Applied patch by Muhammad Usama <m.usama@gmail.com> to fix this.
      44fe8ae9
    • Tom Lane's avatar
      Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY. · 94c014b5
      Tom Lane authored
      Commit 8cb53654, which introduced DROP
      INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor
      choice of catalog state representation.  The pg_index state for an index
      that's reached the final pre-drop stage was the same as the state for an
      index just created by CREATE INDEX CONCURRENTLY.  This meant that the
      (necessary) change to make RelationGetIndexList ignore about-to-die indexes
      also made it ignore freshly-created indexes; which is catastrophic because
      the latter do need to be considered in HOT-safety decisions.  Failure to
      do so leads to incorrect index entries and subsequently wrong results from
      queries depending on the concurrently-created index.
      
      To fix, make the final state be indisvalid = true and indisready = false,
      which is otherwise nonsensical.  This is pretty ugly but we can't add
      another column without forcing initdb, and it's too late for that in 9.2.
      (There's a cleaner fix in HEAD.)
      
      In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index
      flag changes they make without exclusive lock on the index are made via
      heap_inplace_update() rather than a normal transactional update.  The
      latter is not very safe because moving the pg_index tuple could result in
      concurrent SnapshotNow scans finding it twice or not at all, thus possibly
      resulting in index corruption.  This is a pre-existing bug in CREATE INDEX
      CONCURRENTLY, which was copied into the DROP code.
      
      In addition, fix various places in the code that ought to check to make
      sure that the indexes they are manipulating are valid and/or ready as
      appropriate.  These represent bugs that have existed since 8.2, since
      a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid
      index behind, and we ought not try to do anything that might fail with
      such an index.
      
      Also fix RelationReloadIndexInfo to ensure it copies all the pg_index
      columns that are allowed to change after initial creation.  Previously we
      could have been left with stale values of some fields in an index relcache
      entry.  It's not clear whether this actually had any user-visible
      consequences, but it's at least a bug waiting to happen.
      
      In addition, do some code and docs review for DROP INDEX CONCURRENTLY;
      some cosmetic code cleanup but mostly addition and revision of comments.
      
      Portions of this need to be back-patched even further, but I'll work
      on that separately.
      
      Problem reported by Amit Kapila, diagnosis by Pavan Deolasee,
      fix by Tom Lane and Andres Freund.
      94c014b5
  6. Nov 28, 2012
  7. Nov 26, 2012
    • Tom Lane's avatar
      Revert patch for taking fewer snapshots. · 786afc1c
      Tom Lane authored
      This reverts commit d573e239, "Take fewer
      snapshots".  While that seemed like a good idea at the time, it caused
      execution to use a snapshot that had been acquired before locking any of
      the tables mentioned in the query.  This created user-visible anomalies
      that were not present in any prior release of Postgres, as reported by
      Tomas Vondra.  While this whole area could do with a redesign (since there
      are related cases that have anomalies anyway), it doesn't seem likely that
      any future patch would be reasonably back-patchable; and we don't want 9.2
      to exhibit a behavior that's subtly unlike either past or future releases.
      Hence, revert to prior code while we rethink the problem.
      786afc1c
    • Tom Lane's avatar
      Fix SELECT DISTINCT with index-optimized MIN/MAX on inheritance trees. · eea6ada9
      Tom Lane authored
      In a query such as "SELECT DISTINCT min(x) FROM tab", the DISTINCT is
      pretty useless (there being only one output row), but nonetheless it
      shouldn't fail.  But it could fail if "tab" is an inheritance parent,
      because planagg.c's code for fixing up equivalence classes after making the
      index-optimized MIN/MAX transformation wasn't prepared to find child-table
      versions of the aggregate expression.  The least ugly fix seems to be
      to add an option to mutate_eclass_expressions() to skip child-table
      equivalence class members, which aren't used anymore at this stage of
      planning so it's not really necessary to fix them.  Since child members
      are ignored in many cases already, it seems plausible for
      mutate_eclass_expressions() to have an option to ignore them too.
      
      Per bug #7703 from Maxim Boguk.
      
      Back-patch to 9.1.  Although the same code exists before that, it cannot
      encounter child-table aggregates AFAICS, because the index optimization
      transformation cannot succeed on inheritance trees before 9.1 (for lack
      of MergeAppend).
      eea6ada9
  8. Nov 23, 2012
    • Heikki Linnakangas's avatar
      pg_stat_replication.sync_state was displayed incorrectly at page boundary. · 3f7b04d6
      Heikki Linnakangas authored
      XLogRecPtrIsInvalid() only checks the xrecoff field, which is correct when
      checking if a WAL record could legally begin at the given position, but WAL
      sending can legally be paused at a page boundary, in which case xrecoff is
      0. Use XLByteEQ(..., InvalidXLogRecPtr) instead, which checks that both
      xlogid and xrecoff are 0.
      
      9.3 doesn't have this problem because XLogRecPtr is now a single 64-bit
      integer, so XLogRecPtrIsInvalid() does the right thing. Apply to 9.2, and
      9.1 where pg_stat_replication view was introduced.
      
      Kyotaro HORIGUCHI, reviewed by Fujii Masao.
      3f7b04d6
  9. Nov 22, 2012
    • Tom Lane's avatar
      Fix pg_resetxlog to use correct path to postmaster.pid. · 430b47f3
      Tom Lane authored
      Since we've already chdir'd into the data directory, the file should
      be referenced as just "postmaster.pid", without prefixing the directory
      path.  This is harmless in the normal case where an absolute PGDATA path
      is used, but quite dangerous if a relative path is specified, since the
      program might then fail to notice an active postmaster.
      
      Reported by Hari Babu.  This got broken in my commit
      eb5949d1, so patch all active versions.
      430b47f3
    • Heikki Linnakangas's avatar
      Avoid bogus "out-of-sequence timeline ID" errors in standby-mode. · dda8b87b
      Heikki Linnakangas authored
      When startup process opens a WAL segment after replaying part of it, it
      validates the first page on the WAL segment, even though the page it's
      really interested in later in the file. As part of the validation, it checks
      that the TLI on the page header is >= the TLI it saw on the last page it
      read. If the segment contains a timeline switch, and we have already
      replayed it, and then re-open the WAL segment (because of streaming
      replication got disconnected and reconnected, for example), the TLI check
      will fail when the first page is validated. Fix that by relaxing the TLI
      check when re-opening a WAL segment.
      
      Backpatch to 9.0. Earlier versions had the same code, but before standby
      mode was introduced in 9.0, recovery never tried to re-read a segment after
      partially replaying it.
      
      Reported by Amit Kapila, while testing a new feature.
      dda8b87b
  10. Nov 21, 2012
    • Tom Lane's avatar
      Don't launch new child processes after we've been told to shut down. · 8af60da9
      Tom Lane authored
      Once we've received a shutdown signal (SIGINT or SIGTERM), we should not
      launch any more child processes, even if we get signals requesting such.
      The normal code path for spawning backends has always understood that,
      but the postmaster's infrastructure for hot standby and autovacuum didn't
      get the memo.  As reported by Hari Babu in bug #7643, this could lead to
      failure to shut down at all in some cases, such as when SIGINT is received
      just before the startup process sends PMSIGNAL_RECOVERY_STARTED: we'd
      launch a bgwriter and checkpointer, and then those processes would have no
      idea that they ought to quit.  Similarly, launching a new autovacuum worker
      would result in waiting till it finished before shutting down.
      
      Also, switch the order of the code blocks in reaper() that detect startup
      process crash versus shutdown termination.  Once we've sent it a signal,
      we should not consider that exit(1) is surprising.  This is just a cosmetic
      fix since shutdown occurs correctly anyway, but better not to log a phony
      complaint about startup process crash.
      
      Back-patch to 9.0.  Some parts of this might be applicable before that,
      but given the lack of prior complaints I'm not going to worry too much
      about older branches.
      8af60da9
  11. Nov 20, 2012
    • Tom Lane's avatar
      Improve handling of INT_MIN / -1 and related cases. · 278b6059
      Tom Lane authored
      Some platforms throw an exception for this division, rather than returning
      a necessarily-overflowed result.  Since we were testing for overflow after
      the fact, an exception isn't nice.  We can avoid the problem by treating
      division by -1 as negation.
      
      Add some regression tests so that we'll find out if any compilers try to
      optimize away the overflow check conditions.
      
      Back-patch of commit 1f7cb5c3.
      
      Per discussion with Xi Wang, though this is different from the patch he
      submitted.
      278b6059
  12. Nov 18, 2012
    • Tom Lane's avatar
      Limit values of archive_timeout, post_auth_delay, auth_delay.milliseconds. · 83d48a81
      Tom Lane authored
      The previous definitions of these GUC variables allowed them to range
      up to INT_MAX, but in point of fact the underlying code would suffer
      overflows or other errors with large values.  Reduce the maximum values
      to something that won't misbehave.  There's no apparent value in working
      harder than this, since very large delays aren't sensible for any of
      these.  (Note: the risk with archive_timeout is that if we're late
      checking the state, the timestamp difference it's being compared to
      might overflow.  So we need some amount of slop; the choice of INT_MAX/2
      is arbitrary.)
      
      Per followup investigation of bug #7670.  Although this isn't a very
      significant fix, might as well back-patch.
      83d48a81
    • Tom Lane's avatar
      Fix syslogger to not fail when log_rotation_age exceeds 2^31 milliseconds. · 89067bc1
      Tom Lane authored
      We need to avoid calling WaitLatch with timeouts exceeding INT_MAX.
      Fortunately a simple clamp will do the trick, since no harm is done if
      the wait times out before it's really time to rotate the log file.
      Per bug #7670 (probably bug #7545 is the same thing, too).
      
      In passing, fix bogus definition of log_rotation_age's maximum value in
      guc.c --- it was numerically right, but only because MINS_PER_HOUR and
      SECS_PER_MINUTE have the same value.
      
      Back-patch to 9.2.  Before that, syslogger wasn't using WaitLatch.
      89067bc1
  13. Nov 16, 2012
    • Tom Lane's avatar
      Improve check_partial_indexes() to consider join clauses in proof attempts. · f0461cd8
      Tom Lane authored
      Traditionally check_partial_indexes() has only looked at restriction
      clauses while trying to prove partial indexes usable in queries.  However,
      join clauses can also be used in some cases; mainly, that a strict operator
      on "x" proves an "x IS NOT NULL" index predicate, even if the operator is
      in a join clause rather than a restriction clause.  Adding this code fixes
      a regression in 9.2, because previously we would take join clauses into
      account when considering whether a partial index could be used in a
      nestloop inner indexscan path.  9.2 doesn't handle nestloop inner
      indexscans in the same way, and this consideration was overlooked in the
      rewrite.  Moving the work to check_partial_indexes() is a better solution
      anyway, since the proof applies whether or not we actually use the index
      in that particular way, and we don't have to do it over again for each
      possible outer relation.  Per report from Dave Cramer.
      f0461cd8
  14. Nov 14, 2012
    • Tom Lane's avatar
      Fix the int8 and int2 cases of (minimum possible integer) % (-1). · 3b4db79e
      Tom Lane authored
      The correct answer for this (or any other case with arg2 = -1) is zero,
      but some machines throw a floating-point exception instead of behaving
      sanely.  Commit f9ac414c dealt with this
      in int4mod, but overlooked the fact that it also happens in int8mod
      (at least on my Linux x86_64 machine).  Protect int2mod as well; it's
      not clear whether any machines fail there (mine does not) but since the
      test is so cheap it seems better safe than sorry.  While at it, simplify
      the original guard in int4mod: we need only check for arg2 == -1, we
      don't need to check arg1 explicitly.
      
      Xi Wang, with some editing by me.
      3b4db79e
  15. Nov 13, 2012
    • Tom Lane's avatar
      Fix memory leaks in record_out() and record_send(). · 5355e39c
      Tom Lane authored
      record_out() leaks memory: it fails to free the strings returned by the
      per-column output functions, and also is careless about detoasted values.
      This results in a query-lifespan memory leakage when returning composite
      values to the client, because printtup() runs the output functions in the
      query-lifespan memory context.  Fix it to handle these issues the same way
      printtup() does.  Also fix a similar leakage in record_send().
      
      (At some point we might want to try to run output functions in
      shorter-lived memory contexts, so that we don't need a zero-leakage policy
      for them.  But that would be a significantly more invasive patch, which
      doesn't seem like material for back-patching.)
      
      In passing, use appendStringInfoCharMacro instead of appendStringInfoChar
      in the innermost data-copying loop of record_out, to try to shave a few
      cycles from this function's runtime.
      
      Per trouble report from Carlos Henrique Reimer.  Back-patch to all
      supported versions.
      5355e39c
    • Simon Riggs's avatar
      Skip searching for subxact locks at commit. · 31541778
      Simon Riggs authored
      At commit all standby locks are released
      for the top-level transaction, so searching
      for locks for each subtransaction is both
      pointless and costly (N^2) in the presence
      of many AccessExclusiveLocks.
      31541778
    • Simon Riggs's avatar
      Clarify docs on hot standby lock release · f66e7ab6
      Simon Riggs authored
      Andres Freund and Simon Riggs
      f66e7ab6
    • Tom Lane's avatar
      Fix multiple problems in WAL replay. · 8805ff65
      Tom Lane authored
      Most of the replay functions for WAL record types that modify more than
      one page failed to ensure that those pages were locked correctly to ensure
      that concurrent queries could not see inconsistent page states.  This is
      a hangover from coding decisions made long before Hot Standby was added,
      when it was hardly necessary to acquire buffer locks during WAL replay
      at all, let alone hold them for carefully-chosen periods.
      
      The key problem was that RestoreBkpBlocks was written to hold lock on each
      page restored from a full-page image for only as long as it took to update
      that page.  This was guaranteed to break any WAL replay function in which
      there was any update-ordering constraint between pages, because even if the
      nominal order of the pages is the right one, any mixture of full-page and
      non-full-page updates in the same record would result in out-of-order
      updates.  Moreover, it wouldn't work for situations where there's a
      requirement to maintain lock on one page while updating another.  Failure
      to honor an update ordering constraint in this way is thought to be the
      cause of bug #7648 from Daniel Farina: what seems to have happened there
      is that a btree page being split was rewritten from a full-page image
      before the new right sibling page was written, and because lock on the
      original page was not maintained it was possible for hot standby queries to
      try to traverse the page's right-link to the not-yet-existing sibling page.
      
      To fix, get rid of RestoreBkpBlocks as such, and instead create a new
      function RestoreBackupBlock that restores just one full-page image at a
      time.  This function can be invoked by WAL replay functions at the points
      where they would otherwise perform non-full-page updates; in this way, the
      physical order of page updates remains the same no matter which pages are
      replaced by full-page images.  We can then further adjust the logic in
      individual replay functions if it is necessary to hold buffer locks
      for overlapping periods.  A side benefit is that we can simplify the
      handling of concurrency conflict resolution by moving that code into the
      record-type-specfic functions; there's no more need to contort the code
      layout to keep conflict resolution in front of the RestoreBkpBlocks call.
      
      In connection with that, standardize on zero-based numbering rather than
      one-based numbering for referencing the full-page images.  In HEAD, I
      removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4.  They are
      still there in the header files in previous branches, but are no longer
      used by the code.
      
      In addition, fix some other bugs identified in the course of making these
      changes:
      
      spgRedoAddNode could fail to update the parent downlink at all, if the
      parent tuple is in the same page as either the old or new split tuple and
      we're not doing a full-page image: it would get fooled by the LSN having
      been advanced already.  This would result in permanent index corruption,
      not just transient failure of concurrent queries.
      
      Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old
      tail page as a candidate for a full-page image; in the worst case this
      could result in torn-page corruption.
      
      heap_xlog_freeze() was inconsistent about using a cleanup lock or plain
      exclusive lock: it did the former in the normal path but the latter for a
      full-page image.  A plain exclusive lock seems sufficient, so change to
      that.
      
      Also, remove gistRedoPageDeleteRecord(), which has been dead code since
      VACUUM FULL was rewritten.
      
      Back-patch to 9.0, where hot standby was introduced.  Note however that 9.0
      had a significantly different WAL-logging scheme for GIST index updates,
      and it doesn't appear possible to make that scheme safe for concurrent hot
      standby queries, because it can leave inconsistent states in the index even
      between WAL records.  Given the lack of complaints from the field, we won't
      work too hard on fixing that branch.
      8805ff65
  16. Nov 12, 2012
    • Tom Lane's avatar
      Check for stack overflow in transformSetOperationTree(). · 454edf1d
      Tom Lane authored
      Since transformSetOperationTree() recurses, it can be driven to stack
      overflow with enough UNION/INTERSECT/EXCEPT clauses in a query.  Add a
      check to ensure it fails cleanly instead of crashing.  Per report from
      Matthew Gerber (though it's not clear whether this is the only thing
      going wrong for him).
      
      Historical note: I think the reasoning behind not putting a check here in
      the beginning was that the check in transformExpr() ought to be sufficient
      to guard the whole parser.  However, because transformSetOperationTree()
      recurses all the way to the bottom of the set-operation tree before doing
      any analysis of the statement's expressions, that check doesn't save it.
      454edf1d
Loading