Skip to content
Snippets Groups Projects
  1. Jan 10, 2018
    • Tom Lane's avatar
      Fix sample INSTR() functions in the plpgsql documentation. · 08adf688
      Tom Lane authored
      These functions are stated to be Oracle-compatible, but they weren't.
      Yugo Nagata noticed that while our code returns zero for a zero or
      negative fourth parameter (occur_index), Oracle throws an error.
      Further testing by me showed that there was also a discrepancy in the
      interpretation of a negative third parameter (beg_index): Oracle thinks
      that a negative beg_index indicates the last place where the target
      substring can *begin*, whereas our code thinks it is the last place
      where the target can *end*.
      
      Adjust the sample code to behave like Oracle in both these respects.
      Also change it to be a CDATA[] section, simplifying copying-and-pasting
      out of the documentation source file.  And fix minor problems in the
      introductory comment, which wasn't very complete or accurate.
      
      Back-patch to all supported branches.  Although this patch only touches
      documentation, we should probably call it out as a bug fix in the next
      minor release notes, since users who have adopted the functions will
      likely want to update their versions.
      
      Yugo Nagata and Tom Lane
      
      Discussion: https://postgr.es/m/20171229191705.c0b43a8c.nagata@sraoss.co.jp
      08adf688
    • Tom Lane's avatar
      Remove dubious micro-optimization in ckpt_buforder_comparator(). · 7eb0187a
      Tom Lane authored
      It seems incorrect to assume that the list of CkptSortItems can never
      contain duplicate page numbers: concurrent activity could result in some
      page getting dropped from a low-numbered buffer and later loaded into a
      high-numbered buffer while BufferSync is scanning the buffer pool.
      If that happened, the comparator would give self-inconsistent results,
      potentially confusing qsort().  Saving one comparison step is not worth
      possibly getting the sort wrong.
      
      So far as I can tell, nothing would actually go wrong given our current
      implementation of qsort().  It might get a bit slower than expected
      if there were a large number of duplicates of one value, but that's
      surely a probability-epsilon case.  Still, the comment is wrong,
      and if we ever switched to another sort implementation it might be
      less forgiving.
      
      In passing, avoid casting away const-ness of the argument pointers;
      I've not seen any compiler complaints from that, but it seems likely
      that some compilers would not like it.
      
      Back-patch to 9.6 where this code came in, just in case I've underestimated
      the possible consequences.
      
      Discussion: https://postgr.es/m/18437.1515607610@sss.pgh.pa.us
      7eb0187a
  2. Jan 09, 2018
    • Alvaro Herrera's avatar
      Change some bogus PageGetLSN calls to BufferGetLSNAtomic · 37dd1128
      Alvaro Herrera authored
      As src/backend/access/transam/README says, PageGetLSN may only be called
      by processes holding either exclusive lock on buffer, or a shared lock
      on buffer plus buffer header lock.  Therefore any place that only holds
      a shared buffer lock must use BufferGetLSNAtomic instead of PageGetLSN,
      which internally obtains buffer header lock prior to reading the LSN.
      
      A few callsites failed to comply with this rule.  This was detected by
      running all tests under a new (not committed) assertion that verifies
      PageGetLSN locking contract.  All but one of the callsites that failed
      the assertion are fixed by this patch.  Remaining callsites were
      inspected manually and determined not to need any change.
      
      The exception (unfixed callsite) is in TestForOldSnapshot, which only
      has a Page argument, making it impossible to access the corresponding
      Buffer from it.  Fixing that seems a much larger patch that will have to
      be done separately; and that's just as well, since it was only
      introduced in 9.6 and other bugs are much older.
      
      Some of these bugs are ancient; backpatch all the way back to 9.3.
      
      Authors: Jacob Champion, Asim Praveen, Ashwin Agrawal
      Reviewed-by: Michaël Paquier
      Discussion: https://postgr.es/m/CABAq_6GXgQDVu3u12mK9O5Xt5abBZWQ0V40LZCE+oUf95XyNFg@mail.gmail.com
      37dd1128
    • Tom Lane's avatar
      While waiting for a condition variable, detect postmaster death. · d56a5f99
      Tom Lane authored
      The general assumption for postmaster child processes is that they
      should just exit(1), reasonably promptly, if the postmaster disappears.
      condition_variable.c neglected this consideration and could be left
      waiting forever, if the counterpart process it is waiting for has
      done the right thing and exited.
      
      We had some discussion of adjusting the WaitEventSet API to make it
      harder to make this type of mistake in future; but for the moment,
      and for v10, let's make this narrow fix.
      
      Discussion: https://postgr.es/m/20412.1515456143@sss.pgh.pa.us
      d56a5f99
    • Tom Lane's avatar
      Fix race condition during replication origin drop. · 1f5adbd7
      Tom Lane authored
      replorigin_drop() misunderstood the API for condition variables: it
      had ConditionVariablePrepareToSleep and ConditionVariableCancelSleep
      inside its test-and-sleep loop, rather than outside the loop as
      intended.  The net effect is a narrow race-condition window wherein,
      if the process using a replication slot releases it immediately after
      replorigin_drop() releases the ReplicationOriginLock, replorigin_drop()
      would get into the condition variable's wait list too late and then
      wait indefinitely for a signal that won't come.
      
      Because there's a different CV for each replication slot, we can't
      just move the ConditionVariablePrepareToSleep call to above the
      test-and-sleep loop.  What we can do, in the wake of commit 13db3b93,
      is drop the ConditionVariablePrepareToSleep call entirely.  This fix
      depends on that commit because (at least in principle) the slot matching
      the target replication origin might move around, so that once in a blue
      moon successive loop iterations might involve different CVs.  We can now
      cope with such a scenario, at the cost of an extra trip through the
      retry loop.
      
      (There are ways we could fix this bug without depending on that commit,
      but they're all a lot more complicated than this way.)
      
      While at it, upgrade the rather skimpy comments in this function.
      
      Back-patch to v10 where this code came in.
      
      Discussion: https://postgr.es/m/19947.1515455433@sss.pgh.pa.us
      1f5adbd7
    • Tom Lane's avatar
      Allow ConditionVariable[PrepareTo]Sleep to auto-switch between CVs. · 4af2190e
      Tom Lane authored
      The original coding here insisted that callers manually cancel any prepared
      sleep for one condition variable before starting a sleep on another one.
      While that's not a huge burden today, it seems like a gotcha that will bite
      us in future if the use of condition variables increases; anything we can
      do to make the use of this API simpler and more robust is attractive.
      Hence, allow these functions to automatically switch their attention to
      a different CV when required.  This is safe for the same reason it was OK
      for commit aced5a92 to let a broadcast operation cancel any prepared CV
      sleep: whenever we return to the other test-and-sleep loop, we will
      automatically re-prepare that CV, paying at most an extra test of that
      loop's exit condition.
      
      Back-patch to v10 where condition variables were introduced.  Ordinarily
      we would probably not back-patch a change like this, but since it does not
      invalidate any coding pattern that was legal before, it seems safe enough.
      Furthermore, there's an open bug in replorigin_drop() for which the
      simplest fix requires this.  Even if we chose to fix that in some more
      complicated way, the hazard would remain that we might back-patch some
      other bug fix that requires this behavior.
      
      Patch by me, reviewed by Thomas Munro.
      
      Discussion: https://postgr.es/m/2437.1515368316@sss.pgh.pa.us
      4af2190e
    • Bruce Momjian's avatar
      pg_upgrade: prevent check on live cluster from generating error · 1776c817
      Bruce Momjian authored
      Previously an inaccurate but harmless error was generated when running
      --check on a live server before reporting the servers as compatible.
      The fix is to split error reporting and exit control in the exec_prog()
      API.
      
      Reported-by: Daniel Westermann
      
      Backpatch-through: 10
      1776c817
  3. Jan 06, 2018
    • Tom Lane's avatar
      Reorder steps in ConditionVariablePrepareToSleep for more safety. · 83fe2708
      Tom Lane authored
      In the admittedly-very-unlikely case that AddWaitEventToSet fails,
      ConditionVariablePrepareToSleep would error out after already having
      set cv_sleep_target, which is probably bad, and after having already
      set cv_wait_event_set, which is very bad.  Transaction abort might or
      might not clean up cv_sleep_target properly; but there is nothing
      that would be aware that the WaitEventSet wasn't fully constructed,
      so that all future condition variable sleeps would be broken.
      We can easily guard against these hazards with slight restructuring.
      
      Back-patch to v10 where condition_variable.c was introduced.
      
      Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com
      83fe2708
    • Tom Lane's avatar
      Rewrite ConditionVariableBroadcast() to avoid live-lock. · 1c77e990
      Tom Lane authored
      The original implementation of ConditionVariableBroadcast was, per its
      self-description, "the dumbest way possible".  Thomas Munro found out
      it was a bit too dumb.  An awakened process may immediately re-queue
      itself, if the specific condition it's waiting for is not yet satisfied.
      If this happens before ConditionVariableBroadcast is able to see the wait
      queue as empty, then ConditionVariableBroadcast will re-awaken the same
      process, repeating the cycle.  Given unlucky timing this back-and-forth
      can repeat indefinitely; loops lasting thousands of seconds have been
      seen in testing.
      
      To fix, add our own process to the end of the wait queue to serve as a
      sentinel, and exit the broadcast loop once our process is not there
      anymore.  There are various special considerations described in the
      comments, the principal disadvantage being that wakers can no longer
      be sure whether they awakened a real waiter or just a sentinel.  But in
      practice nobody pays attention to the result of ConditionVariableSignal
      or ConditionVariableBroadcast anyway, so that problem seems hypothetical.
      
      Back-patch to v10 where condition_variable.c was introduced.
      
      Tom Lane and Thomas Munro
      
      Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com
      1c77e990
  4. Jan 05, 2018
  5. Jan 04, 2018
  6. Jan 03, 2018
  7. Jan 02, 2018
    • Alvaro Herrera's avatar
      Fix deadlock hazard in CREATE INDEX CONCURRENTLY · 6d2a9ae0
      Alvaro Herrera authored
      Multiple sessions doing CREATE INDEX CONCURRENTLY simultaneously are
      supposed to be able to work in parallel, as evidenced by fixes in commit
      c3d09b3b specifically to support this case.  In reality, one of the
      sessions would be aborted by a misterious "deadlock detected" error.
      
      Jeff Janes diagnosed that this is because of leftover snapshots used for
      system catalog scans -- this was broken by 8aa3e475 keeping track of
      (registering) the catalog snapshot.  To fix the deadlocks, it's enough
      to de-register that snapshot prior to waiting.
      
      Backpatch to 9.4, which introduced MVCC catalog scans.
      
      Include an isolationtester spec that 8 out of 10 times reproduces the
      deadlock with the unpatched code for me (Álvaro).
      
      Author: Jeff Janes
      Diagnosed-by: Jeff Janes
      Reported-by: Jeremy Finzel
      Discussion: https://postgr.es/m/CAMa1XUhHjCv8Qkx0WOr1Mpm_R4qxN26EibwCrj0Oor2YBUFUTg%40mail.gmail.com
      6d2a9ae0
  8. Jan 01, 2018
  9. Dec 29, 2017
    • Magnus Hagander's avatar
      Properly set base backup backends to active in pg_stat_activity · b38c3d58
      Magnus Hagander authored
      When walsenders were included in pg_stat_activity, only the ones
      actually streaming WAL were listed as active when they were active. In
      particular, the connections sending base backups were listed as being
      idle. Which means that a regular pg_basebackup would show up with one
      active and one idle connection, when both were active.
      
      This patch updates to set all walsenders to active when they are
      (including those doing very fast things like IDENTIFY_SYSTEM), and then
      back to idle. Details about exactly what they are doing is available in
      pg_stat_replication.
      
      Patch by me, review by Michael Paquier and David Steele.
      b38c3d58
  10. Dec 27, 2017
  11. Dec 22, 2017
    • Tom Lane's avatar
      Fix UNION/INTERSECT/EXCEPT over no columns. · c252ccda
      Tom Lane authored
      Since 9.4, we've allowed the syntax "select union select" and variants
      of that.  However, the planner wasn't expecting a no-column set operation
      and ended up treating the set operation as if it were UNION ALL.
      
      Turns out it's trivial to fix in v10 and later; we just need to be careful
      about not generating a Sort node with no sort keys.  However, since a weird
      corner case like this is never going to be exercised by developers, we'd
      better have thorough regression tests if we want to consider it supported.
      
      Per report from Victor Yegorov.
      
      Discussion: https://postgr.es/m/CAGnEbojGJrRSOgJwNGM7JSJZpVAf8xXcVPbVrGdhbVEHZ-BUMw@mail.gmail.com
      c252ccda
  12. Dec 21, 2017
  13. Dec 20, 2017
  14. Dec 19, 2017
    • Robert Haas's avatar
      Try again to fix accumulation of parallel worker instrumentation. · 72567f61
      Robert Haas authored
      When a Gather or Gather Merge node is started and stopped multiple
      times, accumulate instrumentation data only once, at the end, instead
      of after each execution, to avoid recording inflated totals.
      
      Commit 778e78ae, the previous attempt
      at a fix, instead reset the state after every execution, which worked
      for the general instrumentation data but had problems for the additional
      instrumentation specific to Sort and Hash nodes.
      
      Report by hubert depesz lubaczewski.  Analysis and fix by Amit Kapila,
      following a design proposal from Thomas Munro, with a comment tweak
      by me.
      
      Discussion: http://postgr.es/m/20171127175631.GA405@depesz.com
      72567f61
  15. Dec 18, 2017
    • Peter Eisentraut's avatar
      doc: Fix figures in example description · db2ee079
      Peter Eisentraut authored
      
      oversight in 244c8b46
      
      Reported-by: default avatarBlaz Merela <blaz@merela.org>
      db2ee079
    • Fujii Masao's avatar
      Fix bug in cancellation of non-exclusive backup to avoid assertion failure. · 133d2fab
      Fujii Masao authored
      Previously an assertion failure occurred when pg_stop_backup() for
      non-exclusive backup was aborted while it's waiting for WAL files to
      be archived. This assertion failure happened in do_pg_abort_backup()
      which was called when a non-exclusive backup was canceled.
      do_pg_abort_backup() assumes that there is at least one non-exclusive
      backup running when it's called. But pg_stop_backup() can be canceled
      even after it marks the end of non-exclusive backup (e.g.,
      during waiting for WAL archiving). This broke the assumption that
      do_pg_abort_backup() relies on, and which caused an assertion failure.
      
      This commit changes do_pg_abort_backup() so that it does nothing
      when non-exclusive backup has been already marked as completed.
      That is, the asssumption is also changed, and do_pg_abort_backup()
      now can handle even the case where it's called when there is
      no running backup.
      
      Backpatch to 9.6 where SQL-callable non-exclusive backup was added.
      
      Author: Masahiko Sawada and Michael Paquier
      Reviewed-By: Robert Haas and Fujii Masao
      Discussion: https://www.postgresql.org/message-id/CAD21AoD2L1Fu2c==gnVASMyFAAaq3y-AQ2uEVj-zTCGFFjvmDg@mail.gmail.com
      133d2fab
    • Robert Haas's avatar
      Fix crashes on plans with multiple Gather (Merge) nodes. · b70ea4c7
      Robert Haas authored
      es_query_dsa turns out to be broken by design, because it supposes
      that there is only one DSA for the whole query, whereas there is
      actually one per Gather (Merge) node.  For now, work around that
      problem by setting and clearing the pointer around the sections of
      code that might need it.  It's probably a better idea to get rid of
      es_query_dsa altogether in favor of having each node keep track
      individually of which DSA is relevant, but that seems like more than
      we would want to back-patch.
      
      Thomas Munro, reviewed and tested by Andreas Seltenreich, Amit
      Kapila, and by me.
      
      Discussion: http://postgr.es/m/CAEepm=1U6as=brnVvMNixEV2tpi8NuyQoTmO8Qef0-VV+=7MDA@mail.gmail.com
      b70ea4c7
  16. Dec 16, 2017
  17. Dec 15, 2017
    • Andres Freund's avatar
      Perform a lot more sanity checks when freezing tuples. · d3044f8b
      Andres Freund authored
      The previous commit has shown that the sanity checks around freezing
      aren't strong enough. Strengthening them seems especially important
      because the existance of the bug has caused corruption that we don't
      want to make even worse during future vacuum cycles.
      
      The errors are emitted with ereport rather than elog, despite being
      "should never happen" messages, so a proper error code is emitted. To
      avoid superflous translations, mark messages as internal.
      
      Author: Andres Freund and Alvaro Herrera
      Reviewed-By: Alvaro Herrera, Michael Paquier
      Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
      Backpatch: 9.3-
      d3044f8b
    • Andres Freund's avatar
      Fix pruning of locked and updated tuples. · 1224383e
      Andres Freund authored
      Previously it was possible that a tuple was not pruned during vacuum,
      even though its update xmax (i.e. the updating xid in a multixact with
      both key share lockers and an updater) was below the cutoff horizon.
      
      As the freezing code assumed, rightly so, that that's not supposed to
      happen, xmax would be preserved (as a member of a new multixact or
      xmax directly). That causes two problems: For one the tuple is below
      the xmin horizon, which can cause problems if the clog is truncated or
      once there's an xid wraparound. The bigger problem is that that will
      break HOT chains, which in turn can lead two to breakages: First,
      failing index lookups, which in turn can e.g lead to constraints being
      violated. Second, future hot prunes / vacuums can end up making
      invisible tuples visible again. There's other harmful scenarios.
      
      Fix the problem by recognizing that tuples can be DEAD instead of
      RECENTLY_DEAD, even if the multixactid has alive members, if the
      update_xid is below the xmin horizon. That's safe because newer
      versions of the tuple will contain the locking xids.
      
      A followup commit will harden the code somewhat against future similar
      bugs and already corrupted data.
      
      Author: Andres Freund, with changes by Alvaro Herrera
      Reported-By: Daniel Wood
      Analyzed-By: Andres Freund, Alvaro Herrera, Robert Haas, Peter
         Geoghegan, Daniel Wood, Yi Wen Wong, Michael Paquier
      Reviewed-By: Alvaro Herrera, Robert Haas, Michael Paquier
      Discussion:
          https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com
          https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
      Backpatch: 9.3-
      1224383e
  18. Dec 14, 2017
    • Andrew Dunstan's avatar
      Fix walsender timeouts when decoding a large transaction · 14c15b1f
      Andrew Dunstan authored
      The logical slots have a fast code path for sending data so as not to
      impose too high a per message overhead. The fast path skips checks for
      interrupts and timeouts. However, the existing coding failed to consider
      the fact that a transaction with a large number of changes may take a
      very long time to be processed and sent to the client. This causes the
      walsender to ignore interrupts for potentially a long time and more
      importantly it will result in the walsender being killed due to
      timeout at the end of such a transaction.
      
      This commit changes the fast path to also check for interrupts and only
      allows calling the fast path when the last keepalive check happened less
      than half the walsender timeout ago. Otherwise the slower code path will
      be taken.
      
      Backpatched to 9.4
      
      Petr Jelinek, reviewed by  Kyotaro HORIGUCHI, Yura Sokolov,  Craig
      Ringer and Robert Haas.
      
      Discussion: https://postgr.es/m/e082a56a-fd95-a250-3bae-0fff93832510@2ndquadrant.com
      14c15b1f
  19. Dec 13, 2017
  20. Dec 11, 2017
    • Peter Eisentraut's avatar
      Fix comment · c55253b7
      Peter Eisentraut authored
      
      Reported-by: default avatarNoah Misch <noah@leadboat.com>
      c55253b7
    • Tom Lane's avatar
      Fix corner-case coredump in _SPI_error_callback(). · e3d194f7
      Tom Lane authored
      I noticed that _SPI_execute_plan initially sets spierrcontext.arg = NULL,
      and only fills it in some time later.  If an error were to happen in
      between, _SPI_error_callback would try to dereference the null pointer.
      This is unlikely --- there's not much between those points except
      push-snapshot calls --- but it's clearly not impossible.  Tweak the
      callback to do nothing if the pointer isn't set yet.
      
      It's been like this for awhile, so back-patch to all supported branches.
      e3d194f7
Loading