Skip to content
Snippets Groups Projects
  1. Jan 03, 2018
  2. Jan 02, 2018
    • Alvaro Herrera's avatar
      Fix deadlock hazard in CREATE INDEX CONCURRENTLY · 6d2a9ae0
      Alvaro Herrera authored
      Multiple sessions doing CREATE INDEX CONCURRENTLY simultaneously are
      supposed to be able to work in parallel, as evidenced by fixes in commit
      c3d09b3b specifically to support this case.  In reality, one of the
      sessions would be aborted by a misterious "deadlock detected" error.
      
      Jeff Janes diagnosed that this is because of leftover snapshots used for
      system catalog scans -- this was broken by 8aa3e475 keeping track of
      (registering) the catalog snapshot.  To fix the deadlocks, it's enough
      to de-register that snapshot prior to waiting.
      
      Backpatch to 9.4, which introduced MVCC catalog scans.
      
      Include an isolationtester spec that 8 out of 10 times reproduces the
      deadlock with the unpatched code for me (Álvaro).
      
      Author: Jeff Janes
      Diagnosed-by: Jeff Janes
      Reported-by: Jeremy Finzel
      Discussion: https://postgr.es/m/CAMa1XUhHjCv8Qkx0WOr1Mpm_R4qxN26EibwCrj0Oor2YBUFUTg%40mail.gmail.com
      6d2a9ae0
  3. Jan 01, 2018
  4. Dec 29, 2017
    • Magnus Hagander's avatar
      Properly set base backup backends to active in pg_stat_activity · b38c3d58
      Magnus Hagander authored
      When walsenders were included in pg_stat_activity, only the ones
      actually streaming WAL were listed as active when they were active. In
      particular, the connections sending base backups were listed as being
      idle. Which means that a regular pg_basebackup would show up with one
      active and one idle connection, when both were active.
      
      This patch updates to set all walsenders to active when they are
      (including those doing very fast things like IDENTIFY_SYSTEM), and then
      back to idle. Details about exactly what they are doing is available in
      pg_stat_replication.
      
      Patch by me, review by Michael Paquier and David Steele.
      b38c3d58
  5. Dec 27, 2017
  6. Dec 22, 2017
    • Tom Lane's avatar
      Fix UNION/INTERSECT/EXCEPT over no columns. · c252ccda
      Tom Lane authored
      Since 9.4, we've allowed the syntax "select union select" and variants
      of that.  However, the planner wasn't expecting a no-column set operation
      and ended up treating the set operation as if it were UNION ALL.
      
      Turns out it's trivial to fix in v10 and later; we just need to be careful
      about not generating a Sort node with no sort keys.  However, since a weird
      corner case like this is never going to be exercised by developers, we'd
      better have thorough regression tests if we want to consider it supported.
      
      Per report from Victor Yegorov.
      
      Discussion: https://postgr.es/m/CAGnEbojGJrRSOgJwNGM7JSJZpVAf8xXcVPbVrGdhbVEHZ-BUMw@mail.gmail.com
      c252ccda
  7. Dec 21, 2017
  8. Dec 20, 2017
  9. Dec 19, 2017
    • Robert Haas's avatar
      Try again to fix accumulation of parallel worker instrumentation. · 72567f61
      Robert Haas authored
      When a Gather or Gather Merge node is started and stopped multiple
      times, accumulate instrumentation data only once, at the end, instead
      of after each execution, to avoid recording inflated totals.
      
      Commit 778e78ae, the previous attempt
      at a fix, instead reset the state after every execution, which worked
      for the general instrumentation data but had problems for the additional
      instrumentation specific to Sort and Hash nodes.
      
      Report by hubert depesz lubaczewski.  Analysis and fix by Amit Kapila,
      following a design proposal from Thomas Munro, with a comment tweak
      by me.
      
      Discussion: http://postgr.es/m/20171127175631.GA405@depesz.com
      72567f61
  10. Dec 18, 2017
    • Peter Eisentraut's avatar
      doc: Fix figures in example description · db2ee079
      Peter Eisentraut authored
      
      oversight in 244c8b46
      
      Reported-by: default avatarBlaz Merela <blaz@merela.org>
      db2ee079
    • Fujii Masao's avatar
      Fix bug in cancellation of non-exclusive backup to avoid assertion failure. · 133d2fab
      Fujii Masao authored
      Previously an assertion failure occurred when pg_stop_backup() for
      non-exclusive backup was aborted while it's waiting for WAL files to
      be archived. This assertion failure happened in do_pg_abort_backup()
      which was called when a non-exclusive backup was canceled.
      do_pg_abort_backup() assumes that there is at least one non-exclusive
      backup running when it's called. But pg_stop_backup() can be canceled
      even after it marks the end of non-exclusive backup (e.g.,
      during waiting for WAL archiving). This broke the assumption that
      do_pg_abort_backup() relies on, and which caused an assertion failure.
      
      This commit changes do_pg_abort_backup() so that it does nothing
      when non-exclusive backup has been already marked as completed.
      That is, the asssumption is also changed, and do_pg_abort_backup()
      now can handle even the case where it's called when there is
      no running backup.
      
      Backpatch to 9.6 where SQL-callable non-exclusive backup was added.
      
      Author: Masahiko Sawada and Michael Paquier
      Reviewed-By: Robert Haas and Fujii Masao
      Discussion: https://www.postgresql.org/message-id/CAD21AoD2L1Fu2c==gnVASMyFAAaq3y-AQ2uEVj-zTCGFFjvmDg@mail.gmail.com
      133d2fab
    • Robert Haas's avatar
      Fix crashes on plans with multiple Gather (Merge) nodes. · b70ea4c7
      Robert Haas authored
      es_query_dsa turns out to be broken by design, because it supposes
      that there is only one DSA for the whole query, whereas there is
      actually one per Gather (Merge) node.  For now, work around that
      problem by setting and clearing the pointer around the sections of
      code that might need it.  It's probably a better idea to get rid of
      es_query_dsa altogether in favor of having each node keep track
      individually of which DSA is relevant, but that seems like more than
      we would want to back-patch.
      
      Thomas Munro, reviewed and tested by Andreas Seltenreich, Amit
      Kapila, and by me.
      
      Discussion: http://postgr.es/m/CAEepm=1U6as=brnVvMNixEV2tpi8NuyQoTmO8Qef0-VV+=7MDA@mail.gmail.com
      b70ea4c7
  11. Dec 16, 2017
  12. Dec 15, 2017
    • Andres Freund's avatar
      Perform a lot more sanity checks when freezing tuples. · d3044f8b
      Andres Freund authored
      The previous commit has shown that the sanity checks around freezing
      aren't strong enough. Strengthening them seems especially important
      because the existance of the bug has caused corruption that we don't
      want to make even worse during future vacuum cycles.
      
      The errors are emitted with ereport rather than elog, despite being
      "should never happen" messages, so a proper error code is emitted. To
      avoid superflous translations, mark messages as internal.
      
      Author: Andres Freund and Alvaro Herrera
      Reviewed-By: Alvaro Herrera, Michael Paquier
      Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
      Backpatch: 9.3-
      d3044f8b
    • Andres Freund's avatar
      Fix pruning of locked and updated tuples. · 1224383e
      Andres Freund authored
      Previously it was possible that a tuple was not pruned during vacuum,
      even though its update xmax (i.e. the updating xid in a multixact with
      both key share lockers and an updater) was below the cutoff horizon.
      
      As the freezing code assumed, rightly so, that that's not supposed to
      happen, xmax would be preserved (as a member of a new multixact or
      xmax directly). That causes two problems: For one the tuple is below
      the xmin horizon, which can cause problems if the clog is truncated or
      once there's an xid wraparound. The bigger problem is that that will
      break HOT chains, which in turn can lead two to breakages: First,
      failing index lookups, which in turn can e.g lead to constraints being
      violated. Second, future hot prunes / vacuums can end up making
      invisible tuples visible again. There's other harmful scenarios.
      
      Fix the problem by recognizing that tuples can be DEAD instead of
      RECENTLY_DEAD, even if the multixactid has alive members, if the
      update_xid is below the xmin horizon. That's safe because newer
      versions of the tuple will contain the locking xids.
      
      A followup commit will harden the code somewhat against future similar
      bugs and already corrupted data.
      
      Author: Andres Freund, with changes by Alvaro Herrera
      Reported-By: Daniel Wood
      Analyzed-By: Andres Freund, Alvaro Herrera, Robert Haas, Peter
         Geoghegan, Daniel Wood, Yi Wen Wong, Michael Paquier
      Reviewed-By: Alvaro Herrera, Robert Haas, Michael Paquier
      Discussion:
          https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com
          https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
      Backpatch: 9.3-
      1224383e
  13. Dec 14, 2017
    • Andrew Dunstan's avatar
      Fix walsender timeouts when decoding a large transaction · 14c15b1f
      Andrew Dunstan authored
      The logical slots have a fast code path for sending data so as not to
      impose too high a per message overhead. The fast path skips checks for
      interrupts and timeouts. However, the existing coding failed to consider
      the fact that a transaction with a large number of changes may take a
      very long time to be processed and sent to the client. This causes the
      walsender to ignore interrupts for potentially a long time and more
      importantly it will result in the walsender being killed due to
      timeout at the end of such a transaction.
      
      This commit changes the fast path to also check for interrupts and only
      allows calling the fast path when the last keepalive check happened less
      than half the walsender timeout ago. Otherwise the slower code path will
      be taken.
      
      Backpatched to 9.4
      
      Petr Jelinek, reviewed by  Kyotaro HORIGUCHI, Yura Sokolov,  Craig
      Ringer and Robert Haas.
      
      Discussion: https://postgr.es/m/e082a56a-fd95-a250-3bae-0fff93832510@2ndquadrant.com
      14c15b1f
  14. Dec 13, 2017
  15. Dec 11, 2017
    • Peter Eisentraut's avatar
      Fix comment · c55253b7
      Peter Eisentraut authored
      
      Reported-by: default avatarNoah Misch <noah@leadboat.com>
      c55253b7
    • Tom Lane's avatar
      Fix corner-case coredump in _SPI_error_callback(). · e3d194f7
      Tom Lane authored
      I noticed that _SPI_execute_plan initially sets spierrcontext.arg = NULL,
      and only fills it in some time later.  If an error were to happen in
      between, _SPI_error_callback would try to dereference the null pointer.
      This is unlikely --- there's not much between those points except
      push-snapshot calls --- but it's clearly not impossible.  Tweak the
      callback to do nothing if the pointer isn't set yet.
      
      It's been like this for awhile, so back-patch to all supported branches.
      e3d194f7
  16. Dec 09, 2017
    • Magnus Hagander's avatar
      Fix typo · 22e71b3a
      Magnus Hagander authored
      Reported by Robins Tharakan
      22e71b3a
    • Noah Misch's avatar
      MSVC 2012+: Permit linking to 32-bit, MinGW-built libraries. · e2cc6505
      Noah Misch authored
      Notably, this permits linking to the 32-bit Perl binaries advertised on
      perl.org, namely Strawberry Perl and ActivePerl.  This has a side effect
      of permitting linking to binaries built with obsolete MSVC versions.
      
      By default, MSVC 2012 and later require a "safe exception handler table"
      in each binary.  MinGW-built, 32-bit DLLs lack the relevant exception
      handler metadata, so linking to them failed with error LNK2026.  Restore
      the semantics of MSVC 2010, which omits the table from a given binary if
      some linker input lacks metadata.  This has no effect on 64-bit builds
      or on MSVC 2010 and earlier.  Back-patch to 9.3 (all supported
      versions).
      
      Reported by Victor Wagner.
      
      Discussion: https://postgr.es/m/20160326154321.7754ab8f@wagner.wagner.home
      e2cc6505
    • Noah Misch's avatar
      MSVC: Test whether 32-bit Perl needs -D_USE_32BIT_TIME_T. · 9b5c9979
      Noah Misch authored
      Commits 5a5c2fec and
      b5178c5d introduced support for modern
      MSVC-built, 32-bit Perl, but they broke use of MinGW-built, 32-bit Perl
      distributions like Strawberry Perl and modern ActivePerl.  Perl has no
      robust means to report whether it expects a -D_USE_32BIT_TIME_T ABI, so
      test this.  Back-patch to 9.3 (all supported versions).
      
      The chief alternative was a heuristic of adding -D_USE_32BIT_TIME_T when
      $Config{gccversion} is nonempty.  That banks on every gcc-built Perl
      using the same ABI.  gcc could change its default ABI the way MSVC once
      did, and one could build Perl with gcc and the non-default ABI.
      
      The GNU make build system could benefit from a similar test, without
      which it does not support MSVC-built Perl.  For now, just add a comment.
      Most users taking the special step of building Perl with MSVC probably
      build PostgreSQL with MSVC.
      
      Discussion: https://postgr.es/m/20171130041441.GA3161526@rfd.leadboat.com
      9b5c9979
  17. Dec 08, 2017
  18. Dec 06, 2017
    • Robert Haas's avatar
      Report failure to start a background worker. · a8ef4e81
      Robert Haas authored
      When a worker is flagged as BGW_NEVER_RESTART and we fail to start it,
      or if it is not marked BGW_NEVER_RESTART but is terminated before
      startup succeeds, what BgwHandleStatus should be reported?  The
      previous code really hadn't considered this possibility (as indicated
      by the comments which ignore it completely) and would typically return
      BGWH_NOT_YET_STARTED, but that's not a good answer, because then
      there's no way for code using GetBackgroundWorkerPid() to tell the
      difference between a worker that has not started but will start
      later and a worker that has not started and will never be started.
      So, when this case happens, return BGWH_STOPPED instead.  Update the
      comments to reflect this.
      
      The preceding fix by itself is insufficient to fix the problem,
      because the old code also didn't send a notification to the process
      identified in bgw_notify_pid when startup failed.  That might've
      been technically correct under the theory that the status of the
      worker was BGWH_NOT_YET_STARTED, because the status would indeed not
      change when the worker failed to start, but now that we're more
      usefully reporting BGWH_STOPPED, a notification is needed.
      
      Without these fixes, code which starts background workers and then
      uses the recommended APIs to wait for those background workers to
      start would hang indefinitely if the postmaster failed to fork a
      worker.
      
      Amit Kapila and Robert Haas
      
      Discussion: http://postgr.es/m/CAA4eK1KDfKkvrjxsKJi3WPyceVi3dH1VCkbTJji2fuwKuB=3uw@mail.gmail.com
      a8ef4e81
  19. Dec 05, 2017
  20. Dec 04, 2017
    • Tom Lane's avatar
      Clean up assorted messiness around AllocateDir() usage. · 2a11b188
      Tom Lane authored
      This patch fixes a couple of low-probability bugs that could lead to
      reporting an irrelevant errno value (and hence possibly a wrong SQLSTATE)
      concerning directory-open or file-open failures.  It also fixes places
      where we took shortcuts in reporting such errors, either by using elog
      instead of ereport or by using ereport but forgetting to specify an
      errcode.  And it eliminates a lot of just plain redundant error-handling
      code.
      
      In service of all this, export fd.c's formerly-static function
      ReadDirExtended, so that external callers can make use of the coding
      pattern
      
      	dir = AllocateDir(path);
      	while ((de = ReadDirExtended(dir, path, LOG)) != NULL)
      
      if they'd like to treat directory-open failures as mere LOG conditions
      rather than errors.  Also fix FreeDir to be a no-op if we reach it
      with dir == NULL, as such a coding pattern would cause.
      
      Then, remove code at many call sites that was throwing an error or log
      message for AllocateDir failure, as ReadDir or ReadDirExtended can handle
      that job just fine.  Aside from being a net code savings, this gets rid of
      a lot of not-quite-up-to-snuff reports, as mentioned above.  (In some
      places these changes result in replacing a custom error message such as
      "could not open tablespace directory" with more generic wording "could not
      open directory", but it was agreed that the custom wording buys little as
      long as we report the directory name.)  In some other call sites where we
      can't just remove code, change the error reports to be fully
      project-style-compliant.
      
      Also reorder code in restoreTwoPhaseData that was acquiring a lock
      between AllocateDir and ReadDir; in the unlikely but surely not
      impossible case that LWLockAcquire changes errno, AllocateDir failures
      would be misreported.  There is no great value in opening the directory
      before acquiring TwoPhaseStateLock, so just do it in the other order.
      
      Also fix CheckXLogRemoved to guarantee that it preserves errno,
      as quite a number of call sites are implicitly assuming.  (Again,
      it's unlikely but I think not impossible that errno could change
      during a SpinLockAcquire.  If so, this function was broken for its
      own purposes as well as breaking callers.)
      
      And change a few places that were using not-per-project-style messages,
      such as "could not read directory" when "could not open directory" is
      more correct.
      
      Back-patch the exporting of ReadDirExtended, in case we have occasion
      to back-patch some fix that makes use of it; it's not needed right now
      but surely making it global is pretty harmless.  Also back-patch the
      restoreTwoPhaseData and CheckXLogRemoved fixes.  The rest of this is
      essentially cosmetic and need not get back-patched.
      
      Michael Paquier, with a bit of additional work by me
      
      Discussion: https://postgr.es/m/CAB7nPqRpOCxjiirHmebEFhXVTK7V5Jvw4bz82p7Oimtsm3TyZA@mail.gmail.com
      2a11b188
    • Tom Lane's avatar
      Support boolean columns in functional-dependency statistics. · bf2b317f
      Tom Lane authored
      There's no good reason that the multicolumn stats stuff shouldn't work on
      booleans.  But it looked only for "Var = pseudoconstant" clauses, and it
      will seldom find those for boolean Vars, since earlier phases of planning
      will fold "boolvar = true" or "boolvar = false" to just "boolvar" or
      "NOT boolvar" respectively.  Improve dependencies_clauselist_selectivity()
      to recognize such clauses as equivalent to equality restrictions.
      
      This fixes a failure of the extended stats mechanism to apply in a case
      reported by Vitaliy Garnashevich.  It's not a complete solution to his
      problem because the bitmap-scan costing code isn't consulting extended
      stats where it should, but that's surely an independent issue.
      
      In passing, improve some comments, get rid of a NumRelids() test that's
      redundant with the preceding bms_membership() test, and fix
      dependencies_clauselist_selectivity() so that estimatedclauses actually
      is a pure output argument as stated by its API contract.
      
      Back-patch to v10 where this code was introduced.
      
      Discussion: https://postgr.es/m/73a4936d-2814-dc08-ed0c-978f76f435b0@gmail.com
      bf2b317f
  21. Nov 30, 2017
    • Noah Misch's avatar
      Fix non-GNU makefiles for AIX make. · f8252b64
      Noah Misch authored
      Invoking the Makefile without an explicit target was building every
      possible target instead of just the "all" target.  Back-patch to 9.3
      (all supported versions).
      f8252b64
  22. Nov 29, 2017
  23. Nov 28, 2017
Loading