Commits · e477d148e70302f1f93a42f0ebf0e5770a0fd64f · Jakob Huber / postgres-lambda-diff

Jan 21, 2018

doc: update intermediate certificate instructions · e477d148

Bruce Momjian authored 7 years ago

Document how to properly create root and intermediate certificates using
v3_ca extensions and where to place intermediate certificates so they
are properly transferred to the remote side with the leaf certificate to
link to the remote root certificate.  This corrects docs that used to
say that intermediate certificates must be stored with the root
certificate.

Also add instructions on how to create root, intermediate, and leaf
certificates.

Discussion: https://postgr.es/m/20180116002238.GC12724@momjian.us

Reviewed-by: Michael Paquier

Backpatch-through: 9.3

e477d148

Jan 19, 2018

Fix StoreCatalogInheritance1 to use 32bit inhseqno · 61f08c01

Alvaro Herrera authored 7 years ago

For no apparent reason, this function was using a 16bit-wide inhseqno
value, rather than the correct 32 bit width which is what is stored in
the pg_inherits catalog. This becomes evident if you try to create a
table with more than 65535 parents, because this error appears:

ERROR: duplicate key value violates unique constraint «pg_inherits_relid_seqno_index»
DETAIL: Key (inhrelid, inhseqno)=(329371, 0) already exists.

Needless to say, having so many parents is an uncommon situations, which
explains why this error has never been reported despite being having
been introduced with the Postgres95 1.01 sources in commit d31084e9:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/creatinh.c;hb=d31084e9d111#l349

Backpatch all the way back.

David Rowley noticed this while reviewing a patch of mine.
Discussion: https://postgr.es/m/CAKJS1f8Dn7swSEhOWwzZzssW7747YB=2Hi+T7uGud40dur69-g@mail.gmail.com

61f08c01

Jan 18, 2018

Extend configure's __int128 test to check for a known gcc bug. · 5dcbdcbd

Tom Lane authored 7 years ago

On Sparc64, use of __attribute__(aligned(8)) with __int128 causes faulty
code generation in gcc versions at least through 5.5.0. We can work around
that by disabling use of __int128, so teach configure to test for the bug.

This solution doesn't fix things for the case of cross-compiling with a
buggy compiler; to support that nicely, we'd need to add a manual disable
switch. Unless more such cases turn up, it doesn't seem worth the work.
Affected users could always edit pg_config.h manually.

In passing, fix some typos in the existing configure test for __int128.
They're harmless because we only compile that code not run it, but
they're still confusing for anyone looking at it closely.

This is needed in support of commit 75180499, so back-patch to 9.5
as that was.

Marina Polyakova, Victor Wagner, Tom Lane

Discussion: https://postgr.es/m/0d3a9fa264cebe1cb9966f37b7c06e86@postgrespro.ru

5dcbdcbd

Jan 17, 2018

postgres_fdw: Avoid 'outer pathkeys do not match mergeclauses' error. · 3f05a30b

Robert Haas authored 7 years ago

When pushing down a join to a foreign server, postgres_fdw constructs
an alternative plan to be used for any EvalPlanQual rechecks that
prove to be necessary. This plan is stored as the outer subplan of
the Foreign Scan implementing the pushed-down join. Previously, this
alternative plan could have a different nominal sort ordering than its
parent, which seemed OK since there will only be one tuple per base
table anyway in the case of an EvalPlanQual recheck. Actually,
though, it caused a problem if that path was used as a building block
for the EvalPlanQual recheck plan of a higher-level foreign join,
because we could end up with a merge join one of whose inputs was not
labelled with the correct sort order. Repair by injecting an extra
Sort node into the EvalPlanQual recheck plan whenever it would
otherwise fail to be sorted at least as well as its parent Foreign
Scan.

Report by Jeff Janes. Patch by me, reviewed by Tom Lane, who also
provided the test case and comment text.

Discussion: http://postgr.es/m/CAMkU=1y2G8VOVBHv3iXU2TMAj7-RyBFFW1uhkr5sm9LQ2=X35g@mail.gmail.com

3f05a30b

Jan 15, 2018
- Cope with indicator arrays that do not have the correct length. · 8b89b7aa
  Michael Meskes authored 7 years ago
  
  Patch by: "Rader, David" <davidr@openscg.com>
  8b89b7aa
Jan 12, 2018

Fix postgres_fdw to cope with duplicate GROUP BY entries. · 67854bc5

Tom Lane authored 7 years ago

Commit 7012b132, which added the ability to push down aggregates and
grouping to the remote server, wasn't careful to ensure that the remote
server would have the same idea we do about which columns are the grouping
columns, in cases where there are textually identical GROUP BY expressions.
Such cases typically led to "targetlist item has multiple sortgroupref
labels" errors.

To fix this reliably, switch over to using "GROUP BY column-number" syntax
rather than "GROUP BY expression" in transmitted queries, and adjust
foreign_grouping_ok() to be more careful about duplicating the sortgroupref
labeling of the local pathtarget.

Per bug #14890 from Sean Johnston.  Back-patch to v10 where the buggy code
was introduced.

Jeevan Chalke, reviewed by Ashutosh Bapat

Discussion: https://postgr.es/m/20171107134948.1508.94783@wrigleys.postgresql.org

67854bc5

Avoid unnecessary failure in SELECT concurrent with ALTER NO INHERIT. · 55e5eb4d

Tom Lane authored 7 years ago

If a query against an inheritance tree runs concurrently with an ALTER
TABLE that's disinheriting one of the tree members, it's possible to get
a "could not find inherited attribute" error because after obtaining lock
on the removed member, make_inh_translation_list sees that its columns
have attinhcount=0 and decides they aren't the columns it's looking for.

An ideal fix, perhaps, would avoid including such a just-removed member
table in the query at all; but there seems no way to accomplish that
without adding expensive catalog rechecks or creating a likelihood of
deadlocks. Instead, let's just drop the check on attinhcount. In this
way, a query that's included a just-disinherited child will still
succeed, which is not a completely unreasonable behavior.

This problem has existed for a long time, so back-patch to all supported
branches. Also add an isolation test verifying related behaviors.

Patch by me; the new isolation test is based on Kyotaro Horiguchi's work.

Discussion: https://postgr.es/m/20170626.174612.23936762.horiguchi.kyotaro@lab.ntt.co.jp

55e5eb4d

Fix incorrect handling of subquery pullup in the presence of grouping sets. · d3ca1a6c

Tom Lane authored 7 years ago

If we flatten a subquery whose target list contains constants or
expressions, when those output columns are used in GROUPING SET columns,
the planner was capable of doing the wrong thing by merging a pulled-up
expression into the surrounding expression during const-simplification.
Then the late processing that attempts to match subexpressions to grouping
sets would fail to match those subexpressions to grouping sets, with the
effect that they'd not go to null when expected.

To fix, wrap such subquery outputs in PlaceHolderVars, ensuring that
they preserve their separate identity throughout the planner's expression
processing. This is a bit of a band-aid, because the wrapper defeats
const-simplification even in places where it would be safe to allow.
But a nicer fix would likely be too invasive to back-patch, and the
consequences of the missed optimizations probably aren't large in most
cases.

Back-patch to 9.5 where grouping sets were introduced.

Heikki Linnakangas, with small mods and better test cases by me;
additional review by Andrew Gierth

Discussion: https://postgr.es/m/7dbdcf5c-b5a6-ef89-4958-da212fe10176@iki.fi

d3ca1a6c

Jan 11, 2018

Fix behavior of ~> (cube, int) operator · b8279a78

Teodor Sigaev authored 7 years ago

~> (cube, int) operator was especially designed for knn-gist search.
However, it appears that knn-gist search can't work correctly with current
behavior of this operator when dataset contains cubes of variable
dimensionality. In this case, the same value of second operator argument
can point to different dimension depending on dimensionality of particular cube.
Such behavior is incompatible with gist indexing of cubes, and knn-gist doesn't
work correctly for it.

This patch changes behavior of ~> (cube, int) operator by introducing dimension
numbering where value of second argument unambiguously identifies number of
dimension. With new behavior, this operator can be correctly supported by
knn-gist. Relevant changes to cube operator class are also included.

Backpatch to v9.6 where operator was introduced.

Since behavior of ~> (cube, int) operator is changed, depending entities
must be refreshed after upgrade. Such as, expression indexes using this
operator must be reindexed, materialized views must be rebuilt, stored
procedures and client code must be revised to correctly use new behavior.
That should be mentioned in release notes.

Noticed by: Tomas Vondra
Author: Alexander Korotkov
Reviewed by: Tomas Vondra, Andrey Borodin
Discussion: https://www.postgresql.org/message-id/flat/a9657f6a-b497-36ff-e56-482a2c7e3292@2ndquadrant.com

b8279a78

Jan 10, 2018

Fix sample INSTR() functions in the plpgsql documentation. · 08adf688

Tom Lane authored 7 years ago

These functions are stated to be Oracle-compatible, but they weren't.
Yugo Nagata noticed that while our code returns zero for a zero or
negative fourth parameter (occur_index), Oracle throws an error.
Further testing by me showed that there was also a discrepancy in the
interpretation of a negative third parameter (beg_index): Oracle thinks
that a negative beg_index indicates the last place where the target
substring can *begin*, whereas our code thinks it is the last place
where the target can *end*.

Adjust the sample code to behave like Oracle in both these respects.
Also change it to be a CDATA[] section, simplifying copying-and-pasting
out of the documentation source file. And fix minor problems in the
introductory comment, which wasn't very complete or accurate.

Back-patch to all supported branches. Although this patch only touches
documentation, we should probably call it out as a bug fix in the next
minor release notes, since users who have adopted the functions will
likely want to update their versions.

Yugo Nagata and Tom Lane

Discussion: https://postgr.es/m/20171229191705.c0b43a8c.nagata@sraoss.co.jp

08adf688

Remove dubious micro-optimization in ckpt_buforder_comparator(). · 7eb0187a

Tom Lane authored 7 years ago

It seems incorrect to assume that the list of CkptSortItems can never
contain duplicate page numbers: concurrent activity could result in some
page getting dropped from a low-numbered buffer and later loaded into a
high-numbered buffer while BufferSync is scanning the buffer pool.
If that happened, the comparator would give self-inconsistent results,
potentially confusing qsort().  Saving one comparison step is not worth
possibly getting the sort wrong.

So far as I can tell, nothing would actually go wrong given our current
implementation of qsort().  It might get a bit slower than expected
if there were a large number of duplicates of one value, but that's
surely a probability-epsilon case.  Still, the comment is wrong,
and if we ever switched to another sort implementation it might be
less forgiving.

In passing, avoid casting away const-ness of the argument pointers;
I've not seen any compiler complaints from that, but it seems likely
that some compilers would not like it.

Back-patch to 9.6 where this code came in, just in case I've underestimated
the possible consequences.

Discussion: https://postgr.es/m/18437.1515607610@sss.pgh.pa.us

7eb0187a

Jan 09, 2018

Change some bogus PageGetLSN calls to BufferGetLSNAtomic · 37dd1128

Alvaro Herrera authored 7 years ago

As src/backend/access/transam/README says, PageGetLSN may only be called
by processes holding either exclusive lock on buffer, or a shared lock
on buffer plus buffer header lock. Therefore any place that only holds
a shared buffer lock must use BufferGetLSNAtomic instead of PageGetLSN,
which internally obtains buffer header lock prior to reading the LSN.

A few callsites failed to comply with this rule. This was detected by
running all tests under a new (not committed) assertion that verifies
PageGetLSN locking contract. All but one of the callsites that failed
the assertion are fixed by this patch. Remaining callsites were
inspected manually and determined not to need any change.

The exception (unfixed callsite) is in TestForOldSnapshot, which only
has a Page argument, making it impossible to access the corresponding
Buffer from it. Fixing that seems a much larger patch that will have to
be done separately; and that's just as well, since it was only
introduced in 9.6 and other bugs are much older.

Some of these bugs are ancient; backpatch all the way back to 9.3.

Authors: Jacob Champion, Asim Praveen, Ashwin Agrawal
Reviewed-by: Michaël Paquier
Discussion: https://postgr.es/m/CABAq_6GXgQDVu3u12mK9O5Xt5abBZWQ0V40LZCE+oUf95XyNFg@mail.gmail.com

37dd1128

While waiting for a condition variable, detect postmaster death. · d56a5f99

Tom Lane authored 7 years ago

The general assumption for postmaster child processes is that they
should just exit(1), reasonably promptly, if the postmaster disappears.
condition_variable.c neglected this consideration and could be left
waiting forever, if the counterpart process it is waiting for has
done the right thing and exited.

We had some discussion of adjusting the WaitEventSet API to make it
harder to make this type of mistake in future; but for the moment,
and for v10, let's make this narrow fix.

Discussion: https://postgr.es/m/20412.1515456143@sss.pgh.pa.us

d56a5f99

Fix race condition during replication origin drop. · 1f5adbd7

Tom Lane authored 7 years ago

replorigin_drop() misunderstood the API for condition variables: it
had ConditionVariablePrepareToSleep and ConditionVariableCancelSleep
inside its test-and-sleep loop, rather than outside the loop as
intended. The net effect is a narrow race-condition window wherein,
if the process using a replication slot releases it immediately after
replorigin_drop() releases the ReplicationOriginLock, replorigin_drop()
would get into the condition variable's wait list too late and then
wait indefinitely for a signal that won't come.

Because there's a different CV for each replication slot, we can't
just move the ConditionVariablePrepareToSleep call to above the
test-and-sleep loop. What we can do, in the wake of commit 13db3b93,
is drop the ConditionVariablePrepareToSleep call entirely. This fix
depends on that commit because (at least in principle) the slot matching
the target replication origin might move around, so that once in a blue
moon successive loop iterations might involve different CVs. We can now
cope with such a scenario, at the cost of an extra trip through the
retry loop.

(There are ways we could fix this bug without depending on that commit,
but they're all a lot more complicated than this way.)

While at it, upgrade the rather skimpy comments in this function.

Back-patch to v10 where this code came in.

Discussion: https://postgr.es/m/19947.1515455433@sss.pgh.pa.us

1f5adbd7

Allow ConditionVariable[PrepareTo]Sleep to auto-switch between CVs. · 4af2190e

Tom Lane authored 7 years ago

The original coding here insisted that callers manually cancel any prepared
sleep for one condition variable before starting a sleep on another one.
While that's not a huge burden today, it seems like a gotcha that will bite
us in future if the use of condition variables increases; anything we can
do to make the use of this API simpler and more robust is attractive.
Hence, allow these functions to automatically switch their attention to
a different CV when required. This is safe for the same reason it was OK
for commit aced5a92 to let a broadcast operation cancel any prepared CV
sleep: whenever we return to the other test-and-sleep loop, we will
automatically re-prepare that CV, paying at most an extra test of that
loop's exit condition.

Back-patch to v10 where condition variables were introduced. Ordinarily
we would probably not back-patch a change like this, but since it does not
invalidate any coding pattern that was legal before, it seems safe enough.
Furthermore, there's an open bug in replorigin_drop() for which the
simplest fix requires this. Even if we chose to fix that in some more
complicated way, the hazard would remain that we might back-patch some
other bug fix that requires this behavior.

Patch by me, reviewed by Thomas Munro.

Discussion: https://postgr.es/m/2437.1515368316@sss.pgh.pa.us

4af2190e

pg_upgrade: prevent check on live cluster from generating error · 1776c817

Bruce Momjian authored 7 years ago

Previously an inaccurate but harmless error was generated when running
--check on a live server before reporting the servers as compatible.
The fix is to split error reporting and exit control in the exec_prog()
API.

Reported-by: Daniel Westermann

Backpatch-through: 10

1776c817

Jan 06, 2018

Reorder steps in ConditionVariablePrepareToSleep for more safety. · 83fe2708

Tom Lane authored 7 years ago

In the admittedly-very-unlikely case that AddWaitEventToSet fails,
ConditionVariablePrepareToSleep would error out after already having
set cv_sleep_target, which is probably bad, and after having already
set cv_wait_event_set, which is very bad. Transaction abort might or
might not clean up cv_sleep_target properly; but there is nothing
that would be aware that the WaitEventSet wasn't fully constructed,
so that all future condition variable sleeps would be broken.
We can easily guard against these hazards with slight restructuring.

Back-patch to v10 where condition_variable.c was introduced.

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com

83fe2708

Rewrite ConditionVariableBroadcast() to avoid live-lock. · 1c77e990

Tom Lane authored 7 years ago

The original implementation of ConditionVariableBroadcast was, per its
self-description, "the dumbest way possible". Thomas Munro found out
it was a bit too dumb. An awakened process may immediately re-queue
itself, if the specific condition it's waiting for is not yet satisfied.
If this happens before ConditionVariableBroadcast is able to see the wait
queue as empty, then ConditionVariableBroadcast will re-awaken the same
process, repeating the cycle. Given unlucky timing this back-and-forth
can repeat indefinitely; loops lasting thousands of seconds have been
seen in testing.

To fix, add our own process to the end of the wait queue to serve as a
sentinel, and exit the broadcast loop once our process is not there
anymore. There are various special considerations described in the
comments, the principal disadvantage being that wakers can no longer
be sure whether they awakened a real waiter or just a sentinel. But in
practice nobody pays attention to the result of ConditionVariableSignal
or ConditionVariableBroadcast anyway, so that problem seems hypothetical.

Back-patch to v10 where condition_variable.c was introduced.

Tom Lane and Thomas Munro

Discussion: https://postgr.es/m/CAEepm=0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w@mail.gmail.com

1c77e990

Jan 05, 2018

pg_upgrade: remove C comment · 502b4118
Bruce Momjian authored 7 years ago
```
Revert another part of 959ee6d2 .

Backpatch-through: 10
```
502b4118
pg_upgrade: revert part of patch for ease of translation · 5f45c72e
Bruce Momjian authored 7 years ago
```
Revert part of 959ee6d2 .

Backpatch-through: 10
```
5f45c72e
pg_upgrade: simplify code layout in a few places · b3b05853
Bruce Momjian authored 7 years ago
```
Backpatch-through: 9.4 (9.3 didn't need improving)
```
b3b05853

Fix failure to delete spill files of aborted transactions · a19c262f

Alvaro Herrera authored 7 years ago

Logical decoding's reorderbuffer.c may spill transaction files to disk
when transactions are large. These are supposed to be removed when they
become "too old" by xid; but file removal requires the boundary LSNs of
the transaction to be known. The final_lsn is only set when we see the
commit or abort record for the transaction, but nothing sets the value
for transactions that crash, so the removal code misbehaves -- in
assertion-enabled builds, it crashes by a failed assertion.

To fix, modify the final_lsn of transactions that don't have a value
set, to the LSN of the very latest change in the transaction. This
causes the spilled files to be removed appropriately.

Author: Atsushi Torikoshi
Reviewed-by: Kyotaro HORIGUCHI, Craig Ringer, Masahiko Sawada
Discussion: https://postgr.es/m/54e4e488-186b-a056-6628-50628e4e4ebc@lab.ntt.co.jp

a19c262f

Jan 04, 2018

Fix new test case to not be endian-dependent. · 0dc5dfcd

Tom Lane authored 7 years ago

Per buildfarm.

Discussion: https://postgr.es/m/ec295792-a69f-350f-6287-25a20e8f31d5@gmail.com

0dc5dfcd

Fix incorrect computations of length of null bitmap in pageinspect. · 5ad1b172

Tom Lane authored 7 years ago

Instead of using our standard macro for this calculation, this code
did it itself ... and got it wrong, leading to incorrect display of
the null bitmap in some cases.  Noted and fixed by Maksim Milyutin.

In passing, remove a uselessly duplicative error check.

Errors were introduced in commit d6061f83; back-patch to 9.6
where that came in.

Maksim Milyutin, reviewed by Andrey Borodin

Discussion: https://postgr.es/m/ec295792-a69f-350f-6287-25a20e8f31d5@gmail.com

5ad1b172

Jan 03, 2018

Revert "Fix isolation test to be less timing-dependent" · a6565da8

Alvaro Herrera authored 7 years ago

This reverts commit 2268e6af. It turned out that inconsistency in
the report is still possible, so go back to the simpler formulation of
the test and instead add an alternate expected output.

Discussion: https://postgr.es/m/20180103193728.ysqpcp2xjnqpiep7@alvherre.pgsql

a6565da8

Rename pg_rewind's copy_file_range() to avoid conflict with new linux syscall. · e3fdb7c0

Andres Freund authored 7 years ago

Upcoming versions of glibc will contain copy_file_range(2), a wrapper
around a new linux syscall for in-kernel copying of data ranges. This
conflicts with pg_rewinds function of the same name.

Therefore rename pg_rewinds version. As our version isn't a generic
copying facility we decided to choose a rewind specific function name.

Per buildfarm animal caiman and subsequent discussion with Tom Lane.

Author: Andres Freund
Discussion:
    https://postgr.es/m/20180103033425.w7jkljth3e26sduc@alap3.anarazel.de
    https://postgr.es/m/31122.1514951044@sss.pgh.pa.us
Backpatch: 9.5-, where pg_rewind was introduced

e3fdb7c0

Fix use of config-specific libraries for Windows OpenSSL · 0fb69340

Andrew Dunstan authored 7 years ago

Commit 614350a3 allowed for an different builds of OpenSSL libraries on
Windows, but ignored the fact that the alternative builds don't have
config-specific libraries. This patch fixes the Solution file to ask for
the correct libraries.

per offline discussions with Leonardo Cecchi and Marco Nenciarini,

Backpatch to all live branches.

0fb69340

Make XactLockTableWait work for transactions that are not yet self-locked · 43fb4b42

Alvaro Herrera authored 7 years ago

XactLockTableWait assumed that its xid argument has already added itself
to the lock table.  That assumption led to another assumption that if
locking the xid has succeeded but the xid is reported as still in
progress, then the input xid must have been a subtransaction.

These assumptions hold true for the original uses of this code in
locking related to on-disk tuples, but they break down in logical
replication slot snapshot building -- in particular, when a standby
snapshot logged contains an xid that's already in ProcArray but not yet
in the lock table.  This leads to assertion failures that can be
reproduced all the way back to 9.4, when logical decoding was
introduced.

To fix, change SubTransGetParent to SubTransGetTopmostTransaction which
has a slightly different API: it returns the argument Xid if there is no
parent, and it goes all the way to the top instead of moving up the
levels one by one.  Also, to avoid busy-waiting, add a 1ms sleep to give
the other process time to register itself in the lock table.

For consistency, change ConditionalXactLockTableWait the same way.

Author: Petr Jelínek
Discussion: https://postgr.es/m/1B3E32D8-FCF4-40B4-AEF9-5C0E3AC57969@postgrespro.ru
Reported-by: Konstantin Knizhnik
Diagnosed-by: Stas Kelvich, Petr Jelínek
Reviewed-by: Andres Freund, Robert Haas

43fb4b42

Fix isolation test to be less timing-dependent · 87c44b1a

Alvaro Herrera authored 7 years ago

I did this by adding another locking process, which makes the other two
wait.  This way the output should be stable enough.

Per buildfarm and Andres Freund
Discussion: https://postgr.es/m/20180103034445.t3utrtrnrevfsghm@alap3.anarazel.de

87c44b1a

Update copyright for 2018 · 724bceae
Bruce Momjian authored 7 years ago
```
Backpatch-through: certain files through 9.3
```
724bceae

Jan 02, 2018

Fix deadlock hazard in CREATE INDEX CONCURRENTLY · 6d2a9ae0

Alvaro Herrera authored 7 years ago

Multiple sessions doing CREATE INDEX CONCURRENTLY simultaneously are
supposed to be able to work in parallel, as evidenced by fixes in commit
c3d09b3b specifically to support this case.  In reality, one of the
sessions would be aborted by a misterious "deadlock detected" error.

Jeff Janes diagnosed that this is because of leftover snapshots used for
system catalog scans -- this was broken by 8aa3e475 keeping track of
(registering) the catalog snapshot.  To fix the deadlocks, it's enough
to de-register that snapshot prior to waiting.

Backpatch to 9.4, which introduced MVCC catalog scans.

Include an isolationtester spec that 8 out of 10 times reproduces the
deadlock with the unpatched code for me (Álvaro).

Author: Jeff Janes
Diagnosed-by: Jeff Janes
Reported-by: Jeremy Finzel
Discussion: https://postgr.es/m/CAMa1XUhHjCv8Qkx0WOr1Mpm_R4qxN26EibwCrj0Oor2YBUFUTg%40mail.gmail.com

6d2a9ae0

Jan 01, 2018

In tests, await an LSN no later than the recovery target. · 5f5d73dd

Noah Misch authored 7 years ago

Otherwise, the test fails with "Timed out while waiting for standby to
catch up". This happened rarely, perhaps only when autovacuum wrote WAL
between our choosing the recovery target and choosing the LSN to await.
Commit b26f7fa6 fixed one case of this.
Fix two more. Back-patch to 9.6, which introduced the affected test.

Discussion: https://postgr.es/m/20180101055227.GA2952815@rfd.leadboat.com

5f5d73dd

Dec 29, 2017

Properly set base backup backends to active in pg_stat_activity · b38c3d58

Magnus Hagander authored 7 years ago

When walsenders were included in pg_stat_activity, only the ones
actually streaming WAL were listed as active when they were active. In
particular, the connections sending base backups were listed as being
idle. Which means that a regular pg_basebackup would show up with one
active and one idle connection, when both were active.

This patch updates to set all walsenders to active when they are
(including those doing very fast things like IDENTIFY_SYSTEM), and then
back to idle. Details about exactly what they are doing is available in
pg_stat_replication.

Patch by me, review by Michael Paquier and David Steele.

b38c3d58

Dec 27, 2017

Update relation's stats in pg_class during vacuum full. · bdbf29aa

Teodor Sigaev authored 7 years ago

Hash index depends on estimation of numbers of tuples and pages of relations,
incorrect value could be a reason of significantly growing of index. Vacuum
full recreates heap and reindex all indexes before renewal stats. The patch
fixes that, so indexes will see correct values.

Backpatch to v10 only because earlier versions haven't usable hash index and
growing of hash index is a single user-visible symptom.

Author: Amit Kapila
Reviewed-by: Ashutosh Sharma, me
Discussion: https://www.postgresql.org/message-id/flat/20171115232922.5tomkxnw3iq6jsg7@inml.weebeastie.net

bdbf29aa

Dec 22, 2017

Fix UNION/INTERSECT/EXCEPT over no columns. · c252ccda

Tom Lane authored 7 years ago

Since 9.4, we've allowed the syntax "select union select" and variants
of that. However, the planner wasn't expecting a no-column set operation
and ended up treating the set operation as if it were UNION ALL.

Turns out it's trivial to fix in v10 and later; we just need to be careful
about not generating a Sort node with no sort keys. However, since a weird
corner case like this is never going to be exercised by developers, we'd
better have thorough regression tests if we want to consider it supported.

Per report from Victor Yegorov.

Discussion: https://postgr.es/m/CAGnEbojGJrRSOgJwNGM7JSJZpVAf8xXcVPbVrGdhbVEHZ-BUMw@mail.gmail.com

c252ccda

Dec 21, 2017

Cancel CV sleep during subtransaction abort. · f3decdc9

Robert Haas authored 7 years ago

Generally, error recovery paths that need to do things like
LWLockReleaseAll and pgstat_report_wait_end also need to call
ConditionVariableCancelSleep, but AbortSubTransaction was missed.

Since subtransaction abort might destroy up the DSM segment that
contains the ConditionVariable stored in cv_sleep_target, this
can result in a crash for anything using condition variables.

Reported and diagnosed by Andres Freund.

Discussion: http://postgr.es/m/20171221110048.rxk6464azzl5t2fi@alap3.anarazel.de

f3decdc9

Dec 20, 2017

When passing query strings to workers, pass the terminating \0. · 7be0d775

Robert Haas authored 7 years ago

Otherwise, when the query string is read, we might trailing garbage
beyond the end, unless there happens to be a \0 there by good luck.

Report and patch by Thomas Munro. Reviewed by Rafia Sabih.

Discussion: http://postgr.es/m/CAEepm=2SJs7X+_vx8QoDu8d1SMEOxtLhxxLNzZun_BvNkuNhrw@mail.gmail.com

7be0d775

Dec 19, 2017

Try again to fix accumulation of parallel worker instrumentation. · 72567f61

Robert Haas authored 7 years ago

When a Gather or Gather Merge node is started and stopped multiple
times, accumulate instrumentation data only once, at the end, instead
of after each execution, to avoid recording inflated totals.

Commit 778e78ae, the previous attempt
at a fix, instead reset the state after every execution, which worked
for the general instrumentation data but had problems for the additional
instrumentation specific to Sort and Hash nodes.

Report by hubert depesz lubaczewski.  Analysis and fix by Amit Kapila,
following a design proposal from Thomas Munro, with a comment tweak
by me.

Discussion: http://postgr.es/m/20171127175631.GA405@depesz.com

72567f61

Dec 18, 2017

doc: Fix figures in example description · db2ee079
Peter Eisentraut authored 7 years ago
```
oversight in 244c8b46

Reported-by: Blaz Merela <blaz@merela.org>
```
db2ee079

Fix bug in cancellation of non-exclusive backup to avoid assertion failure. · 133d2fab

Fujii Masao authored 7 years ago

Previously an assertion failure occurred when pg_stop_backup() for
non-exclusive backup was aborted while it's waiting for WAL files to
be archived. This assertion failure happened in do_pg_abort_backup()
which was called when a non-exclusive backup was canceled.
do_pg_abort_backup() assumes that there is at least one non-exclusive
backup running when it's called. But pg_stop_backup() can be canceled
even after it marks the end of non-exclusive backup (e.g.,
during waiting for WAL archiving). This broke the assumption that
do_pg_abort_backup() relies on, and which caused an assertion failure.

This commit changes do_pg_abort_backup() so that it does nothing
when non-exclusive backup has been already marked as completed.
That is, the asssumption is also changed, and do_pg_abort_backup()
now can handle even the case where it's called when there is
no running backup.

Backpatch to 9.6 where SQL-callable non-exclusive backup was added.

Author: Masahiko Sawada and Michael Paquier
Reviewed-By: Robert Haas and Fujii Masao
Discussion: https://www.postgresql.org/message-id/CAD21AoD2L1Fu2c==gnVASMyFAAaq3y-AQ2uEVj-zTCGFFjvmDg@mail.gmail.com

133d2fab