Commits · 6deb52b202e0f673b583b03ad141ccad6f8e7fba · Jakob Huber / postgres-lambda-diff

Jul 05, 2017
- Remove unnecessary pg_is_in_recovery calls in tests · 6deb52b2
  Peter Eisentraut authored 7 years ago
  
  Since pg_ctl promote already waits for recovery to end, these calls are obsolete.
  6deb52b2
Jul 03, 2017

Fix race condition in recovery/t/009_twophase.pl test. · 64767522

Tom Lane authored 7 years ago

Since reducing pg_ctl's reaction time in commit c61559ec, some
slower buildfarm members have shown erratic failures in this test.
The reason turns out to be that the test assumes synchronous
replication (because it does not provide any lag time for a commit
to replicate before shutting down the servers), but it had only
enabled sync rep in one direction.  The observed symptoms correspond
to failure to replicate the last committed transaction in the other
direction, which can be expected to happen if the shutdown command
is issued soon enough and we are providing no synchronous-commit
guarantees.

Fix that, and add a bit more paranoid state checking at the bottom
of the script.

Michael Paquier and myself

Discussion: https://postgr.es/m/908.1498965681@sss.pgh.pa.us

64767522

Jul 02, 2017

Try to improve readability of recovery/t/009_twophase.pl test. · 4e15387d

Tom Lane authored 7 years ago

The original coding here was very confusing, because it named the
two servers it set up "master" and "slave" even though it swapped
their replication roles multiple times. At any given point in the
script it was very unobvious whether "$node_master" actually referred
to the server named "master" or the other one. Instead, pick arbitrary
names for the two servers --- I used "london" and "paris" --- and
distinguish those permanent names from the nonce references $cur_master
and $cur_slave. Add logging to help distinguish which is which at
any given point. Also, use distinct data and transaction names to
make all the prepared transactions easily distinguishable in the
postmaster logs. (There was one place where we intentionally tested
that the server could cope with re-use of a transaction name, but
it seems like one place is sufficient for that purpose.)

Also, add checks at the end to make sure that all the transactions
that were supposed to be committed did survive.

Discussion: https://postgr.es/m/28238.1499010855@sss.pgh.pa.us

4e15387d

Improve TAP test function PostgresNode::poll_query_until(). · de3de0af

Tom Lane authored 7 years ago

Add an optional "expected" argument to override the default assumption
that we're waiting for the query to return "t". This allows replacing
a handwritten polling loop in recovery/t/007_sync_rep.pl with use of
poll_query_until(); AFAICS that's the only remaining ad-hoc polling
loop in our TAP tests.

Change poll_query_until() to probe ten times per second not once per
second. Like some similar changes I've been making recently, the
one-second interval seems to be rooted in ancient traditions rather
than the actual likely wait duration on modern machines. I'd consider
reducing it further if there were a convenient way to spawn just one
psql for the whole loop rather than one per probe attempt.

Discussion: https://postgr.es/m/12486.1498938782@sss.pgh.pa.us

de3de0af

Jul 01, 2017

Clean up misuse and nonuse of poll_query_until(). · b0f069d9

Tom Lane authored 7 years ago

Several callers of PostgresNode::poll_query_until() neglected to check
for failure; I do not think that's optional. Also, rewrite one place
that had reinvented poll_query_until() for no very good reason.

b0f069d9

Jun 29, 2017

Eat XIDs more efficiently in recovery TAP test. · 08aed660

Tom Lane authored 7 years ago

The point of this loop is to insert 1000 rows into the test table
and consume 1000 XIDs.  I can't see any good reason why it's useful
to launch 1000 psqls and 1000 backend processes to accomplish that.
Pushing the looping into a plpgsql DO block shaves about 10 seconds
off the runtime of the src/test/recovery TAP tests on my machine;
that's over 10% of the runtime of that test suite.

It is, in fact, sufficiently more efficient that we now demonstrably
need wait_slot_xmins() afterwards, or the slaves' xmins may not have
moved yet.

08aed660

Jun 26, 2017

Improve wait logic in TAP tests for streaming replication. · 5c77690f

Tom Lane authored 7 years ago

Remove hard-wired sleep(2) delays in 001_stream_rep.pl in favor of using
poll_query_until to check for the desired state to appear.  In addition,
add such a wait before the last test in the script, as it's possible
to demonstrate failures there after upcoming improvements in pg_ctl.

(We might end up adding polling before each of the get_slot_xmins calls in
this script, but I feel no great need to do that until shown necessary.)

In passing, clarify the description strings for some of the test cases.

Michael Paquier and Craig Ringer, pursuant to a complaint from me

Discussion: https://postgr.es/m/8962.1498425057@sss.pgh.pa.us

5c77690f

May 18, 2017
- Post-PG 10 beta1 pgperltidy run · ce554810
  Bruce Momjian authored 7 years ago
  
  ce554810
May 12, 2017

Avoid tests which crash the calling process on Windows · 734cb4c2

Andrew Dunstan authored 7 years ago

Certain recovery tests use the Perl IPC::Run module's start/kill_kill
method of processing. On at least some versions of perl this causes the
whole process and its caller to crash. If we ever find a better way of
doing these tests they can be re-enabled on this platform. This does not
affect Mingw or Cygwin builds, which use a different perl and a
different shell and so are not affected.

734cb4c2

May 11, 2017

Tom Lane authored 7 years ago

Per discussion, "location" is a rather vague term that could refer to
multiple concepts. "LSN" is an unambiguous term for WAL locations and
should be preferred. Some function names, view column names, and function
output argument names used "lsn" already, but others used "location",
as well as yet other terms such as "wal_position". Since we've already
renamed a lot of things in this area from "xlog" to "wal" for v10,
we may as well incur a bit more compatibility pain and make these names
all consistent.

David Rowley, minor additional docs hacking by me

Discussion: https://postgr.es/m/CAKJS1f8O0njDKe8ePFQ-LK5-EjwThsDws6ohJ-+c6nWK+oUxtg@mail.gmail.com

d10c626d

Apr 27, 2017

Additional tests for subtransactions in recovery · 0352c15e

Simon Riggs authored 7 years ago

Tests for normal and prepared transactions

Author: Nikhil Sontakke, placed in new test file by me

0352c15e

Apr 25, 2017

Set the priorities of all quorum synchronous standbys to 1. · 346199dc

Fujii Masao authored 7 years ago

In quorum-based synchronous replication, all the standbys listed in
synchronous_standby_names equally have chances to be chosen
as synchronous standbys. So they should have the same priority.
However, previously, quorum standbys whose names appear earlier
in the list were given higher priority values though the difference of
those priority values didn't affect the selection of synchronous standbys.
Users could see those "meaningless" priority values in pg_stat_replication
and this was confusing.

This commit gives all the quorum synchronous standbys the same
highest priority, i.e., 1, in order to remove such confusion.

Author: Fujii Masao
Reviewed-by: Masahiko Sawada, Kyotaro Horiguchi
Discussion: http://postgr.es/m/CAHGQGwEKOw=SmPLxJzkBsH6wwDBgOnVz46QjHbtsiZ-d-2RGUg@mail.gmail.com

346199dc

Apr 22, 2017

Make PostgresNode::append_conf append a newline automatically. · 8a19c1a3

Tom Lane authored 7 years ago

Although the documentation for append_conf said clearly that it didn't
add a newline, many test authors seem to have forgotten that ... or maybe
they just consulted the example at the top of the POD documentation,
which clearly shows adding a config entry without bothering to add a
trailing newline. The worst part of that is that it works, as long as
you don't do it more than once, since the backend isn't picky about
whether config files end with newlines. So there's not a strong forcing
function reminding test authors not to do it like that. Upshot is that
this is a terribly fragile way to go about things, and there's at least
one existing test case that is demonstrably broken and not testing what
it thinks it is.

Let's just make append_conf append a newline, instead; that is clearly
way safer than the old definition.

I also cleaned up a few call sites that were unnecessarily ugly.
(I left things alone in places where it's plausible that additional
config lines would need to be added someday.)

Back-patch the change in append_conf itself to 9.6 where it was added,
as having a definitional inconsistency between branches would obviously
be pretty hazardous for back-patching TAP tests. The other changes are
just cosmetic and don't need to be back-patched.

Discussion: https://postgr.es/m/19751.1492892376@sss.pgh.pa.us

8a19c1a3

Mar 29, 2017

Change 'diag' to 'note' in TAP tests · 2e74e636

Peter Eisentraut authored 7 years ago

Reduce noise from TAP tests by changing 'diag' to 'note', so output only
goes to the test's log file not stdout, unless in verbose mode. This
also removes the junk on screen when running the TAP tests in parallel.

Author: Craig Ringer <craig@2ndquadrant.com>

2e74e636

Mar 28, 2017

Cleanup slots during drop database · ff539da3

Simon Riggs authored 7 years ago

Automatically drop all logical replication slots associated with a
database when the database is dropped. Previously we threw an ERROR
if a slot existed. Now we throw ERROR only if a slot is active in
the database being dropped.

Craig Ringer

ff539da3

Mar 25, 2017

Report catalog_xmin separately in hot_standby_feedback · 5737c12d

Simon Riggs authored 7 years ago

If the upstream walsender is using a physical replication slot, store the
catalog_xmin in the slot's catalog_xmin field. If the upstream doesn't use a
slot and has only a PGPROC entry behaviour doesn't change, as we store the
combined xmin and catalog_xmin in the PGPROC entry.

Author: Craig Ringer

5737c12d

Fix recovery test hang · cd07f73d

Peter Eisentraut authored 7 years ago

The test would hang if a sufficient ~/.psqlrc was present.  Fix by using
psql -X.

cd07f73d

Mar 24, 2017

Add a txid_status function. · 857ee8e3

Robert Haas authored 7 years ago

If your connection to the database server is lost while a COMMIT is
in progress, it may be difficult to figure out whether the COMMIT was
successful or not.  This function will tell you, provided that you
don't wait too long to ask.  It may be useful in other situations,
too.

Craig Ringer, reviewed by Simon Riggs and by me

Discussion: http://postgr.es/m/CAMsr+YHQiWNEi0daCTboS40T+V5s_+dst3PYv_8v2wNVH+Xx4g@mail.gmail.com

857ee8e3

Mar 22, 2017

Teach xlogreader to follow timeline switches · 1148e22a

Simon Riggs authored 7 years ago

Uses page-based mechanism to ensure we’re using the correct timeline.

Tests are included to exercise the functionality using a cold disk-level copy
of the master that's started up as a replica with slots intact, but the
intended use of the functionality is with later features.

Craig Ringer, reviewed by Simon Riggs and Andres Freund

1148e22a

Avoid Perl warning · 9ca2dd57

Peter Eisentraut authored 7 years ago

Perl versions before 5.12 would warn "Use of implicit split to @_ is
deprecated".

Author: Jeff Janes <jeff.janes@gmail.com>

9ca2dd57

Mar 21, 2017

Add a pg_recvlogical wrapper to PostgresNode · eb2a6131

Simon Riggs authored 7 years ago

Allows testing of logical decoding using SQL interface and/or pg_recvlogical
Most logical decoding tests are in contrib/test_decoding. This module
is for work that doesn't fit well there, like where server restarts
are required.

Craig Ringer

eb2a6131

Feb 26, 2017
- TAP tests for target_session_attrs connection parameter. · caa6c1f1
  Robert Haas authored 8 years ago
  
  Michael Paquier
  caa6c1f1
Feb 21, 2017

Add tests for two-phase commit · 30820982

Alvaro Herrera authored 8 years ago

There's some ongoing performance work on this area, so let's make sure
we don't break things.

Extracted from a larger patch originally by Stas Kelvich.

Authors: Stas Kelvich, Nikhil Sontakke, Michael Paquier
Discussion: https://postgr.es/m/CAMGcDxfsuLLOg=h5cTg3g77Jjk-UGnt=RW7zK57zBSoFsapiWA@mail.gmail.com

30820982

Feb 09, 2017

Remove all references to "xlog" from SQL-callable functions in pg_proc. · 806091c9

Robert Haas authored 8 years ago

Commit f82ec32a renamed the pg_xlog
directory to pg_wal.  To make things consistent, and because "xlog" is
terrible terminology for either "transaction log" or "write-ahead log"
rename all SQL-callable functions that contain "xlog" in the name to
instead contain "wal".  (Note that this may pose an upgrade hazard for
some users.)

Similarly, rename the xlog_position argument of the functions that
create slots to be called wal_position.

Discussion: https://www.postgresql.org/message-id/CA+Tgmob=YmA=H3DbW1YuOXnFVgBheRmyDkWcD9M8f=5bGWYEoQ@mail.gmail.com

806091c9

Jan 26, 2017

Reset hot standby xmin on master after restart · ec4b9750

Simon Riggs authored 8 years ago

Hot_standby_feedback could be reset by reload and worked correctly, but if
the server was restarted rather than reloaded the xmin was not reset.
Force reset always if hot_standby_feedback is enabled at startup.

Ants Aasma, Craig Ringer

Reported-by: Ants Aasma

ec4b9750

Jan 14, 2017

Change default values for backup and replication parameters · f6d6d292

Magnus Hagander authored 8 years ago

This changes the default values of the following parameters:

wal_level = replica
max_wal_senders = 10
max_replication_slots = 10

in order to make it possible to make a backup and set up simple
replication on the default settings, without requiring a system restart.

Discussion: https://postgr.es/m/CABUevEy4PR_EAvZEzsbF5s+V0eEvw7shJ2t-AUwbHOjT+yRb3A@mail.gmail.com

Reviewed by Peter Eisentraut. Benchmark help from Tomas Vondra.

f6d6d292

Jan 04, 2017

Add 18 new recovery TAP tests · 0813216c

Simon Riggs authored 8 years ago

Add new tests for physical repl slots and hot standby feedback.

Craig Ringer, reviewed by Aleksander Alekseev and Simon Riggs

0813216c

Allow PostgresNode.pm tests to wait for catchup · fb093e4c

Simon Riggs authored 8 years ago

Add methods to the core test framework PostgresNode.pm to allow us to
test that standby nodes have caught up with the master, as well as
basic LSN handling.  Used in tests recovery/t/001_stream_rep.pl and
recovery/t/004_timeline_switch.pl

Craig Ringer, reviewed by Aleksander Alekseev and Simon Riggs

fb093e4c

Jan 03, 2017
- Update copyright via script for 2017 · 1d257792
  Bruce Momjian authored 8 years ago
  
  1d257792
Dec 19, 2016

Support quorum-based synchronous replication. · 3901fd70

Fujii Masao authored 8 years ago

This feature is also known as "quorum commit" especially in discussion
on pgsql-hackers.

This commit adds the following new syntaxes into synchronous_standby_names
GUC. By using FIRST and ANY keywords, users can specify the method to
choose synchronous standbys from the listed servers.

  FIRST num_sync (standby_name [, ...])
  ANY num_sync (standby_name [, ...])

The keyword FIRST specifies a priority-based synchronous replication
which was available also in 9.6 or before. This method makes transaction
commits wait until their WAL records are replicated to num_sync
synchronous standbys chosen based on their priorities.

The keyword ANY specifies a quorum-based synchronous replication
and makes transaction commits wait until their WAL records are
replicated to *at least* num_sync listed standbys. In this method,
the values of sync_state.pg_stat_replication for the listed standbys
are reported as "quorum". The priority is still assigned to each standby,
but not used in this method.

The existing syntaxes having neither FIRST nor ANY keyword are still
supported. They are the same as new syntax with FIRST keyword, i.e.,
a priorirty-based synchronous replication.

Author: Masahiko Sawada
Reviewed-By: Michael Paquier, Amit Kapila and me
Discussion: <CAD21AoAACi9NeC_ecm+Vahm+MMA6nYh=Kqs3KB3np+MBOS_gZg@mail.gmail.com>

Many thanks to the various individuals who were involved in
discussing and developing this feature.

3901fd70

Oct 27, 2016

Fix possible pg_basebackup failure on standby with "include WAL". · f267c1c2

Robert Haas authored 8 years ago

If a restartpoint flushed no dirty buffers, it could fail to update
the minimum recovery point, leading to a minimum recovery point prior
to the starting REDO location. perform_base_backup() would interpret
that as meaning that no WAL files at all needed to be included in the
backup, failing an internal sanity check. To fix, have restartpoints
always update the minimum recovery point to just after the checkpoint
record itself, so that the file (or files) containing the checkpoint
record will always be included in the backup.

Code by Amit Kapila, per a design suggestion by me, with some
additional work on the code comment by me. Test case by Michael
Paquier. Report by Kyotaro Horiguchi.

f267c1c2

Oct 19, 2016

Use pg_ctl promote -w in TAP tests · e5a9bcb5

Peter Eisentraut authored 8 years ago

Switch TAP tests to use the new wait mode of pg_ctl promote.  This
allows avoiding extra logic with poll_query_until() to be sure that a
promoted standby is ready for read-write queries.

From: Michael Paquier <michael.paquier@gmail.com>

e5a9bcb5

Fix WAL-logging of FSM and VM truncation. · 917dc7d2

Heikki Linnakangas authored 8 years ago

When a relation is truncated, it is important that the FSM is truncated as
well. Otherwise, after recovery, the FSM can return a page that has been
truncated away, leading to errors like:

ERROR:  could not read block 28991 in file "base/16390/572026": read only 0
of 8192 bytes

We were using MarkBufferDirtyHint() to dirty the buffer holding the last
remaining page of the FSM, but during recovery, that might in fact not
dirty the page, and the FSM update might be lost.

To fix, use the stronger MarkBufferDirty() function. MarkBufferDirty()
requires us to do WAL-logging ourselves, to protect from a torn page, if
checksumming is enabled.

Also fix an oversight in visibilitymap_truncate: it also needs to WAL-log
when checksumming is enabled.

Analysis by Pavan Deolasee.

Discussion: <CABOikdNr5vKucqyZH9s1Mh0XebLs_jRhKv6eJfNnD2wxTn=_9A@mail.gmail.com>

917dc7d2

Sep 05, 2016

Dirty replication slots when using sql interface · d851bef2

Simon Riggs authored 8 years ago

When pg_logical_slot_get_changes(...) sets confirmed_flush_lsn to the point at
which replay stopped, it doesn't dirty the replication slot. So if the replay
didn't cause restart_lsn or catalog_xmin to change as well, this change will
not get written out to disk. Even on a clean shutdown.

If Pg crashes or restarts, a subsequent pg_logical_slot_get_changes(...) call
will see the same changes already replayed since it uses the slot's
confirmed_flush_lsn as the start point for fetching changes. The caller can't
specify a start LSN when using the SQL interface.

Mark the slot as dirty after reading changes using the SQL interface so that
users won't see repeated changes after a clean shutdown. Repeated changes still
occur when using the walsender interface or after an unclean shutdown.

Craig Ringer

d851bef2

Sep 03, 2016
- New recovery target recovery_target_lsn · 35250b6a
  Simon Riggs authored 8 years ago
  
  Michael Paquier
  35250b6a
Aug 15, 2016
- Final pgindent + perltidy run for 9.6. · b5bce6c1
  Tom Lane authored 8 years ago
  
  b5bce6c1
Aug 03, 2016

Fix assorted problems in recovery tests · b26f7fa6

Alvaro Herrera authored 8 years ago

In test 001_stream_rep we're using pg_stat_replication.write_location to
determine catch-up status, but we care about xlog having been applied
not just received, so change that to apply_location.

In test 003_recovery_targets, we query the database for a recovery
target specification and later for the xlog position supposedly
corresponding to that recovery specification.  If for whatever reason
more WAL is written between the two queries, the recovery specification
is earlier than the xlog position used by the query in the test harness,
so we wait forever, leading to test failures.  Deal with this by using a
single query to extract both items.  In 2a0f89cd we tried to deal
with it by giving them more tests to run, but in hindsight that was
obviously doomed to failure (no revert of that, though).

Per hamster buildfarm failures.

Author: Michaël Paquier

b26f7fa6

Aug 02, 2016
- Consistently capitalize names of recovery tests · f6ced51f
  Peter Eisentraut authored 8 years ago
  
  f6ced51f
Jun 12, 2016
- Finish pgindent run for 9.6: Perl files. · 3be0a62f
  Noah Misch authored 8 years ago
  
  3be0a62f
May 04, 2016

Revert timeline following in replication slots · c1543a81

Alvaro Herrera authored 8 years ago

This reverts commits f07d18b6, 82c83b33, 3a3b3090, and
24c5f1a1.

This feature has shown enough immaturity that it was deemed better to
rip it out before rushing some more fixes at the last minute.  There are
discussions on larger changes in this area for the next release.

c1543a81