- Jul 05, 2017
-
-
Peter Eisentraut authored
Since pg_ctl promote already waits for recovery to end, these calls are obsolete.
-
- Jul 03, 2017
-
-
Tom Lane authored
Since reducing pg_ctl's reaction time in commit c61559ec, some slower buildfarm members have shown erratic failures in this test. The reason turns out to be that the test assumes synchronous replication (because it does not provide any lag time for a commit to replicate before shutting down the servers), but it had only enabled sync rep in one direction. The observed symptoms correspond to failure to replicate the last committed transaction in the other direction, which can be expected to happen if the shutdown command is issued soon enough and we are providing no synchronous-commit guarantees. Fix that, and add a bit more paranoid state checking at the bottom of the script. Michael Paquier and myself Discussion: https://postgr.es/m/908.1498965681@sss.pgh.pa.us
-
- Jul 02, 2017
-
-
Tom Lane authored
The original coding here was very confusing, because it named the two servers it set up "master" and "slave" even though it swapped their replication roles multiple times. At any given point in the script it was very unobvious whether "$node_master" actually referred to the server named "master" or the other one. Instead, pick arbitrary names for the two servers --- I used "london" and "paris" --- and distinguish those permanent names from the nonce references $cur_master and $cur_slave. Add logging to help distinguish which is which at any given point. Also, use distinct data and transaction names to make all the prepared transactions easily distinguishable in the postmaster logs. (There was one place where we intentionally tested that the server could cope with re-use of a transaction name, but it seems like one place is sufficient for that purpose.) Also, add checks at the end to make sure that all the transactions that were supposed to be committed did survive. Discussion: https://postgr.es/m/28238.1499010855@sss.pgh.pa.us
-
Tom Lane authored
Add an optional "expected" argument to override the default assumption that we're waiting for the query to return "t". This allows replacing a handwritten polling loop in recovery/t/007_sync_rep.pl with use of poll_query_until(); AFAICS that's the only remaining ad-hoc polling loop in our TAP tests. Change poll_query_until() to probe ten times per second not once per second. Like some similar changes I've been making recently, the one-second interval seems to be rooted in ancient traditions rather than the actual likely wait duration on modern machines. I'd consider reducing it further if there were a convenient way to spawn just one psql for the whole loop rather than one per probe attempt. Discussion: https://postgr.es/m/12486.1498938782@sss.pgh.pa.us
-
- Jul 01, 2017
-
-
Tom Lane authored
Several callers of PostgresNode::poll_query_until() neglected to check for failure; I do not think that's optional. Also, rewrite one place that had reinvented poll_query_until() for no very good reason.
-
- Jun 29, 2017
-
-
Tom Lane authored
The point of this loop is to insert 1000 rows into the test table and consume 1000 XIDs. I can't see any good reason why it's useful to launch 1000 psqls and 1000 backend processes to accomplish that. Pushing the looping into a plpgsql DO block shaves about 10 seconds off the runtime of the src/test/recovery TAP tests on my machine; that's over 10% of the runtime of that test suite. It is, in fact, sufficiently more efficient that we now demonstrably need wait_slot_xmins() afterwards, or the slaves' xmins may not have moved yet.
-
- Jun 26, 2017
-
-
Tom Lane authored
Remove hard-wired sleep(2) delays in 001_stream_rep.pl in favor of using poll_query_until to check for the desired state to appear. In addition, add such a wait before the last test in the script, as it's possible to demonstrate failures there after upcoming improvements in pg_ctl. (We might end up adding polling before each of the get_slot_xmins calls in this script, but I feel no great need to do that until shown necessary.) In passing, clarify the description strings for some of the test cases. Michael Paquier and Craig Ringer, pursuant to a complaint from me Discussion: https://postgr.es/m/8962.1498425057@sss.pgh.pa.us
-
- May 18, 2017
-
-
Bruce Momjian authored
-
- May 12, 2017
-
-
Andrew Dunstan authored
Certain recovery tests use the Perl IPC::Run module's start/kill_kill method of processing. On at least some versions of perl this causes the whole process and its caller to crash. If we ever find a better way of doing these tests they can be re-enabled on this platform. This does not affect Mingw or Cygwin builds, which use a different perl and a different shell and so are not affected.
-
- May 11, 2017
-
-
Tom Lane authored
Per discussion, "location" is a rather vague term that could refer to multiple concepts. "LSN" is an unambiguous term for WAL locations and should be preferred. Some function names, view column names, and function output argument names used "lsn" already, but others used "location", as well as yet other terms such as "wal_position". Since we've already renamed a lot of things in this area from "xlog" to "wal" for v10, we may as well incur a bit more compatibility pain and make these names all consistent. David Rowley, minor additional docs hacking by me Discussion: https://postgr.es/m/CAKJS1f8O0njDKe8ePFQ-LK5-EjwThsDws6ohJ-+c6nWK+oUxtg@mail.gmail.com
-
- Apr 27, 2017
-
-
Simon Riggs authored
Tests for normal and prepared transactions Author: Nikhil Sontakke, placed in new test file by me
-
- Apr 25, 2017
-
-
Fujii Masao authored
In quorum-based synchronous replication, all the standbys listed in synchronous_standby_names equally have chances to be chosen as synchronous standbys. So they should have the same priority. However, previously, quorum standbys whose names appear earlier in the list were given higher priority values though the difference of those priority values didn't affect the selection of synchronous standbys. Users could see those "meaningless" priority values in pg_stat_replication and this was confusing. This commit gives all the quorum synchronous standbys the same highest priority, i.e., 1, in order to remove such confusion. Author: Fujii Masao Reviewed-by: Masahiko Sawada, Kyotaro Horiguchi Discussion: http://postgr.es/m/CAHGQGwEKOw=SmPLxJzkBsH6wwDBgOnVz46QjHbtsiZ-d-2RGUg@mail.gmail.com
-
- Apr 22, 2017
-
-
Tom Lane authored
Although the documentation for append_conf said clearly that it didn't add a newline, many test authors seem to have forgotten that ... or maybe they just consulted the example at the top of the POD documentation, which clearly shows adding a config entry without bothering to add a trailing newline. The worst part of that is that it works, as long as you don't do it more than once, since the backend isn't picky about whether config files end with newlines. So there's not a strong forcing function reminding test authors not to do it like that. Upshot is that this is a terribly fragile way to go about things, and there's at least one existing test case that is demonstrably broken and not testing what it thinks it is. Let's just make append_conf append a newline, instead; that is clearly way safer than the old definition. I also cleaned up a few call sites that were unnecessarily ugly. (I left things alone in places where it's plausible that additional config lines would need to be added someday.) Back-patch the change in append_conf itself to 9.6 where it was added, as having a definitional inconsistency between branches would obviously be pretty hazardous for back-patching TAP tests. The other changes are just cosmetic and don't need to be back-patched. Discussion: https://postgr.es/m/19751.1492892376@sss.pgh.pa.us
-
- Mar 29, 2017
-
-
Peter Eisentraut authored
Reduce noise from TAP tests by changing 'diag' to 'note', so output only goes to the test's log file not stdout, unless in verbose mode. This also removes the junk on screen when running the TAP tests in parallel. Author: Craig Ringer <craig@2ndquadrant.com>
-
- Mar 28, 2017
-
-
Simon Riggs authored
Automatically drop all logical replication slots associated with a database when the database is dropped. Previously we threw an ERROR if a slot existed. Now we throw ERROR only if a slot is active in the database being dropped. Craig Ringer
-
- Mar 25, 2017
-
-
Simon Riggs authored
If the upstream walsender is using a physical replication slot, store the catalog_xmin in the slot's catalog_xmin field. If the upstream doesn't use a slot and has only a PGPROC entry behaviour doesn't change, as we store the combined xmin and catalog_xmin in the PGPROC entry. Author: Craig Ringer
-
Peter Eisentraut authored
The test would hang if a sufficient ~/.psqlrc was present. Fix by using psql -X.
-
- Mar 24, 2017
-
-
Robert Haas authored
If your connection to the database server is lost while a COMMIT is in progress, it may be difficult to figure out whether the COMMIT was successful or not. This function will tell you, provided that you don't wait too long to ask. It may be useful in other situations, too. Craig Ringer, reviewed by Simon Riggs and by me Discussion: http://postgr.es/m/CAMsr+YHQiWNEi0daCTboS40T+V5s_+dst3PYv_8v2wNVH+Xx4g@mail.gmail.com
-
- Mar 22, 2017
-
-
Simon Riggs authored
Uses page-based mechanism to ensure we’re using the correct timeline. Tests are included to exercise the functionality using a cold disk-level copy of the master that's started up as a replica with slots intact, but the intended use of the functionality is with later features. Craig Ringer, reviewed by Simon Riggs and Andres Freund
-
Peter Eisentraut authored
Perl versions before 5.12 would warn "Use of implicit split to @_ is deprecated". Author: Jeff Janes <jeff.janes@gmail.com>
-
- Mar 21, 2017
-
-
Simon Riggs authored
Allows testing of logical decoding using SQL interface and/or pg_recvlogical Most logical decoding tests are in contrib/test_decoding. This module is for work that doesn't fit well there, like where server restarts are required. Craig Ringer
-
- Feb 26, 2017
-
-
Robert Haas authored
Michael Paquier
-
- Feb 21, 2017
-
-
Alvaro Herrera authored
There's some ongoing performance work on this area, so let's make sure we don't break things. Extracted from a larger patch originally by Stas Kelvich. Authors: Stas Kelvich, Nikhil Sontakke, Michael Paquier Discussion: https://postgr.es/m/CAMGcDxfsuLLOg=h5cTg3g77Jjk-UGnt=RW7zK57zBSoFsapiWA@mail.gmail.com
-
- Feb 09, 2017
-
-
Robert Haas authored
Commit f82ec32a renamed the pg_xlog directory to pg_wal. To make things consistent, and because "xlog" is terrible terminology for either "transaction log" or "write-ahead log" rename all SQL-callable functions that contain "xlog" in the name to instead contain "wal". (Note that this may pose an upgrade hazard for some users.) Similarly, rename the xlog_position argument of the functions that create slots to be called wal_position. Discussion: https://www.postgresql.org/message-id/CA+Tgmob=YmA=H3DbW1YuOXnFVgBheRmyDkWcD9M8f=5bGWYEoQ@mail.gmail.com
-
- Jan 26, 2017
-
-
Simon Riggs authored
Hot_standby_feedback could be reset by reload and worked correctly, but if the server was restarted rather than reloaded the xmin was not reset. Force reset always if hot_standby_feedback is enabled at startup. Ants Aasma, Craig Ringer Reported-by: Ants Aasma
-
- Jan 14, 2017
-
-
Magnus Hagander authored
This changes the default values of the following parameters: wal_level = replica max_wal_senders = 10 max_replication_slots = 10 in order to make it possible to make a backup and set up simple replication on the default settings, without requiring a system restart. Discussion: https://postgr.es/m/CABUevEy4PR_EAvZEzsbF5s+V0eEvw7shJ2t-AUwbHOjT+yRb3A@mail.gmail.com Reviewed by Peter Eisentraut. Benchmark help from Tomas Vondra.
-
- Jan 04, 2017
-
-
Simon Riggs authored
Add new tests for physical repl slots and hot standby feedback. Craig Ringer, reviewed by Aleksander Alekseev and Simon Riggs
-
Simon Riggs authored
Add methods to the core test framework PostgresNode.pm to allow us to test that standby nodes have caught up with the master, as well as basic LSN handling. Used in tests recovery/t/001_stream_rep.pl and recovery/t/004_timeline_switch.pl Craig Ringer, reviewed by Aleksander Alekseev and Simon Riggs
-
- Jan 03, 2017
-
-
Bruce Momjian authored
-
- Dec 19, 2016
-
-
Fujii Masao authored
This feature is also known as "quorum commit" especially in discussion on pgsql-hackers. This commit adds the following new syntaxes into synchronous_standby_names GUC. By using FIRST and ANY keywords, users can specify the method to choose synchronous standbys from the listed servers. FIRST num_sync (standby_name [, ...]) ANY num_sync (standby_name [, ...]) The keyword FIRST specifies a priority-based synchronous replication which was available also in 9.6 or before. This method makes transaction commits wait until their WAL records are replicated to num_sync synchronous standbys chosen based on their priorities. The keyword ANY specifies a quorum-based synchronous replication and makes transaction commits wait until their WAL records are replicated to *at least* num_sync listed standbys. In this method, the values of sync_state.pg_stat_replication for the listed standbys are reported as "quorum". The priority is still assigned to each standby, but not used in this method. The existing syntaxes having neither FIRST nor ANY keyword are still supported. They are the same as new syntax with FIRST keyword, i.e., a priorirty-based synchronous replication. Author: Masahiko Sawada Reviewed-By: Michael Paquier, Amit Kapila and me Discussion: <CAD21AoAACi9NeC_ecm+Vahm+MMA6nYh=Kqs3KB3np+MBOS_gZg@mail.gmail.com> Many thanks to the various individuals who were involved in discussing and developing this feature.
-
- Oct 27, 2016
-
-
Robert Haas authored
If a restartpoint flushed no dirty buffers, it could fail to update the minimum recovery point, leading to a minimum recovery point prior to the starting REDO location. perform_base_backup() would interpret that as meaning that no WAL files at all needed to be included in the backup, failing an internal sanity check. To fix, have restartpoints always update the minimum recovery point to just after the checkpoint record itself, so that the file (or files) containing the checkpoint record will always be included in the backup. Code by Amit Kapila, per a design suggestion by me, with some additional work on the code comment by me. Test case by Michael Paquier. Report by Kyotaro Horiguchi.
-
- Oct 19, 2016
-
-
Peter Eisentraut authored
Switch TAP tests to use the new wait mode of pg_ctl promote. This allows avoiding extra logic with poll_query_until() to be sure that a promoted standby is ready for read-write queries. From: Michael Paquier <michael.paquier@gmail.com>
-
Heikki Linnakangas authored
When a relation is truncated, it is important that the FSM is truncated as well. Otherwise, after recovery, the FSM can return a page that has been truncated away, leading to errors like: ERROR: could not read block 28991 in file "base/16390/572026": read only 0 of 8192 bytes We were using MarkBufferDirtyHint() to dirty the buffer holding the last remaining page of the FSM, but during recovery, that might in fact not dirty the page, and the FSM update might be lost. To fix, use the stronger MarkBufferDirty() function. MarkBufferDirty() requires us to do WAL-logging ourselves, to protect from a torn page, if checksumming is enabled. Also fix an oversight in visibilitymap_truncate: it also needs to WAL-log when checksumming is enabled. Analysis by Pavan Deolasee. Discussion: <CABOikdNr5vKucqyZH9s1Mh0XebLs_jRhKv6eJfNnD2wxTn=_9A@mail.gmail.com>
-
- Sep 05, 2016
-
-
Simon Riggs authored
When pg_logical_slot_get_changes(...) sets confirmed_flush_lsn to the point at which replay stopped, it doesn't dirty the replication slot. So if the replay didn't cause restart_lsn or catalog_xmin to change as well, this change will not get written out to disk. Even on a clean shutdown. If Pg crashes or restarts, a subsequent pg_logical_slot_get_changes(...) call will see the same changes already replayed since it uses the slot's confirmed_flush_lsn as the start point for fetching changes. The caller can't specify a start LSN when using the SQL interface. Mark the slot as dirty after reading changes using the SQL interface so that users won't see repeated changes after a clean shutdown. Repeated changes still occur when using the walsender interface or after an unclean shutdown. Craig Ringer
-
- Sep 03, 2016
-
-
Simon Riggs authored
Michael Paquier
-
- Aug 15, 2016
-
-
Tom Lane authored
-
- Aug 03, 2016
-
-
Alvaro Herrera authored
In test 001_stream_rep we're using pg_stat_replication.write_location to determine catch-up status, but we care about xlog having been applied not just received, so change that to apply_location. In test 003_recovery_targets, we query the database for a recovery target specification and later for the xlog position supposedly corresponding to that recovery specification. If for whatever reason more WAL is written between the two queries, the recovery specification is earlier than the xlog position used by the query in the test harness, so we wait forever, leading to test failures. Deal with this by using a single query to extract both items. In 2a0f89cd we tried to deal with it by giving them more tests to run, but in hindsight that was obviously doomed to failure (no revert of that, though). Per hamster buildfarm failures. Author: Michaël Paquier
-
- Aug 02, 2016
-
-
Peter Eisentraut authored
-
- Jun 12, 2016
-
-
Noah Misch authored
-
- May 04, 2016
-
-
Alvaro Herrera authored
This reverts commits f07d18b6, 82c83b33, 3a3b3090, and 24c5f1a1. This feature has shown enough immaturity that it was deemed better to rip it out before rushing some more fixes at the last minute. There are discussions on larger changes in this area for the next release.
-