diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index 01c8ed7d6e61d2f7798695192b5183892709e762..e5d7597dc3d1cc0db1dea6d1345df962e945faba 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -1,4 +1,4 @@ -<!-- $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.84 2006/09/15 21:55:07 momjian Exp $ --> +<!-- $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.85 2006/09/15 22:02:21 momjian Exp $ --> <chapter id="backup"> <title>Backup and Restore</title> @@ -1203,6 +1203,312 @@ restore_command = 'copy /mnt/server/archivedir/%f "%p"' # Windows </sect2> </sect1> + <sect1 id="warm-standby"> + <title>Warm Standby Servers for High Availability</title> + + <indexterm zone="backup"> + <primary>Warm Standby</primary> + </indexterm> + + <indexterm zone="backup"> + <primary>PITR Standby</primary> + </indexterm> + + <indexterm zone="backup"> + <primary>Standby Server</primary> + </indexterm> + + <indexterm zone="backup"> + <primary>Log Shipping</primary> + </indexterm> + + <indexterm zone="backup"> + <primary>Witness Server</primary> + </indexterm> + + <indexterm zone="backup"> + <primary>STONITH</primary> + </indexterm> + + <indexterm zone="backup"> + <primary>High Availability</primary> + </indexterm> + + <para> + Continuous Archiving can be used to create a High Availability (HA) + cluster configuration with one or more Standby Servers ready to take + over operations in the case that the Primary Server fails. This + capability is more widely known as Warm Standby Log Shipping. + </para> + + <para> + The Primary and Standby Server work together to provide this capability, + though the servers are only loosely coupled. The Primary Server operates + in Continuous Archiving mode, while the Standby Server operates in a + continuous Recovery mode, reading the WAL files from the Primary. No + changes to the database tables are required to enable this capability, + so it offers a low administration overhead in comparison with other + replication approaches. This configuration also has a very low + performance impact on the Primary server. + </para> + + <para> + Directly moving WAL or "log" records from one database server to another + is typically described as Log Shipping. PostgreSQL implements file-based + Log Shipping, meaning WAL records are batched one file at a time. WAL + files can be shipped easily and cheaply over any distance, whether it be + to an adjacent system, another system on the same site or another system + on the far side of the globe. The bandwidth required for this technique + varies according to the transaction rate of the Primary Server. + Record-based Log Shipping is also possible with custom-developed + procedures, discussed in a later section. Future developments are likely + to include options for synchronous and/or integrated record-based log + shipping. + </para> + + <para> + It should be noted that the log shipping is asynchronous, i.e. the WAL + records are shipped after transaction commit. As a result there can be a + small window of data loss, should the Primary Server suffer a + catastrophic failure. The window of data loss is minimised by the use of + the archive_timeout parameter, which can be set as low as a few seconds + if required. A very low setting can increase the bandwidth requirements + for file shipping. + </para> + + <para> + The Standby server is not available for access, since it is continually + performing recovery processing. Recovery performance is sufficiently + good that the Standby will typically be only minutes away from full + availability once it has been activated. As a result, we refer to this + capability as a Warm Standby configuration that offers High + Availability. Restoring a server from an archived base backup and + rollforward can take considerably longer and so that technique only + really offers a solution for Disaster Recovery, not HA. + </para> + + <para> + Other mechanisms for High Availability replication are available, both + commercially and as open-source software. + </para> + + <para> + In general, log shipping between servers running different release + levels will not be possible. It is the policy of the PostgreSQL Worldwide + Development Group not to make changes to disk formats during minor release + upgrades, so it is likely that running different minor release levels + on Primary and Standby servers will work successfully. However, no + formal support for that is offered and you are advised not to allow this + to occur over long periods. + </para> + + <sect2 id="warm-standby-planning"> + <title>Planning</title> + + <para> + On the Standby server all tablespaces and paths will refer to similarly + named mount points, so it is important to create the Primary and Standby + servers so that they are as similar as possible, at least from the + perspective of the database server. Furthermore, any CREATE TABLESPACE + commands will be passed across as-is, so any new mount points must be + created on both servers before they are used on the Primary. Hardware + need not be the same, but experience shows that maintaining two + identical systems is easier than maintaining two dissimilar ones over + the whole lifetime of the application and system. + </para> + + <para> + There is no special mode required to enable a Standby server. The + operations that occur on both Primary and Standby servers are entirely + normal continuous archiving and recovery tasks. The primary point of + contact between the two database servers is the archive of WAL files + that both share: Primary writing to the archive, Standby reading from + the archive. Care must be taken to ensure that WAL archives for separate + servers do not become mixed together or confused. + </para> + + <para> + The magic that makes the two loosely coupled servers work together is + simply a restore_command that waits for the next WAL file to be archived + from the Primary. The restore_command is specified in the recovery.conf + file on the Standby Server. Normal recovery processing would request a + file from the WAL archive, causing an error if the file was unavailable. + For Standby processing it is normal for the next file to be unavailable, + so we must be patient and wait for it to appear. A waiting + restore_command can be written as a custom script that loops after + polling for the existence of the next WAL file. There must also be some + way to trigger failover, which should interrupt the restore_command, + break the loop and return a file not found error to the Standby Server. + This then ends recovery and the Standby will then come up as a normal + server. + </para> + + <para> + Sample code for the C version of the restore_command would be be: +<programlisting> +triggered = false; +while (!NextWALFileReady() && !triggered) +{ + sleep(100000L); // wait for ~0.1 sec + if (CheckForExternalTrigger()) + triggered = true; +} +if (!triggered) + CopyWALFileForRecovery(); +</programlisting> + </para> + + <para> + PostgreSQL does not provide the system software required to identify a + failure on the Primary and notify the Standby system and then the + Standby database server. Many such tools exist and are well integrated + with other aspects of a system failover, such as ip address migration. + </para> + + <para> + Triggering failover is an important part of planning and design. The + restore_command is executed in full once for each WAL file. The process + running the restore_command is therefore created and dies for each file, + so there is no daemon or server process and so we cannot use signals and + a signal handler. A more permanent notification is required to trigger + the failover. It is possible to use a simple timeout facility, + especially if used in conjunction with a known archive_timeout setting + on the Primary. This is somewhat error prone since a network or busy + Primary server might be sufficient to initiate failover. A notification + mechanism such as the explicit creation of a trigger file is less error + prone, if this can be arranged. + </para> + </sect2> + + <sect2 id="warm-standby-config"> + <title>Implementation</title> + + <para> + The short procedure for configuring a Standby Server is as follows. For + full details of each step, refer to previous sections as noted. + <orderedlist> + <listitem> + <para> + Set up Primary and Standby systems as near identically as possible, + including two identical copies of PostgreSQL at same release level. + </para> + </listitem> + <listitem> + <para> + Set up Continuous Archiving from the Primary to a WAL archive located + in a directory on the Standby Server. Ensure that both <xref + linkend="guc-archive-command"> and <xref linkend="guc-archive-timeout"> + are set. (See <xref linkend="backup-archiving-wal">) + </para> + </listitem> + <listitem> + <para> + Make a Base Backup of the Primary Server. (See <xref + linkend="backup-base-backup">) + </para> + </listitem> + <listitem> + <para> + Begin recovery on the Standby Server from the local WAL archive, + using a recovery.conf that specifies a restore_command that waits as + described previously. (See <xref linkend="backup-pitr-recovery">) + </para> + </listitem> + </orderedlist> + </para> + + <para> + Recovery treats the WAL Archive as read-only, so once a WAL file has + been copied to the Standby system it can be copied to tape at the same + time as it is being used by the Standby database server to recover. + Thus, running a Standby Server for High Availability can be performed at + the same time as files are stored for longer term Disaster Recovery + purposes. + </para> + + <para> + For testing purposes, it is possible to run both Primary and Standby + servers on the same system. This does not provide any worthwhile + improvement on server robustness, nor would it be described as HA. + </para> + </sect2> + + <sect2 id="warm-standby-failover"> + <title>Failover</title> + + <para> + If the Primary Server fails then the Standby Server should take begin + failover procedures. + </para> + + <para> + If the Standby Server fails then no failover need take place. If the + Standby Server can be restarted, then the recovery process can also be + immediately restarted, taking advantage of Restartable Recovery. + </para> + + <para> + If the Primary Server fails and then immediately restarts, you must have + a mechanism for informing it that it is no longer the Primary. This is + sometimes known as STONITH (Should the Other Node In The Head), which is + necessary to avoid situations where both systems think they are the + Primary, which can lead to confusion and ultimately data loss. + </para> + + <para> + Many failover systems use just two systems, the Primary and the Standby, + connected by some kind of heartbeat mechanism to continually verify the + connectivity between the two and the viability of the Primary. It is + also possible to use a third system, known as a Witness Server to avoid + some problems of inappropriate failover, but the additional complexity + may not be worthwhile unless it is set-up with sufficient care and + rigorous testing. + </para> + + <para> + At the instant that failover takes place to the Standby, we have only a + single server in operation. This is known as a degenerate state. + The former Standby is now the Primary, but the former Primary is down + and may stay down. We must now fully re-create a Standby server, + either on the former Primary system when it comes up, or on a third, + possibly new, system. Once complete the Primary and Standby can be + considered to have switched roles. Some people choose to use a third + server to provide additional protection across the failover interval, + though clearly this complicates the system configuration and + operational processes (and this can also act as a Witness Server). + </para> + + <para> + So, switching from Primary to Standby Server can be fast, but requires + some time to re-prepare the failover cluster. Regular switching from + Primary to Standby is encouraged, since it allows the regular downtime + one each system required to maintain HA. This also acts as a test of the + failover so that it definitely works when you really need it. Written + administration procedures are advised. + </para> + </sect2> + + <sect2 id="warm-standby-record"> + <title>Implementing Record-based Log Shipping</title> + + <para> + The main features for Log Shipping in this release are based around the + file-based Log Shipping described above. It is also possible to + implement record-based Log Shipping using the pg_xlogfile_name_offset() + function, though this requires custom development. + </para> + + <para> + An external program can call pg_xlogfile_name_offset() to find out the + filename and the exact byte offset within it of the latest WAL pointer. + If the external program regularly polls the server it can find out how + far forward the pointer has moved. It can then access the WAL file + directly and copy those bytes across to a less up-to-date copy on a + Standby Server. + </para> + </sect2> + </sect1> + <sect1 id="migration"> <title>Migration Between Releases</title>