Commits · 2d53003432f8560b9c3adf569118747c8ac8447d · Jakob Huber / postgres-lambda-diff

Oct 24, 2014
- Complain if too many options are passed to pg_controldata or pg_resetxlog. · 2d530034
  Heikki Linnakangas authored 10 years ago
  
  2d530034
- Oops, the commit accept pg_controldata -D datadir missed code changes. · 22b743b2
  Heikki Linnakangas authored 10 years ago
  
  I updated the docs and usage blurp, but forgot to commit the code changes required. Spotted by Michael Paquier.
  22b743b2
Sep 25, 2014

Add -D option to specify data directory to pg_controldata and pg_resetxlog. · b0d81ade

Heikki Linnakangas authored 10 years ago

It was confusing that to other commands, like initdb and postgres, you would
pass the data directory with "-D datadir", but pg_controldata and
pg_resetxlog would take just plain path, without the "-D". With this patch,
pg_controldata and pg_resetxlog also accept "-D datadir".

Abhijit Menon-Sen, with minor kibitzing by me

b0d81ade

Jun 05, 2014

Add defenses against running with a wrong selection of LOBLKSIZE. · 5f93c378

Tom Lane authored 10 years ago

It's critical that the backend's idea of LOBLKSIZE match the way data has
actually been divided up in pg_largeobject. While we don't provide any
direct way to adjust that value, doing so is a one-line source code change
and various people have expressed interest recently in changing it. So,
just as with TOAST_MAX_CHUNK_SIZE, it seems prudent to record the value in
pg_control and cross-check that the backend's compiled-in setting matches
the on-disk data.

Also tweak the code in inv_api.c so that fetches from pg_largeobject
explicitly verify that the length of the data field is not more than
LOBLKSIZE. Formerly we just had Asserts() for that, which is no protection
at all in production builds. In some of the call sites an overlength data
value would translate directly to a security-relevant stack clobber, so it
seems worth one extra runtime comparison to be sure.

In the back branches, we can't change the contents of pg_control; but we
can still make the extra checks in inv_api.c, which will offer some amount
of protection against running with the wrong value of LOBLKSIZE.

5f93c378

May 06, 2014

pgindent run for 9.4 · 0a783200

Bruce Momjian authored 10 years ago

This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.

0a783200

Dec 20, 2013
- Rename wal_log_hintbits to wal_log_hints, per discussion on pgsql-hackers. · 961bf59f
  Fujii Masao authored 11 years ago
  
  Sawada Masahiko
  961bf59f
Dec 13, 2013

Add GUC to enable WAL-logging of hint bits, even with checksums disabled. · 50e54709

Heikki Linnakangas authored 11 years ago

WAL records of hint bit updates is useful to tools that want to examine
which pages have been modified. In particular, this is required to make
the pg_rewind tool safe (without checksums).

This can also be used to test how much extra WAL-logging would occur if
you enabled checksums, without actually enabling them (which you can't
currently do without re-initdb'ing).

Sawada Masahiko, docs by Samrat Revagade. Reviewed by Dilip Kumar, with
further changes by me.

50e54709

Dec 11, 2013

Add new wal_level, logical, sufficient for logical decoding. · e55704d8

Robert Haas authored 11 years ago

When wal_level=logical, we'll log columns from the old tuple as
configured by the REPLICA IDENTITY facility added in commit
07cacba9.  This makes it possible
a properly-configured logical replication solution to correctly
follow table updates even if they change the chosen key columns,
or, with REPLICA IDENTITY FULL, even if the table has no key at
all.  Note that updates which do not modify the replica identity
column won't log anything extra, making the choice of a good key
(i.e. one that will rarely be changed) important to performance
when wal_level=logical is configured.

Each insert, update, or delete to a catalog table will also log
the CMIN and/or CMAX values of stamped by the current transaction.
This is necessary because logical decoding will require access to
historical snapshots of the catalog in order to decode some data
types, and the CMIN/CMAX values that we may need in order to judge
row visibility may have been overwritten by the time we need them.

Andres Freund, reviewed in various versions by myself, Heikki
Linnakangas, KONDO Mitsumasa, and many others.

e55704d8

Jul 04, 2013

Add new GUC, max_worker_processes, limiting number of bgworkers. · 6bc8ef0b

Robert Haas authored 11 years ago

In 9.3, there's no particular limit on the number of bgworkers;
instead, we just count up the number that are actually registered,
and use that to set MaxBackends.  However, that approach causes
problems for Hot Standby, which needs both MaxBackends and the
size of the lock table to be the same on the standby as on the
master, yet it may not be desirable to run the same bgworkers in
both places.  9.3 handles that by failing to notice the problem,
which will probably work fine in nearly all cases anyway, but is
not theoretically sound.

A further problem with simply counting the number of registered
workers is that new workers can't be registered without a
postmaster restart.  This is inconvenient for administrators,
since bouncing the postmaster causes an interruption of service.
Moreover, there are a number of applications for background
processes where, by necessity, the background process must be
started on the fly (e.g. parallel query).  While this patch
doesn't actually make it possible to register new background
workers after startup time, it's a necessary prerequisite.

Patch by me.  Review by Michael Paquier.

6bc8ef0b

Apr 30, 2013

Record data_checksum_version in control file. · 44395174

Simon Riggs authored 11 years ago

The value is not used anywhere in code, but will
allow future changes to the checksum version
should that become necessary in the future.

44395174

Mar 22, 2013

Allow I/O reliability checks using 16-bit checksums · 96ef3b8f

Simon Riggs authored 12 years ago

Checksums are set immediately prior to flush out of shared buffers
and checked when pages are read in again. Hint bit setting will
require full page write when block is dirtied, which causes various
infrastructure changes. Extensive comments, docs and README.

WARNING message thrown if checksum fails on non-all zeroes page;
ERROR thrown but can be disabled with ignore_checksum_failure = on.

Feature enabled by an initdb option, since transition from option off
to option on is long and complex and has not yet been implemented.
Default is not to use checksums.

Checksum used is WAL CRC-32 truncated to 16-bits.

Simon Riggs, Jeff Davis, Greg Smith
Wide input and assistance from many community members. Thank you.

96ef3b8f

Mar 17, 2013
- pg_controldata: Undo message spelling change · ea1aee88
  Peter Eisentraut authored 12 years ago
  
  ea1aee88
Feb 11, 2013

Support unlogged GiST index. · 62401db4

Heikki Linnakangas authored 12 years ago

The reason this wasn't supported before was that GiST indexes need an
increasing sequence to detect concurrent page-splits. In a regular WAL-
logged GiST index, the LSN of the page-split record is used for that
purpose, and in a temporary index, we can get away with a backend-local
counter. Neither of those methods works for an unlogged relation.

To provide such an increasing sequence of numbers, create a "fake LSN"
counter that is saved and restored across shutdowns. On recovery, unlogged
relations are blown away, so the counter doesn't need to survive that
either.

Jeevan Chalke, based on discussions with Robert Haas, Tom Lane and me.

62401db4

Include previous TLI in end-of-recovery and shutdown checkpoint records. · 7803e932

Heikki Linnakangas authored 12 years ago

This isn't used for anything but a sanity check at the moment, but it could
be highly valuable for debugging purposes. It could also be used to recreate
timeline history by traversing WAL, which seems useful.

7803e932

Jan 24, 2013
- Make output identical to pg_resetxlog's · 6772c1e5
  Alvaro Herrera authored 12 years ago
  
  6772c1e5
Jan 23, 2013

Improve concurrency of foreign key locking · 0ac5ad51

Alvaro Herrera authored 12 years ago

This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE".  These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE".  UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.

Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.

The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid.  Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates.  This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed.  pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.

Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header.  This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.

Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)

With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.

As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.

Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane.  There's probably room for several more tests.

There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it.  Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.

This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
	AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
	1290721684-sup-3951@alvh.no-ip.org
	1294953201-sup-2099@alvh.no-ip.org
	1320343602-sup-2290@alvh.no-ip.org
	1339690386-sup-8927@alvh.no-ip.org
	4FE5FF020200002500048A3D@gw.wicourts.gov
	4FEAB90A0200002500048B7D@gw.wicourts.gov

0ac5ad51

Dec 04, 2012

Track the timeline associated with minRecoveryPoint, for more sanity checks. · 5ce108bf

Heikki Linnakangas authored 12 years ago

This allows recovery to notice certain incorrect recovery scenarios.
If a server has recovered to point X on timeline 5, and you restart
recovery, it better be on timeline 5 when it reaches point X again, not on
some timeline with a higher ID. This can happen e.g if you a standby server
is shut down, a new timeline appears in the WAL archive, and the standby
server is restarted. It will try to follow the new timeline, which is wrong
because some WAL on the old timeline was already replayed before shutdown.

Requires an initdb (or at least pg_resetxlog), because this adds a field to
the control file.

5ce108bf

Jul 14, 2012
- Print the name of the WAL file containing latest REDO ptr in pg_controldata. · 6c349a56
  Heikki Linnakangas authored 12 years ago
  
  This makes it easier to determine how far back you need to keep archived WAL files, to restore from a backup. Fujii Masao
  6c349a56
Jun 24, 2012

Replace XLogRecPtr struct with a 64-bit integer. · 0ab9d1c4

Heikki Linnakangas authored 12 years ago

This simplifies code that needs to do arithmetic on XLogRecPtrs.

To avoid changing on-disk format of data pages, the LSN on data pages is
still stored in the old format. That should keep pg_upgrade happy. However,
we have XLogRecPtrs embedded in the control file, and in the structs that
are sent over the replication protocol, so this changes breaks compatibility
of pg_basebackup and server. I didn't do anything about this in this patch,
per discussion on -hackers, the right thing to do would to be to change the
replication protocol to be architecture-independent, so that you could use
a newer version of pg_receivexlog, for example, against an older server
version.

0ab9d1c4

Jun 18, 2012

Make documentation of --help and --version options more consistent · bb7520cc

Peter Eisentraut authored 12 years ago

Before, some places didn't document the short options (-? and -V),
some documented both, some documented nothing, and they were listed in
various orders. Now this is hopefully more consistent and complete.

bb7520cc

Jan 25, 2012

Allow pg_basebackup from standby node with safety checking. · 8366c780

Simon Riggs authored 13 years ago

Base backup follows recommended procedure, plus goes to great
lengths to ensure that partial page writes are avoided.

Jun Ishizuka and Fujii Masao, with minor modifications

8366c780

Aug 17, 2011
- Teach pg_controldata and pg_resetxlog about the new backupEndRequired field · a1a847d3
  Heikki Linnakangas authored 13 years ago
  
  in control file.
  a1a847d3
Sep 20, 2010
- Remove cvs keywords from all files. · 9f2e2113
  Magnus Hagander authored 14 years ago
  
  9f2e2113
Jun 03, 2010
- On clean shutdown during recovery, don't warn about possible corruption. · d561430b
  Robert Haas authored 14 years ago
  
  Fujii Masao. Review by Heikki Linnakangas and myself.
  d561430b
Apr 28, 2010

Minor editorializing on pg_controldata and pg_resetxlog: adjust some message · c80a85e3
Tom Lane authored 14 years ago
```
wording, deal explicitly with some fields that were being silently left zero.
```
c80a85e3
pg_controldata needs #define FRONTEND, same as pg_resetxlog. · 82e38aba
Tom Lane authored 14 years ago
```
Per buildfarm results from dawn_bat.
```
82e38aba

Introduce wal_level GUC to explicitly control if information needed for · 9b8a7332

Heikki Linnakangas authored 14 years ago

archival or hot standby should be WAL-logged, instead of deducing that from
other options like archive_mode. This replaces recovery_connections GUC in
the primary, where it now has no effect, but it's still used in the standby
to enable/disable hot standby.

Remove the WAL-logging of "unlogged operations", like creating an index
without WAL-logging and fsyncing it at the end. Instead, we keep a copy of
the wal_mode setting and the settings that affect how much shared memory a
hot standby server needs to track master transactions (max_connections,
max_prepared_xacts, max_locks_per_xact) in pg_control. Whenever the settings
change, at server restart, write a WAL record noting the new settings and
update pg_control. This allows us to notice the change in those settings in
the standby at the right moment, they used to be included in checkpoint
records, but that meant that a changed value was not reflected in the
standby until the first checkpoint after the change.

Bump PG_CONTROL_VERSION and XLOG_PAGE_MAGIC. Whack XLOG_PAGE_MAGIC back to
the sequence it used to follow, before hot standby and subsequent patches
changed it to 0x9003.

9b8a7332

Jan 04, 2010

Write an end-of-backup WAL record at pg_stop_backup(), and wait for it at · 06f82b29

Heikki Linnakangas authored 15 years ago

recovery instead of reading the backup history file. This is more robust,
as it stops you from prematurely starting up an inconsisten cluster if the
backup history file is lost for some reason, or if the base backup was
never finished with pg_stop_backup().

This also paves the way for a simpler streaming replication patch, which
doesn't need to care about backup history files anymore.

The backup history file is still created and archived as before, but it's
not used by the system anymore. It's just for informational purposes now.

Bump PG_CONTROL_VERSION as the location of the backup startpoint is now
written to a new field in pg_control, and catversion because initdb is
required

Original patch by Fujii Masao per Simon's idea, with further fixes by me.

06f82b29

Dec 19, 2009

Allow read only connections during recovery, known as Hot Standby. · efc16ea5

Simon Riggs authored 15 years ago

Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.

New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.

This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.

Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.

Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.

efc16ea5

Aug 31, 2009

Track the current XID wrap limit (or more accurately, the oldest unfrozen · 25ec228e

Tom Lane authored 15 years ago

XID) in checkpoint records. This eliminates the need to recompute the value
from scratch during database startup, which is one of the two remaining
reasons for the flatfile code to exist. It should also simplify life for
hot-standby operation.

To avoid bloating the checkpoint records unreasonably, I switched from
tracking the oldest database by name to tracking it by OID. This turns
out to save cycles in general (everywhere but the warning-generating
paths, which we hardly care about) and also helps us deal with the case
that the oldest database got dropped instead of being vacuumed. The prior
coding might go for a long time without updating the wrap limit in that case,
which is bad because it might result in a lot of useless autovacuum activity.

25ec228e

Jun 11, 2009
- 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list · d7471402
  Bruce Momjian authored 15 years ago
  
  provided by Andrew.
  d7471402
Dec 11, 2008

Append major version number and for libraries soname major version number · 218b4e8d

Peter Eisentraut authored 16 years ago

to the gettext domain name, to simplify parallel installations.

Also, rename set_text_domain() to pg_bindtextdomain(), because that is what
it does.

218b4e8d

Sep 24, 2008
- Make sure pg_control is opened in binary mode, to deal · 607b7166
  Magnus Hagander authored 16 years ago
  
  with situtations when the file contains an EOF maker (0x1A) on Windows. ITAGAKI Takahiro
  607b7166
Sep 23, 2008

Make LC_COLLATE and LC_CTYPE database-level settings. Collation and · 61d96749

Heikki Linnakangas authored 16 years ago

ctype are now more like encoding, stored in new datcollate and datctype
columns in pg_database.

This is a stripped-down version of Radek Strnad's patch, with further
changes by me.

61d96749

Apr 21, 2008

Allow float8, int8, and related datatypes to be passed by value on machines · 8472bf7a

Tom Lane authored 16 years ago

where Datum is 8 bytes wide.  Since this will break old-style C functions
(those still using version 0 calling convention) that have arguments or
results of these types, provide a configure option to disable it and retain
the old pass-by-reference behavior.  Likewise, provide a configure option
to disable the recently-committed float4 pass-by-value change.

Zoltan Boszormenyi, plus configurability stuff by me.

8472bf7a

Mar 27, 2008

Reduce the need for frontend programs to include "postgres.h" by refactoring · 039dfbfd

Tom Lane authored 17 years ago

inclusions in src/include/catalog/*.h files.  The main idea here is to push
function declarations for src/backend/catalog/*.c files into separate headers,
rather than sticking them into the corresponding catalog definition file as
has been done in the past.  This commit only carries out that idea fully for
pg_proc, pg_type and pg_conversion, but that's enough for the moment ---
if pg_list.h ever becomes unsafe for frontend code to include, we'll need
to work a bit more.

Zdenek Kotala

039dfbfd

Feb 17, 2008

Replace time_t with pg_time_t (same values, but always int64) in on-disk · cd004067

Tom Lane authored 17 years ago

data structures and backend internal APIs. This solves problems we've seen
recently with inconsistent layout of pg_control between machines that have
32-bit time_t and those that have already migrated to 64-bit time_t. Also,
we can get out from under the problem that Windows' Unix-API emulation is not
consistent about the width of time_t.

There are a few remaining places where local time_t variables are used to hold
the current or recent result of time(NULL). I didn't bother changing these
since they do not affect any cross-module APIs and surely all platforms will
have 64-bit time_t before overflow becomes an actual risk. time_t should
be avoided for anything visible to extension modules, however.

cd004067

Jan 21, 2008
- Provide a clearer error message if the pg_control version number looks · 6f8f8d2d
  Peter Eisentraut authored 17 years ago
  
  wrong because of mismatched byte ordering.
  6f8f8d2d
Apr 03, 2007

Decouple the values of TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE. · b3005276

Tom Lane authored 17 years ago

Add the latter to the values checked in pg_control, since it can't be changed
without invalidating toast table content. This commit in itself shouldn't
change any behavior, but it lays some necessary groundwork for experimentation
with these toast-control numbers.

Note: while TOAST_TUPLE_THRESHOLD can now be changed without initdb, some
thought still needs to be given to needs_toast_table() in toasting.c before
unleashing random changes.

b3005276

Mar 18, 2007
- Code cleanup: mark some variables with the "const" modifier, when they · 7221b4fa
  Neil Conway authored 18 years ago
  
  are initialized with a string literal. Patch from Stefan Huehner.
  7221b4fa