Commits · 8396447cdbdff0b62914748de2fec04281dc9114 · Jakob Huber / postgres-lambda-diff

Feb 12, 2013

Create libpgcommon, and move pg_malloc et al to it · 8396447c

Alvaro Herrera authored 12 years ago

libpgcommon is a new static library to allow sharing code among the
various frontend programs and backend; this lets us eliminate duplicate
implementations of common routines.  We avoid libpgport, because that's
intended as a place for porting issues; per discussion, it seems better
to keep them separate.

The first use case, and the only implemented by this patch, is pg_malloc
and friends, which many frontend programs were already using.

At the same time, we can use this to provide palloc emulation functions
for the frontend; this way, some palloc-using files in the backend can
also be used by the frontend cleanly.  To do this, we change palloc() in
the backend to be a function instead of a macro on top of
MemoryContextAlloc().  This was previously believed to cause loss of
performance, but this implementation has been tweaked by Tom and Andres
so that on modern compilers it provides a slight improvement over the
previous one.

This lets us clean up some places that were already with
localized hacks.

Most of the pg_malloc/palloc changes in this patch were authored by
Andres Freund. Zoltán Böszörményi also independently provided a form of
that.  libpgcommon infrastructure was authored by Álvaro.

8396447c

Add noreturn attributes to some error reporting functions · 0cb1fac3
Peter Eisentraut authored 12 years ago

0cb1fac3

Feb 08, 2013

Make contrib/btree_gist's GiST penalty function a bit saner. · 9221f9d4

Tom Lane authored 12 years ago

The previous coding supposed that the first differing bytes in two varlena
datums must have the same sign difference as their overall comparison
result. This is obviously bogus for text strings in non-C locales, and
probably wrong for numeric, and even for bytea I think it was wrong on
machines where char is signed. When the assumption failed, the function
could deliver a zero or negative penalty in situations where such a result
is quite ridiculous, leading the core GiST code to make very bad page-split
decisions.

To fix, take the absolute values of the byte-level differences. Also,
switch the code to using unsigned char not just char, so that the behavior
will be consistent whether char is signed or not.

Per investigation of a trouble report from Tomas Vondra. Back-patch to all
supported branches.

9221f9d4

Fix erroneous range-union logic for varlena types in contrib/btree_gist. · 94f565dc

Tom Lane authored 12 years ago

gbt_var_bin_union() failed to do the right thing when the existing range
needed to be widened at both ends rather than just one end. This could
result in an invalid index in which keys that are present would not be
found by searches, because the searches would not think they need to
descend to the relevant leaf pages. This error affected all the varlena
datatypes supported by btree_gist (text, bytea, bit, numeric).

Per investigation of a trouble report from Tomas Vondra. (There is also
an issue in gbt_var_penalty(), but that should only result in inefficiency
not wrong answers. I'm committing this separately so that we have a git
state in which it can be tested that bad penalty results don't produce
invalid indexes.) Back-patch to all supported branches.

94f565dc

Feb 06, 2013

Improve error message wording · cb9b66d3

Alvaro Herrera authored 12 years ago

The wording changes applied in 0ac5ad51 were universally disliked.

Per gripe from Andrew Dunstan

cb9b66d3

Jan 31, 2013

pgrowlocks: fix bogus lock strength output · 77a3082f
Alvaro Herrera authored 12 years ago
```
Per report from digoal@126.com
```
77a3082f

Add --aggregate-interval option. · 6a651d85

Tatsuo Ishii authored 12 years ago

The new option specifies length of aggregation interval (in
seconds). May be used only together with -l. With this option, the log
contains per-interval summary (number of transactions, min/max latency
and two additional fields useful for variance estimation).

Patch contributed by Tomas Vondra, reviewed by Pavel Stehule. Slight
change by Tatsuo Ishii, suggested by Robert Hass to emit an error
message indicating that the option is not currently supported on
Windows.

6a651d85

Jan 29, 2013

Allow pgbench to use a scale larger than 21474. · 89d00cbe

Heikki Linnakangas authored 12 years ago

Beyond 21474, the number of accounts exceed the range for int4. Change the
initialization code to use bigint for account id columns when scale is large
enough, and switch to using int64s for the variables in pgbench code. The
threshold where we switch to bigints is set at 20000, because that's easier
to remember and document than 21474, and ensures that there is some headroom
when int4s are used.

Greg Smith, with various changes by Euler Taveira de Oliveira, Gurjeet
Singh and Satoshi Nagayasu.

89d00cbe

Jan 24, 2013

pg_upgrade: detect stale postmaster.pid lock files · a9ceaa53

Bruce Momjian authored 12 years ago

If the postmaster.pid lock file exists, try starting/stopping the
cluster to check if the lock file is valid.

Per request from Tom.

a9ceaa53

Use the catversion to distinguish old/new clusters · 34da7004

Alvaro Herrera authored 12 years ago

This makes 9.3 -> 9.3 upgrades work when they cross the commit that
added persistent multixacts; early 9.3 pg_controldata did not have the
required oldestMultiXact line, and so would fail to upgrade.

per Bruce Momjian

34da7004

Don't require oldestMultixact if server doesn't have it · 2494a9af
Alvaro Herrera authored 12 years ago

2494a9af

pg_upgrade: report failed cluster name · bd6aca8a

Bruce Momjian authored 12 years ago

When pg_upgrade can't find required pg_controldata information, report
_which_ cluster is failing, with this message:

	The %s cluster lacks some required control information:

bd6aca8a

Jan 23, 2013

Improve concurrency of foreign key locking · 0ac5ad51

Alvaro Herrera authored 12 years ago

This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE".  These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE".  UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.

Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.

The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid.  Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates.  This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed.  pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.

Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header.  This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.

Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)

With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.

As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.

Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane.  There's probably room for several more tests.

There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it.  Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.

This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
	AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
	1290721684-sup-3951@alvh.no-ip.org
	1294953201-sup-2099@alvh.no-ip.org
	1320343602-sup-2290@alvh.no-ip.org
	1339690386-sup-8927@alvh.no-ip.org
	4FE5FF020200002500048A3D@gw.wicourts.gov
	4FEAB90A0200002500048B7D@gw.wicourts.gov

0ac5ad51

pg_upgrade: remove --single-transaction usage · 861ad67b

Bruce Momjian authored 12 years ago

With AtEOXact applied, --single-transaction makes pg_restore slower, and
has the potential to require lock table configuration, so remove the
argument.

Per suggestion from Tom.

861ad67b

Jan 18, 2013

Improve pg_upgrade error report · 600250d0

Bruce Momjian authored 12 years ago

If the cluster alignments don't match, output this suggestion:

	Likely one cluster is a 32-bit install, the other 64-bit

600250d0

Jan 14, 2013

Improve handling of ereport(ERROR) and elog(ERROR). · b853eb97

Tom Lane authored 12 years ago

In commit 71450d7f, we added code to inform
suitably-intelligent compilers that ereport() doesn't return if the elevel
is ERROR or higher. This patch extends that to elog(), and also fixes a
double-evaluation hazard that the previous commit created in ereport(),
as well as reducing the emitted code size.

The elog() improvement requires the compiler to support __VA_ARGS__, which
should be available in just about anything nowadays since it's required by
C99. But our minimum language baseline is still C89, so add a configure
test for that.

The previous commit assumed that ereport's elevel could be evaluated twice,
which isn't terribly safe --- there are already counterexamples in xlog.c.
On compilers that have __builtin_constant_p, we can use that to protect the
second test, since there's no possible optimization gain if the compiler
doesn't know the value of elevel. Otherwise, use a local variable inside
the macros to prevent double evaluation. The local-variable solution is
inferior because (a) it leads to useless code being emitted when elevel
isn't constant, and (b) it increases the optimization level needed for the
compiler to recognize that subsequent code is unreachable. But it seems
better than not teaching non-gcc compilers about unreachability at all.

Lastly, if the compiler has __builtin_unreachable(), we can use that
instead of abort(), resulting in a noticeable code savings since no
function call is actually emitted. However, it seems wise to do this only
in non-assert builds. In an assert build, continue to use abort(), so that
the behavior will be predictable and debuggable if the "impossible"
happens.

These changes involve making the ereport and elog macros emit do-while
statement blocks not just expressions, which forces small changes in
a few call sites.

Andres Freund, Tom Lane, Heikki Linnakangas

b853eb97

Jan 12, 2013

Extend and improve use of EXTRA_REGRESS_OPTS. · 4ae5ee6c

Andrew Dunstan authored 12 years ago

This is now used by ecpg tests, and not clobbered by pg_upgrade
tests. This change won't affect anything that doesn't set this
environment variable, but will enable the buildfarm to control
exactly what port regression test installs will be running on,
and thus to detect possible rogue postmasters more easily.

Backpatch to release 9.2 where EXTRA_REGRESS_OPTS was first used.

4ae5ee6c

Jan 09, 2013

Allow parallel copy/link in pg_upgrade · a89c46f9

Bruce Momjian authored 12 years ago

This patch implements parallel copying/linking of files by tablespace
using the --jobs option in pg_upgrade.

a89c46f9

Jan 07, 2013

Add new "-q" logging option (quiet mode) while in initialize mode · cf03ff6c

Tatsuo Ishii authored 12 years ago

(-i), producing only one progress message per 5 seconds along with
elapsed time and estimated remaining time.  Also add elapsed time and
estimated remaining time to the default logging(prints one message
each 100000 rows).
Patch contributed by Tomas Vondra, reviewed by Jeevan Chalke and
Tatsuo Ishii.

cf03ff6c

Jan 04, 2013

Prevent creation of postmaster's TCP socket during pg_upgrade testing. · 78a5e738

Tom Lane authored 12 years ago

On non-Windows machines, we use the Unix socket for connections to test
postmasters, so there is no need to create a TCP socket. Furthermore,
doing so causes failures due to port conflicts if two builds are carried
out concurrently on one machine. (If the builds are done in different
chroots, which is standard practice at least in Red Hat distros, there
is no risk of conflict on the Unix socket.) Suppressing the TCP socket
by setting listen_addresses to empty has long been standard practice
for pg_regress, and pg_upgrade knows about this too ... but pg_upgrade's
test.sh didn't get the memo.

Back-patch to 9.2, and also sync the 9.2 version of the script with HEAD
as much as practical.

78a5e738

Jan 03, 2013

Adjust a few pg_upgrade functions to return void. · bcbe9924

Bruce Momjian authored 12 years ago

Adjust pg_upgrade page conversion functions (which are not used) to
return void so transfer_all_new_dbs can return void.

bcbe9924

Jan 01, 2013

Update copyrights for 2013 · bd61a623

Bruce Momjian authored 12 years ago

Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.

bd61a623

Dec 27, 2012

Add pg_upgrade --jobs parameter · 6f1b9e4e

Bruce Momjian authored 12 years ago

Add pg_upgrade --jobs, which allows parallel dump/restore of databases,
which improves performance.

6f1b9e4e

Dec 20, 2012

Avoid using NAMEDATALEN in pg_upgrade · dc9896a2

Bruce Momjian authored 12 years ago

Because the client encoding might not match the server encoding,
pg_upgrade can't allocate NAMEDATALEN bytes for storage of database,
relation, and namespace identifiers.  Instead pg_strdup() the memory and
free it.

Also add C comment in initdb.c about safe NAMEDATALEN usage.

dc9896a2

Add pg_upgrade comment about mismatch error · 345fb82f
Bruce Momjian authored 12 years ago
```
Add comment stating that constraint and index names must match.
```
345fb82f

Dec 11, 2012

Fix pg_upgrade for invalid indexes · e95c4bd1

Bruce Momjian authored 12 years ago

All versions of pg_upgrade upgraded invalid indexes caused by CREATE
INDEX CONCURRENTLY failures and marked them as valid.  The patch adds a
check to all pg_upgrade versions and throws an error during upgrade or
--check.

Backpatch to 9.2, 9.1, 9.0.  Patch slightly adjusted.

e95c4bd1

Add mode where contrib installcheck runs each module in a separately named database. · ad69bd05

Andrew Dunstan authored 12 years ago

Normally each module is tested in a database named contrib_regression,
which is dropped and recreated at the beginhning of each pg_regress run.
This new mode, enabled by adding USE_MODULE_DB=1 to the make command
line, runs most modules in a database with the module name embedded in
it.

This will make testing pg_upgrade on clusters with the contrib modules
a lot easier.

Second attempt at this, this time accomodating make versions older
than 3.82.

Still to be done: adapt to the MSVC build system.

Backpatch to 9.0, which is the earliest version it is reasonably
possible to test upgrading from.

ad69bd05

Fix pg_upgrade -O/-o options · acdb8c22

Bruce Momjian authored 12 years ago

Fix previous commit that added synchronous_commit=off, but broke -O/-o
due to missing space in argument passing.

Backpatch to 9.2.

acdb8c22

Dec 07, 2012

Improve pg_upgrade's status display · 6dd95845

Bruce Momjian authored 12 years ago

Pg_upgrade displays file names during copy and database names during
dump/restore.  Andrew Dunstan identified three bugs:

*  long file names were being truncated to 60 _leading_ characters, which
   often do not change for long file names

*  file names were truncated to 60 characters in log files

*  carriage returns were being output to log files

This commit fixes these --- it prints 60 _trailing_ characters to the
status display, and full path names without carriage returns to log
files.  It also suppresses status output to the log file unless verbose
mode is used.

6dd95845

Dec 06, 2012

Background worker processes · da07a1e8

Alvaro Herrera authored 12 years ago

Background workers are postmaster subprocesses that run arbitrary
user-specified code.  They can request shared memory access as well as
backend database connections; or they can just use plain libpq frontend
database connections.

Modules listed in shared_preload_libraries can register background
workers in their _PG_init() function; this is early enough that it's not
necessary to provide an extra GUC option, because the necessary extra
resources can be allocated early on.  Modules can install more than one
bgworker, if necessary.

Care is taken that these extra processes do not interfere with other
postmaster tasks: only one such process is started on each ServerLoop
iteration.  This means a large number of them could be waiting to be
started up and postmaster is still able to quickly service external
connection requests.  Also, shutdown sequence should not be impacted by
a worker process that's reasonably well behaved (i.e. promptly responds
to termination signals.)

The current implementation lets worker processes specify their start
time, i.e. at what point in the server startup process they are to be
started: right after postmaster start (in which case they mustn't ask
for shared memory access), when consistent state has been reached
(useful during recovery in a HOT standby server), or when recovery has
terminated (i.e. when normal backends are allowed).

In case of a bgworker crash, actions to take depend on registration
data: if shared memory was requested, then all other connections are
taken down (as well as other bgworkers), just like it were a regular
backend crashing.  The bgworker itself is restarted, too, within a
configurable timeframe (which can be configured to be never).

More features to add to this framework can be imagined without much
effort, and have been discussed, but this seems good enough as a useful
unit already.

An elementary sample module is supplied.

Author: Álvaro Herrera

This patch is loosely based on prior patches submitted by KaiGai Kohei,
and unsubmitted code by Simon Riggs.

Reviewed by: KaiGai Kohei, Markus Wanner, Andres Freund,
Heikki Linnakangas, Simon Riggs, Amit Kapila

da07a1e8

Dec 05, 2012
- Add pgstatginindex() function to get the size of the GIN pending list. · 357cbaae
  Heikki Linnakangas authored 12 years ago
  
  Fujii Masao, reviewed by Kyotaro Horiguchi.
  357cbaae
Dec 04, 2012
- In pg_upgrade testing script, turn off command echo at the end so status · c47d261c
  Bruce Momjian authored 12 years ago
  
  report is clearer.
  c47d261c
- Restore set -x in pg_upgrade/test.sh, so the user can see what is being · 2f227656
  Bruce Momjian authored 12 years ago
  
  executed.
  2f227656
- Add initdb --sync-only option to sync the data directory to durable · 630cd144
  Bruce Momjian authored 12 years ago
  
  storage. Have pg_upgrade use it, and enable server options fsync=off and full_page_writes=off. Document that users turning fsync from off to on should run initdb --sync-only. [ Previous commit was incorrectly applied as a git merge. ]
  630cd144
- Revert initdb --sync-only patch that had incorrect commit messages. · 25d1ed04
  Bruce Momjian authored 12 years ago
  
  25d1ed04
- dummy commit · 4d88bc8f
  Bruce Momjian authored 12 years ago
  
  4d88bc8f
- dummy commit · cd7569a5
  Bruce Momjian authored 12 years ago
  
  cd7569a5
Dec 03, 2012
- Revert "Add mode where contrib installcheck runs each module in a separately named database." · fc5c1bbb
  Andrew Dunstan authored 12 years ago
  
  This reverts commit e2b3c21b.
  fc5c1bbb
Dec 02, 2012

Add mode where contrib installcheck runs each module in a separately named database. · e2b3c21b

Andrew Dunstan authored 12 years ago

Normally each module is tested in aq database named contrib_regression,
which is dropped and recreated at the beginhning of each pg_regress run.
This mode, enabled by adding USE_MODULE_DB=1 to the make command line,
runs most modules in a database with the module name embedded in it.

This will make testing pg_upgrade on clusters with the contrib modules
a lot easier.

Still to be done: adapt to the MSVC build system.

Backpatch to 9.0, which is the earliest version it is reasonably possible
to test upgrading from.

e2b3c21b

Dec 01, 2012
- In pg_upgrade, remove 'set -x' from test script. · 452739df
  Bruce Momjian authored 12 years ago
  
  452739df