Skip to content
Snippets Groups Projects
  1. Oct 24, 2014
  2. Sep 25, 2014
  3. Jun 05, 2014
    • Tom Lane's avatar
      Add defenses against running with a wrong selection of LOBLKSIZE. · 5f93c378
      Tom Lane authored
      It's critical that the backend's idea of LOBLKSIZE match the way data has
      actually been divided up in pg_largeobject.  While we don't provide any
      direct way to adjust that value, doing so is a one-line source code change
      and various people have expressed interest recently in changing it.  So,
      just as with TOAST_MAX_CHUNK_SIZE, it seems prudent to record the value in
      pg_control and cross-check that the backend's compiled-in setting matches
      the on-disk data.
      
      Also tweak the code in inv_api.c so that fetches from pg_largeobject
      explicitly verify that the length of the data field is not more than
      LOBLKSIZE.  Formerly we just had Asserts() for that, which is no protection
      at all in production builds.  In some of the call sites an overlength data
      value would translate directly to a security-relevant stack clobber, so it
      seems worth one extra runtime comparison to be sure.
      
      In the back branches, we can't change the contents of pg_control; but we
      can still make the extra checks in inv_api.c, which will offer some amount
      of protection against running with the wrong value of LOBLKSIZE.
      5f93c378
  4. May 06, 2014
    • Bruce Momjian's avatar
      pgindent run for 9.4 · 0a783200
      Bruce Momjian authored
      This includes removing tabs after periods in C comments, which was
      applied to back branches, so this change should not effect backpatching.
      0a783200
  5. Dec 20, 2013
  6. Dec 13, 2013
    • Heikki Linnakangas's avatar
      Add GUC to enable WAL-logging of hint bits, even with checksums disabled. · 50e54709
      Heikki Linnakangas authored
      WAL records of hint bit updates is useful to tools that want to examine
      which pages have been modified. In particular, this is required to make
      the pg_rewind tool safe (without checksums).
      
      This can also be used to test how much extra WAL-logging would occur if
      you enabled checksums, without actually enabling them (which you can't
      currently do without re-initdb'ing).
      
      Sawada Masahiko, docs by Samrat Revagade. Reviewed by Dilip Kumar, with
      further changes by me.
      50e54709
  7. Dec 11, 2013
    • Robert Haas's avatar
      Add new wal_level, logical, sufficient for logical decoding. · e55704d8
      Robert Haas authored
      When wal_level=logical, we'll log columns from the old tuple as
      configured by the REPLICA IDENTITY facility added in commit
      07cacba9.  This makes it possible
      a properly-configured logical replication solution to correctly
      follow table updates even if they change the chosen key columns,
      or, with REPLICA IDENTITY FULL, even if the table has no key at
      all.  Note that updates which do not modify the replica identity
      column won't log anything extra, making the choice of a good key
      (i.e. one that will rarely be changed) important to performance
      when wal_level=logical is configured.
      
      Each insert, update, or delete to a catalog table will also log
      the CMIN and/or CMAX values of stamped by the current transaction.
      This is necessary because logical decoding will require access to
      historical snapshots of the catalog in order to decode some data
      types, and the CMIN/CMAX values that we may need in order to judge
      row visibility may have been overwritten by the time we need them.
      
      Andres Freund, reviewed in various versions by myself, Heikki
      Linnakangas, KONDO Mitsumasa, and many others.
      e55704d8
  8. Jul 04, 2013
    • Robert Haas's avatar
      Add new GUC, max_worker_processes, limiting number of bgworkers. · 6bc8ef0b
      Robert Haas authored
      In 9.3, there's no particular limit on the number of bgworkers;
      instead, we just count up the number that are actually registered,
      and use that to set MaxBackends.  However, that approach causes
      problems for Hot Standby, which needs both MaxBackends and the
      size of the lock table to be the same on the standby as on the
      master, yet it may not be desirable to run the same bgworkers in
      both places.  9.3 handles that by failing to notice the problem,
      which will probably work fine in nearly all cases anyway, but is
      not theoretically sound.
      
      A further problem with simply counting the number of registered
      workers is that new workers can't be registered without a
      postmaster restart.  This is inconvenient for administrators,
      since bouncing the postmaster causes an interruption of service.
      Moreover, there are a number of applications for background
      processes where, by necessity, the background process must be
      started on the fly (e.g. parallel query).  While this patch
      doesn't actually make it possible to register new background
      workers after startup time, it's a necessary prerequisite.
      
      Patch by me.  Review by Michael Paquier.
      6bc8ef0b
  9. Apr 30, 2013
  10. Mar 22, 2013
    • Simon Riggs's avatar
      Allow I/O reliability checks using 16-bit checksums · 96ef3b8f
      Simon Riggs authored
      Checksums are set immediately prior to flush out of shared buffers
      and checked when pages are read in again. Hint bit setting will
      require full page write when block is dirtied, which causes various
      infrastructure changes. Extensive comments, docs and README.
      
      WARNING message thrown if checksum fails on non-all zeroes page;
      ERROR thrown but can be disabled with ignore_checksum_failure = on.
      
      Feature enabled by an initdb option, since transition from option off
      to option on is long and complex and has not yet been implemented.
      Default is not to use checksums.
      
      Checksum used is WAL CRC-32 truncated to 16-bits.
      
      Simon Riggs, Jeff Davis, Greg Smith
      Wide input and assistance from many community members. Thank you.
      96ef3b8f
  11. Mar 17, 2013
  12. Feb 11, 2013
    • Heikki Linnakangas's avatar
      Support unlogged GiST index. · 62401db4
      Heikki Linnakangas authored
      The reason this wasn't supported before was that GiST indexes need an
      increasing sequence to detect concurrent page-splits. In a regular WAL-
      logged GiST index, the LSN of the page-split record is used for that
      purpose, and in a temporary index, we can get away with a backend-local
      counter. Neither of those methods works for an unlogged relation.
      
      To provide such an increasing sequence of numbers, create a "fake LSN"
      counter that is saved and restored across shutdowns. On recovery, unlogged
      relations are blown away, so the counter doesn't need to survive that
      either.
      
      Jeevan Chalke, based on discussions with Robert Haas, Tom Lane and me.
      62401db4
    • Heikki Linnakangas's avatar
      Include previous TLI in end-of-recovery and shutdown checkpoint records. · 7803e932
      Heikki Linnakangas authored
      This isn't used for anything but a sanity check at the moment, but it could
      be highly valuable for debugging purposes. It could also be used to recreate
      timeline history by traversing WAL, which seems useful.
      7803e932
  13. Jan 24, 2013
  14. Jan 23, 2013
    • Alvaro Herrera's avatar
      Improve concurrency of foreign key locking · 0ac5ad51
      Alvaro Herrera authored
      This patch introduces two additional lock modes for tuples: "SELECT FOR
      KEY SHARE" and "SELECT FOR NO KEY UPDATE".  These don't block each
      other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
      FOR UPDATE".  UPDATE commands that do not modify the values stored in
      the columns that are part of the key of the tuple now grab a SELECT FOR
      NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
      with tuple locks of the FOR KEY SHARE variety.
      
      Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
      means the concurrency improvement applies to them, which is the whole
      point of this patch.
      
      The added tuple lock semantics require some rejiggering of the multixact
      module, so that the locking level that each transaction is holding can
      be stored alongside its Xid.  Also, multixacts now need to persist
      across server restarts and crashes, because they can now represent not
      only tuple locks, but also tuple updates.  This means we need more
      careful tracking of lifetime of pg_multixact SLRU files; since they now
      persist longer, we require more infrastructure to figure out when they
      can be removed.  pg_upgrade also needs to be careful to copy
      pg_multixact files over from the old server to the new, or at least part
      of multixact.c state, depending on the versions of the old and new
      servers.
      
      Tuple time qualification rules (HeapTupleSatisfies routines) need to be
      careful not to consider tuples with the "is multi" infomask bit set as
      being only locked; they might need to look up MultiXact values (i.e.
      possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
      whereas they previously were assured to only use information readily
      available from the tuple header.  This is considered acceptable, because
      the extra I/O would involve cases that would previously cause some
      commands to block waiting for concurrent transactions to finish.
      
      Another important change is the fact that locking tuples that have
      previously been updated causes the future versions to be marked as
      locked, too; this is essential for correctness of foreign key checks.
      This causes additional WAL-logging, also (there was previously a single
      WAL record for a locked tuple; now there are as many as updated copies
      of the tuple there exist.)
      
      With all this in place, contention related to tuples being checked by
      foreign key rules should be much reduced.
      
      As a bonus, the old behavior that a subtransaction grabbing a stronger
      tuple lock than the parent (sub)transaction held on a given tuple and
      later aborting caused the weaker lock to be lost, has been fixed.
      
      Many new spec files were added for isolation tester framework, to ensure
      overall behavior is sane.  There's probably room for several more tests.
      
      There were several reviewers of this patch; in particular, Noah Misch
      and Andres Freund spent considerable time in it.  Original idea for the
      patch came from Simon Riggs, after a problem report by Joel Jacobson.
      Most code is from me, with contributions from Marti Raudsepp, Alexander
      Shulgin, Noah Misch and Andres Freund.
      
      This patch was discussed in several pgsql-hackers threads; the most
      important start at the following message-ids:
      	AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
      	1290721684-sup-3951@alvh.no-ip.org
      	1294953201-sup-2099@alvh.no-ip.org
      	1320343602-sup-2290@alvh.no-ip.org
      	1339690386-sup-8927@alvh.no-ip.org
      	4FE5FF020200002500048A3D@gw.wicourts.gov
      	4FEAB90A0200002500048B7D@gw.wicourts.gov
      0ac5ad51
  15. Dec 04, 2012
    • Heikki Linnakangas's avatar
      Track the timeline associated with minRecoveryPoint, for more sanity checks. · 5ce108bf
      Heikki Linnakangas authored
      This allows recovery to notice certain incorrect recovery scenarios.
      If a server has recovered to point X on timeline 5, and you restart
      recovery, it better be on timeline 5 when it reaches point X again, not on
      some timeline with a higher ID. This can happen e.g if you a standby server
      is shut down, a new timeline appears in the WAL archive, and the standby
      server is restarted. It will try to follow the new timeline, which is wrong
      because some WAL on the old timeline was already replayed before shutdown.
      
      Requires an initdb (or at least pg_resetxlog), because this adds a field to
      the control file.
      5ce108bf
  16. Jul 14, 2012
  17. Jun 24, 2012
    • Heikki Linnakangas's avatar
      Replace XLogRecPtr struct with a 64-bit integer. · 0ab9d1c4
      Heikki Linnakangas authored
      This simplifies code that needs to do arithmetic on XLogRecPtrs.
      
      To avoid changing on-disk format of data pages, the LSN on data pages is
      still stored in the old format. That should keep pg_upgrade happy. However,
      we have XLogRecPtrs embedded in the control file, and in the structs that
      are sent over the replication protocol, so this changes breaks compatibility
      of pg_basebackup and server. I didn't do anything about this in this patch,
      per discussion on -hackers, the right thing to do would to be to change the
      replication protocol to be architecture-independent, so that you could use
      a newer version of pg_receivexlog, for example, against an older server
      version.
      0ab9d1c4
  18. Jun 18, 2012
  19. Jan 25, 2012
  20. Aug 17, 2011
  21. Sep 20, 2010
  22. Jun 03, 2010
  23. Apr 28, 2010
    • Tom Lane's avatar
      Minor editorializing on pg_controldata and pg_resetxlog: adjust some message · c80a85e3
      Tom Lane authored
      wording, deal explicitly with some fields that were being silently left zero.
      c80a85e3
    • Tom Lane's avatar
      pg_controldata needs #define FRONTEND, same as pg_resetxlog. · 82e38aba
      Tom Lane authored
      Per buildfarm results from dawn_bat.
      82e38aba
    • Heikki Linnakangas's avatar
      Introduce wal_level GUC to explicitly control if information needed for · 9b8a7332
      Heikki Linnakangas authored
      archival or hot standby should be WAL-logged, instead of deducing that from
      other options like archive_mode. This replaces recovery_connections GUC in
      the primary, where it now has no effect, but it's still used in the standby
      to enable/disable hot standby.
      
      Remove the WAL-logging of "unlogged operations", like creating an index
      without WAL-logging and fsyncing it at the end. Instead, we keep a copy of
      the wal_mode setting and the settings that affect how much shared memory a
      hot standby server needs to track master transactions (max_connections,
      max_prepared_xacts, max_locks_per_xact) in pg_control. Whenever the settings
      change, at server restart, write a WAL record noting the new settings and
      update pg_control. This allows us to notice the change in those settings in
      the standby at the right moment, they used to be included in checkpoint
      records, but that meant that a changed value was not reflected in the
      standby until the first checkpoint after the change.
      
      Bump PG_CONTROL_VERSION and XLOG_PAGE_MAGIC. Whack XLOG_PAGE_MAGIC back to
      the sequence it used to follow, before hot standby and subsequent patches
      changed it to 0x9003.
      9b8a7332
  24. Jan 04, 2010
    • Heikki Linnakangas's avatar
      Write an end-of-backup WAL record at pg_stop_backup(), and wait for it at · 06f82b29
      Heikki Linnakangas authored
      recovery instead of reading the backup history file. This is more robust,
      as it stops you from prematurely starting up an inconsisten cluster if the
      backup history file is lost for some reason, or if the base backup was
      never finished with pg_stop_backup().
      
      This also paves the way for a simpler streaming replication patch, which
      doesn't need to care about backup history files anymore.
      
      The backup history file is still created and archived as before, but it's
      not used by the system anymore. It's just for informational purposes now.
      
      Bump PG_CONTROL_VERSION as the location of the backup startpoint is now
      written to a new field in pg_control, and catversion because initdb is
      required
      
      Original patch by Fujii Masao per Simon's idea, with further fixes by me.
      06f82b29
  25. Dec 19, 2009
    • Simon Riggs's avatar
      Allow read only connections during recovery, known as Hot Standby. · efc16ea5
      Simon Riggs authored
      Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
      
      New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
      
      This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
      
      Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
      
      Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
      efc16ea5
  26. Aug 31, 2009
    • Tom Lane's avatar
      Track the current XID wrap limit (or more accurately, the oldest unfrozen · 25ec228e
      Tom Lane authored
      XID) in checkpoint records.  This eliminates the need to recompute the value
      from scratch during database startup, which is one of the two remaining
      reasons for the flatfile code to exist.  It should also simplify life for
      hot-standby operation.
      
      To avoid bloating the checkpoint records unreasonably, I switched from
      tracking the oldest database by name to tracking it by OID.  This turns
      out to save cycles in general (everywhere but the warning-generating
      paths, which we hardly care about) and also helps us deal with the case
      that the oldest database got dropped instead of being vacuumed.  The prior
      coding might go for a long time without updating the wrap limit in that case,
      which is bad because it might result in a lot of useless autovacuum activity.
      25ec228e
  27. Jun 11, 2009
  28. Dec 11, 2008
  29. Sep 24, 2008
  30. Sep 23, 2008
  31. Apr 21, 2008
    • Tom Lane's avatar
      Allow float8, int8, and related datatypes to be passed by value on machines · 8472bf7a
      Tom Lane authored
      where Datum is 8 bytes wide.  Since this will break old-style C functions
      (those still using version 0 calling convention) that have arguments or
      results of these types, provide a configure option to disable it and retain
      the old pass-by-reference behavior.  Likewise, provide a configure option
      to disable the recently-committed float4 pass-by-value change.
      
      Zoltan Boszormenyi, plus configurability stuff by me.
      8472bf7a
  32. Mar 27, 2008
    • Tom Lane's avatar
      Reduce the need for frontend programs to include "postgres.h" by refactoring · 039dfbfd
      Tom Lane authored
      inclusions in src/include/catalog/*.h files.  The main idea here is to push
      function declarations for src/backend/catalog/*.c files into separate headers,
      rather than sticking them into the corresponding catalog definition file as
      has been done in the past.  This commit only carries out that idea fully for
      pg_proc, pg_type and pg_conversion, but that's enough for the moment ---
      if pg_list.h ever becomes unsafe for frontend code to include, we'll need
      to work a bit more.
      
      Zdenek Kotala
      039dfbfd
  33. Feb 17, 2008
    • Tom Lane's avatar
      Replace time_t with pg_time_t (same values, but always int64) in on-disk · cd004067
      Tom Lane authored
      data structures and backend internal APIs.  This solves problems we've seen
      recently with inconsistent layout of pg_control between machines that have
      32-bit time_t and those that have already migrated to 64-bit time_t.  Also,
      we can get out from under the problem that Windows' Unix-API emulation is not
      consistent about the width of time_t.
      
      There are a few remaining places where local time_t variables are used to hold
      the current or recent result of time(NULL).  I didn't bother changing these
      since they do not affect any cross-module APIs and surely all platforms will
      have 64-bit time_t before overflow becomes an actual risk.  time_t should
      be avoided for anything visible to extension modules, however.
      cd004067
  34. Jan 21, 2008
  35. Apr 03, 2007
    • Tom Lane's avatar
      Decouple the values of TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE. · b3005276
      Tom Lane authored
      Add the latter to the values checked in pg_control, since it can't be changed
      without invalidating toast table content.  This commit in itself shouldn't
      change any behavior, but it lays some necessary groundwork for experimentation
      with these toast-control numbers.
      
      Note: while TOAST_TUPLE_THRESHOLD can now be changed without initdb, some
      thought still needs to be given to needs_toast_table() in toasting.c before
      unleashing random changes.
      b3005276
  36. Mar 18, 2007
Loading