Skip to content
Snippets Groups Projects
Select Git revision
  • benchmark-tools
  • postgres-lambda
  • master default
  • REL9_4_25
  • REL9_5_20
  • REL9_6_16
  • REL_10_11
  • REL_11_6
  • REL_12_1
  • REL_12_0
  • REL_12_RC1
  • REL_12_BETA4
  • REL9_4_24
  • REL9_5_19
  • REL9_6_15
  • REL_10_10
  • REL_11_5
  • REL_12_BETA3
  • REL9_4_23
  • REL9_5_18
  • REL9_6_14
  • REL_10_9
  • REL_11_4
23 results

checkpointer.c

Blame
    • Tom Lane's avatar
      a1695035
      Make checkpoint requests more robust. · a1695035
      Tom Lane authored
      Commit 6f6a6d8b introduced a delay of up to 2 seconds if we're trying
      to request a checkpoint but the checkpointer hasn't started yet (or,
      much less likely, our kill() call fails).  However buildfarm experience
      shows that that's not quite enough for slow or heavily-loaded machines.
      There's no good reason to assume that the checkpointer won't start
      eventually, so we may as well make the timeout much longer, say 60 sec.
      
      However, if the caller didn't say CHECKPOINT_WAIT, it seems like a bad
      idea to be waiting at all, much less for as long as 60 sec.  We can
      remove the need for that, and make this whole thing more robust, by
      adjusting the code so that the existence of a pending checkpoint
      request is clear from the contents of shared memory, and making sure
      that the checkpointer process will notice it at startup even if it did
      not get a signal.  In this way there's no need for a non-CHECKPOINT_WAIT
      call to wait at all; if it can't send the signal, it can nonetheless
      assume that the checkpointer will eventually service the request.
      
      A potential downside of this change is that "kill -INT" on the checkpointer
      process is no longer enough to trigger a checkpoint, should anyone be
      relying on something so hacky.  But there's no obvious reason to do it
      like that rather than issuing a plain old CHECKPOINT command, so we'll
      assume that nobody is.  There doesn't seem to be a way to preserve this
      undocumented quasi-feature without introducing race conditions.
      
      Since a principal reason for messing with this is to prevent intermittent
      buildfarm failures, back-patch to all supported branches.
      
      Discussion: https://postgr.es/m/27830.1552752475@sss.pgh.pa.us
      a1695035
      History
      Make checkpoint requests more robust.
      Tom Lane authored
      Commit 6f6a6d8b introduced a delay of up to 2 seconds if we're trying
      to request a checkpoint but the checkpointer hasn't started yet (or,
      much less likely, our kill() call fails).  However buildfarm experience
      shows that that's not quite enough for slow or heavily-loaded machines.
      There's no good reason to assume that the checkpointer won't start
      eventually, so we may as well make the timeout much longer, say 60 sec.
      
      However, if the caller didn't say CHECKPOINT_WAIT, it seems like a bad
      idea to be waiting at all, much less for as long as 60 sec.  We can
      remove the need for that, and make this whole thing more robust, by
      adjusting the code so that the existence of a pending checkpoint
      request is clear from the contents of shared memory, and making sure
      that the checkpointer process will notice it at startup even if it did
      not get a signal.  In this way there's no need for a non-CHECKPOINT_WAIT
      call to wait at all; if it can't send the signal, it can nonetheless
      assume that the checkpointer will eventually service the request.
      
      A potential downside of this change is that "kill -INT" on the checkpointer
      process is no longer enough to trigger a checkpoint, should anyone be
      relying on something so hacky.  But there's no obvious reason to do it
      like that rather than issuing a plain old CHECKPOINT command, so we'll
      assume that nobody is.  There doesn't seem to be a way to preserve this
      undocumented quasi-feature without introducing race conditions.
      
      Since a principal reason for messing with this is to prevent intermittent
      buildfarm failures, back-patch to all supported branches.
      
      Discussion: https://postgr.es/m/27830.1552752475@sss.pgh.pa.us