Skip to content
Snippets Groups Projects
  • Tom Lane's avatar
    9b2eacba
    Fix treatment of *lpNumberOfBytesRecvd == 0: that's a completion condition. · 9b2eacba
    Tom Lane authored
    pgwin32_recv() has treated a non-error return of zero bytes from WSARecv()
    as being a reason to block ever since the current implementation was
    introduced in commit a4c40f14.  However, so far as one can tell
    from Microsoft's documentation, that is just wrong: what it means is
    graceful connection closure (in stream protocols) or receipt of a
    zero-length message (in message protocols), and neither case should result
    in blocking here.  The only reason the code worked at all was that control
    then fell into the retry loop, which did *not* treat zero bytes specially,
    so we'd get out after only wasting some cycles.  But as of 9.5 we do not
    normally reach the retry loop and so the bug is exposed, as reported by
    Shay Rojansky and diagnosed by Andres Freund.
    
    Remove the unnecessary test on the byte count, and rearrange the code
    in the retry loop so that it looks identical to the initial sequence.
    
    Back-patch of commit 90e61df8.  The
    original plan was to apply this only to 9.5 and up, but after discussion
    and buildfarm testing, it seems better to back-patch.  The noblock code
    path has been at risk of this problem since it was introduced (in 9.0);
    if it did happen in pre-9.5 branches, the symptom would be that a walsender
    would wait indefinitely rather than noticing a loss of connection.  While
    we lack proof that the case has been seen in the field, it seems possible
    that it's happened without being reported.
    9b2eacba
    History
    Fix treatment of *lpNumberOfBytesRecvd == 0: that's a completion condition.
    Tom Lane authored
    pgwin32_recv() has treated a non-error return of zero bytes from WSARecv()
    as being a reason to block ever since the current implementation was
    introduced in commit a4c40f14.  However, so far as one can tell
    from Microsoft's documentation, that is just wrong: what it means is
    graceful connection closure (in stream protocols) or receipt of a
    zero-length message (in message protocols), and neither case should result
    in blocking here.  The only reason the code worked at all was that control
    then fell into the retry loop, which did *not* treat zero bytes specially,
    so we'd get out after only wasting some cycles.  But as of 9.5 we do not
    normally reach the retry loop and so the bug is exposed, as reported by
    Shay Rojansky and diagnosed by Andres Freund.
    
    Remove the unnecessary test on the byte count, and rearrange the code
    in the retry loop so that it looks identical to the initial sequence.
    
    Back-patch of commit 90e61df8.  The
    original plan was to apply this only to 9.5 and up, but after discussion
    and buildfarm testing, it seems better to back-patch.  The noblock code
    path has been at risk of this problem since it was introduced (in 9.0);
    if it did happen in pre-9.5 branches, the symptom would be that a walsender
    would wait indefinitely rather than noticing a loss of connection.  While
    we lack proof that the case has been seen in the field, it seems possible
    that it's happened without being reported.