diff --git a/doc/TODO b/doc/TODO index 601e1bdf2a8b84378eefaba65601e0c273b82cb6..0396f4bd2297a149c1eb44de33482711e16fbc0e 100644 --- a/doc/TODO +++ b/doc/TODO @@ -1,6 +1,6 @@ TODO list for PostgreSQL ======================== -Last updated: Wed Jan 24 08:38:34 EST 2001 +Last updated: Wed Jan 24 09:24:35 EST 2001 Current maintainer: Bruce Momjian (pgman@candle.pha.pa.us) @@ -303,7 +303,7 @@ MISC connection pooling * Add SET PERFORMANCE_TIPS option to suggest INDEX, VACUUM, VACUUM ANALYZE, and CLUSTER -* Delay fsync() when other backends are about to commit too +* Delay fsync() when other backends are about to commit too [fsync] SOURCE CODE ----------- diff --git a/doc/TODO.detail/fsync b/doc/TODO.detail/fsync new file mode 100644 index 0000000000000000000000000000000000000000..6163dc431906aa31731addd3c69241e17cbc845f --- /dev/null +++ b/doc/TODO.detail/fsync @@ -0,0 +1,129 @@ +From pgsql-hackers-owner+M908@postgresql.org Sun Nov 19 14:27:43 2000 +Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28]) + by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id OAA10885 + for <pgman@candle.pha.pa.us>; Sun, 19 Nov 2000 14:27:42 -0500 (EST) +Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28]) + by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eAJJSMs83653; + Sun, 19 Nov 2000 14:28:22 -0500 (EST) + (envelope-from pgsql-hackers-owner+M908@postgresql.org) +Received: from candle.pha.pa.us (candle.navpoint.com [162.33.245.46] (may be forged)) + by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id eAJJQns83565 + for <pgsql-hackers@postgreSQL.org>; Sun, 19 Nov 2000 14:26:49 -0500 (EST) + (envelope-from pgman@candle.pha.pa.us) +Received: (from pgman@localhost) + by candle.pha.pa.us (8.9.0/8.9.0) id OAA06790; + Sun, 19 Nov 2000 14:23:06 -0500 (EST) +From: Bruce Momjian <pgman@candle.pha.pa.us> +Message-Id: <200011191923.OAA06790@candle.pha.pa.us> +Subject: Re: [HACKERS] WAL fsync scheduling +In-Reply-To: <002101c0525e$2d964480$b97a30d0@sectorbase.com> "from Vadim Mikheev + at Nov 19, 2000 11:23:19 am" +To: Vadim Mikheev <vmikheev@sectorbase.com> +Date: Sun, 19 Nov 2000 14:23:06 -0500 (EST) +CC: Tom Samplonius <tom@sdf.com>, Alfred@candle.pha.pa.us, + Perlstein <bright@wintelcom.net>, Larry@candle.pha.pa.us, + Rosenman <ler@lerctr.org>, + PostgreSQL-development <pgsql-hackers@postgresql.org> +X-Mailer: ELM [version 2.4ME+ PL77 (25)] +MIME-Version: 1.0 +Content-Transfer-Encoding: 7bit +Content-Type: text/plain; charset=US-ASCII +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +Status: OR + +[ Charset ISO-8859-1 unsupported, converting... ] +> > There are two parts to transaction commit. The first is writing all +> > dirty buffers or log changes to the kernel, and second is fsync of the +> ^^^^^^^^^^^^ +> Backend doesn't write any dirty buffer to the kernel at commit time. + +Yes, I suspected that. + +> +> > log file. +> +> The first part is writing commit record into WAL buffers in shmem. +> This is what XLogInsert does. After that XLogFlush is called to ensure +> that entire commit record is on disk. XLogFlush does *both* write() and +> fsync() (single slock is used for both writing and fsyncing) if it needs to +> do it at all. + +Yes, I realize there are new steps in WAL. + +> +> > I suggest having a per-backend shared memory byte that has the following +> > values: +> > +> > START_LOG_WRITE +> > WAIT_ON_FSYNC +> > NOT_IN_COMMIT +> > backend_number_doing_fsync +> > +> > I suggest that when each backend starts a commit, it sets its byte to +> > START_LOG_WRITE. +> ^^^^^^^^^^^^^^^^^^^^^^^ +> Isn't START_COMMIT more meaningful? + +Yes. + +> +> > When it gets ready to fsync, it checks all backends. +> ^^^^^^^^^^^^^^^^^^^^^^^^^^ +> What do you mean by this? The moment just after XLogInsert? + +Just before it calls fsync(). + +> +> > If all are NOT_IN_COMMIT, it does fsync and continues. +> +> 1st edition: +> > If one or more are in START_LOG_WRITE, it waits until no one is in +> > START_LOG_WRITE. It then checks all WAIT_ON_FSYNC, and if it is the +> > lowest backend in WAIT_ON_FSYNC, marks all others with its backend +> > number, and does fsync. It then clears all backends with its number to +> > NOT_IN_COMMIT. Other backend will see they are not the lowest +> > WAIT_ON_FSYNC and will wait for their byte to be set to NOT_IN_COMMIT +> > so they can then continue, knowing their data was synced. +> +> 2nd edition: +> > I have another idea. If a backend gets to the point that it needs +> > fsync, and there is another backend in START_LOG_WRITE, it can go to an +> > interuptable sleep, knowing another backend will perform the fsync and +> > wake it up. Therefore, there is no busy-wait or timed sleep. +> > +> > Of course, a backend must set its status to WAIT_ON_FSYNC to avoid a +> > race condition. +> +> The 2nd edition is much better. But I'm not sure do we really need in +> these per-backend bytes in shmem. Why not just have some counters? +> We can use a semaphore to wake-up all waiters at once. + +Yes, that is much better and clearer. My idea was just to say, "if no +one is entering commit phase, do the commit. If someone else is coming, +sleep and wait for them to do the fsync and wake me up with a singal." + +> +> > This allows a single backend not to sleep, and allows multiple backends +> > to bunch up only when they are all about to commit. +> > +> > The reason backend numbers are written is so other backends entering the +> > commit code will not interfere with the backends performing fsync. +> +> Being waked-up backend can check what's written/fsynced by calling XLogFlush. + +Seems that may not be needed anymore with a counter. The only issue is +that other backends may enter commit while fsync() is happening. The +process that did the fsync must be sure to wake up only the backends +that were waiting for it, and not other backends that may be also be +doing fsync as a group while the first fsync was happening. I leave +those details to people more experienced. :-) + +I am just glad people liked my idea. + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 853-3000 + + If your life is a hard drive, | 830 Blythe Avenue + + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 +