<!--
$Header: /cvsroot/pgsql/doc/src/sgml/mvcc.sgml,v 2.24 2002/03/27 02:36:51 petere Exp $
-->

 <chapter id="mvcc">
  <title>Multiversion Concurrency Control</title>

  <indexterm>
   <primary>concurrency</primary>
  </indexterm>

  <abstract>
   <para>
    Multiversion Concurrency Control
    (MVCC)
    is an advanced technique for improving database performance in a
    multiuser environment. 
    Vadim Mikheev (<email>vadim@krs.ru</email>) provided
    the implementation for <productname>PostgreSQL</productname>.
   </para>
  </abstract>

  <sect1 id="mvcc-intro">
   <title>Introduction</title>

   <para>
    Unlike most other database systems which use locks for concurrency control,
    <productname>PostgreSQL</productname>
    maintains data consistency by using a multiversion model. 
    This means that while querying a database each transaction sees
    a snapshot of data (a <firstterm>database version</firstterm>)
    as it was some
    time ago, regardless of the current state of the underlying data.
    This protects the transaction from viewing inconsistent data that
    could be caused by (other) concurrent transaction updates on the same
    data rows, providing <firstterm>transaction isolation</firstterm>
    for each database session.
   </para>

   <para>
    The main difference between multiversion and lock models is that
    in MVCC locks acquired for querying (reading) data don't conflict
    with locks acquired for writing data, and so reading never blocks
    writing and writing never blocks reading.
   </para>
  </sect1>

  <sect1 id="transaction-iso">
   <title>Transaction Isolation</title>

   <para>
    The <acronym>ANSI</acronym>/<acronym>ISO</acronym> <acronym>SQL</acronym>
    standard defines four levels of transaction
    isolation in terms of three phenomena that must be prevented 
    between concurrent transactions.
    These undesirable phenomena are:

    <variablelist>
     <varlistentry>
      <term>
       dirty reads
       <indexterm><primary>dirty reads</primary></indexterm>
      </term>
     <listitem>
      <para>
	A transaction reads data written by concurrent uncommitted transaction.
       </para>
      </listitem>
     </varlistentry>

     <varlistentry>
      <term>
       non-repeatable reads
       <indexterm><primary>non-repeatable reads</primary></indexterm>
      </term>
     <listitem>
      <para>
	A transaction re-reads data it has previously read and finds that data
	has been modified by another transaction (that committed since the
	initial read).
       </para>
      </listitem>
     </varlistentry>

     <varlistentry>
      <term>
       phantom read
       <indexterm><primary>phantom reads</primary></indexterm>
      </term>
     <listitem>
      <para>
	A transaction re-executes a query returning a set of rows that satisfy a
	search condition and finds that the set of rows satisfying the condition
	has changed due to another recently-committed transaction.
       </para>
      </listitem>
     </varlistentry>
    </variablelist>
   </para>

   <para>
    <indexterm>
     <primary>isolation levels</primary>
    </indexterm>
    The four transaction isolation levels and the corresponding
    behaviors are described in <xref linkend="mvcc-isolevel-table">.

    <table tocentry="1" id="mvcc-isolevel-table">
     <title><acronym>SQL</acronym> Transaction Isolation Levels</title>
     <titleabbrev>Isolation Levels</titleabbrev>
     <tgroup cols="4">
      <thead>
       <row>
	<entry>
         Isolation Level
	</entry>
	<entry>
	 Dirty Read
	</entry>
	<entry>
	 Non-Repeatable Read
	</entry>
	<entry>
	 Phantom Read
	</entry>
       </row>
      </thead>
      <tbody>
       <row>
	<entry>
	 Read uncommitted
	</entry>
	<entry>
	 Possible
	</entry>
	<entry>
	 Possible
	</entry>
	<entry>
	 Possible
	</entry>
       </row>

       <row>
	<entry>
	 Read committed
	</entry>
	<entry>
	 Not possible
	</entry>
	<entry>
	 Possible
	</entry>
	<entry>
	 Possible
	</entry>
       </row>

       <row>
	<entry>
	 Repeatable read
	</entry>
	<entry>
	 Not possible
	</entry>
	<entry>
	 Not possible
	</entry>
	<entry>
	 Possible
	</entry>
       </row>

       <row>
	<entry>
	 Serializable
	</entry>
	<entry>
	 Not possible
	</entry>
	<entry>
	 Not possible
	</entry>
	<entry>
	 Not possible
	</entry>
       </row>
      </tbody>
     </tgroup>
    </table>
   </para>

   <para>
    <productname>PostgreSQL</productname>
    offers the read committed and serializable isolation levels.
   </para>
  </sect1>

  <sect1 id="xact-read-committed">
   <title>Read Committed Isolation Level</title>

   <indexterm>
    <primary>isolation levels</primary>
    <secondary>read committed</secondary>
   </indexterm>

   <para>
    <firstterm>Read Committed</firstterm>
    is the default isolation level in <productname>PostgreSQL</productname>. 
    When a transaction runs on this isolation level,
    a <command>SELECT</command> query sees only data committed before the
    query began and never sees either uncommitted data or changes committed
    during query execution by concurrent transactions.  (However, the
    <command>SELECT</command> does see the effects of previous updates
    executed within this same transaction, even though they are not yet
    committed.)  Notice that two successive <command>SELECT</command>s can see different data,
    even though they are within a single transaction, when other transactions
    commit changes during execution of the first <command>SELECT</command>.
   </para>

   <para>
    If a target row found by a query while executing an
    <command>UPDATE</command> statement
    (or <command>DELETE</command> or <command>SELECT FOR UPDATE</command>)
    has already been updated by a
    concurrent uncommitted transaction then the second transaction
    that tries to update this row will wait for the other transaction to
    commit or rollback. In the case of rollback, the waiting transaction
    can proceed to change the row. In the case of commit (and if the
    row still exists; i.e. was not deleted by the other transaction), the
    query will be re-executed for this row to check that the new row
    version still satisfies the query search condition. If the new row version
    satisfies the query search condition then the row will be
    updated (or deleted or marked for update).  Note that the starting point
    for the update will be the new row version; moreover, after the update
    the doubly-updated row is visible to subsequent <command>SELECT</command>s
    in the current transaction.  Thus, the current transaction is able to see
    the effects of the other transaction for this specific row.
   </para>

   <para>
    The partial transaction isolation provided by Read Committed level is
    adequate for many applications, and this level is fast and simple to use.
    However, for applications that do complex queries and updates, it may
    be necessary to guarantee a more rigorously consistent view of the
    database than the Read Committed level provides.
   </para>
  </sect1>

  <sect1 id="xact-serializable">
   <title>Serializable Isolation Level</title>

   <indexterm>
    <primary>isolation levels</primary>
    <secondary>read serializable</secondary>
   </indexterm>

   <para>
    <firstterm>Serializable</firstterm> provides the highest transaction
    isolation.  This level emulates serial transaction execution,
    as if transactions had been executed one after another, serially,
    rather than concurrently.  However, applications using this level must
    be prepared to retry transactions due to serialization failures.
   </para>

   <para>
    When a transaction is on the serializable level,
    a <command>SELECT</command> query sees only data committed before the
    transaction began and never sees either uncommitted data or changes
    committed
    during transaction execution by concurrent transactions.  (However, the
    <command>SELECT</command> does see the effects of previous updates
    executed within this same transaction, even though they are not yet
    committed.)  This is different from Read Committed in that the
    <command>SELECT</command>
    sees a snapshot as of the start of the transaction, not as of the start
    of the current query within the transaction.
   </para>

   <para>
    If a target row found by a query while executing an
    <command>UPDATE</command> statement
    (or <command>DELETE</command> or <command>SELECT FOR UPDATE</command>)
    has already been updated by a
    concurrent uncommitted transaction then the second transaction
    that tries to update this row will wait for the other transaction to
    commit or rollback. In the case of rollback, the waiting transaction
    can proceed to change the row. In the case of a concurrent
    transaction commit, a serializable transaction will be rolled back
    with the message

<screen>
ERROR:  Can't serialize access due to concurrent update
</screen>

    because a serializable transaction cannot modify rows changed by
    other transactions after the serializable transaction began.
   </para>

   <para>
    When the application receives this error message, it should abort
    the current transaction and then retry the whole transaction from
    the beginning.  The second time through, the transaction sees the
    previously-committed change as part of its initial view of the database,
    so there is no logical conflict in using the new version of the row
    as the starting point for the new transaction's update.
    Note that only updating transactions may need to be retried --- read-only
    transactions never have serialization conflicts.
   </para>

   <para>
    The Serializable transaction level provides a rigorous guarantee that each
    transaction sees a wholly consistent view of the database.  However,
    the application has to be prepared to retry transactions when concurrent
    updates make it impossible to sustain the illusion of serial execution,
    and the cost of redoing complex transactions may be significant.  So
    this level is recommended only when update queries contain logic
    sufficiently complex that they may give wrong answers in the Read Committed
    level.
   </para>
  </sect1>

  <sect1 id="applevel-consistency">
   <title>Data consistency checks at the application level</title>

   <para>
    Because readers in <productname>PostgreSQL</productname>
    don't lock data, regardless of
    transaction isolation level, data read by one transaction can be
    overwritten by another concurrent transaction. In other words,
    if a row is returned by <command>SELECT</command> it doesn't mean that
    the row still exists at the time it is returned (i.e., sometime after the
    current transaction began); the row might have been modified or deleted
    by an already-committed transaction that committed after this one started.
    Even if the row is still valid <quote>now</quote>, it could be changed or deleted
    before the current transaction does a commit or rollback.
   </para>

   <para>
    Another way to think about it is that each
    transaction sees a snapshot of the database contents, and concurrently
    executing transactions may very well see different snapshots.  So the
    whole concept of <quote>now</quote> is somewhat suspect anyway.  This is not normally
    a big problem if the client applications are isolated from each other,
    but if the clients can communicate via channels outside the database
    then serious confusion may ensue.
   </para>

   <para>
    To ensure the current existence of a row and protect it against
    concurrent updates one must use <command>SELECT FOR UPDATE</command> or
    an appropriate <command>LOCK TABLE</command> statement.
    (<command>SELECT FOR UPDATE</command> locks just the returned rows against
    concurrent updates, while <command>LOCK TABLE</command> protects the
    whole table.)
    This should be taken into account when porting applications to
    <productname>PostgreSQL</productname> from other environments.

    <note>
     <para>
      Before version 6.5 <productname>PostgreSQL</productname>
      used read-locks and so the
      above consideration is also the case
      when upgrading to 6.5 (or higher) from previous
      <productname>PostgreSQL</productname> versions.
     </para>
    </note>
   </para>
  </sect1>

  <sect1 id="locking-tables">
   <title>Locking and Tables</title>

   <indexterm>
    <primary>locking</primary>
   </indexterm>

   <para>
    <productname>PostgreSQL</productname> provides various lock modes
    to control concurrent access to data in tables.  Users normally
    need not be concerned about the different lock modes because
    <productname>PostgreSQL</productname> commands automatically
    acquire locks of appropriate modes to ensure data integrity while
    permitting an appropriate level of concurrent access.
    Nevertheless, a user can manually lock a table in any of the
    available modes using the <command>LOCK TABLE</command> command.
   </para>

   <para>
    The list below shows the available lock modes and the contexts in
    which they are used.  Remember that all of these lock modes are
    table-level locks, even if the name contains the word
    <quote>row</quote>.  The names of the lock modes are historical.
   </para>

     <variablelist>
      <title>Table-level lock modes</title>
      <varlistentry>
       <term>
	<literal>ACCESS SHARE</literal>
       </term>
       <listitem>
	<para>
	 A read-lock mode acquired automatically on tables
	 being queried.
	</para>

	<para>
	 Conflicts with the <literal>ACCESS EXCLUSIVE</literal> lock
	 mode only.
	</para>
       </listitem>
      </varlistentry>

      <varlistentry>
       <term>
	<literal>ROW SHARE</literal>
       </term>
       <listitem>
	<para>
	 The <command>SELECT FOR UPDATE</command> command acquires a
	 lock of this mode.
	</para>

	<para>
	 Conflicts with the <literal>EXCLUSIVE</literal> and
	 <literal>ACCESS EXCLUSIVE</literal> lock modes.
	</para>
       </listitem>
      </varlistentry>

      <varlistentry>
       <term>
	<literal>ROW EXCLUSIVE</literal>
       </term>
       <listitem>
	<para>
	 The commands <command>UPDATE</command>,
	 <command>DELETE</command>, and <command>INSERT</command>
	 automatically acquire this lock mode.
	</para>

	<para>
	 Conflicts with the <literal>SHARE</literal>, <literal>SHARE ROW
	 EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
	 <literal>ACCESS EXCLUSIVE</literal> lock modes.
	</para>
       </listitem>
      </varlistentry>

      <varlistentry>
       <term>
	<literal>SHARE UPDATE EXCLUSIVE</literal>
       </term>
       <listitem>
	<para>
	 Acquired by <command>VACUUM</command> (without <option>FULL</option>).
	</para>

	<para>
	 Conflicts with the <literal>SHARE UPDATE EXCLUSIVE</literal>,
	 <literal>SHARE</literal>, <literal>SHARE ROW
	 EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
	 <literal>ACCESS EXCLUSIVE</literal> lock modes.
	</para>
       </listitem>
      </varlistentry>

      <varlistentry>
       <term>
	<literal>SHARE</literal>
       </term>
       <listitem>
	<para>
	 Acquired by <command>CREATE INDEX</command>.
	</para>

	<para>
	 Conflicts with the <literal>ROW EXCLUSIVE</literal>,
	 <literal>SHARE UPDATE EXCLUSIVE</literal>, <literal>SHARE ROW
	 EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
	 <literal>ACCESS EXCLUSIVE</literal> lock modes.
	</para>
       </listitem>
      </varlistentry>

      <varlistentry>
       <term>
	<literal>SHARE ROW EXCLUSIVE</literal>
       </term>
       <listitem>
	<para>
         This lock mode is not automatically acquired by any command.
	</para>

	<para>
	 Conflicts with the <literal>ROW EXCLUSIVE</literal>,
	 <literal>SHARE UPDATE EXCLUSIVE</literal>,
	 <literal>SHARE</literal>, <literal>SHARE ROW
	 EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
	 <literal>ACCESS EXCLUSIVE</literal> lock modes.
	</para>
       </listitem>
      </varlistentry>

      <varlistentry>
       <term>
	<literal>EXCLUSIVE LOCK</literal>
       </term>
       <listitem>
	<para>
         This lock mode is not automatically acquired by any command.
	</para>

	<para>
	 Conflicts with the <literal>ROW SHARE</literal>, <literal>ROW
	 EXCLUSIVE</literal>, <literal>SHARE UPDATE
	 EXCLUSIVE</literal>, <literal>SHARE</literal>, <literal>SHARE
	 ROW EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
	 <literal>ACCESS EXCLUSIVE</literal> lock modes.
	</para>
       </listitem>
      </varlistentry>

      <varlistentry>
       <term>
	<literal>ACCESS EXCLUSIVE</literal>
       </term>
       <listitem>
	<para>
	 Acquired by the <command>ALTER TABLE</command>, <command>DROP
	 TABLE</command>, and <command>VACUUM FULL</command> commands.
	 This is also the default lock mode for <command>LOCK TABLE</command>
         statements that do not specify a mode explicitly.
	</para>

	<para>
	 Conflicts with locks of all modes ( <literal>ACCESS
	 SHARE</literal>, <literal>ROW SHARE</literal>, <literal>ROW
	 EXCLUSIVE</literal>, <literal>SHARE UPDATE
	 EXCLUSIVE</literal>, <literal>SHARE</literal>, <literal>SHARE
	 ROW EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
	 <literal>ACCESS EXCLUSIVE</literal>).
	</para>
       </listitem>
      </varlistentry>
     </variablelist>

     <note>
      <para>
       Only an <literal>ACCESS EXCLUSIVE</literal> lock blocks a
       <command>SELECT</command> (without <option>FOR UPDATE</option>)
       statement.
      </para>
     </note>

    <para>
     In addition to table-level locks, there are row-level locks.
     Row-level locks are acquired when rows are being updated (or
     deleted or marked for update).  Row-level locks don't affect data
     querying; they block writers to <emphasis>the same row</emphasis>
     only.  Row-level locks cannot be acquired explicitly by the user.
    </para>

    <para>
     <productname>PostgreSQL</productname> doesn't remember any
     information about modified rows in memory, so is has no limit to
     the number of rows locked at one time.  However, locking a row
     may cause a disk write; thus, for example, <command>SELECT FOR
     UPDATE</command> will modify selected rows to mark them and so
     will result in disk writes.
    </para>

    <para>
    In addition to table and row locks, short-term share/exclusive locks are
    used to control read/write access to table pages in the shared buffer
    pool.  These locks are released immediately after a tuple is fetched or
    updated.  Application writers normally need not be concerned with
    page-level locks, but we mention them for completeness.
   </para>
  </sect1>

  <sect1 id="locking-indexes">
   <title>Locking and Indexes</title>

   <para>
    Though <productname>PostgreSQL</productname>
    provides nonblocking read/write access to table
    data, nonblocking read/write access is not currently offered for every
    index access method implemented
    in <productname>PostgreSQL</productname>.
   </para>

   <para>
    The various index types are handled as follows:

    <variablelist>
     <varlistentry>
      <term>
       <acronym>GiST</acronym> and R-Tree indexes
      </term>
      <listitem>
       <para>
	Share/exclusive index-level locks are used for read/write access.
	Locks are released after statement is done.
       </para>
      </listitem>
     </varlistentry>

     <varlistentry>
      <term>
       Hash indexes
      </term>
      <listitem>
       <para>
	Share/exclusive page-level locks are used for read/write access.
	Locks are released after page is processed.
       </para>

       <para>
	Page-level locks provide better concurrency than index-level ones
	but are subject to deadlocks.
       </para>
      </listitem>
     </varlistentry>

     <varlistentry>
      <term>
       B-tree indexes
      </term>
      <listitem>
       <para>
	Short-term share/exclusive page-level locks are used for
	read/write access. Locks are released immediately after each index
	tuple is fetched/inserted.
       </para>

       <para>
	B-tree indexes provide the highest concurrency without deadlock
	conditions.
       </para>
      </listitem>
     </varlistentry>
    </variablelist>
   </para>

   <para>
    In short, B-tree indexes are the recommended index type for concurrent
    applications.
   </para>
  </sect1>
 </chapter>

<!-- Keep this comment at the end of the file
Local variables:
mode:sgml
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"./reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:("/usr/lib/sgml/catalog")
sgml-local-ecat-files:nil
End:
-->