From d1fcd337e0a44aa5981f87bd315da6fb8c948019 Mon Sep 17 00:00:00 2001 From: Bruce Momjian <bruce@momjian.us> Date: Thu, 20 Jun 2002 21:48:47 +0000 Subject: [PATCH] Add new documentation on page format. Martijn van Ooster --- doc/src/sgml/page.sgml | 322 ++++++++++++++++++++++++++++++----------- 1 file changed, 234 insertions(+), 88 deletions(-) diff --git a/doc/src/sgml/page.sgml b/doc/src/sgml/page.sgml index bb82142e611..7551085dc94 100644 --- a/doc/src/sgml/page.sgml +++ b/doc/src/sgml/page.sgml @@ -22,9 +22,13 @@ refers to data that is stored in <productname>PostgreSQL</productname> tables. </para> <para> -<xref linkend="page-table"> shows how pages in both normal <productname>PostgreSQL</productname> tables - and <productname>PostgreSQL</productname> indexes -(e.g., a B-tree index) are structured. + +<xref linkend="page-table"> shows how pages in both normal + <productname>PostgreSQL</productname> tables and + <productname>PostgreSQL</productname> indexes (e.g., a B-tree index) +are structured. This structure is also used for toast tables and sequences. +There are five parts to each page. + </para> <table tocentry="1" id="page-table"> @@ -43,113 +47,255 @@ Item <tbody> <row> -<entry>itemPointerData</entry> -</row> - -<row> -<entry>filler</entry> + <entry>PageHeaderData</entry> + <entry>20 bytes long. Contains general information about the page to allow to access it.</entry> </row> <row> -<entry>itemData...</entry> +<entry>itemPointerData</entry> +<entry>List of (offset,length) pairs pointing to the actual item.</entry> </row> <row> -<entry>Unallocated Space</entry> +<entry>Free space</entry> +<entry>The unallocated space. All new tuples are allocated from here, generally from the end.</entry> </row> <row> -<entry>ItemContinuationData</entry> +<entry>items</entry> +<entry>The actual items themselves. Different access method have different data here.</entry> </row> <row> <entry>Special Space</entry> +<entry>Access method specific data. Different method store different data. Unused by normal tables.</entry> </row> -<row> -<entry><quote>ItemData 2</quote></entry> -</row> +</tbody> +</tgroup> +</table> -<row> -<entry><quote>ItemData 1</quote></entry> -</row> + <para> -<row> -<entry>ItemIdData</entry> -</row> + The first 20 bytes of each page consists of a page header + (PageHeaderData). It's format is detailed in <xref + linkend="pageheaderdata-table">. The first two fields deal with WAL + related stuff. This is followed by three 2-byte integer fields + (<firstterm>lower</firstterm>, <firstterm>upper</firstterm>, and + <firstterm>special</firstterm>). These represent byte offsets to the start + of unallocated space, to the end of unallocated space, and to the start of + the special space. + + </para> + + <table tocentry="1" id="pageheaderdata-table"> + <title>PageHeaderData Layout</title> + <titleabbrev>PageHeaderData Layout</titleabbrev> + <tgroup cols="4"> + <thead> + <row> + <entry>Field</entry> + <entry>Type</entry> + <entry>Length</entry> + <entry>Description</entry> + </row> + </thead> + <tbody> + <row> + <entry>pd_lsn</entry> + <entry>XLogRecPtr</entry> + <entry>6 bytes</entry> + <entry>LSN: next byte after last byte of xlog</entry> + </row> + <row> + <entry>pd_sui</entry> + <entry>StartUpID</entry> + <entry>4 bytes</entry> + <entry>SUI of last changes (currently it's used by heap AM only)</entry> + </row> + <row> + <entry>pd_lower</entry> + <entry>LocationIndex</entry> + <entry>2 bytes</entry> + <entry>Offset to start of free space.</entry> + </row> + <row> + <entry>pd_upper</entry> + <entry>LocationIndex</entry> + <entry>2 bytes</entry> + <entry>Offset to end of free space.</entry> + </row> + <row> + <entry>pd_special</entry> + <entry>LocationIndex</entry> + <entry>2 bytes</entry> + <entry>Offset to start of special space.</entry> + </row> + <row> + <entry>pd_opaque</entry> + <entry>OpaqueData</entry> + <entry>2 bytes</entry> + <entry>AM-generic information. Currently just stores the page size.</entry> + </row> + </tbody> + </tgroup> + </table> -<row> -<entry>PageHeaderData</entry> -</row> + <para> + Special space is a region at the end of the page that is allocated at page + initialization time and contains information specific to an access method. + The last 2 bytes of the page header, <firstterm>opaque</firstterm>, + currently only stores the page size. Page size is stored in each page + because frames in the buffer pool may be subdivided into equal sized pages + on a frame by frame basis within a table (is this true? - mvo). -</tbody> -</tgroup> -</table> + </para> -<!-- -.\" Running -.\" .q .../bin/dumpbpages -.\" or -.\" .q .../src/support/dumpbpages -.\" as the postgres superuser -.\" with the file paths associated with -.\" (heap or B-tree index) classes, -.\" .q .../data/base/<database-name>/<class-name>, -.\" will display the page structure used by the classes. -.\" Specifying the -.\" .q -r -.\" flag will cause the classes to be -.\" treated as heap classes and for more information to be displayed. ---> + <para> -<para> -The first 8 bytes of each page consists of a page header -(PageHeaderData). -Within the header, the first three 2-byte integer fields -(<firstterm>lower</firstterm>, -<firstterm>upper</firstterm>, -and -<firstterm>special</firstterm>) -represent byte offsets to the start of unallocated space, to the end -of unallocated space, and to the start of <firstterm>special space</firstterm>. -Special space is a region at the end of the page that is allocated at -page initialization time and contains information specific to an -access method. The last 2 bytes of the page header, -<firstterm>opaque</firstterm>, -encode the page size and information on the internal fragmentation of -the page. Page size is stored in each page because frames in the -buffer pool may be subdivided into equal sized pages on a frame by -frame basis within a table. The internal fragmentation information is -used to aid in determining when page reorganization should occur. -</para> + Following the page header are item identifiers + (<firstterm>ItemIdData</firstterm>). New item identifiers are allocated + from the first four bytes of unallocated space. Because an item + identifier is never moved until it is freed, its index may be used to + indicate the location of an item on a page. In fact, every pointer to an + item (<firstterm>ItemPointer</firstterm>, also know as + <firstterm>CTID</firstterm>) created by + <productname>PostgreSQL</productname> consists of a frame number and an + index of an item identifier. An item identifier contains a byte-offset to + the start of an item, its length in bytes, and a set of attribute bits + which affect its interpretation. -<para> -Following the page header are item identifiers -(<firstterm>ItemIdData</firstterm>). -New item identifiers are allocated from the first four bytes of -unallocated space. Because an item identifier is never moved until it -is freed, its index may be used to indicate the location of an item on -a page. In fact, every pointer to an item -(<firstterm>ItemPointer</firstterm>) -created by <productname>PostgreSQL</productname> consists of a frame number and an index of an item -identifier. An item identifier contains a byte-offset to the start of -an item, its length in bytes, and a set of attribute bits which affect -its interpretation. -</para> + </para> -<para> -The items themselves are stored in space allocated backwards from -the end of unallocated space. Usually, the items are not interpreted. -However when the item is too long to be placed on a single page or -when fragmentation of the item is desired, the item is divided and -each piece is handled as distinct items in the following manner. The -first through the next to last piece are placed in an item -continuation structure -(<firstterm>ItemContinuationData</firstterm>). -This structure contains -itemPointerData -which points to the next piece and the piece itself. The last piece -is handled normally. -</para> + <para> + + The items themselves are stored in space allocated backwards from the end + of unallocated space. The exact structure varies depending on what the + table is to contain. Sequences and tables both use a structure named + <firstterm>HeapTupleHeaderData</firstterm>, describe below. + + </para> + + <para> + + The final section is the "special section" which may contain anything the + access method wishes to store. Ordinary tables do not use this at all + (indicated by setting the offset to the pagesize). + + </para> + + <para> + + All tuples are structured the same way. A header of around 31 bytes + followed by an optional null bitmask and the data. The header is detailed + below in <xref linkend="heaptupleheaderdata-table">. The null bitmask is + only present if the <firstterm>HEAP_HASNULL</firstterm> bit is set in the + <firstterm>t_infomask</firstterm>. If it is present it takes up the space + between the end of the header and the beginning of the data, as indicated + by the <firstterm>t_hoff</firstterm> field. In this list of bits, a 1 bit + indicates not-null, a 0 bit is a null. + + </para> + + <table tocentry="1" id="heaptupleheaderdata-table"> + <title>HeapTupleHeaderData Layout</title> + <titleabbrev>HeapTupleHeaderData Layout</titleabbrev> + <tgroup cols="4"> + <thead> + <row> + <entry>Field</entry> + <entry>Type</entry> + <entry>Length</entry> + <entry>Description</entry> + </row> + </thead> + <tbody> + <row> + <entry>t_oid</entry> + <entry>Oid</entry> + <entry>4 bytes</entry> + <entry>OID of this tuple</entry> + </row> + <row> + <entry>t_cmin</entry> + <entry>CommandId</entry> + <entry>4 bytes</entry> + <entry>insert CID stamp</entry> + </row> + <row> + <entry>t_cmax</entry> + <entry>CommandId</entry> + <entry>4 bytes</entry> + <entry>delete CID stamp</entry> + </row> + <row> + <entry>t_xmin</entry> + <entry>TransactionId</entry> + <entry>4 bytes</entry> + <entry>insert XID stamp</entry> + </row> + <row> + <entry>t_xmax</entry> + <entry>TransactionId</entry> + <entry>4 bytes</entry> + <entry>delete XID stamp</entry> + </row> + <row> + <entry>t_ctid</entry> + <entry>ItemPointerData</entry> + <entry>6 bytes</entry> + <entry>current TID of this or newer tuple</entry> + </row> + <row> + <entry>t_natts</entry> + <entry>int16</entry> + <entry>2 bytes</entry> + <entry>number of attributes</entry> + </row> + <row> + <entry>t_infomask</entry> + <entry>uint16</entry> + <entry>2 bytes</entry> + <entry>Various flags</entry> + </row> + <row> + <entry>t_hoff</entry> + <entry>uint8</entry> + <entry>1 byte</entry> + <entry>length of tuple header. Also offset of data.</entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + + All the details may be found in src/include/storage/bufpage.h. + + </para> + + <para> + + Interpreting the actual data can only be done with information obtained + from other tables, mostly <firstterm>pg_attribute</firstterm>. The + particular fields are <firstterm>attlen</firstterm> and + <firstterm>attalign</firstterm>. There is no way to directly get a + particular attribute, except when there are only fixed width fields and no + NULLs. All this trickery is wrapped up in the functions + <firstterm>heap_getattr</firstterm>, <firstterm>fastgetattr</firstterm> + and <firstterm>heap_getsysattr</firstterm>. + + </para> + <para> + To read the data you need to examine each attribute in turn. First check + whether the field is NULL according to the null bitmap. If it is, go to + the next. Then make sure you have the right alignment. If the field is a + fixed width field, then all the bytes are simply placed. If it's a + variable length field (attlen == -1) then it's a bit more complicated, + using the variable length structure <firstterm>varattrib</firstterm>. + Depending on the flags, the data may be either inline, compressed or in + another table (TOAST). + + </para> </chapter> -- GitLab