From c452ddcac67dc2eb52c2744cdc0ebe5530ab591f Mon Sep 17 00:00:00 2001 From: "Thomas G. Lockhart" <lockhart@fourpalms.org> Date: Sat, 4 Apr 1998 16:32:01 +0000 Subject: [PATCH] Convert body of chapter to SGML. Was embedded text from original doc. --- doc/src/sgml/geqo.sgml | 402 +++++++++++++++++++++++++++-------------- 1 file changed, 266 insertions(+), 136 deletions(-) diff --git a/doc/src/sgml/geqo.sgml b/doc/src/sgml/geqo.sgml index 725504c28fb..61abf13ca43 100644 --- a/doc/src/sgml/geqo.sgml +++ b/doc/src/sgml/geqo.sgml @@ -3,78 +3,103 @@ <Author> <FirstName>Martin</FirstName> <SurName>Utesch</SurName> +<Affiliation> +<Orgname> +University of Mining and Technology +</Orgname> +<Orgdiv> +Institute of Automatic Control +</Orgdiv> +<Address> +<City> +Freiberg +</City> +<Country> +Germany +</Country> +</Address> +</Affiliation> </Author> +<Date>1997-10-02</Date> </DocInfo> <Title>Genetic Query Optimization in Database Systems</Title> <Para> -<ProgramListing> -<ULink url="utesch@aut.tu-freiberg.de">Martin Utesch</ULink> - - Institute of Automatic Control - University of Mining and Technology - Freiberg, Germany - - 02/10/1997 - +<Note> +<Title>Author</Title> +<Para> +Written by <ULink url="utesch@aut.tu-freiberg.de">Martin Utesch</ULink> +for the Institute of Automatic Control at the University of Mining and Technology in Freiberg, Germany. +</Para> +</Note> -1.) Query Handling as a Complex Optimization Problem -==================================================== +<Sect1> +<Title>Query Handling as a Complex Optimization Problem</Title> +<Para> Among all relational operators the most difficult one to process and -optimize is the JOIN. The number of alternative plans to answer a query -grows exponentially with the number of JOINs included in it. Further -optimization effort is caused by the support of a variety of *JOIN -methods* (e.g., nested loop, index scan, merge join in Postgres) to -process individual JOINs and a diversity of *indices* (e.g., r-tree, -b-tree, hash in Postgres) as access paths for relations. - - The current Postgres optimizer implementation performs a *near- -exhaustive search* over the space of alternative strategies. This query +optimize is the <FirstTerm>join</FirstTerm>. The number of alternative plans to answer a query +grows exponentially with the number of <Command>join</Command>s included in it. Further +optimization effort is caused by the support of a variety of <FirstTerm>join methods</FirstTerm> + (e.g., nested loop, index scan, merge join in <ProductName>Postgres</ProductName>) to +process individual <Command>join</Command>s and a diversity of <FirstTerm>indices</FirstTerm> (e.g., r-tree, +b-tree, hash in <ProductName>Postgres</ProductName>) as access paths for relations. + +<Para> + The current <ProductName>Postgres</ProductName> optimizer implementation performs a <FirstTerm>near- +exhaustive search</FirstTerm> over the space of alternative strategies. This query optimization technique is inadequate to support database application domains that involve the need for extensive queries, such as artificial intelligence. +<Para> The Institute of Automatic Control at the University of Mining and Technology, in Freiberg, Germany, encountered the described problems as its -folks wanted to take the Postgres DBMS as the backend for a decision +folks wanted to take the <ProductName>Postgres</ProductName> DBMS as the backend for a decision support knowledge based system for the maintenance of an electrical -power grid. The DBMS needed to handle large JOIN queries for the +power grid. The DBMS needed to handle large <Command>join</Command> queries for the inference machine of the knowledge based system. +<Para> Performance difficulties within exploring the space of possible query plans arose the demand for a new optimization technique being developed. - In the following we propose the implementation of a *Genetic -Algorithm* as an option for the database query optimization problem. +<Para> + In the following we propose the implementation of a <FirstTerm>Genetic Algorithm</FirstTerm> + as an option for the database query optimization problem. -2.) Genetic Algorithms (GA) -=========================== +<Sect1> +<Title>Genetic Algorithms (<Acronym>GA</Acronym>)</Title> - The GA is a heuristic optimization method which operates through +<Para> + The <Acronym>GA</Acronym> is a heuristic optimization method which operates through determined, randomized search. The set of possible solutions for the -optimization problem is considered as a *population* of *individuals*. +optimization problem is considered as a <FirstTerm>population</FirstTerm> of <FirstTerm>individuals</FirstTerm>. The degree of adaption of an individual to its environment is specified -by its *fitness*. +by its <FirstTerm>fitness</FirstTerm>. +<Para> The coordinates of an individual in the search space are represented -by *chromosomes*, in essence a set of character strings. A *gene* is a +by <FirstTerm>chromosomes</FirstTerm>, in essence a set of character strings. A <FirstTerm>gene</FirstTerm> is a subsection of a chromosome which encodes the value of a single parameter -being optimized. Typical encodings for a gene could be *binary* or -*integer*. +being optimized. Typical encodings for a gene could be <FirstTerm>binary</FirstTerm> or +<FirstTerm>integer</FirstTerm>. - Through simulation of the evolutionary operations *recombination*, -*mutation*, and *selection* new generations of search points are found +<Para> + Through simulation of the evolutionary operations <FirstTerm>recombination</FirstTerm>, +<FirstTerm>mutation</FirstTerm>, and <FirstTerm>selection</FirstTerm> new generations of search points are found that show a higher average fitness than their ancestors. - According to the "comp.ai.genetic" FAQ it cannot be stressed too -strongly that a GA is not a pure random search for a solution to a -problem. A GA uses stochastic processes, but the result is distinctly +<Para> + According to the "comp.ai.genetic" <Acronym>FAQ</Acronym> it cannot be stressed too +strongly that a <Acronym>GA</Acronym> is not a pure random search for a solution to a +problem. A <Acronym>GA</Acronym> uses stochastic processes, but the result is distinctly non-random (better than random). -Structured Diagram of a GA: +<ProgramListing> +Structured Diagram of a <Acronym>GA</Acronym>: --------------------------- P(t) generation of ancestors at a time t @@ -101,128 +126,233 @@ P''(t) generation of descendants at a time t | +-------------------------------------+ | | t := t + 1 | +===+=====================================+ +</ProgramListing> +<Sect1> +<Title>Genetic Query Optimization (<Acronym>GEQO</Acronym>) in Postgres</Title> -3.) Genetic Query Optimization (GEQO) in PostgreSQL -=================================================== - - The GEQO module is intended for the solution of the query -optimization problem similar to a traveling salesman problem (TSP). +<Para> + The <Acronym>GEQO</Acronym> module is intended for the solution of the query +optimization problem similar to a traveling salesman problem (<Acronym>TSP</Acronym>). Possible query plans are encoded as integer strings. Each string -represents the JOIN order from one relation of the query to the next. -E. g., the query tree /\ - /\ 2 - /\ 3 - 4 1 is encoded by the integer string '4-1-3-2', +represents the <Command>join</Command> order from one relation of the query to the next. +E. g., the query tree +<ProgramListing> + /\ + /\ 2 + /\ 3 + 4 1 +</ProgramListing> +is encoded by the integer string '4-1-3-2', which means, first join relation '4' and '1', then '3', and -then '2', where 1, 2, 3, 4 are relids in PostgreSQL. +then '2', where 1, 2, 3, 4 are relids in <ProductName>Postgres</ProductName>. - Parts of the GEQO module are adapted from D. Whitley's Genitor +<Para> + Parts of the <Acronym>GEQO</Acronym> module are adapted from D. Whitley's Genitor algorithm. - Specific characteristics of the GEQO implementation in PostgreSQL +<Para> + Specific characteristics of the <Acronym>GEQO</Acronym> implementation in <ProductName>Postgres</ProductName> are: -o usage of a *steady state* GA (replacement of the least fit +<ItemizedList Mark="bullet" Spacing="compact"> +<ListItem> +<Para> +Usage of a <FirstTerm>steady state</FirstTerm> <Acronym>GA</Acronym> (replacement of the least fit individuals in a population, not whole-generational replacement) allows fast convergence towards improved query plans. This is essential for query handling with reasonable time; +</Para> +</ListItem> -o usage of *edge recombination crossover* which is especially suited - to keep edge losses low for the solution of the TSP by means of a GA; +<ListItem> +<Para> +Usage of <FirstTerm>edge recombination crossover</FirstTerm> which is especially suited + to keep edge losses low for the solution of the <Acronym>TSP</Acronym> by means of a <Acronym>GA</Acronym>; +</Para> +</ListItem> -o mutation as genetic operator is deprecated so that no repair - mechanisms are needed to generate legal TSP tours. +<ListItem> +<Para> +Mutation as genetic operator is deprecated so that no repair + mechanisms are needed to generate legal <Acronym>TSP</Acronym> tours. +</Para> +</ListItem> +</ItemizedList> - The GEQO module gives the following benefits to the PostgreSQL DBMS -compared to the Postgres query optimizer implementation: +<Para> + The <Acronym>GEQO</Acronym> module gives the following benefits to the <ProductName>Postgres</ProductName> DBMS +compared to the <ProductName>Postgres</ProductName> query optimizer implementation: -o handling of large JOIN queries through non-exhaustive search; +<ItemizedList Mark="bullet" Spacing="compact"> +<ListItem> +<Para> +Handling of large <Command>join</Command> queries through non-exhaustive search; +</Para> +</ListItem> -o improved cost size approximation of query plans since no longer - plan merging is needed (the GEQO module evaluates the cost for a +<ListItem> +<Para> +Improved cost size approximation of query plans since no longer + plan merging is needed (the <Acronym>GEQO</Acronym> module evaluates the cost for a query plan as an individual). +</Para> +</ListItem> +</ItemizedList> +</Sect1> -References -========== +<Sect1> +<Title>Future Implementation Tasks for <ProductName>Postgres</ProductName> <Acronym>GEQO</Acronym></Title> -J. Heitk"otter, D. Beasley: ---------------------------- - "The Hitch-Hicker's Guide to Evolutionary Computation", - FAQ in 'comp.ai.genetic', - 'ftp://ftp.Germany.EU.net/pub/research/softcomp/EC/Welcome.html' - -Z. Fong: --------- - "The Design and Implementation of the Postgres Query Optimizer", - file 'planner/Report.ps' in the 'postgres-papers' distribution - -R. Elmasri, S. Navathe: ------------------------ - "Fundamentals of Database Systems", - The Benjamin/Cummings Pub., Inc. - - -=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= -* Things left to done for the PostgreSQL * -= Genetic Query Optimization (GEQO) = -* module implementation * -=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= -* Martin Utesch * Institute of Automatic Control * -= = University of Mining and Technology = -* utesch@aut.tu-freiberg.de * Freiberg, Germany * -=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= - - -1.) Basic Improvements -=============================================================== - -a) improve freeing of memory when query is already processed: -------------------------------------------------------------- -with large JOIN queries the computing time spent for the genetic query -optimization seems to be a mere *fraction* of the time Postgres -needs for freeing memory via routine 'MemoryContextFree', -file 'backend/utils/mmgr/mcxt.c'; -debugging showed that it get stucked in a loop of routine -'OrderedElemPop', file 'backend/utils/mmgr/oset.c'; -the same problems arise with long queries when using the normal -Postgres query optimization algorithm; - -b) improve genetic algorithm parameter settings: ------------------------------------------------- -file 'backend/optimizer/geqo/geqo_params.c', routines -'gimme_pool_size' and 'gimme_number_generations'; +<Sect2> +<Title>Basic Improvements</Title> + +<Sect3> +<Title>Improve freeing of memory when query is already processed</Title> + +<Para> +With large <Command>join</Command> queries the computing time spent for the genetic query +optimization seems to be a mere <Emphasis>fraction</Emphasis> of the time + <ProductName>Postgres</ProductName> +needs for freeing memory via routine <Function>MemoryContextFree</Function>, +file <FileName>backend/utils/mmgr/mcxt.c</FileName>. +Debugging showed that it get stucked in a loop of routine +<Function>OrderedElemPop</Function>, file <FileName>backend/utils/mmgr/oset.c</FileName>. +The same problems arise with long queries when using the normal +<ProductName>Postgres</ProductName> query optimization algorithm. + +<Sect3> +<Title>Improve genetic algorithm parameter settings</Title> + +<Para> +In file <FileName>backend/optimizer/geqo/geqo_params.c</FileName>, routines +<Function>gimme_pool_size</Function> and <Function>gimme_number_generations</Function>, we have to find a compromise for the parameter settings to satisfy two competing demands: -1. optimality of the query plan -2. computing time - -c) find better solution for integer overflow: ---------------------------------------------- -file 'backend/optimizer/geqo/geqo_eval.c', routine -'geqo_joinrel_size'; -the present hack for MAXINT overflow is to set the Postgres integer -value of 'rel->size' to its logarithm; -modifications of 'struct Rel' in 'backend/nodes/relation.h' will -surely have severe impacts on the whole PostgreSQL implementation. - -d) find solution for exhausted memory: --------------------------------------- -that may occur with more than 10 relations involved in a query, -file 'backend/optimizer/geqo/geqo_eval.c', routine -'gimme_tree' which is recursively called; -maybe I forgot something to be freed correctly, but I dunno what; -of course the 'rel' data structure of the JOIN keeps growing and -growing the more relations are packed into it; -suggestions are welcome :-( - - -2.) Further Improvements -=============================================================== -Enable bushy query tree processing within PostgreSQL; +<ItemizedList Spacing="compact"> +<ListItem> +<Para> +Optimality of the query plan +</Para> +</ListItem> +<ListItem> +<Para> +Computing time +</Para> +</ListItem> +</ItemizedList> + +<Sect3> +<Title>Find better solution for integer overflow</Title> + +<Para> +In file <FileName>backend/optimizer/geqo/geqo_eval.c</FileName>, routine +<Function>geqo_joinrel_size</Function>, +the present hack for MAXINT overflow is to set the <ProductName>Postgres</ProductName> integer +value of <StructField>rel->size</StructField> to its logarithm. +Modifications of <StructName>Rel</StructName> in <FileName>backend/nodes/relation.h</FileName> will +surely have severe impacts on the whole <ProductName>Postgres</ProductName> implementation. + +<Sect3> +<Title>Find solution for exhausted memory</Title> + +<Para> +Memory exhaustion may occur with more than 10 relations involved in a query. +In file <FileName>backend/optimizer/geqo/geqo_eval.c</FileName>, routine +<Function>gimme_tree</Function> is recursively called. +Maybe I forgot something to be freed correctly, but I dunno what. +Of course the <StructName>rel</StructName> data structure of the <Command>join</Command> keeps growing and +growing the more relations are packed into it. +Suggestions are welcome :-( + + +<Sect2> +<Title>Further Improvements</Title> + +<Para> +Enable bushy query tree processing within <ProductName>Postgres</ProductName>; that may improve the quality of query plans. -</ProgramListing> +<BIBLIOGRAPHY> +<TITLE> +References +</TITLE> +<PARA>Reference information for <Acronym>GEQ</Acronym> algorithms. +</PARA> +<BIBLIOENTRY> + +<BOOKBIBLIO> +<TITLE> +The Hitch-Hiker's Guide to Evolutionary Computation +</TITLE> +<AUTHORGROUP> +<AUTHOR> +<FIRSTNAME>Jörg</FIRSTNAME> +<SURNAME>Heitkötter</SURNAME> +</AUTHOR> +<AUTHOR> +<FIRSTNAME>David</FIRSTNAME> +<SURNAME>Beasley</SURNAME> +</AUTHOR> +</AUTHORGROUP> +<PUBLISHER> +<PUBLISHERNAME> +InterNet resource +</PUBLISHERNAME> +</PUBLISHER> +<ABSTRACT> +<Para> +FAQ in <ULink url="news://comp.ai.genetic">comp.ai.genetic</ULink> +is available at <ULink url="ftp://ftp.Germany.EU.net/pub/research/softcomp/EC/Welcome.html">Encore</ULink>. </Para> +</ABSTRACT> +</BOOKBIBLIO> + +<BOOKBIBLIO> +<TITLE> +The Design and Implementation of the Postgres Query Optimizer +</TITLE> +<AUTHORGROUP> +<AUTHOR> +<FIRSTNAME>Z.</FIRSTNAME> +<SURNAME>Fong</SURNAME> +</AUTHOR> +</AUTHORGROUP> +<PUBLISHER> +<PUBLISHERNAME> +University of California, Berkeley Computer Science Department +</PUBLISHERNAME> +</PUBLISHER> +<ABSTRACT> +<Para> +File <FileName>planner/Report.ps</FileName> in the 'postgres-papers' distribution. +</Para> +</ABSTRACT> +</BOOKBIBLIO> + +<BOOKBIBLIO> +<TITLE> +Fundamentals of Database Systems +</TITLE> +<AUTHORGROUP> +<AUTHOR> +<FIRSTNAME>R.</FIRSTNAME> +<SURNAME>Elmasri</SURNAME> +</AUTHOR> +<AUTHOR> +<FIRSTNAME>S.</FIRSTNAME> +<SURNAME>Navathe</SURNAME> +</AUTHOR> +</AUTHORGROUP> +<PUBLISHER> +<PUBLISHERNAME> +The Benjamin/Cummings Pub., Inc. +</PUBLISHERNAME> +</PUBLISHER> +</BOOKBIBLIO> + +</BIBLIOENTRY> +</BIBLIOGRAPHY> + </Chapter> -- GitLab