Skip to content
Snippets Groups Projects
  • Tom Lane's avatar
    c352ea2d
    Further cleanup of gistsplit.c. · c352ea2d
    Tom Lane authored
    After further reflection I was unconvinced that the existing coding is
    guaranteed to return valid union datums in every code path for multi-column
    indexes.  Fix that by forcing a gistunionsubkey() call at the end of the
    recursion.  Having done that, we can remove some clearly-redundant calls
    elsewhere.  This should be a little faster for multi-column indexes (since
    the previous coding would uselessly do such a call for each column while
    unwinding the recursion), as well as much harder to break.
    
    Also, simplify the handling of cases where one side or the other of a
    primary split contains only don't-care tuples.  The previous coding used a
    very ugly hack in removeDontCares() that essentially forced one random
    tuple to be treated as non-don't-care, providing a random initial choice of
    seed datum for the secondary split.  It seems unlikely that that method
    will give better-than-random splits.  Instead, treat such a split as
    degenerate and just let the next column determine the split, the same way
    that we handle fully degenerate cases where the two sides produce identical
    union datums.
    c352ea2d
    History
    Further cleanup of gistsplit.c.
    Tom Lane authored
    After further reflection I was unconvinced that the existing coding is
    guaranteed to return valid union datums in every code path for multi-column
    indexes.  Fix that by forcing a gistunionsubkey() call at the end of the
    recursion.  Having done that, we can remove some clearly-redundant calls
    elsewhere.  This should be a little faster for multi-column indexes (since
    the previous coding would uselessly do such a call for each column while
    unwinding the recursion), as well as much harder to break.
    
    Also, simplify the handling of cases where one side or the other of a
    primary split contains only don't-care tuples.  The previous coding used a
    very ugly hack in removeDontCares() that essentially forced one random
    tuple to be treated as non-don't-care, providing a random initial choice of
    seed datum for the secondary split.  It seems unlikely that that method
    will give better-than-random splits.  Instead, treat such a split as
    degenerate and just let the next column determine the split, the same way
    that we handle fully degenerate cases where the two sides produce identical
    union datums.
gistsplit.c 23.75 KiB
/*-------------------------------------------------------------------------
 *
 * gistsplit.c
 *	  Multi-column page splitting algorithm
 *
 * This file is concerned with making good page-split decisions in multi-column
 * GiST indexes.  The opclass-specific picksplit functions can only be expected
 * to produce answers based on a single column.  We first run the picksplit
 * function for column 1; then, if there are more columns, we check if any of
 * the tuples are "don't cares" so far as the column 1 split is concerned
 * (that is, they could go to either side for no additional penalty).  If so,
 * we try to redistribute those tuples on the basis of the next column.
 * Repeat till we're out of columns.
 *
 * gistSplitByKey() is the entry point to this file.
 *
 *
 * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
 * Portions Copyright (c) 1994, Regents of the University of California
 *
 * IDENTIFICATION
 *	  src/backend/access/gist/gistsplit.c
 *
 *-------------------------------------------------------------------------
 */
#include "postgres.h"

#include "access/gist_private.h"
#include "utils/rel.h"

typedef struct
{
	OffsetNumber *entries;
	int			len;
	Datum	   *attr;
	bool	   *isnull;
	bool	   *dontcare;
} GistSplitUnion;


/*
 * Form unions of subkeys in itvec[] entries listed in gsvp->entries[],
 * ignoring any tuples that are marked in gsvp->dontcare[].  Subroutine for
 * gistunionsubkey.
 */
static void
gistunionsubkeyvec(GISTSTATE *giststate, IndexTuple *itvec,
				   GistSplitUnion *gsvp)
{
	IndexTuple *cleanedItVec;
	int			i,
				cleanedLen = 0;

	cleanedItVec = (IndexTuple *) palloc(sizeof(IndexTuple) * gsvp->len);

	for (i = 0; i < gsvp->len; i++)
	{
		if (gsvp->dontcare && gsvp->dontcare[gsvp->entries[i]])
			continue;

		cleanedItVec[cleanedLen++] = itvec[gsvp->entries[i] - 1];
	}

	gistMakeUnionItVec(giststate, cleanedItVec, cleanedLen,
					   gsvp->attr, gsvp->isnull);

	pfree(cleanedItVec);
}

/*