-
Tom Lane authored
After further reflection I was unconvinced that the existing coding is guaranteed to return valid union datums in every code path for multi-column indexes. Fix that by forcing a gistunionsubkey() call at the end of the recursion. Having done that, we can remove some clearly-redundant calls elsewhere. This should be a little faster for multi-column indexes (since the previous coding would uselessly do such a call for each column while unwinding the recursion), as well as much harder to break. Also, simplify the handling of cases where one side or the other of a primary split contains only don't-care tuples. The previous coding used a very ugly hack in removeDontCares() that essentially forced one random tuple to be treated as non-don't-care, providing a random initial choice of seed datum for the secondary split. It seems unlikely that that method will give better-than-random splits. Instead, treat such a split as degenerate and just let the next column determine the split, the same way that we handle fully degenerate cases where the two sides produce identical union datums.
Tom Lane authoredAfter further reflection I was unconvinced that the existing coding is guaranteed to return valid union datums in every code path for multi-column indexes. Fix that by forcing a gistunionsubkey() call at the end of the recursion. Having done that, we can remove some clearly-redundant calls elsewhere. This should be a little faster for multi-column indexes (since the previous coding would uselessly do such a call for each column while unwinding the recursion), as well as much harder to break. Also, simplify the handling of cases where one side or the other of a primary split contains only don't-care tuples. The previous coding used a very ugly hack in removeDontCares() that essentially forced one random tuple to be treated as non-don't-care, providing a random initial choice of seed datum for the secondary split. It seems unlikely that that method will give better-than-random splits. Instead, treat such a split as degenerate and just let the next column determine the split, the same way that we handle fully degenerate cases where the two sides produce identical union datums.
gistsplit.c 23.75 KiB
/*-------------------------------------------------------------------------
*
* gistsplit.c
* Multi-column page splitting algorithm
*
* This file is concerned with making good page-split decisions in multi-column
* GiST indexes. The opclass-specific picksplit functions can only be expected
* to produce answers based on a single column. We first run the picksplit
* function for column 1; then, if there are more columns, we check if any of
* the tuples are "don't cares" so far as the column 1 split is concerned
* (that is, they could go to either side for no additional penalty). If so,
* we try to redistribute those tuples on the basis of the next column.
* Repeat till we're out of columns.
*
* gistSplitByKey() is the entry point to this file.
*
*
* Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* src/backend/access/gist/gistsplit.c
*
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include "access/gist_private.h"
#include "utils/rel.h"
typedef struct
{
OffsetNumber *entries;
int len;
Datum *attr;
bool *isnull;
bool *dontcare;
} GistSplitUnion;
/*
* Form unions of subkeys in itvec[] entries listed in gsvp->entries[],
* ignoring any tuples that are marked in gsvp->dontcare[]. Subroutine for
* gistunionsubkey.
*/
static void
gistunionsubkeyvec(GISTSTATE *giststate, IndexTuple *itvec,
GistSplitUnion *gsvp)
{
IndexTuple *cleanedItVec;
int i,
cleanedLen = 0;
cleanedItVec = (IndexTuple *) palloc(sizeof(IndexTuple) * gsvp->len);
for (i = 0; i < gsvp->len; i++)
{
if (gsvp->dontcare && gsvp->dontcare[gsvp->entries[i]])
continue;
cleanedItVec[cleanedLen++] = itvec[gsvp->entries[i] - 1];
}
gistMakeUnionItVec(giststate, cleanedItVec, cleanedLen,
gsvp->attr, gsvp->isnull);
pfree(cleanedItVec);
}
/*