diff --git a/src/backend/regex/README b/src/backend/regex/README
index 6c9f48315e3a2be030288d98db53427b6016db78..f08aab69e376f6e7bc0df998d2fcd20fde4a45ef 100644
--- a/src/backend/regex/README
+++ b/src/backend/regex/README
@@ -27,13 +27,14 @@ and similarly additional source files rege_*.c that are #include'd in
 regexec.  This was done to avoid exposing internal symbols globally;
 all functions not meant to be part of the library API are static.
 
-(Actually the above is a lie in one respect: there is one more global
-symbol, pg_set_regex_collation in regcomp.  It is not meant to be part of
-the API, but it has to be global because both regcomp and regexec call it.
-It'd be better to get rid of that, as well as the static variables it
-sets, in favor of keeping the needed locale state in the regex structs.
-We have not done this yet for lack of a design for how to add
-application-specific state to the structs.)
+(Actually the above is a lie in one respect: there are two more global
+symbols, pg_set_regex_collation and pg_reg_getcolor in regcomp.  These are
+not meant to be part of the API, but they have to be global because both
+regcomp and regexec call them.  It'd be better to get rid of
+pg_set_regex_collation, as well as the static variables it sets, in favor of
+keeping the needed locale state in the regex structs.  We have not done this
+yet for lack of a design for how to add application-specific state to the
+structs.)
 
 What's where in src/backend/regex/:
 
@@ -274,28 +275,65 @@ colors:
   an existing color has to be subdivided.
 
 The last two of these are handled with the "struct colordesc" array and
-the "colorchain" links in NFA arc structs.  The color map proper (that
-is, the per-character lookup array) is handled as a multi-level tree,
-with each tree level indexed by one byte of a character's value.  The
-code arranges to not have more than one copy of bottom-level tree pages
-that are all-the-same-color.
-
-Unfortunately, this design does not seem terribly efficient for common
-cases such as a tree in which all Unicode letters are colored the same,
-because there aren't that many places where we get a whole page all the
-same color, except at the end of the map.  (It also strikes me that given
-PG's current restrictions on the range of Unicode values, we could use a
-3-level rather than 4-level tree; but there's not provision for that in
-regguts.h at the moment.)
-
-A bigger problem is that it just doesn't seem very reasonable to have to
-consider each Unicode letter separately at regex parse time for a regex
-such as "\w"; more than likely, a huge percentage of those codes will
-never be seen at runtime.  We need to fix things so that locale-based
-character classes are somehow processed "symbolically" without making a
-full expansion of their contents at parse time.  This would mean that we'd
-have to be ready to call iswalpha() at runtime, but if that only happens
-for high-code-value characters, it shouldn't be a big performance hit.
+the "colorchain" links in NFA arc structs.
+
+Ideally, we'd do the first two operations using a simple linear array
+storing the current color assignment for each character code.
+Unfortunately, that's not terribly workable for large charsets such as
+Unicode.  Our solution is to divide the color map into two parts.  A simple
+linear array is used for character codes up to MAX_SIMPLE_CHR, which can be
+chosen large enough to include all popular characters (so that the
+significantly-slower code paths about to be described are seldom invoked).
+Characters above that need be considered at compile time only if they
+appear explicitly in the regex pattern.  We store each such mentioned
+character or character range as an entry in the "colormaprange" array in
+the colormap.  (Overlapping ranges are split into unique subranges, so that
+each range in the finished list needs only a single color that describes
+all its characters.)  When mapping a character above MAX_SIMPLE_CHR to a
+color at runtime, we search this list of ranges explicitly.
+
+That's still not quite enough, though, because of locale-dependent
+character classes such as [[:alpha:]].  In Unicode locales these classes
+may have thousands of entries that are above MAX_SIMPLE_CHR, and we
+certainly don't want to be searching large colormaprange arrays at runtime.
+Nor do we even want to spend the time to initialize cvec structures that
+exhaustively describe all of those characters.  Our solution is to compute
+exact per-character colors at regex compile time only up to MAX_SIMPLE_CHR.
+For characters above that, we apply the <ctype.h> or <wctype.h> lookup
+functions at runtime for each locale-dependent character class used in the
+regex pattern, constructing a bitmap that describes which classes the
+runtime character belongs to.  The per-character-range data structure
+mentioned above actually holds, for each range, a separate color entry
+for each possible combination of character class properties.  That is,
+the color map for characters above MAX_SIMPLE_CHR is really a 2-D array,
+whose rows correspond to high characters or character ranges that are
+explicitly mentioned in the regex pattern, and whose columns correspond
+to sets of the locale-dependent character classes that are used in the
+regex.
+
+As an example, given the pattern '\w\u1234[\U0001D100-\U0001D1FF]'
+(and supposing that MAX_SIMPLE_CHR is less than 0x1234), we will need
+a high color map with three rows.  One row is for the single character
+U+1234 (represented as a single-element range), one is for the range
+U+1D100..U+1D1FF, and the other row represents all remaining high
+characters.  The color map has two columns, one for characters that
+satisfy iswalnum() and one for those that don't.
+
+We build this color map in parallel with scanning the regex.  Each time
+we detect a new explicit high character (or range) or a locale-dependent
+character class, we split existing entry(s) in the high color map so that
+characters we need to be able to distinguish will have distinct entries
+that can be given separate colors.  Often, though, single entries in the
+high color map will represent very large sets of characters.
+
+If there are both explicit high characters/ranges and locale-dependent
+character classes, we may have entries in the high color map array that
+have non-WHITE colors but don't actually represent any real characters.
+(For example, in a row representing a singleton range, only one of the
+columns could possibly be a live entry; it's the one matching the actual
+locale properties for that single character.)  We don't currently make
+any effort to reclaim such colors.  In principle it could be done, but
+it's not clear that it's worth the trouble.
 
 
 Detailed semantics of an NFA
diff --git a/src/backend/regex/regc_color.c b/src/backend/regex/regc_color.c
index 8ffc8fb797d0e5340e4c968be2eeabb9af66250b..4b72894ad3a60bc33c490f2fc78bccdda6369160 100644
--- a/src/backend/regex/regc_color.c
+++ b/src/backend/regex/regc_color.c
@@ -49,10 +49,6 @@ static void
 initcm(struct vars * v,
 	   struct colormap * cm)
 {
-	int			i;
-	int			j;
-	union tree *t;
-	union tree *nextt;
 	struct colordesc *cd;
 
 	cm->magic = CMMAGIC;
@@ -64,24 +60,40 @@ initcm(struct vars * v,
 	cm->free = 0;
 
 	cd = cm->cd;				/* cm->cd[WHITE] */
+	cd->nschrs = MAX_SIMPLE_CHR - CHR_MIN + 1;
+	cd->nuchrs = 1;
 	cd->sub = NOSUB;
 	cd->arcs = NULL;
 	cd->firstchr = CHR_MIN;
-	cd->nchrs = CHR_MAX - CHR_MIN + 1;
 	cd->flags = 0;
 
-	/* upper levels of tree */
-	for (t = &cm->tree[0], j = NBYTS - 1; j > 0; t = nextt, j--)
+	cm->locolormap = (color *)
+		MALLOC((MAX_SIMPLE_CHR - CHR_MIN + 1) * sizeof(color));
+	if (cm->locolormap == NULL)
 	{
-		nextt = t + 1;
-		for (i = BYTTAB - 1; i >= 0; i--)
-			t->tptr[i] = nextt;
+		CERR(REG_ESPACE);
+		cm->cmranges = NULL;	/* prevent failure during freecm */
+		cm->hicolormap = NULL;
+		return;
 	}
-	/* bottom level is solid white */
-	t = &cm->tree[NBYTS - 1];
-	for (i = BYTTAB - 1; i >= 0; i--)
-		t->tcolor[i] = WHITE;
-	cd->block = t;
+	/* this memset relies on WHITE being zero: */
+	memset(cm->locolormap, WHITE,
+		   (MAX_SIMPLE_CHR - CHR_MIN + 1) * sizeof(color));
+
+	memset(cm->classbits, 0, sizeof(cm->classbits));
+	cm->numcmranges = 0;
+	cm->cmranges = NULL;
+	cm->maxarrayrows = 4;		/* arbitrary initial allocation */
+	cm->hiarrayrows = 1;		/* but we have only one row/col initially */
+	cm->hiarraycols = 1;
+	cm->hicolormap = (color *) MALLOC(cm->maxarrayrows * sizeof(color));
+	if (cm->hicolormap == NULL)
+	{
+		CERR(REG_ESPACE);
+		return;
+	}
+	/* initialize the "all other characters" row to WHITE */
+	cm->hicolormap[0] = WHITE;
 }
 
 /*
@@ -90,117 +102,67 @@ initcm(struct vars * v,
 static void
 freecm(struct colormap * cm)
 {
-	size_t		i;
-	union tree *cb;
-
 	cm->magic = 0;
-	if (NBYTS > 1)
-		cmtreefree(cm, cm->tree, 0);
-	for (i = 1; i <= cm->max; i++)		/* skip WHITE */
-		if (!UNUSEDCOLOR(&cm->cd[i]))
-		{
-			cb = cm->cd[i].block;
-			if (cb != NULL)
-				FREE(cb);
-		}
 	if (cm->cd != cm->cdspace)
 		FREE(cm->cd);
+	if (cm->locolormap != NULL)
+		FREE(cm->locolormap);
+	if (cm->cmranges != NULL)
+		FREE(cm->cmranges);
+	if (cm->hicolormap != NULL)
+		FREE(cm->hicolormap);
 }
 
 /*
- * cmtreefree - free a non-terminal part of a colormap tree
+ * pg_reg_getcolor - slow case of GETCOLOR()
  */
-static void
-cmtreefree(struct colormap * cm,
-		   union tree * tree,
-		   int level)			/* level number (top == 0) of this block */
+color
+pg_reg_getcolor(struct colormap * cm, chr c)
 {
-	int			i;
-	union tree *t;
-	union tree *fillt = &cm->tree[level + 1];
-	union tree *cb;
-
-	assert(level < NBYTS - 1);	/* this level has pointers */
-	for (i = BYTTAB - 1; i >= 0; i--)
+	int			rownum,
+				colnum,
+				low,
+				high;
+
+	/* Should not be used for chrs in the locolormap */
+	assert(c > MAX_SIMPLE_CHR);
+
+	/*
+	 * Find which row it's in.  The colormapranges are in order, so we can use
+	 * binary search.
+	 */
+	rownum = 0;					/* if no match, use array row zero */
+	low = 0;
+	high = cm->numcmranges;
+	while (low < high)
 	{
-		t = tree->tptr[i];
-		assert(t != NULL);
-		if (t != fillt)
+		int			middle = low + (high - low) / 2;
+		const colormaprange *cmr = &cm->cmranges[middle];
+
+		if (c < cmr->cmin)
+			high = middle;
+		else if (c > cmr->cmax)
+			low = middle + 1;
+		else
 		{
-			if (level < NBYTS - 2)
-			{					/* more pointer blocks below */
-				cmtreefree(cm, t, level + 1);
-				FREE(t);
-			}
-			else
-			{					/* color block below */
-				cb = cm->cd[t->tcolor[0]].block;
-				if (t != cb)	/* not a solid block */
-					FREE(t);
-			}
+			rownum = cmr->rownum;		/* found a match */
+			break;
 		}
 	}
-}
 
-/*
- * setcolor - set the color of a character in a colormap
- */
-static color					/* previous color */
-setcolor(struct colormap * cm,
-		 chr c,
-		 color co)
-{
-	uchr		uc = c;
-	int			shift;
-	int			level;
-	int			b;
-	int			bottom;
-	union tree *t;
-	union tree *newt;
-	union tree *fillt;
-	union tree *lastt;
-	union tree *cb;
-	color		prev;
-
-	assert(cm->magic == CMMAGIC);
-	if (CISERR() || co == COLORLESS)
-		return COLORLESS;
-
-	t = cm->tree;
-	for (level = 0, shift = BYTBITS * (NBYTS - 1); shift > 0;
-		 level++, shift -= BYTBITS)
-	{
-		b = (uc >> shift) & BYTMASK;
-		lastt = t;
-		t = lastt->tptr[b];
-		assert(t != NULL);
-		fillt = &cm->tree[level + 1];
-		bottom = (shift <= BYTBITS) ? 1 : 0;
-		cb = (bottom) ? cm->cd[t->tcolor[0]].block : fillt;
-		if (t == fillt || t == cb)
-		{						/* must allocate a new block */
-			newt = (union tree *) MALLOC((bottom) ?
-								sizeof(struct colors) : sizeof(struct ptrs));
-			if (newt == NULL)
-			{
-				CERR(REG_ESPACE);
-				return COLORLESS;
-			}
-			if (bottom)
-				memcpy(VS(newt->tcolor), VS(t->tcolor),
-					   BYTTAB * sizeof(color));
-			else
-				memcpy(VS(newt->tptr), VS(t->tptr),
-					   BYTTAB * sizeof(union tree *));
-			t = newt;
-			lastt->tptr[b] = t;
-		}
+	/*
+	 * Find which column it's in --- this is all locale-dependent.
+	 */
+	if (cm->hiarraycols > 1)
+	{
+		colnum = cclass_column_index(cm, c);
+		return cm->hicolormap[rownum * cm->hiarraycols + colnum];
+	}
+	else
+	{
+		/* fast path if no relevant cclasses */
+		return cm->hicolormap[rownum];
 	}
-
-	b = uc & BYTMASK;
-	prev = t->tcolor[b];
-	t->tcolor[b] = co;
-	return prev;
 }
 
 /*
@@ -216,7 +178,7 @@ maxcolor(struct colormap * cm)
 }
 
 /*
- * newcolor - find a new color (must be subject of setcolor at once)
+ * newcolor - find a new color (must be assigned at once)
  * Beware:	may relocate the colordescs.
  */
 static color					/* COLORLESS for error */
@@ -278,12 +240,12 @@ newcolor(struct colormap * cm)
 		cd = &cm->cd[cm->max];
 	}
 
-	cd->nchrs = 0;
+	cd->nschrs = 0;
+	cd->nuchrs = 0;
 	cd->sub = NOSUB;
 	cd->arcs = NULL;
 	cd->firstchr = CHR_MIN;		/* in case never set otherwise */
 	cd->flags = 0;
-	cd->block = NULL;
 
 	return (color) (cd - cm->cd);
 }
@@ -305,13 +267,9 @@ freecolor(struct colormap * cm,
 
 	assert(cd->arcs == NULL);
 	assert(cd->sub == NOSUB);
-	assert(cd->nchrs == 0);
+	assert(cd->nschrs == 0);
+	assert(cd->nuchrs == 0);
 	cd->flags = FREECOL;
-	if (cd->block != NULL)
-	{
-		FREE(cd->block);
-		cd->block = NULL;		/* just paranoia */
-	}
 
 	if ((size_t) co == cm->max)
 	{
@@ -354,17 +312,25 @@ static color
 pseudocolor(struct colormap * cm)
 {
 	color		co;
+	struct colordesc *cd;
 
 	co = newcolor(cm);
 	if (CISERR())
 		return COLORLESS;
-	cm->cd[co].nchrs = 1;
-	cm->cd[co].flags = PSEUDO;
+	cd = &cm->cd[co];
+	cd->nschrs = 0;
+	cd->nuchrs = 1;				/* pretend it is in the upper map */
+	cd->sub = NOSUB;
+	cd->arcs = NULL;
+	cd->firstchr = CHR_MIN;
+	cd->flags = PSEUDO;
 	return co;
 }
 
 /*
  * subcolor - allocate a new subcolor (if necessary) to this chr
+ *
+ * This works only for chrs that map into the low color map.
  */
 static color
 subcolor(struct colormap * cm, chr c)
@@ -372,7 +338,9 @@ subcolor(struct colormap * cm, chr c)
 	color		co;				/* current color of c */
 	color		sco;			/* new subcolor */
 
-	co = GETCOLOR(cm, c);
+	assert(c <= MAX_SIMPLE_CHR);
+
+	co = cm->locolormap[c - CHR_MIN];
 	sco = newsub(cm, co);
 	if (CISERR())
 		return COLORLESS;
@@ -380,11 +348,37 @@ subcolor(struct colormap * cm, chr c)
 
 	if (co == sco)				/* already in an open subcolor */
 		return co;				/* rest is redundant */
-	cm->cd[co].nchrs--;
-	if (cm->cd[sco].nchrs == 0)
+	cm->cd[co].nschrs--;
+	if (cm->cd[sco].nschrs == 0)
 		cm->cd[sco].firstchr = c;
-	cm->cd[sco].nchrs++;
-	setcolor(cm, c, sco);
+	cm->cd[sco].nschrs++;
+	cm->locolormap[c - CHR_MIN] = sco;
+	return sco;
+}
+
+/*
+ * subcolorhi - allocate a new subcolor (if necessary) to this colormap entry
+ *
+ * This is the same processing as subcolor(), but for entries in the high
+ * colormap, which do not necessarily correspond to exactly one chr code.
+ */
+static color
+subcolorhi(struct colormap * cm, color *pco)
+{
+	color		co;				/* current color of entry */
+	color		sco;			/* new subcolor */
+
+	co = *pco;
+	sco = newsub(cm, co);
+	if (CISERR())
+		return COLORLESS;
+	assert(sco != COLORLESS);
+
+	if (co == sco)				/* already in an open subcolor */
+		return co;				/* rest is redundant */
+	cm->cd[co].nuchrs--;
+	cm->cd[sco].nuchrs++;
+	*pco = sco;
 	return sco;
 }
 
@@ -400,7 +394,8 @@ newsub(struct colormap * cm,
 	sco = cm->cd[co].sub;
 	if (sco == NOSUB)
 	{							/* color has no open subcolor */
-		if (cm->cd[co].nchrs == 1)		/* optimization */
+		/* optimization: singly-referenced color need not be subcolored */
+		if ((cm->cd[co].nschrs + cm->cd[co].nuchrs) == 1)
 			return co;
 		sco = newcolor(cm);		/* must create subcolor */
 		if (sco == COLORLESS)
@@ -417,136 +412,500 @@ newsub(struct colormap * cm,
 }
 
 /*
- * subrange - allocate new subcolors to this range of chrs, fill in arcs
+ * newhicolorrow - get a new row in the hicolormap, cloning it from oldrow
+ *
+ * Returns array index of new row.  Note the array might move.
  */
-static void
-subrange(struct vars * v,
-		 chr from,
-		 chr to,
-		 struct state * lp,
-		 struct state * rp)
+static int
+newhicolorrow(struct colormap * cm,
+			  int oldrow)
 {
-	uchr		uf;
+	int			newrow = cm->hiarrayrows;
+	color	   *newrowptr;
 	int			i;
 
-	assert(from <= to);
+	/* Assign a fresh array row index, enlarging storage if needed */
+	if (newrow >= cm->maxarrayrows)
+	{
+		color	   *newarray;
+
+		if (cm->maxarrayrows >= INT_MAX / (cm->hiarraycols * 2))
+		{
+			CERR(REG_ESPACE);
+			return 0;
+		}
+		newarray = (color *) REALLOC(cm->hicolormap,
+									 cm->maxarrayrows * 2 *
+									 cm->hiarraycols * sizeof(color));
+		if (newarray == NULL)
+		{
+			CERR(REG_ESPACE);
+			return 0;
+		}
+		cm->hicolormap = newarray;
+		cm->maxarrayrows *= 2;
+	}
+	cm->hiarrayrows++;
+
+	/* Copy old row data */
+	newrowptr = &cm->hicolormap[newrow * cm->hiarraycols];
+	memcpy(newrowptr,
+		   &cm->hicolormap[oldrow * cm->hiarraycols],
+		   cm->hiarraycols * sizeof(color));
+
+	/* Increase color reference counts to reflect new colormap entries */
+	for (i = 0; i < cm->hiarraycols; i++)
+		cm->cd[newrowptr[i]].nuchrs++;
 
-	/* first, align "from" on a tree-block boundary */
-	uf = (uchr) from;
-	i = (int) (((uf + BYTTAB - 1) & (uchr) ~BYTMASK) - uf);
-	for (; from <= to && i > 0; i--, from++)
-		newarc(v->nfa, PLAIN, subcolor(v->cm, from), lp, rp);
-	if (from > to)				/* didn't reach a boundary */
+	return newrow;
+}
+
+/*
+ * newhicolorcols - create a new set of columns in the high colormap
+ *
+ * Essentially, extends the 2-D array to the right with a copy of itself.
+ */
+static void
+newhicolorcols(struct colormap * cm)
+{
+	color	   *newarray;
+	int			r,
+				c;
+
+	if (cm->hiarraycols >= INT_MAX / (cm->maxarrayrows * 2))
+	{
+		CERR(REG_ESPACE);
 		return;
+	}
+	newarray = (color *) REALLOC(cm->hicolormap,
+								 cm->maxarrayrows *
+								 cm->hiarraycols * 2 * sizeof(color));
+	if (newarray == NULL)
+	{
+		CERR(REG_ESPACE);
+		return;
+	}
+	cm->hicolormap = newarray;
 
-	/* deal with whole blocks */
-	for (; to - from >= BYTTAB; from += BYTTAB)
-		subblock(v, from, lp, rp);
+	/* Duplicate existing columns to the right, and increase ref counts */
+	/* Must work backwards in the array because we realloc'd in place */
+	for (r = cm->hiarrayrows - 1; r >= 0; r--)
+	{
+		color	   *oldrowptr = &newarray[r * cm->hiarraycols];
+		color	   *newrowptr = &newarray[r * cm->hiarraycols * 2];
+		color	   *newrowptr2 = newrowptr + cm->hiarraycols;
+
+		for (c = 0; c < cm->hiarraycols; c++)
+		{
+			color		co = oldrowptr[c];
 
-	/* clean up any remaining partial table */
-	for (; from <= to; from++)
-		newarc(v->nfa, PLAIN, subcolor(v->cm, from), lp, rp);
+			newrowptr[c] = newrowptr2[c] = co;
+			cm->cd[co].nuchrs++;
+		}
+	}
+
+	cm->hiarraycols *= 2;
 }
 
 /*
- * subblock - allocate new subcolors for one tree block of chrs, fill in arcs
+ * subcolorcvec - allocate new subcolors to cvec members, fill in arcs
  *
- * Note: subcolors that are created during execution of this function
- * will not be given a useful value of firstchr; it'll be left as CHR_MIN.
- * For the current usage of firstchr in pg_regprefix, this does not matter
- * because such subcolors won't occur in the common prefix of a regex.
+ * For each chr "c" represented by the cvec, do the equivalent of
+ * newarc(v->nfa, PLAIN, subcolor(v->cm, c), lp, rp);
+ *
+ * Note that in typical cases, many of the subcolors are the same.
+ * While newarc() would discard duplicate arc requests, we can save
+ * some cycles by not calling it repetitively to begin with.  This is
+ * mechanized with the "lastsubcolor" state variable.
  */
 static void
-subblock(struct vars * v,
-		 chr start,				/* first of BYTTAB chrs */
-		 struct state * lp,
-		 struct state * rp)
+subcolorcvec(struct vars * v,
+			 struct cvec * cv,
+			 struct state * lp,
+			 struct state * rp)
 {
-	uchr		uc = start;
 	struct colormap *cm = v->cm;
-	int			shift;
-	int			level;
+	color		lastsubcolor = COLORLESS;
+	chr			ch,
+				from,
+				to;
+	const chr  *p;
 	int			i;
-	int			b;
-	union tree *t;
-	union tree *cb;
-	union tree *fillt;
-	union tree *lastt;
-	int			previ;
-	int			ndone;
-	color		co;
-	color		sco;
 
-	assert((uc % BYTTAB) == 0);
-
-	/* find its color block, making new pointer blocks as needed */
-	t = cm->tree;
-	fillt = NULL;
-	for (level = 0, shift = BYTBITS * (NBYTS - 1); shift > 0;
-		 level++, shift -= BYTBITS)
-	{
-		b = (uc >> shift) & BYTMASK;
-		lastt = t;
-		t = lastt->tptr[b];
-		assert(t != NULL);
-		fillt = &cm->tree[level + 1];
-		if (t == fillt && shift > BYTBITS)
-		{						/* need new ptr block */
-			t = (union tree *) MALLOC(sizeof(struct ptrs));
-			if (t == NULL)
+	/* ordinary characters */
+	for (p = cv->chrs, i = cv->nchrs; i > 0; p++, i--)
+	{
+		ch = *p;
+		subcoloronechr(v, ch, lp, rp, &lastsubcolor);
+		NOERR();
+	}
+
+	/* and the ranges */
+	for (p = cv->ranges, i = cv->nranges; i > 0; p += 2, i--)
+	{
+		from = *p;
+		to = *(p + 1);
+		if (from <= MAX_SIMPLE_CHR)
+		{
+			/* deal with simple chars one at a time */
+			chr			lim = (to <= MAX_SIMPLE_CHR) ? to : MAX_SIMPLE_CHR;
+
+			while (from <= lim)
 			{
-				CERR(REG_ESPACE);
-				return;
+				color		sco = subcolor(cm, from);
+
+				NOERR();
+				if (sco != lastsubcolor)
+				{
+					newarc(v->nfa, PLAIN, sco, lp, rp);
+					NOERR();
+					lastsubcolor = sco;
+				}
+				from++;
 			}
-			memcpy(VS(t->tptr), VS(fillt->tptr),
-				   BYTTAB * sizeof(union tree *));
-			lastt->tptr[b] = t;
 		}
+		/* deal with any part of the range that's above MAX_SIMPLE_CHR */
+		if (from < to)
+			subcoloronerange(v, from, to, lp, rp, &lastsubcolor);
+		else if (from == to)
+			subcoloronechr(v, from, lp, rp, &lastsubcolor);
+		NOERR();
 	}
 
-	/* special cases:  fill block or solid block */
-	co = t->tcolor[0];
-	cb = cm->cd[co].block;
-	if (t == fillt || t == cb)
+	/* and deal with cclass if any */
+	if (cv->cclasscode >= 0)
 	{
-		/* either way, we want a subcolor solid block */
-		sco = newsub(cm, co);
-		t = cm->cd[sco].block;
-		if (t == NULL)
-		{						/* must set it up */
-			t = (union tree *) MALLOC(sizeof(struct colors));
-			if (t == NULL)
+		int			classbit;
+		color	   *pco;
+		int			r,
+					c;
+
+		/* Enlarge array if we don't have a column bit assignment for cclass */
+		if (cm->classbits[cv->cclasscode] == 0)
+		{
+			cm->classbits[cv->cclasscode] = cm->hiarraycols;
+			newhicolorcols(cm);
+			NOERR();
+		}
+		/* Apply subcolorhi() and make arc for each entry in relevant cols */
+		classbit = cm->classbits[cv->cclasscode];
+		pco = cm->hicolormap;
+		for (r = 0; r < cm->hiarrayrows; r++)
+		{
+			for (c = 0; c < cm->hiarraycols; c++)
 			{
-				CERR(REG_ESPACE);
-				return;
+				if (c & classbit)
+				{
+					color		sco = subcolorhi(cm, pco);
+
+					NOERR();
+					/* add the arc if needed */
+					if (sco != lastsubcolor)
+					{
+						newarc(v->nfa, PLAIN, sco, lp, rp);
+						NOERR();
+						lastsubcolor = sco;
+					}
+				}
+				pco++;
 			}
-			for (i = 0; i < BYTTAB; i++)
-				t->tcolor[i] = sco;
-			cm->cd[sco].block = t;
 		}
-		/* find loop must have run at least once */
-		lastt->tptr[b] = t;
-		newarc(v->nfa, PLAIN, sco, lp, rp);
-		cm->cd[co].nchrs -= BYTTAB;
-		cm->cd[sco].nchrs += BYTTAB;
+	}
+}
+
+/*
+ * subcoloronechr - do subcolorcvec's work for a singleton chr
+ *
+ * We could just let subcoloronerange do this, but it's a bit more efficient
+ * if we exploit the single-chr case.  Also, callers find it useful for this
+ * to be able to handle both low and high chr codes.
+ */
+static void
+subcoloronechr(struct vars * v,
+			   chr ch,
+			   struct state * lp,
+			   struct state * rp,
+			   color *lastsubcolor)
+{
+	struct colormap *cm = v->cm;
+	colormaprange *newranges;
+	int			numnewranges;
+	colormaprange *oldrange;
+	int			oldrangen;
+	int			newrow;
+
+	/* Easy case for low chr codes */
+	if (ch <= MAX_SIMPLE_CHR)
+	{
+		color		sco = subcolor(cm, ch);
+
+		NOERR();
+		if (sco != *lastsubcolor)
+		{
+			newarc(v->nfa, PLAIN, sco, lp, rp);
+			*lastsubcolor = sco;
+		}
+		return;
+	}
+
+	/*
+	 * Potentially, we could need two more colormapranges than we have now, if
+	 * the given chr is in the middle of some existing range.
+	 */
+	newranges = (colormaprange *)
+		MALLOC((cm->numcmranges + 2) * sizeof(colormaprange));
+	if (newranges == NULL)
+	{
+		CERR(REG_ESPACE);
 		return;
 	}
+	numnewranges = 0;
 
-	/* general case, a mixed block to be altered */
-	i = 0;
-	while (i < BYTTAB)
+	/* Ranges before target are unchanged */
+	for (oldrange = cm->cmranges, oldrangen = 0;
+		 oldrangen < cm->numcmranges;
+		 oldrange++, oldrangen++)
+	{
+		if (oldrange->cmax >= ch)
+			break;
+		newranges[numnewranges++] = *oldrange;
+	}
+
+	/* Match target chr against current range */
+	if (oldrangen >= cm->numcmranges || oldrange->cmin > ch)
+	{
+		/* chr does not belong to any existing range, make a new one */
+		newranges[numnewranges].cmin = ch;
+		newranges[numnewranges].cmax = ch;
+		/* row state should be cloned from the "all others" row */
+		newranges[numnewranges].rownum = newrow = newhicolorrow(cm, 0);
+		numnewranges++;
+	}
+	else if (oldrange->cmin == oldrange->cmax)
+	{
+		/* we have an existing singleton range matching the chr */
+		newranges[numnewranges++] = *oldrange;
+		newrow = oldrange->rownum;
+		/* we've now fully processed this old range */
+		oldrange++, oldrangen++;
+	}
+	else
 	{
-		co = t->tcolor[i];
-		sco = newsub(cm, co);
-		newarc(v->nfa, PLAIN, sco, lp, rp);
-		previ = i;
-		do
+		/* chr is a subset of this existing range, must split it */
+		if (ch > oldrange->cmin)
+		{
+			/* emit portion of old range before chr */
+			newranges[numnewranges].cmin = oldrange->cmin;
+			newranges[numnewranges].cmax = ch - 1;
+			newranges[numnewranges].rownum = oldrange->rownum;
+			numnewranges++;
+		}
+		/* emit chr as singleton range, initially cloning from range */
+		newranges[numnewranges].cmin = ch;
+		newranges[numnewranges].cmax = ch;
+		newranges[numnewranges].rownum = newrow =
+			newhicolorrow(cm, oldrange->rownum);
+		numnewranges++;
+		if (ch < oldrange->cmax)
 		{
-			t->tcolor[i++] = sco;
-		} while (i < BYTTAB && t->tcolor[i] == co);
-		ndone = i - previ;
-		cm->cd[co].nchrs -= ndone;
-		cm->cd[sco].nchrs += ndone;
+			/* emit portion of old range after chr */
+			newranges[numnewranges].cmin = ch + 1;
+			newranges[numnewranges].cmax = oldrange->cmax;
+			/* must clone the row if we are making two new ranges from old */
+			newranges[numnewranges].rownum =
+				(ch > oldrange->cmin) ? newhicolorrow(cm, oldrange->rownum) :
+				oldrange->rownum;
+			numnewranges++;
+		}
+		/* we've now fully processed this old range */
+		oldrange++, oldrangen++;
+	}
+
+	/* Update colors in newrow and create arcs as needed */
+	subcoloronerow(v, newrow, lp, rp, lastsubcolor);
+
+	/* Ranges after target are unchanged */
+	for (; oldrangen < cm->numcmranges; oldrange++, oldrangen++)
+	{
+		newranges[numnewranges++] = *oldrange;
+	}
+
+	/* Assert our original space estimate was adequate */
+	assert(numnewranges <= (cm->numcmranges + 2));
+
+	/* And finally, store back the updated list of ranges */
+	if (cm->cmranges != NULL)
+		FREE(cm->cmranges);
+	cm->cmranges = newranges;
+	cm->numcmranges = numnewranges;
+}
+
+/*
+ * subcoloronerange - do subcolorcvec's work for a high range
+ */
+static void
+subcoloronerange(struct vars * v,
+				 chr from,
+				 chr to,
+				 struct state * lp,
+				 struct state * rp,
+				 color *lastsubcolor)
+{
+	struct colormap *cm = v->cm;
+	colormaprange *newranges;
+	int			numnewranges;
+	colormaprange *oldrange;
+	int			oldrangen;
+	int			newrow;
+
+	/* Caller should take care of non-high-range cases */
+	assert(from > MAX_SIMPLE_CHR);
+	assert(from < to);
+
+	/*
+	 * Potentially, if we have N non-adjacent ranges, we could need as many as
+	 * 2N+1 result ranges (consider case where new range spans 'em all).
+	 */
+	newranges = (colormaprange *)
+		MALLOC((cm->numcmranges * 2 + 1) * sizeof(colormaprange));
+	if (newranges == NULL)
+	{
+		CERR(REG_ESPACE);
+		return;
+	}
+	numnewranges = 0;
+
+	/* Ranges before target are unchanged */
+	for (oldrange = cm->cmranges, oldrangen = 0;
+		 oldrangen < cm->numcmranges;
+		 oldrange++, oldrangen++)
+	{
+		if (oldrange->cmax >= from)
+			break;
+		newranges[numnewranges++] = *oldrange;
+	}
+
+	/*
+	 * Deal with ranges that (partially) overlap the target.  As we process
+	 * each such range, increase "from" to remove the dealt-with characters
+	 * from the target range.
+	 */
+	while (oldrangen < cm->numcmranges && oldrange->cmin <= to)
+	{
+		if (from < oldrange->cmin)
+		{
+			/* Handle portion of new range that corresponds to no old range */
+			newranges[numnewranges].cmin = from;
+			newranges[numnewranges].cmax = oldrange->cmin - 1;
+			/* row state should be cloned from the "all others" row */
+			newranges[numnewranges].rownum = newrow = newhicolorrow(cm, 0);
+			numnewranges++;
+			/* Update colors in newrow and create arcs as needed */
+			subcoloronerow(v, newrow, lp, rp, lastsubcolor);
+			/* We've now fully processed the part of new range before old */
+			from = oldrange->cmin;
+		}
+
+		if (from <= oldrange->cmin && to >= oldrange->cmax)
+		{
+			/* old range is fully contained in new, process it in-place */
+			newranges[numnewranges++] = *oldrange;
+			newrow = oldrange->rownum;
+			from = oldrange->cmax + 1;
+		}
+		else
+		{
+			/* some part of old range does not overlap new range */
+			if (from > oldrange->cmin)
+			{
+				/* emit portion of old range before new range */
+				newranges[numnewranges].cmin = oldrange->cmin;
+				newranges[numnewranges].cmax = from - 1;
+				newranges[numnewranges].rownum = oldrange->rownum;
+				numnewranges++;
+			}
+			/* emit common subrange, initially cloning from old range */
+			newranges[numnewranges].cmin = from;
+			newranges[numnewranges].cmax =
+				(to < oldrange->cmax) ? to : oldrange->cmax;
+			newranges[numnewranges].rownum = newrow =
+				newhicolorrow(cm, oldrange->rownum);
+			numnewranges++;
+			if (to < oldrange->cmax)
+			{
+				/* emit portion of old range after new range */
+				newranges[numnewranges].cmin = to + 1;
+				newranges[numnewranges].cmax = oldrange->cmax;
+				/* must clone the row if we are making two new ranges from old */
+				newranges[numnewranges].rownum =
+					(from > oldrange->cmin) ? newhicolorrow(cm, oldrange->rownum) :
+					oldrange->rownum;
+				numnewranges++;
+			}
+			from = oldrange->cmax + 1;
+		}
+		/* Update colors in newrow and create arcs as needed */
+		subcoloronerow(v, newrow, lp, rp, lastsubcolor);
+		/* we've now fully processed this old range */
+		oldrange++, oldrangen++;
+	}
+
+	if (from <= to)
+	{
+		/* Handle portion of new range that corresponds to no old range */
+		newranges[numnewranges].cmin = from;
+		newranges[numnewranges].cmax = to;
+		/* row state should be cloned from the "all others" row */
+		newranges[numnewranges].rownum = newrow = newhicolorrow(cm, 0);
+		numnewranges++;
+		/* Update colors in newrow and create arcs as needed */
+		subcoloronerow(v, newrow, lp, rp, lastsubcolor);
+	}
+
+	/* Ranges after target are unchanged */
+	for (; oldrangen < cm->numcmranges; oldrange++, oldrangen++)
+	{
+		newranges[numnewranges++] = *oldrange;
+	}
+
+	/* Assert our original space estimate was adequate */
+	assert(numnewranges <= (cm->numcmranges * 2 + 1));
+
+	/* And finally, store back the updated list of ranges */
+	if (cm->cmranges != NULL)
+		FREE(cm->cmranges);
+	cm->cmranges = newranges;
+	cm->numcmranges = numnewranges;
+}
+
+/*
+ * subcoloronerow - do subcolorcvec's work for one new row in the high colormap
+ */
+static void
+subcoloronerow(struct vars * v,
+			   int rownum,
+			   struct state * lp,
+			   struct state * rp,
+			   color *lastsubcolor)
+{
+	struct colormap *cm = v->cm;
+	color	   *pco;
+	int			i;
+
+	/* Apply subcolorhi() and make arc for each entry in row */
+	pco = &cm->hicolormap[rownum * cm->hiarraycols];
+	for (i = 0; i < cm->hiarraycols; pco++, i++)
+	{
+		color		sco = subcolorhi(cm, pco);
+
+		NOERR();
+		/* make the arc if needed */
+		if (sco != *lastsubcolor)
+		{
+			newarc(v->nfa, PLAIN, sco, lp, rp);
+			NOERR();
+			*lastsubcolor = sco;
+		}
 	}
 }
 
@@ -575,12 +934,12 @@ okcolors(struct nfa * nfa,
 		{
 			/* is subcolor, let parent deal with it */
 		}
-		else if (cd->nchrs == 0)
+		else if (cd->nschrs == 0 && cd->nuchrs == 0)
 		{
 			/* parent empty, its arcs change color to subcolor */
 			cd->sub = NOSUB;
 			scd = &cm->cd[sco];
-			assert(scd->nchrs > 0);
+			assert(scd->nschrs > 0 || scd->nuchrs > 0);
 			assert(scd->sub == sco);
 			scd->sub = NOSUB;
 			while ((a = cd->arcs) != NULL)
@@ -597,7 +956,7 @@ okcolors(struct nfa * nfa,
 			/* parent's arcs must gain parallel subcolor arcs */
 			cd->sub = NOSUB;
 			scd = &cm->cd[sco];
-			assert(scd->nchrs > 0);
+			assert(scd->nschrs > 0 || scd->nuchrs > 0);
 			assert(scd->sub == sco);
 			scd->sub = NOSUB;
 			for (a = cd->arcs; a != NULL; a = a->colorchain)
@@ -711,62 +1070,54 @@ dumpcolors(struct colormap * cm,
 	struct colordesc *end;
 	color		co;
 	chr			c;
-	char	   *has;
 
 	fprintf(f, "max %ld\n", (long) cm->max);
-	if (NBYTS > 1)
-		fillcheck(cm, cm->tree, 0, f);
 	end = CDEND(cm);
 	for (cd = cm->cd + 1, co = 1; cd < end; cd++, co++) /* skip 0 */
+	{
 		if (!UNUSEDCOLOR(cd))
 		{
-			assert(cd->nchrs > 0);
-			has = (cd->block != NULL) ? "#" : "";
+			assert(cd->nschrs > 0 || cd->nuchrs > 0);
 			if (cd->flags & PSEUDO)
-				fprintf(f, "#%2ld%s(ps): ", (long) co, has);
+				fprintf(f, "#%2ld(ps): ", (long) co);
 			else
-				fprintf(f, "#%2ld%s(%2d): ", (long) co,
-						has, cd->nchrs);
+				fprintf(f, "#%2ld(%2d): ", (long) co, cd->nschrs + cd->nuchrs);
 
 			/*
 			 * Unfortunately, it's hard to do this next bit more efficiently.
-			 *
-			 * Spencer's original coding has the loop iterating from CHR_MIN
-			 * to CHR_MAX, but that's utterly unusable for 32-bit chr. For
-			 * debugging purposes it seems fine to print only chr codes up to
-			 * 1000 or so.
 			 */
-			for (c = CHR_MIN; c < 1000; c++)
+			for (c = CHR_MIN; c <= MAX_SIMPLE_CHR; c++)
 				if (GETCOLOR(cm, c) == co)
 					dumpchr(c, f);
 			fprintf(f, "\n");
 		}
-}
-
-/*
- * fillcheck - check proper filling of a tree
- */
-static void
-fillcheck(struct colormap * cm,
-		  union tree * tree,
-		  int level,			/* level number (top == 0) of this block */
-		  FILE *f)
-{
-	int			i;
-	union tree *t;
-	union tree *fillt = &cm->tree[level + 1];
-
-	assert(level < NBYTS - 1);	/* this level has pointers */
-	for (i = BYTTAB - 1; i >= 0; i--)
+	}
+	/* dump the high colormap if it contains anything interesting */
+	if (cm->hiarrayrows > 1 || cm->hiarraycols > 1)
 	{
-		t = tree->tptr[i];
-		if (t == NULL)
-			fprintf(f, "NULL found in filled tree!\n");
-		else if (t == fillt)
+		int			r,
+					c;
+		const color *rowptr;
+
+		fprintf(f, "other:\t");
+		for (c = 0; c < cm->hiarraycols; c++)
+		{
+			fprintf(f, "\t%ld", (long) cm->hicolormap[c]);
+		}
+		fprintf(f, "\n");
+		for (r = 0; r < cm->numcmranges; r++)
 		{
+			dumpchr(cm->cmranges[r].cmin, f);
+			fprintf(f, "..");
+			dumpchr(cm->cmranges[r].cmax, f);
+			fprintf(f, ":");
+			rowptr = &cm->hicolormap[cm->cmranges[r].rownum * cm->hiarraycols];
+			for (c = 0; c < cm->hiarraycols; c++)
+			{
+				fprintf(f, "\t%ld", (long) rowptr[c]);
+			}
+			fprintf(f, "\n");
 		}
-		else if (level < NBYTS - 2)		/* more pointer blocks below */
-			fillcheck(cm, t, level + 1, f);
 	}
 }
 
diff --git a/src/backend/regex/regc_cvec.c b/src/backend/regex/regc_cvec.c
index 3a9e8cfbbd7390b362cd3480dc1a6782da7a6a95..50b7a4574b4049ddfd2497d16c2bbf99ce0f1c64 100644
--- a/src/backend/regex/regc_cvec.c
+++ b/src/backend/regex/regc_cvec.c
@@ -34,7 +34,8 @@
 
 /*
  * Notes:
- * Only (selected) functions in _this_ file should treat chr* as non-constant.
+ * Only (selected) functions in _this_ file should treat the chr arrays
+ * of a cvec as non-constant.
  */
 
 /*
@@ -67,6 +68,7 @@ clearcvec(struct cvec * cv)
 	assert(cv != NULL);
 	cv->nchrs = 0;
 	cv->nranges = 0;
+	cv->cclasscode = -1;
 	return cv;
 }
 
diff --git a/src/backend/regex/regc_locale.c b/src/backend/regex/regc_locale.c
index 399de027cdd655ec3ae0d5ebd28450421ad6b9aa..7cb3a40a0c88b678577ccb20b847641233037226 100644
--- a/src/backend/regex/regc_locale.c
+++ b/src/backend/regex/regc_locale.c
@@ -349,6 +349,19 @@ static const struct cname
 	}
 };
 
+/*
+ * The following arrays define the valid character class names.
+ */
+static const char *const classNames[NUM_CCLASSES + 1] = {
+	"alnum", "alpha", "ascii", "blank", "cntrl", "digit", "graph",
+	"lower", "print", "punct", "space", "upper", "xdigit", NULL
+};
+
+enum classes
+{
+	CC_ALNUM, CC_ALPHA, CC_ASCII, CC_BLANK, CC_CNTRL, CC_DIGIT, CC_GRAPH,
+	CC_LOWER, CC_PRINT, CC_PUNCT, CC_SPACE, CC_UPPER, CC_XDIGIT
+};
 
 /*
  * We do not use the hard-wired Unicode classification tables that Tcl does.
@@ -543,21 +556,6 @@ cclass(struct vars * v,			/* context */
 	int			i,
 				index;
 
-	/*
-	 * The following arrays define the valid character class names.
-	 */
-
-	static const char *const classNames[] = {
-		"alnum", "alpha", "ascii", "blank", "cntrl", "digit", "graph",
-		"lower", "print", "punct", "space", "upper", "xdigit", NULL
-	};
-
-	enum classes
-	{
-		CC_ALNUM, CC_ALPHA, CC_ASCII, CC_BLANK, CC_CNTRL, CC_DIGIT, CC_GRAPH,
-		CC_LOWER, CC_PRINT, CC_PUNCT, CC_SPACE, CC_UPPER, CC_XDIGIT
-	};
-
 	/*
 	 * Map the name to the corresponding enumerated value.
 	 */
@@ -593,18 +591,20 @@ cclass(struct vars * v,			/* context */
 	 * pg_ctype_get_cache so that we can cache the results.  Other classes
 	 * have definitions that are hard-wired here, and for those we just
 	 * construct a transient cvec on the fly.
+	 *
+	 * NB: keep this code in sync with cclass_column_index(), below.
 	 */
 
 	switch ((enum classes) index)
 	{
 		case CC_PRINT:
-			cv = pg_ctype_get_cache(pg_wc_isprint);
+			cv = pg_ctype_get_cache(pg_wc_isprint, index);
 			break;
 		case CC_ALNUM:
-			cv = pg_ctype_get_cache(pg_wc_isalnum);
+			cv = pg_ctype_get_cache(pg_wc_isalnum, index);
 			break;
 		case CC_ALPHA:
-			cv = pg_ctype_get_cache(pg_wc_isalpha);
+			cv = pg_ctype_get_cache(pg_wc_isalpha, index);
 			break;
 		case CC_ASCII:
 			/* hard-wired meaning */
@@ -625,10 +625,10 @@ cclass(struct vars * v,			/* context */
 			addrange(cv, 0x7f, 0x9f);
 			break;
 		case CC_DIGIT:
-			cv = pg_ctype_get_cache(pg_wc_isdigit);
+			cv = pg_ctype_get_cache(pg_wc_isdigit, index);
 			break;
 		case CC_PUNCT:
-			cv = pg_ctype_get_cache(pg_wc_ispunct);
+			cv = pg_ctype_get_cache(pg_wc_ispunct, index);
 			break;
 		case CC_XDIGIT:
 
@@ -646,16 +646,16 @@ cclass(struct vars * v,			/* context */
 			}
 			break;
 		case CC_SPACE:
-			cv = pg_ctype_get_cache(pg_wc_isspace);
+			cv = pg_ctype_get_cache(pg_wc_isspace, index);
 			break;
 		case CC_LOWER:
-			cv = pg_ctype_get_cache(pg_wc_islower);
+			cv = pg_ctype_get_cache(pg_wc_islower, index);
 			break;
 		case CC_UPPER:
-			cv = pg_ctype_get_cache(pg_wc_isupper);
+			cv = pg_ctype_get_cache(pg_wc_isupper, index);
 			break;
 		case CC_GRAPH:
-			cv = pg_ctype_get_cache(pg_wc_isgraph);
+			cv = pg_ctype_get_cache(pg_wc_isgraph, index);
 			break;
 	}
 
@@ -665,6 +665,47 @@ cclass(struct vars * v,			/* context */
 	return cv;
 }
 
+/*
+ * cclass_column_index - get appropriate high colormap column index for chr
+ */
+static int
+cclass_column_index(struct colormap * cm, chr c)
+{
+	int			colnum = 0;
+
+	/* Shouldn't go through all these pushups for simple chrs */
+	assert(c > MAX_SIMPLE_CHR);
+
+	/*
+	 * Note: we should not see requests to consider cclasses that are not
+	 * treated as locale-specific by cclass(), above.
+	 */
+	if (cm->classbits[CC_PRINT] && pg_wc_isprint(c))
+		colnum |= cm->classbits[CC_PRINT];
+	if (cm->classbits[CC_ALNUM] && pg_wc_isalnum(c))
+		colnum |= cm->classbits[CC_ALNUM];
+	if (cm->classbits[CC_ALPHA] && pg_wc_isalpha(c))
+		colnum |= cm->classbits[CC_ALPHA];
+	assert(cm->classbits[CC_ASCII] == 0);
+	assert(cm->classbits[CC_BLANK] == 0);
+	assert(cm->classbits[CC_CNTRL] == 0);
+	if (cm->classbits[CC_DIGIT] && pg_wc_isdigit(c))
+		colnum |= cm->classbits[CC_DIGIT];
+	if (cm->classbits[CC_PUNCT] && pg_wc_ispunct(c))
+		colnum |= cm->classbits[CC_PUNCT];
+	assert(cm->classbits[CC_XDIGIT] == 0);
+	if (cm->classbits[CC_SPACE] && pg_wc_isspace(c))
+		colnum |= cm->classbits[CC_SPACE];
+	if (cm->classbits[CC_LOWER] && pg_wc_islower(c))
+		colnum |= cm->classbits[CC_LOWER];
+	if (cm->classbits[CC_UPPER] && pg_wc_isupper(c))
+		colnum |= cm->classbits[CC_UPPER];
+	if (cm->classbits[CC_GRAPH] && pg_wc_isgraph(c))
+		colnum |= cm->classbits[CC_GRAPH];
+
+	return colnum;
+}
+
 /*
  * allcases - supply cvec for all case counterparts of a chr (including itself)
  *
diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 551ae7dc08eed55e841a4a89ca60d2d01a34374d..ad9d4b1961d4398f135e11a777160d7754fbb59e 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -736,7 +736,7 @@ store_match(pg_ctype_cache *pcc, pg_wchar chr1, int nchrs)
  * Note that the result must not be freed or modified by caller.
  */
 static struct cvec *
-pg_ctype_get_cache(pg_wc_probefunc probefunc)
+pg_ctype_get_cache(pg_wc_probefunc probefunc, int cclasscode)
 {
 	pg_ctype_cache *pcc;
 	pg_wchar	max_chr;
@@ -770,31 +770,43 @@ pg_ctype_get_cache(pg_wc_probefunc probefunc)
 	pcc->cv.ranges = (chr *) malloc(pcc->cv.rangespace * sizeof(chr) * 2);
 	if (pcc->cv.chrs == NULL || pcc->cv.ranges == NULL)
 		goto out_of_memory;
+	pcc->cv.cclasscode = cclasscode;
 
 	/*
-	 * Decide how many character codes we ought to look through.  For C locale
-	 * there's no need to go further than 127.  Otherwise, if the encoding is
-	 * UTF8 go up to 0x7FF, which is a pretty arbitrary cutoff but we cannot
-	 * extend it as far as we'd like (say, 0xFFFF, the end of the Basic
-	 * Multilingual Plane) without creating significant performance issues due
-	 * to too many characters being fed through the colormap code.  This will
-	 * need redesign to fix reasonably, but at least for the moment we have
-	 * all common European languages covered.  Otherwise (not C, not UTF8) go
-	 * up to 255.  These limits are interrelated with restrictions discussed
-	 * at the head of this file.
+	 * Decide how many character codes we ought to look through.  In general
+	 * we don't go past MAX_SIMPLE_CHR; chr codes above that are handled at
+	 * runtime using the "high colormap" mechanism.  However, in C locale
+	 * there's no need to go further than 127, and if we only have a 1-byte
+	 * <ctype.h> API there's no need to go further than that can handle.
+	 *
+	 * If it's not MAX_SIMPLE_CHR that's constraining the search, mark the
+	 * output cvec as not having any locale-dependent behavior, since there
+	 * will be no need to do any run-time locale checks.  (The #if's here
+	 * would always be true for production values of MAX_SIMPLE_CHR, but it's
+	 * useful to allow it to be small for testing purposes.)
 	 */
 	switch (pg_regex_strategy)
 	{
 		case PG_REGEX_LOCALE_C:
+#if MAX_SIMPLE_CHR >= 127
 			max_chr = (pg_wchar) 127;
+			pcc->cv.cclasscode = -1;
+#else
+			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+#endif
 			break;
 		case PG_REGEX_LOCALE_WIDE:
 		case PG_REGEX_LOCALE_WIDE_L:
-			max_chr = (pg_wchar) 0x7FF;
+			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
 			break;
 		case PG_REGEX_LOCALE_1BYTE:
 		case PG_REGEX_LOCALE_1BYTE_L:
+#if MAX_SIMPLE_CHR >= UCHAR_MAX
 			max_chr = (pg_wchar) UCHAR_MAX;
+			pcc->cv.cclasscode = -1;
+#else
+			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+#endif
 			break;
 		default:
 			max_chr = 0;		/* can't get here, but keep compiler quiet */
diff --git a/src/backend/regex/regcomp.c b/src/backend/regex/regcomp.c
index b211cc0a189f7d6d06f5986ba85899219d0a01d0..ed95474b3f073fc10670bbc1646697714f91cfb8 100644
--- a/src/backend/regex/regcomp.c
+++ b/src/backend/regex/regcomp.c
@@ -55,7 +55,6 @@ static void cbracket(struct vars *, struct state *, struct state *);
 static void brackpart(struct vars *, struct state *, struct state *);
 static const chr *scanplain(struct vars *);
 static void onechr(struct vars *, chr, struct state *, struct state *);
-static void dovec(struct vars *, struct cvec *, struct state *, struct state *);
 static void wordchrs(struct vars *);
 static void processlacon(struct vars *, struct state *, struct state *, int,
 			 struct state *, struct state *);
@@ -96,16 +95,19 @@ static chr	chrnamed(struct vars *, const chr *, const chr *, chr);
 /* === regc_color.c === */
 static void initcm(struct vars *, struct colormap *);
 static void freecm(struct colormap *);
-static void cmtreefree(struct colormap *, union tree *, int);
-static color setcolor(struct colormap *, chr, color);
 static color maxcolor(struct colormap *);
 static color newcolor(struct colormap *);
 static void freecolor(struct colormap *, color);
 static color pseudocolor(struct colormap *);
-static color subcolor(struct colormap *, chr c);
+static color subcolor(struct colormap *, chr);
+static color subcolorhi(struct colormap *, color *);
 static color newsub(struct colormap *, color);
-static void subrange(struct vars *, chr, chr, struct state *, struct state *);
-static void subblock(struct vars *, chr, struct state *, struct state *);
+static int	newhicolorrow(struct colormap *, int);
+static void newhicolorcols(struct colormap *);
+static void subcolorcvec(struct vars *, struct cvec *, struct state *, struct state *);
+static void subcoloronechr(struct vars *, chr, struct state *, struct state *, color *);
+static void subcoloronerange(struct vars *, chr, chr, struct state *, struct state *, color *);
+static void subcoloronerow(struct vars *, int, struct state *, struct state *, color *);
 static void okcolors(struct nfa *, struct colormap *);
 static void colorchain(struct colormap *, struct arc *);
 static void uncolorchain(struct colormap *, struct arc *);
@@ -114,7 +116,6 @@ static void colorcomplement(struct nfa *, struct colormap *, int, struct state *
 
 #ifdef REG_DEBUG
 static void dumpcolors(struct colormap *, FILE *);
-static void fillcheck(struct colormap *, union tree *, int, FILE *);
 static void dumpchr(chr, FILE *);
 #endif
 /* === regc_nfa.c === */
@@ -215,6 +216,7 @@ static struct cvec *range(struct vars *, chr, chr, int);
 static int	before(chr, chr);
 static struct cvec *eclass(struct vars *, chr, int);
 static struct cvec *cclass(struct vars *, const chr *, const chr *, int);
+static int	cclass_column_index(struct colormap *, chr);
 static struct cvec *allcases(struct vars *, chr);
 static int	cmp(const chr *, const chr *, size_t);
 static int	casecmp(const chr *, const chr *, size_t);
@@ -1467,7 +1469,7 @@ brackpart(struct vars * v,
 			NOERR();
 			cv = eclass(v, startc, (v->cflags & REG_ICASE));
 			NOERR();
-			dovec(v, cv, lp, rp);
+			subcolorcvec(v, cv, lp, rp);
 			return;
 			break;
 		case CCLASS:
@@ -1477,7 +1479,7 @@ brackpart(struct vars * v,
 			NOERR();
 			cv = cclass(v, startp, endp, (v->cflags & REG_ICASE));
 			NOERR();
-			dovec(v, cv, lp, rp);
+			subcolorcvec(v, cv, lp, rp);
 			return;
 			break;
 		default:
@@ -1523,7 +1525,7 @@ brackpart(struct vars * v,
 		NOTE(REG_UUNPORT);
 	cv = range(v, startc, endc, (v->cflags & REG_ICASE));
 	NOERR();
-	dovec(v, cv, lp, rp);
+	subcolorcvec(v, cv, lp, rp);
 }
 
 /*
@@ -1565,46 +1567,14 @@ onechr(struct vars * v,
 {
 	if (!(v->cflags & REG_ICASE))
 	{
-		newarc(v->nfa, PLAIN, subcolor(v->cm, c), lp, rp);
+		color		lastsubcolor = COLORLESS;
+
+		subcoloronechr(v, c, lp, rp, &lastsubcolor);
 		return;
 	}
 
 	/* rats, need general case anyway... */
-	dovec(v, allcases(v, c), lp, rp);
-}
-
-/*
- * dovec - fill in arcs for each element of a cvec
- */
-static void
-dovec(struct vars * v,
-	  struct cvec * cv,
-	  struct state * lp,
-	  struct state * rp)
-{
-	chr			ch,
-				from,
-				to;
-	const chr  *p;
-	int			i;
-
-	/* ordinary characters */
-	for (p = cv->chrs, i = cv->nchrs; i > 0; p++, i--)
-	{
-		ch = *p;
-		newarc(v->nfa, PLAIN, subcolor(v->cm, ch), lp, rp);
-		NOERR();
-	}
-
-	/* and the ranges */
-	for (p = cv->ranges, i = cv->nranges; i > 0; p += 2, i--)
-	{
-		from = *p;
-		to = *(p + 1);
-		if (from <= to)
-			subrange(v, from, to, lp, rp);
-		NOERR();
-	}
+	subcolorcvec(v, allcases(v, c), lp, rp);
 }
 
 /*
diff --git a/src/backend/regex/regexport.c b/src/backend/regex/regexport.c
index 93da82286f3d556f02e29031a2942b8211ac2531..a515949bb4d738c5aa6598ce443ab4bee7365d51 100644
--- a/src/backend/regex/regexport.c
+++ b/src/backend/regex/regexport.c
@@ -28,10 +28,6 @@
 
 #include "regex/regexport.h"
 
-static void scancolormap(struct colormap * cm, int co,
-			 union tree * t, int level, chr partial,
-			 pg_wchar **chars, int *chars_len);
-
 
 /*
  * Get total number of NFA states.
@@ -187,10 +183,7 @@ pg_reg_colorisend(const regex_t *regex, int co)
  *
  * Note: we return -1 if the color number is invalid, or if it is a special
  * color (WHITE or a pseudocolor), or if the number of members is uncertain.
- * The latter case cannot arise right now but is specified to allow for future
- * improvements (see musings about run-time handling of higher character codes
- * in regex/README).  Callers should not try to extract the members if -1 is
- * returned.
+ * Callers should not try to extract the members if -1 is returned.
  */
 int
 pg_reg_getnumcharacters(const regex_t *regex, int co)
@@ -205,7 +198,18 @@ pg_reg_getnumcharacters(const regex_t *regex, int co)
 	if (cm->cd[co].flags & PSEUDO)		/* also pseudocolors (BOS etc) */
 		return -1;
 
-	return cm->cd[co].nchrs;
+	/*
+	 * If the color appears anywhere in the high colormap, treat its number of
+	 * members as uncertain.  In principle we could determine all the specific
+	 * chrs corresponding to each such entry, but it would be expensive
+	 * (particularly if character class tests are required) and it doesn't
+	 * seem worth it.
+	 */
+	if (cm->cd[co].nuchrs != 0)
+		return -1;
+
+	/* OK, return the known number of member chrs */
+	return cm->cd[co].nschrs;
 }
 
 /*
@@ -222,6 +226,7 @@ pg_reg_getcharacters(const regex_t *regex, int co,
 					 pg_wchar *chars, int chars_len)
 {
 	struct colormap *cm;
+	chr			c;
 
 	assert(regex != NULL && regex->re_magic == REMAGIC);
 	cm = &((struct guts *) regex->re_guts)->cmap;
@@ -231,62 +236,17 @@ pg_reg_getcharacters(const regex_t *regex, int co,
 	if (cm->cd[co].flags & PSEUDO)
 		return;
 
-	/* Recursively search the colormap tree */
-	scancolormap(cm, co, cm->tree, 0, 0, &chars, &chars_len);
-}
-
-/*
- * Recursively scan the colormap tree to find chrs belonging to color "co".
- * See regex/README for info about the tree structure.
- *
- * t: tree block to scan
- * level: level (from 0) of t
- * partial: partial chr code for chrs within t
- * chars, chars_len: output area
- */
-static void
-scancolormap(struct colormap * cm, int co,
-			 union tree * t, int level, chr partial,
-			 pg_wchar **chars, int *chars_len)
-{
-	int			i;
-
-	if (level < NBYTS - 1)
-	{
-		/* non-leaf node */
-		for (i = 0; i < BYTTAB; i++)
-		{
-			/*
-			 * We do not support search for chrs of color 0 (WHITE), so
-			 * all-white subtrees need not be searched.  These can be
-			 * recognized because they are represented by the fill blocks in
-			 * the colormap struct.  This typically allows us to avoid
-			 * scanning large regions of higher-numbered chrs.
-			 */
-			if (t->tptr[i] == &cm->tree[level + 1])
-				continue;
-
-			/* Recursively scan next level down */
-			scancolormap(cm, co,
-						 t->tptr[i], level + 1,
-						 (partial | (chr) i) << BYTBITS,
-						 chars, chars_len);
-		}
-	}
-	else
+	/*
+	 * We need only examine the low character map; there should not be any
+	 * matching entries in the high map.
+	 */
+	for (c = CHR_MIN; c <= MAX_SIMPLE_CHR; c++)
 	{
-		/* leaf node */
-		for (i = 0; i < BYTTAB; i++)
+		if (cm->locolormap[c - CHR_MIN] == co)
 		{
-			if (t->tcolor[i] == co)
-			{
-				if (*chars_len > 0)
-				{
-					**chars = partial | (chr) i;
-					(*chars)++;
-					(*chars_len)--;
-				}
-			}
+			*chars++ = c;
+			if (--chars_len == 0)
+				break;
 		}
 	}
 }
diff --git a/src/backend/regex/regprefix.c b/src/backend/regex/regprefix.c
index 04b6449a20cfb12079660bbe1782a76454bab0c5..f338715225070c970536668aebd7d92b14122afc 100644
--- a/src/backend/regex/regprefix.c
+++ b/src/backend/regex/regprefix.c
@@ -194,7 +194,10 @@ findprefix(struct cnfa * cnfa,
 		if (thiscolor == COLORLESS)
 			break;
 		/* The color must be a singleton */
-		if (cm->cd[thiscolor].nchrs != 1)
+		if (cm->cd[thiscolor].nschrs != 1)
+			break;
+		/* Must not have any high-color-map entries */
+		if (cm->cd[thiscolor].nuchrs != 0)
 			break;
 
 		/*
diff --git a/src/include/regex/regcustom.h b/src/include/regex/regcustom.h
index 459851a7f6bdff8a04b0d3136fc61e4a64c21dc5..04a1893c806ff38556a6f1e490936128ec355fed 100644
--- a/src/include/regex/regcustom.h
+++ b/src/include/regex/regcustom.h
@@ -77,6 +77,16 @@ typedef unsigned uchr;			/* unsigned type that will hold a chr */
  */
 #define CHR_IS_IN_RANGE(c)	((c) <= CHR_MAX)
 
+/*
+ * MAX_SIMPLE_CHR is the cutoff between "simple" and "complicated" processing
+ * in the color map logic.  It should usually be chosen high enough to ensure
+ * that all common characters are <= MAX_SIMPLE_CHR.  However, very large
+ * values will be counterproductive since they cause more regex setup time.
+ * Also, small values can be helpful for testing the high-color-map logic
+ * with plain old ASCII input.
+ */
+#define MAX_SIMPLE_CHR	0x7FF	/* suitable value for Unicode */
+
 /* functions operating on chr */
 #define iscalnum(x) pg_wc_isalnum(x)
 #define iscalpha(x) pg_wc_isalpha(x)
diff --git a/src/include/regex/regex.h b/src/include/regex/regex.h
index 2f89dc9326bfab16a10bd6ba070a6b1230c3ff0c..cc73db25473c06e058555021e08cd5d1e2c75417 100644
--- a/src/include/regex/regex.h
+++ b/src/include/regex/regex.h
@@ -172,6 +172,5 @@ extern int	pg_regexec(regex_t *, const pg_wchar *, size_t, size_t, rm_detail_t *
 extern int	pg_regprefix(regex_t *, pg_wchar **, size_t *);
 extern void pg_regfree(regex_t *);
 extern size_t pg_regerror(int, const regex_t *, char *, size_t);
-extern void pg_set_regex_collation(Oid collation);
 
 #endif   /* _REGEX_H_ */
diff --git a/src/include/regex/regguts.h b/src/include/regex/regguts.h
index b0aa641cc4f6d851e19b3635edb9a8d3ddfbeaeb..69816f1449a6cbf038134f9bce94243be5660701 100644
--- a/src/include/regex/regguts.h
+++ b/src/include/regex/regguts.h
@@ -127,23 +127,6 @@
 #define ISBSET(uv, sn)	((uv)[(sn)/UBITS] & ((unsigned)1 << ((sn)%UBITS)))
 
 
-
-/*
- * We dissect a chr into byts for colormap table indexing.  Here we define
- * a byt, which will be the same as a byte on most machines...  The exact
- * size of a byt is not critical, but about 8 bits is good, and extraction
- * of 8-bit chunks is sometimes especially fast.
- */
-#ifndef BYTBITS
-#define BYTBITS 8				/* bits in a byt */
-#endif
-#define BYTTAB	(1<<BYTBITS)	/* size of table with one entry per byt value */
-#define BYTMASK (BYTTAB-1)		/* bit mask for byt */
-#define NBYTS	((CHRBITS+BYTBITS-1)/BYTBITS)
-/* the definition of GETCOLOR(), below, assumes NBYTS <= 4 */
-
-
-
 /*
  * As soon as possible, we map chrs into equivalence classes -- "colors" --
  * which are of much more manageable number.
@@ -153,42 +136,9 @@ typedef short color;			/* colors of characters */
 #define MAX_COLOR	32767		/* max color (must fit in 'color' datatype) */
 #define COLORLESS	(-1)		/* impossible color */
 #define WHITE		0			/* default color, parent of all others */
-
+/* Note: various places in the code know that WHITE is zero */
 
 
-/*
- * A colormap is a tree -- more precisely, a DAG -- indexed at each level
- * by a byt of the chr, to map the chr to a color efficiently.  Because
- * lower sections of the tree can be shared, it can exploit the usual
- * sparseness of such a mapping table.  The tree is always NBYTS levels
- * deep (in the past it was shallower during construction but was "filled"
- * to full depth at the end of that); areas that are unaltered as yet point
- * to "fill blocks" which are entirely WHITE in color.
- *
- * Leaf-level tree blocks are of type "struct colors", while upper-level
- * blocks are of type "struct ptrs".  Pointers into the tree are generally
- * declared as "union tree *" to be agnostic about what level they point to.
- */
-
-/* the tree itself */
-struct colors
-{
-	color		ccolor[BYTTAB];
-};
-struct ptrs
-{
-	union tree *pptr[BYTTAB];
-};
-union tree
-{
-	struct colors colors;
-	struct ptrs ptrs;
-};
-
-/* use these pseudo-field names when dereferencing a "union tree" pointer */
-#define tcolor	colors.ccolor
-#define tptr	ptrs.pptr
-
 /*
  * Per-color data structure for the compile-time color machinery
  *
@@ -203,26 +153,56 @@ union tree
  */
 struct colordesc
 {
-	uchr		nchrs;			/* number of chars of this color */
+	int			nschrs;			/* number of simple chars of this color */
+	int			nuchrs;			/* number of upper map entries of this color */
 	color		sub;			/* open subcolor, if any; or free-chain ptr */
 #define  NOSUB	 COLORLESS		/* value of "sub" when no open subcolor */
 	struct arc *arcs;			/* chain of all arcs of this color */
-	chr			firstchr;		/* char first assigned to this color */
+	chr			firstchr;		/* simple char first assigned to this color */
 	int			flags;			/* bit values defined next */
 #define  FREECOL 01				/* currently free */
 #define  PSEUDO  02				/* pseudocolor, no real chars */
 #define  UNUSEDCOLOR(cd) ((cd)->flags & FREECOL)
-	union tree *block;			/* block of solid color, if any */
 };
 
 /*
  * The color map itself
  *
- * Much of the data in the colormap struct is only used at compile time.
- * However, the bulk of the space usage is in the "tree" structure, so it's
- * not clear that there's much point in converting the rest to a more compact
- * form when compilation is finished.
+ * This struct holds both data used only at compile time, and the chr to
+ * color mapping information, used at both compile and run time.  The latter
+ * is the bulk of the space, so it's not really worth separating out the
+ * compile-only portion.
+ *
+ * Ideally, the mapping data would just be an array of colors indexed by
+ * chr codes; but for large character sets that's impractical.  Fortunately,
+ * common characters have smaller codes, so we can use a simple array for chr
+ * codes up to MAX_SIMPLE_CHR, and do something more complex for codes above
+ * that, without much loss of performance.  The "something more complex" is a
+ * 2-D array of color entries, where row indexes correspond to individual chrs
+ * or chr ranges that have been mentioned in the regex (with row zero
+ * representing all other chrs), and column indexes correspond to different
+ * sets of locale-dependent character classes such as "isalpha".  The
+ * classbits[k] entry is zero if we do not care about the k'th character class
+ * in this regex, and otherwise it is the bit to be OR'd into the column index
+ * if the character in question is a member of that class.  We find the color
+ * of a high-valued chr by identifying which colormaprange it is in to get
+ * the row index (use row zero if it's in none of them), identifying which of
+ * the interesting cclasses it's in to get the column index, and then indexing
+ * into the 2-D hicolormap array.
+ *
+ * The colormapranges are required to be nonempty, nonoverlapping, and to
+ * appear in increasing chr-value order.
  */
+
+#define NUM_CCLASSES 13			/* must match data in regc_locale.c */
+
+typedef struct colormaprange
+{
+	chr			cmin;			/* range represents cmin..cmax inclusive */
+	chr			cmax;
+	int			rownum;			/* row index in hicolormap array (>= 1) */
+} colormaprange;
+
 struct colormap
 {
 	int			magic;
@@ -233,27 +213,27 @@ struct colormap
 	color		free;			/* beginning of free chain (if non-0) */
 	struct colordesc *cd;		/* pointer to array of colordescs */
 #define  CDEND(cm)	 (&(cm)->cd[(cm)->max + 1])
+
+	/* mapping data for chrs <= MAX_SIMPLE_CHR: */
+	color	   *locolormap;		/* simple array indexed by chr code */
+
+	/* mapping data for chrs > MAX_SIMPLE_CHR: */
+	int			classbits[NUM_CCLASSES];		/* see comment above */
+	int			numcmranges;	/* number of colormapranges */
+	colormaprange *cmranges;	/* ranges of high chrs */
+	color	   *hicolormap;		/* 2-D array of color entries */
+	int			maxarrayrows;	/* number of array rows allocated */
+	int			hiarrayrows;	/* number of array rows in use */
+	int			hiarraycols;	/* number of array columns (2^N) */
+
 	/* If we need up to NINLINECDS, we store them here to save a malloc */
-#define  NINLINECDS  ((size_t)10)
+#define  NINLINECDS  ((size_t) 10)
 	struct colordesc cdspace[NINLINECDS];
-	union tree	tree[NBYTS];	/* tree top, plus lower-level fill blocks */
 };
 
-/* optimization magic to do fast chr->color mapping */
-#define B0(c)	((c) & BYTMASK)
-#define B1(c)	(((c)>>BYTBITS) & BYTMASK)
-#define B2(c)	(((c)>>(2*BYTBITS)) & BYTMASK)
-#define B3(c)	(((c)>>(3*BYTBITS)) & BYTMASK)
-#if NBYTS == 1
-#define GETCOLOR(cm, c) ((cm)->tree->tcolor[B0(c)])
-#endif
-/* beware, for NBYTS>1, GETCOLOR() is unsafe -- 2nd arg used repeatedly */
-#if NBYTS == 2
-#define GETCOLOR(cm, c) ((cm)->tree->tptr[B1(c)]->tcolor[B0(c)])
-#endif
-#if NBYTS == 4
-#define GETCOLOR(cm, c) ((cm)->tree->tptr[B3(c)]->tptr[B2(c)]->tptr[B1(c)]->tcolor[B0(c)])
-#endif
+/* fetch color for chr; beware of multiple evaluation of c argument */
+#define GETCOLOR(cm, c) \
+	((c) <= MAX_SIMPLE_CHR ? (cm)->locolormap[(c) - CHR_MIN] : pg_reg_getcolor(cm, c))
 
 
 /*
@@ -264,6 +244,11 @@ struct colormap
  * Representation of a set of characters.  chrs[] represents individual
  * code points, ranges[] represents ranges in the form min..max inclusive.
  *
+ * If the cvec represents a locale-specific character class, eg [[:alpha:]],
+ * then the chrs[] and ranges[] arrays contain only members of that class
+ * up to MAX_SIMPLE_CHR (inclusive).  cclasscode is set to regc_locale.c's
+ * code for the class, rather than being -1 as it is in an ordinary cvec.
+ *
  * Note that in cvecs gotten from newcvec() and intended to be freed by
  * freecvec(), both arrays of chrs are after the end of the struct, not
  * separately malloc'd; so chrspace and rangespace are effectively immutable.
@@ -276,6 +261,7 @@ struct cvec
 	int			nranges;		/* number of ranges (chr pairs) */
 	int			rangespace;		/* number of ranges allocated in ranges[] */
 	chr		   *ranges;			/* pointer to vector of chr pairs */
+	int			cclasscode;		/* value of "enum classes", or -1 */
 };
 
 
@@ -489,3 +475,8 @@ struct guts
 	int			nlacons;		/* size of lacons[]; note that only slots
 								 * numbered 1 .. nlacons-1 are used */
 };
+
+
+/* prototypes for functions that are exported from regcomp.c to regexec.c */
+extern void pg_set_regex_collation(Oid collation);
+extern color pg_reg_getcolor(struct colormap * cm, chr c);
diff --git a/src/test/regress/expected/regex.linux.utf8.out b/src/test/regress/expected/regex.linux.utf8.out
new file mode 100644
index 0000000000000000000000000000000000000000..7c170a99f3d19534ae24dd25788c8b50784c2426
--- /dev/null
+++ b/src/test/regress/expected/regex.linux.utf8.out
@@ -0,0 +1,164 @@
+/*
+ * This test is for Linux/glibc systems and others that implement proper
+ * locale classification of Unicode characters with high code values.
+ * It must be run in a database with UTF8 encoding and a Unicode-aware locale.
+ */
+SET client_encoding TO UTF8;
+--
+-- Test the "high colormap" logic with single characters and ranges that
+-- exceed the MAX_SIMPLE_CHR cutoff, here assumed to be less than U+2000.
+--
+-- trivial cases:
+SELECT 'aⓐ' ~ U&'a\24D0' AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 'aⓐ' ~ U&'a\24D1' AS f;
+ f 
+---
+ f
+(1 row)
+
+SELECT 'aⓕ' ~ 'a[ⓐ-ⓩ]' AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 'aⒻ' ~ 'a[ⓐ-ⓩ]' AS f;
+ f 
+---
+ f
+(1 row)
+
+-- cases requiring splitting of ranges:
+SELECT 'aⓕⓕ' ~ 'aⓕ[ⓐ-ⓩ]' AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 'aⓕⓐ' ~ 'aⓕ[ⓐ-ⓩ]' AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 'aⓐⓕ' ~ 'aⓕ[ⓐ-ⓩ]' AS f;
+ f 
+---
+ f
+(1 row)
+
+SELECT 'aⓕⓕ' ~ 'a[ⓐ-ⓩ]ⓕ' AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 'aⓕⓐ' ~ 'a[ⓐ-ⓩ]ⓕ' AS f;
+ f 
+---
+ f
+(1 row)
+
+SELECT 'aⓐⓕ' ~ 'a[ⓐ-ⓩ]ⓕ' AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 'aⒶⓜ' ~ 'a[Ⓐ-ⓜ][ⓜ-ⓩ]' AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 'aⓜⓜ' ~ 'a[Ⓐ-ⓜ][ⓜ-ⓩ]' AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 'aⓜⓩ' ~ 'a[Ⓐ-ⓜ][ⓜ-ⓩ]' AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 'aⓩⓩ' ~ 'a[Ⓐ-ⓜ][ⓜ-ⓩ]' AS f;
+ f 
+---
+ f
+(1 row)
+
+SELECT 'aⓜ⓪' ~ 'a[Ⓐ-ⓜ][ⓜ-ⓩ]' AS f;
+ f 
+---
+ f
+(1 row)
+
+SELECT 'a0' ~ 'a[a-ⓩ]' AS f;
+ f 
+---
+ f
+(1 row)
+
+SELECT 'aq' ~ 'a[a-ⓩ]' AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 'aⓜ' ~ 'a[a-ⓩ]' AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 'a⓪' ~ 'a[a-ⓩ]' AS f;
+ f 
+---
+ f
+(1 row)
+
+-- Locale-dependent character classes
+SELECT 'aⒶⓜ⓪' ~ '[[:alpha:]][[:alpha:]][[:alpha:]][[:graph:]]' AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 'aⒶⓜ⓪' ~ '[[:alpha:]][[:alpha:]][[:alpha:]][[:alpha:]]' AS f;
+ f 
+---
+ f
+(1 row)
+
+-- Locale-dependent character classes with high ranges
+SELECT 'aⒶⓜ⓪' ~ '[a-z][[:alpha:]][ⓐ-ⓩ][[:graph:]]' AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 'aⓜⒶ⓪' ~ '[a-z][[:alpha:]][ⓐ-ⓩ][[:graph:]]' AS f;
+ f 
+---
+ f
+(1 row)
+
+SELECT 'aⓜⒶ⓪' ~ '[a-z][ⓐ-ⓩ][[:alpha:]][[:graph:]]' AS t;
+ t 
+---
+ t
+(1 row)
+
+SELECT 'aⒶⓜ⓪' ~ '[a-z][ⓐ-ⓩ][[:alpha:]][[:graph:]]' AS f;
+ f 
+---
+ f
+(1 row)
+
diff --git a/src/test/regress/sql/regex.linux.utf8.sql b/src/test/regress/sql/regex.linux.utf8.sql
new file mode 100644
index 0000000000000000000000000000000000000000..457781164525ea6f90fcb0c130c4bf55c7e51d2a
--- /dev/null
+++ b/src/test/regress/sql/regex.linux.utf8.sql
@@ -0,0 +1,46 @@
+/*
+ * This test is for Linux/glibc systems and others that implement proper
+ * locale classification of Unicode characters with high code values.
+ * It must be run in a database with UTF8 encoding and a Unicode-aware locale.
+ */
+
+SET client_encoding TO UTF8;
+
+--
+-- Test the "high colormap" logic with single characters and ranges that
+-- exceed the MAX_SIMPLE_CHR cutoff, here assumed to be less than U+2000.
+--
+
+-- trivial cases:
+SELECT 'aⓐ' ~ U&'a\24D0' AS t;
+SELECT 'aⓐ' ~ U&'a\24D1' AS f;
+SELECT 'aⓕ' ~ 'a[ⓐ-ⓩ]' AS t;
+SELECT 'aⒻ' ~ 'a[ⓐ-ⓩ]' AS f;
+-- cases requiring splitting of ranges:
+SELECT 'aⓕⓕ' ~ 'aⓕ[ⓐ-ⓩ]' AS t;
+SELECT 'aⓕⓐ' ~ 'aⓕ[ⓐ-ⓩ]' AS t;
+SELECT 'aⓐⓕ' ~ 'aⓕ[ⓐ-ⓩ]' AS f;
+SELECT 'aⓕⓕ' ~ 'a[ⓐ-ⓩ]ⓕ' AS t;
+SELECT 'aⓕⓐ' ~ 'a[ⓐ-ⓩ]ⓕ' AS f;
+SELECT 'aⓐⓕ' ~ 'a[ⓐ-ⓩ]ⓕ' AS t;
+SELECT 'aⒶⓜ' ~ 'a[Ⓐ-ⓜ][ⓜ-ⓩ]' AS t;
+SELECT 'aⓜⓜ' ~ 'a[Ⓐ-ⓜ][ⓜ-ⓩ]' AS t;
+SELECT 'aⓜⓩ' ~ 'a[Ⓐ-ⓜ][ⓜ-ⓩ]' AS t;
+SELECT 'aⓩⓩ' ~ 'a[Ⓐ-ⓜ][ⓜ-ⓩ]' AS f;
+SELECT 'aⓜ⓪' ~ 'a[Ⓐ-ⓜ][ⓜ-ⓩ]' AS f;
+SELECT 'a0' ~ 'a[a-ⓩ]' AS f;
+SELECT 'aq' ~ 'a[a-ⓩ]' AS t;
+SELECT 'aⓜ' ~ 'a[a-ⓩ]' AS t;
+SELECT 'a⓪' ~ 'a[a-ⓩ]' AS f;
+
+-- Locale-dependent character classes
+
+SELECT 'aⒶⓜ⓪' ~ '[[:alpha:]][[:alpha:]][[:alpha:]][[:graph:]]' AS t;
+SELECT 'aⒶⓜ⓪' ~ '[[:alpha:]][[:alpha:]][[:alpha:]][[:alpha:]]' AS f;
+
+-- Locale-dependent character classes with high ranges
+
+SELECT 'aⒶⓜ⓪' ~ '[a-z][[:alpha:]][ⓐ-ⓩ][[:graph:]]' AS t;
+SELECT 'aⓜⒶ⓪' ~ '[a-z][[:alpha:]][ⓐ-ⓩ][[:graph:]]' AS f;
+SELECT 'aⓜⒶ⓪' ~ '[a-z][ⓐ-ⓩ][[:alpha:]][[:graph:]]' AS t;
+SELECT 'aⒶⓜ⓪' ~ '[a-z][ⓐ-ⓩ][[:alpha:]][[:graph:]]' AS f;