Commits · 4509b4eb188beeea5c74a52f238127d323093113 · Jakob Huber / postgres-lambda-diff

May 08, 2017

Further patch rangetypes_selfuncs.c's statistics slot management. · 4509b4eb

Tom Lane authored 7 years ago

Values in a STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM slot are float8,
not of the type of the column the statistics are for.

This bug is at least partly the fault of sloppy specification comments
for get_attstatsslot()/free_attstatsslot(): the type OID they want is that
of the stavalues entries, not of the underlying column.  (I double-checked
other callers and they seem to get this right.)  Adjust the comments to be
more correct.

Per buildfarm.

Security: CVE-2017-7484

4509b4eb

Aug 08, 2016

Fix misestimation of n_distinct for a nearly-unique column with many nulls. · cb5c1498

Tom Lane authored 8 years ago

If ANALYZE found no repeated non-null entries in its sample, it set the
column's stadistinct value to -1.0, intending to indicate that the entries
are all distinct. But what this value actually means is that the number
of distinct values is 100% of the table's rowcount, and thus it was
overestimating the number of distinct values by however many nulls there
are. This could lead to very poor selectivity estimates, as for example
in a recent report from Andreas Joseph Krogh. We should discount the
stadistinct value by whatever we've estimated the nulls fraction to be.
(That is what will happen if we choose to use a negative stadistinct for
a column that does have repeated entries, so this code path was just
inconsistent.)

In addition to fixing the stadistinct entries stored by several different
ANALYZE code paths, adjust the logic where get_variable_numdistinct()
forces an "all distinct" estimate on the basis of finding a relevant unique
index. Unique indexes don't reject nulls, so there's no reason to assume
that the null fraction doesn't apply.

Back-patch to all supported branches. Back-patching is a bit of a judgment
call, but this problem seems to affect only a few users (else we'd have
identified it long ago), and it's bad enough when it does happen that
destabilizing plan choices in a worse direction seems unlikely.

Patch by me, with documentation wording suggested by Dean Rasheed

Report: <VisenaEmail.26.df42f82acae38a58.156463942b8@tc7-visena>
Discussion: <16143.1470350371@sss.pgh.pa.us>

cb5c1498

Jan 06, 2015
- Update copyright for 2015 · 4baaf863
  Bruce Momjian authored 10 years ago
  
  Backpatch certain files through 9.0
  4baaf863
May 06, 2014

pgindent run for 9.4 · 0a783200

Bruce Momjian authored 10 years ago

This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.

0a783200

Jan 07, 2014

Update copyright for 2014 · 7e04792a

Bruce Momjian authored 11 years ago

Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.

7e04792a

May 29, 2013

pgindent run for release 9.3 · 9af4159f

Bruce Momjian authored 11 years ago

This is the first run of the Perl-based pgindent script.  Also update
pgindent instructions.

9af4159f

Mar 14, 2013

Add cost estimation of range @> and <@ operators. · 59d0bf9d

Heikki Linnakangas authored 12 years ago

The estimates are based on the existing lower bound histogram, and a new
histogram of range lengths.

Bump catversion, because the range length histogram now needs to be present
in statistic slot kind 6, or you get an error on @> and <@ queries. (A
re-ANALYZE would be enough to fix that, though)

Alexander Korotkov, with some refactoring by me.

59d0bf9d

Jan 01, 2013

Update copyrights for 2013 · bd61a623

Bruce Momjian authored 12 years ago

Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.

bd61a623

Aug 27, 2012

Collect and use histograms of lower and upper bounds for range types. · 918eee0c

Heikki Linnakangas authored 12 years ago

This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.

The fraction of empty ranges is also calculated and used in estimation.

Alexander Korotkov, heavily modified by me.

918eee0c

Jun 25, 2012

Replace int2/int4 in C code with int16/int32 · b8b2e3b2

Peter Eisentraut authored 12 years ago

The latter was already the dominant use, and it's preferable because
in C the convention is that intXX means XX bits. Therefore, allowing
mixed use of int2, int4, int8, int16, int32 is obviously confusing.

Remove the typedefs for int2 and int4 for now. They don't seem to be
widely used outside of the PostgreSQL source tree, and the few uses
can probably be cleaned up by the time this ships.

b8b2e3b2

Jun 10, 2012
- Run pgindent on 9.2 source tree in preparation for first 9.3 · 927d61ee
  Bruce Momjian authored 12 years ago
  
  commit-fest.
  927d61ee
Mar 04, 2012

Collect and use element-frequency statistics for arrays. · 0e5e167a

Tom Lane authored 13 years ago

This patch improves selectivity estimation for the array <@, &&, and @>
(containment and overlaps) operators.  It enables collection of statistics
about individual array element values by ANALYZE, and introduces
operator-specific estimators that use these stats.  In addition,
ScalarArrayOpExpr constructs of the forms "const = ANY/ALL (array_column)"
and "const <> ANY/ALL (array_column)" are estimated by treating them as
variants of the containment operators.

Since we still collect scalar-style stats about the array values as a
whole, the pg_stats view is expanded to show both these stats and the
array-style stats in separate columns.  This creates an incompatible change
in how stats for tsvector columns are displayed in pg_stats: the stats
about lexemes are now displayed in the array-related columns instead of the
original scalar-related columns.

There are a few loose ends here, notably that it'd be nice to be able to
suppress either the scalar-style stats or the array-element stats for
columns for which they're not useful.  But the patch is in good enough
shape to commit for wider testing.

Alexander Korotkov, reviewed by Noah Misch and Nathan Boley

0e5e167a

Jan 27, 2012

Hide most variable-length fields from Form_pg_* structs · 8137f2c3

Peter Eisentraut authored 13 years ago

Those fields only appear in the structs so that genbki.pl can create
the BKI bootstrap files for the catalogs.  But they are not actually
usable from C.  So hiding them can prevent coding mistakes, saves
stack space, and can help the compiler.

In certain catalogs, the first variable-length field has been kept
visible after manual inspection.  These exceptions are noted in C
comments.

reviewed by Tom Lane

8137f2c3

Jan 02, 2012
- Update copyright notices for year 2012. · e126958c
  Bruce Momjian authored 13 years ago
  
  e126958c
Feb 18, 2011

Fix tsmatchsel() to account properly for null rows. · 52b60530

Tom Lane authored 14 years ago

ts_typanalyze.c computes MCE statistics as fractions of the non-null rows,
which seems fairly reasonable, and anyway changing it in released versions
wouldn't be a good idea.  But then ts_selfuncs.c has to account for that.
Failure to do so results in overestimates in columns with a significant
fraction of null documents.  Back-patch to 8.4 where this stuff was
introduced.

Jesper Krogh

52b60530

Jan 01, 2011
- Stamp copyrights for year 2011. · 5d950e3b
  Bruce Momjian authored 14 years ago
  
  5d950e3b
Sep 20, 2010
- Remove cvs keywords from all files. · 9f2e2113
  Magnus Hagander authored 14 years ago
  
  9f2e2113
Jan 05, 2010

Get rid of the need for manual maintenance of the initial contents of · 64737e93

Tom Lane authored 15 years ago

pg_attribute, by having genbki.pl derive the information from the various
catalog header files. This greatly simplifies modification of the
"bootstrapped" catalogs.

This patch finally kills genbki.sh and Gen_fmgrtab.sh; we now rely entirely on
Perl scripts for those build steps. To avoid creating a Perl build dependency
where there was not one before, the output files generated by these scripts
are now treated as distprep targets, ie, they will be built and shipped in
tarballs. But you will need a reasonably modern Perl (probably at least
5.6) if you want to build from a CVS pull.

The changes to the MSVC build process are untested, and may well break ---
we'll soon find out from the buildfarm.

John Naylor, based on ideas from Robert Haas and others

64737e93

Jan 02, 2010
- Update copyright for the year 2010. · 02398008
  Bruce Momjian authored 15 years ago
  
  02398008
Dec 29, 2009

Add the ability to store inheritance-tree statistics in pg_statistic, · 649b5ec7

Tom Lane authored 15 years ago

and teach ANALYZE to compute such stats for tables that have subclasses.
Per my proposal of yesterday.

autovacuum still needs to be taught about running ANALYZE on parent tables
when their subclasses change, but the feature is useful even without that.

649b5ec7

Jun 11, 2009
- 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list · d7471402
  Bruce Momjian authored 15 years ago
  
  provided by Andrew.
  d7471402
Jan 01, 2009
- Update copyright for 2009. · 511db38a
  Bruce Momjian authored 16 years ago
  
  511db38a
Sep 19, 2008
- Create a selectivity estimation function for the text search @@ operator. · 4e57668d
  Tom Lane authored 16 years ago
  
  Jan Urbanski
  4e57668d
Jul 14, 2008

Create a type-specific typanalyze routine for tsvector, which collects stats · 6f6d8632

Tom Lane authored 16 years ago

on the most common individual lexemes in place of the mostly-useless default
behavior of counting duplicate tsvectors.  Future work: create selectivity
estimation functions that actually do something with these stats.

(Some other things we ought to look at doing: using the Lossy Counting
algorithm in compute_minimal_stats, and using the element-counting idea for
stats on regular arrays.)

Jan Urbanski

6f6d8632

Mar 27, 2008

Reduce the need for frontend programs to include "postgres.h" by refactoring · 039dfbfd

Tom Lane authored 16 years ago

inclusions in src/include/catalog/*.h files.  The main idea here is to push
function declarations for src/backend/catalog/*.c files into separate headers,
rather than sticking them into the corresponding catalog definition file as
has been done in the past.  This commit only carries out that idea fully for
pg_proc, pg_type and pg_conversion, but that's enough for the moment ---
if pg_list.h ever becomes unsafe for frontend code to include, we'll need
to work a bit more.

Zdenek Kotala

039dfbfd

Jan 01, 2008
- Update copyrights in source tree to 2008. · 9098ab9e
  Bruce Momjian authored 17 years ago
  
  9098ab9e
May 08, 2007
- Reserve some pg_statistic "kind" codes for use by the ESRI ST_Geometry · 5b7cf08d
  Tom Lane authored 17 years ago
  
  datatype project. Per request from Ale Raza (araza at esri.com).
  5b7cf08d
Jan 05, 2007
- Update CVS HEAD for 2007 copyright. Back branches are typically not · 29dccf5f
  Bruce Momjian authored 18 years ago
  
  back-stamped for this.
  29dccf5f
Mar 05, 2006
- Update copyright for 2006. Update scripts. · f2f5b056
  Bruce Momjian authored 19 years ago
  
  f2f5b056
Oct 15, 2005
- Standard pgindent run for 8.1. · 1dc34982
  Bruce Momjian authored 19 years ago
  
  1dc34982
Apr 14, 2005

First phase of project to use fixed OIDs for all system catalogs and · 7c13781e

Tom Lane authored 19 years ago

indexes.  Extend the macros in include/catalog/*.h to carry the info
about hand-assigned OIDs, and adjust the genbki script and bootstrap
code to make the relations actually get those OIDs.  Remove the small
number of RelOid_pg_foo macros that we had in favor of a complete
set named like the catname.h and indexing.h macros.  Next phase will
get rid of internal use of names for looking up catalogs and indexes;
but this completes the changes forcing an initdb, so it looks like a
good place to commit.
Along the way, I made the shared relations (pg_database etc) not be
'bootstrap' relations any more, so as to reduce the number of hardwired
entries and simplify changing those relations in future.  I'm not
sure whether they ever really needed to be handled as bootstrap
relations, but it seems to work fine to not do so now.

7c13781e

Dec 31, 2004

· 2ff50159

PostgreSQL Daemon authored 20 years ago

Tag appropriate files for rc3

Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...

2ff50159

Aug 29, 2004
- Pgindent run for 8.0. · b6b71b85
  Bruce Momjian authored 20 years ago
  
  b6b71b85
- Update copyright to 2004. · da9a8649
  Bruce Momjian authored 20 years ago
  
  da9a8649
Feb 24, 2004
- Fix obsolete comment. · 3f2cf812
  Tom Lane authored 21 years ago
  
  3f2cf812
Feb 13, 2004
- Add hooks for type-specific calculation of ANALYZE statistics. Idea and · 69946411
  Tom Lane authored 21 years ago
  
  coding by Mark Cave-Ayland, some kibitzing by Tom Lane. initdb forced due to new column in pg_type.
  69946411
Nov 29, 2003

· 55b11325

PostgreSQL Daemon authored 21 years ago

make sure the $Id tags are converted to $PostgreSQL as well ...

55b11325

Aug 04, 2003
- Update copyrights to 2003. · f3c3deb7
  Bruce Momjian authored 21 years ago
  
  f3c3deb7
- pgindent run. · 089003fb
  Bruce Momjian authored 21 years ago
  
  089003fb
Mar 23, 2003

Instead of storing pg_statistic stavalues entries as text strings, store · 8d9e025e

Tom Lane authored 21 years ago

them as arrays of the internal datatype. This requires treating the
stavalues columns as 'anyarray' rather than 'text[]', which is not 100%
kosher but seems to work fine for the purposes we need for pg_statistic.
Perhaps in the future 'anyarray' will be allowed more generally.

8d9e025e