Revise generation of hashjoin paths: generate one path per
hashjoinable clause, not one path for a randomly-chosen element of each set of clauses with the same join operator. That is, if you wrote SELECT ... WHERE t1.f1 = t2.f2 and t1.f3 = t2.f4, and both '=' ops were the same opcode (say, all four fields are int4), then the system would either consider hashing on f1=f2 or on f3=f4, but it would *not* consider both possibilities. Boo hiss. Also, revise estimation of hashjoin costs to include a penalty when the inner join var has a high disbursion --- ie, the most common value is pretty common. This tends to lead to badly skewed hash bucket occupancy and way more comparisons than you'd expect on average. I imagine that the cost calculation still needs tweaking, but at least it generates a more reasonable plan than before on George Young's example.
Showing
- src/backend/optimizer/path/costsize.c 47 additions, 29 deletionssrc/backend/optimizer/path/costsize.c
- src/backend/optimizer/path/joinpath.c 134 additions, 70 deletionssrc/backend/optimizer/path/joinpath.c
- src/backend/optimizer/util/pathnode.c 11 additions, 10 deletionssrc/backend/optimizer/util/pathnode.c
- src/include/optimizer/cost.h 5 additions, 5 deletionssrc/include/optimizer/cost.h
- src/include/optimizer/pathnode.h 4 additions, 4 deletionssrc/include/optimizer/pathnode.h
Loading
Please register or sign in to comment