This change connects virtual tables with the query optimizer. The optimizer now considers virtual tables during join order search and invokes their best_index callbacks to determine feasible access paths. Currently, this is not a visible change, since none of the existing extensions return information indicating that a plan is invalid.
8.5 KiB
Overview of the current state of the query optimizer in Limbo
Query optimization is obviously an important part of any SQL-based database engine. This document is an overview of what we currently do.
Structure of the optimizer directory
-
mod.rs- Provides the high-level optimization interface through
optimize_plan()
- Provides the high-level optimization interface through
-
access_method.rs- Determines what is the best index to use when joining a table to a set of preceding tables
-
constraints.rs- Manages query constraints:- Extracts constraints from the WHERE clause
- Determines which constraints are usable with indexes
-
cost.rs- Calculates the cost of doing a seek vs a scan, for example
-
join.rs- Implements the System R style dynamic programming join ordering algorithm
-
order.rs- Determines if sort operations can be eliminated based on the chosen access methods and join order
Join reordering and optimal index selection
The goals of query optimization are at least the following:
- Do as little page I/O as possible
- Do as little CPU work as possible
- Retain query correctness.
The most important ways to achieve no. 1 and no. 2 are:
- Choose the optimal access method for each table (e.g. an index or a rowid-based seek, or a full table scan if all else fails).
- Choose the best or near-best way to reorder the tables in the query so that those optimal access methods can be used.
- Also factor in whether the chosen join order and indexes allow removal of any sort operations that are necessary for query correctness.
Limbo's optimizer
Limbo's optimizer is an implementation of an extremely traditional IBM System R style optimizer, i.e. straight from the 70s! The DP algorithm is explained below.
Current high level flow of the optimizer
- SQL rewriting
- Rewrite certain SQL expressions to another form (not a lot currently; e.g. rewrite BETWEEN as two comparisons)
- Eliminate constant conditions: e.g.
WHERE 1is removed,WHERE 0short-circuits the whole query because it is trivially false.
- Check whether there is an "interesting order" that we should consider when evaluating indexes and join orders
- Is there a GROUP BY? an ORDER BY? Both?
- Convert WHERE clause conjucts to Constraints
- E.g. in
WHERE t.x = 5, the expression5constrains tabletto values ofxthat are exactly5. - E.g. in
Where t.x = u.x, the expressionu.xconstrainst, ANDt.xconstrainsu. - Per table, each constraint has an estimated selectivity (how much it filters the result set); this affects join order calculations, see the paragraph on Estimation below.
- Per table, constraints are also analyzed for whether one or multiple of them can be used as an index seek key to avoid a full scan.
- E.g. in
- Compute the best join order using a dynamic programming algorithm:
n= number of tables consideredn=1: find the lowest cost way to access each single table, given the constraints of the query. Memoize the result.n=2: for each table found in then=1step, find the best way to join that table with each other table. Memoize the result.n=3: for each 2-table subset found, find the best way to join that result to each other table. Memoize the result.n=m: for eachm-1table subset found, find the best way to join that result to them'thtable- Use pruning to reduce search space:
- Compute the literal query order first, and store its cost as an upper threshold. In some cases it is not possible to compute this upper threshold from the literal order—for example, when table-valued functions are involved and their arguments reference tables that appear to the right in the join order. In such situations, the literal order cannot be executed directly, so no meaningful cost can be assigned. In these cases, the threshold is set to infinity, ensuring that valid plans are still considered.
- If at any point a considered join order exceeds the upper threshold, discard that search path since it cannot be better than the current best.
- For example, we have
SELECT * FROM a JOIN b JOIN c JOIN d. ComputeJOIN(a,b,c,d)first. IfJOIN (b,a)is already worse thanJOIN(a,b,c,d), we don't have to even tryJOIN(b,a,c).
- For example, we have
- Also keep track of the best plan per subset:
- If we find that
JOIN(b,a,c)is better than any other permutation of the same tables, e.g.JOIN(a,b,c), then we can discard ALL of the other permutations for that subset. For example, we don't need to considerJOIN(a,b,c,d)because we know it's worse thanJOIN(b,a,c,d). - This is possible due to the associativity and commutativity of INNER JOINs.
- If we find that
- Also keep track of the best ordered plan , i.e. one that provides the "interesting order" mentioned above.
- At the end, apply a cost penalty to the best overall plan
- If it is now worse than the best sorted plan, then choose the sorted plan as the best plan for the query.
- This allows us to eliminate a sorting operation.
- If the best overall plan is still best even with the sorting penalty, then keep it. A sorting operation is later applied to sort the rows according to the desired order.
- If it is now worse than the best sorted plan, then choose the sorted plan as the best plan for the query.
- Mutate the plan's
join_orderandOperations to match the computed best plan.
Estimation of cost and cardinalities + a note on table statistics
Currently, in the absence of ANALYZE, sqlite_stat1 etc. we assume the following:
- Each table has
1,000,000rows. - Each equality (
=) filter will filter out some percentage of the result set. - Each nonequality (e.g.
>) will filter out some smaller percentage of the result set. - Each
4096byte database page holds50rows, i.e. roughly80bytes per row - Sort operations have some CPU cost dependent on the number of input rows to the sort operation.
From the above, we derive the following formula for estimating the cost of joining t1 with t2
JOIN_COST = PAGE_IO(t1.rows) + t1.rows * PAGE_IO(t2.rows)
For example, let's take the query SELECT * FROM t1 JOIN t2 USING(foo) WHERE t2.foo > 10. Let's assume the following:
t1has6400rows andt2has8000rows- there are no indexes at all
- let's ignore the CPU cost from the equation for simplicity.
The best access method for both is a full table scan. The output cardinality of t1 is the full table, because nothing is filtering it. Hence, the cost of t1 JOIN t2 becomes:
JOIN_COST = PAGE_IO(t1.input_rows) + t1.output_rows * PAGE_IO(t2.input_rows)
// plugging in the values:
JOIN_COST = PAGE_IO(6400) + 6400 * PAGE_IO(8000)
JOIN_COST = 80 + 6400 * 100 = 640080
Now let's consider t2 JOIN t1. The best access method for both is still a full scan, but since we can filter on t2.foo > 10, its output cardinality decreases. Let's assume only 1/4 of the rows of t2 match the condition t2.foo > 10. Hence, the cost of t2 join t1 becomes:
JOIN_COST = PAGE_IO(t2.input_rows) + t2.output_rows * PAGE_IO(t1.input_rows)
// plugging in the values:
JOIN_COST = PAGE_IO(8000) + 1/4 * 8000 * PAGE_IO(6400)
JOIN_COST = 100 + 2000 * 80 = 160100
Even though t2 is a larger table, because we were able to reduce the input set to the join operation, it's dramatically cheaper.
Statistics
Since we don't support ANALYZE, nor can we assume that users will call ANALYZE anyway, we use simple magic constants to estimate the selectivity of join predicates, row count of tables, and so on. When we have support for ANALYZE, we should plug the statistics from sqlite_stat1 and friends into the optimizer to make more informed decisions.
Estimating the output cardinality of a join
The output cardinality (output row count) of an operation is as follows:
OUTPUT_CARDINALITY_JOIN = INPUT_CARDINALITY_RHS * OUTPUT_CARDINALITY_RHS
where
INPUT_CARDINALITY_RHS = OUTPUT_CARDINALITY_LHS
example:
SELECT * FROM products p JOIN order_lines o ON p.id = o.product_id
Assuming there are 100 products, i.e. just selecting all products would yield 100 rows:
OUTPUT_CARDINALITY_LHS = 100
INPUT_CARDINALITY_RHS = 100
Assuming p.id = o.product_id will return three orders per each product:
OUTPUT_CARDINALITY_RHS = 3
OUTPUT_CARDINALITY_JOIN = 100 * 3 = 300
i.e. the join is estimated to return 300 rows, 3 for each product.
Again, in the absence of statistics, we use magic constants to estimate these cardinalities.
Estimating them is important because in multi-way joins the output cardinality of the previous join becomes the input cardinality of the next one.