Closes#3470
## Background
In a query like `SELECT * FROM t LEFT JOIN s ON t.a=s.a WHERE s.a =
'foo'` we can remove the LEFT JOIN and replace it with an `INNER JOIN`
because NULL values will never be equal to 'foo'. Rewriting as `INNER
JOIN` allows the optimizer to also reorder the table join order to come
up with a more efficient query plan. In fact, we have this optimization
already.
## Problem
However, there is a dumb bug where `WhereTerm`s involving this join
still retain their `from_outer_join` state, resulting in forcing the
evaluation of those terms at the original join index, which results in
completely wrong bytecode if the join optimizer decides to reorder the
join as `s JOIN t` instead. Effectively it will evaluate `t.a=s.a` after
table `s` is open but table `t` is not open yet.
## Fix
This PR fixes that issue by clearing `from_outer_join` properly from the
relevant `WhereTerm`s.
Closes#3475
There were 2 problems:
1. The SELECT wasn't propagating which register it used for its results,
so sometimes the INSERT read bad data.
2. `TableReferences::contains_table` was only checking the top-level
tables, not the nested tables in FROM queries. This condition is used to
emit "template 4", the bytecode template for self-inserts.
Closes https://github.com/tursodatabase/turso/issues/3312
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#3436
**Handle table ID / rootpages properly for both checkpointed and non-
checkpointed tables**
Table ID is an opaque identifier that is only meaningful to the MV
store.
Each checkpointed MVCC table corresponds to a single B-tree on the
pager,
which naturally has a root page.
**We cannot use root page as the MVCC table ID directly because:**
- We assign table IDs during MVCC commit, but
- we commit pages to the pager only during checkpoint
which means the root page is not easily knowable ahead of time.
**Hence:**
- MVCC table ids are always negative
- sqlite_schema rows will have a negative rootpage column if the
table has not been checkpointed yet.
- on checkpoint when the table is allocated a real root page, we update
the row in sqlite_schema and in MV store's internal mapping
**On recovery:**
- All sqlite_schema tables are read directly from disk and assigned
`table_id = -1 * root_page` -- root_page on disk must be positive
- Logical log is deserialized and inserted into MV store
- Schema changes from logical_log are captured into the DB's global
schema
**Note about recovery:**
I changed MVCC recovery to happen on DB initialization which should
prevent any races, so no need for `recover_lock`, right @pereman2 ?
Closes#3419
Closes#2470
In a query like `SELECT * FROM t LEFT JOIN s ON t.a=s.a WHERE s.a = 'foo'` we can
remove the LEFT JOIN because NULL values will be equal to 'foo'. In fact, we have
this optimization already.
However, there was a dumb bug where `WhereTerm`s involving this join still retained
their `from_outer_join` state, resulting in forcing the evaluation of those terms
at the original join index, which results in completely wrong bytecode if the join
optimizer decides to reorder the join as `s JOIN t` instead. Effectively it will
evaluate `t.a=s.a` after table `s` is open but table `t` is not open yet.
This PR fixes that issue by clearing `from_outer_join` properly from the relevant
`WhereTerm`s.
This PR bundles 2 fixes:
1. Index search must skip NULL values
2. UPDATE must avoid using index which column is used in the SET clause
* This was an optimization to not do full scan in case of `UPDATE t
SET ... WHERE col = ?` but instead of doing this hacks we must properly
load updated row set to the ephemeral index and flush it after update
will be finished instead of modifying BTree inplace
* So, for now we completely remove this optimization and quitely
wait for proper optimization to land
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#3459
Table ID is an opaque identifier that is only meaningful to the MV store.
Each checkpointed MVCC table corresponds to a single B-tree on the pager,
which naturally has a root page.
We cannot use root page as the MVCC table ID directly because:
- We assign table IDs during MVCC commit, but
- we commit pages to the pager only during checkpoint
which means the root page is not easily knowable ahead of time.
Hence, we:
- store the mapping between table id and btree rootpage
- sqlite_schema rows will have a negative rootpage column if the
table has not been checkpointed yet.
This PR auto-assign ids for anonymous variables straight into parser.
Otherwise - it's pretty easy to mess up with traversal order in the core
code and assign ids incorrectly.
For example, before the fix, following code worked incorrectly because
parameter values were assigned first to conflict clause instead of
values:
```rs
let mut stmt = conn.prepare("INSERT INTO test VALUES (?, ?), (?, ?) ON CONFLICT DO UPDATE SET v = ?")?;
stmt.bind_at(1.try_into()?, Value::Integer(1));
stmt.bind_at(2.try_into()?, Value::Integer(20));
stmt.bind_at(3.try_into()?, Value::Integer(3));
stmt.bind_at(4.try_into()?, Value::Integer(40));
stmt.bind_at(5.try_into()?, Value::Integer(66));
```
Closes#3455
- even if index search will return only 1 row - it will call next in the loop - and we incorrecty can process same row values multiple times
- the following query failed with this optimization:
turso> CREATE TABLE t (id INTEGER PRIMARY KEY AUTOINCREMENT, k TEXT, c0 INT);
turso> CREATE UNIQUE INDEX idx_p1_0 ON t(c0);
turso> insert into t values (null, 'uu', -1);
turso> insert into t values (null, 'uu', -2);
turso> UPDATE t SET c0 = NULL WHERE c0 = -1;
turso> SELECT * FROM t
┌────┬────┬────┐
│ id │ k │ c0 │
├────┼────┼────┤
│ 1 │ uu │ │
├────┼────┼────┤
│ 2 │ uu │ │
└────┴────┴────┘
SQLite supports complex expressions in group by columns - because of
course it does...
So we need to make sure that a column is created for this expression if
it doesn't exist already, and compute it, the same way we compute pre-
projections in the filter operator.
Fixes#3363Fixes#3366Fixes#3365Closes#3429
MVCC does currently not support indexes. Therefore,
- Fail if a database with indexes is opened with MVCC
- Disallow `CREATE INDEX` when MVCC is enabled
Fixes: #3108Closes#3426
SQLite supports complex expressions in group by columns - because of
course it does...
So we need to make sure that a column is created for this expression if
it doesn't exist already, and compute it, the same way we compute
pre-projections in the filter operator.
Fixes#3363Fixes#3366Fixes#3365
MVCC is like the annoying younger cousin (I know because I was him) that
needs to be treated differently. MVCC requires us to use root_pages that
might not be allocated yet, and the plan is to use negative root_pages
for that case. Therefore, we need i64 in order to fit this change.
MVCC does currently not support indexes. Therefore,
- Fail if a database with indexes is opened with MVCC
- Disallow `CREATE INDEX` when MVCC is enabled
Fixes: #3108
This PR moves part of string normalization to the parser layer.
Now, we dequote and unescape values in the parser, but we still need to
lowercase them for proper ignore-case comparison logic in the planner.
The reason to not lowercase in the parser is following:
1. SQLite (and tursodb) have ident->string conversion rule and by
lowercasing value early we will loose original representation
2. Some things like column names are preserve the case right now and we
better to not change this behaviour.
Closes#3344
- before, we validated that condition during program emit - which works for fixed values of parameters but doesn't work with variables provided externally to the prepared statement
closes#3282
includes minor refactor, removing `column_is_rowid_alias`, which is only
checking the public field of the argument Column.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#3385