This PR fixes an unsound usage of unsafe {
str::from_utf8_unchecked(word) } in the public function keyword_token in
mod.rs.
The function now uses std::str::from_utf8(word).ok()? to safely handle
invalid UTF-8, eliminating the unsoundness.
No logic or API changes.
Code compiles and tests pass (where possible).
Closes: https://github.com/tursodatabase/libsql/issues/1859
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#1677
This PR adds the beginnings of
[AUTOVACUUM](https://www.sqlite.org/lang_vacuum.html) to Limbo. It adds
a feature flag called `omit_autovacuum` which is analogous to
`SQLITE_OMIT_AUTOVACUUM`. It is off by default, same as SQLite.
It introduces the concept of [pointer map pages](https://www.sqlite.org/
fileformat.html#pointer_map_or_ptrmap_pages) which are reverse index
pages used to map pages to their parents. This is used to swap pages
(when a table is deleted for instance) to keep root pages clustered at
the beginning of the file. It's also used while creating a table to
ensure that root pages are clustered at the beginning (although, this
isn't completely implemented yet)
Finally, it also adds a couple of missing instructions like `Int64` that
are required for `PRAGMA` commands related to `auto_vacuum` settings
<img width="1512" alt="Screenshot 2025-05-28 at 8 47 51 PM"
src="https://github.com/user-
attachments/assets/d52eb74f-5b79-4d52-9401-1bdc2dcc304d" />
Closes#1600
This commit introduces AUTOVACUUM to Limbo. It introduces the concept of ptrmap pages and also adds some additional instructions that are required to make AUTOVACUUM PRAGMA work
Currently we have this:
program.alloc_cursor_id(Option<String>, CursorType)`
where the String is the table's name or alias ('users' or 'u' in
the query).
This is problematic because this can happen:
`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`
There are two cursors, both with identifier 't'. This causes a bug
where the program will use the same cursor for both the main query
and the subquery, since they are keyed by 't'.
Instead introduce `CursorKey`, which is a combination of:
1. `TableInternalId`, and
2. index name (Option<String> -- in case of index cursors.
This should provide key uniqueness for cursors:
`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`
here the first 't' will have a different `TableInternalId` than the
second `t`, so there is no clash.
Currently our "table id"/"table no"/"table idx" references always
use the direct index of the `TableReference` in the plan, e.g. in
`SelectPlan::table_references`. For example:
```rust
Expr::Column { table: 0, column: 3, .. }
```
refers to the 0'th table in the `table_references` list.
This is a fragile approach because it assumes the table_references
list is stable for the lifetime of the query processing. This has so
far been the case, but there exist certain query transformations,
e.g. subquery unnesting, that may fold new table references from
a subquery (which has its own table ref list) into the table reference
list of the parent.
If such a transformation is made, then potentially all of the Expr::Column
references to tables will become invalid. Consider this example:
```sql
-- Assume tables: users(id, age), orders(user_id, amount)
-- Get total amount spent per user on orders over $100
SELECT u.id, sub.total
FROM users u JOIN
(SELECT user_id, SUM(amount) as total
FROM orders o
WHERE o.amount > 100
GROUP BY o.user_id) sub
WHERE u.id = sub.user_id
-- Before subquery unnesting:
-- Main query table_references: [users, sub]
-- u.id refers to table 0, column 0
-- sub.total refers to table 1, column 1
--
-- Subquery table_references: [orders]
-- o.user_id refers to table 0, column 0
-- o.amount refers to table 0, column 1
--
-- After unnesting and folding subquery tables into main query,
-- the query might look like this:
SELECT u.id, SUM(o.amount) as total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100
GROUP BY u.id;
-- Main query table_references: [users, orders]
-- u.id refers to table index 0 (correct)
-- o.amount refers to table index 0 (incorrect, should be 1)
-- o.user_id refers to table index 0 (incorrect, should be 1)
```
We could ofc traverse every expression in the subquery and rewrite
the table indexes to be correct, but if we instead use stable identifiers
for each table reference, then all the column references will continue
to be correct.
Hence, this PR introduces a `TableInternalId` used in `TableReference`
as well as `Expr::Column` and `Expr::Rowid` so that this kind of query
transformations can happen with less pain.