This commit replaces the `Name(pub String)` struct with a `Name` enum that
explicitly models how the name appeared in the source either as an
unquoted identifier (`Ident`) or a quoted string (`Quoted`).
In the process, the separate `Id` wrapper type has been coalesced into the
`Name` enum, simplifying the AST and reducing duplication in identifier
handling logic.
While this increases the size of some AST nodes (notably `yyStackEntry`),
it improves correctness and makes source structure more explicit for
later phases.
This PR adds support for the instruction `IntegrityCk` which performs an
integrity check on the contents of a single table. Next PR I will try to
implement the rest of the integrity check where we would check indexes
containt correct amount of data and some more.
<img width="1151" alt="image" src="https://github.com/user-
attachments/assets/29d54148-55ba-480f-b972-e38587f0a483" />
Closes#1719
This PR is a Drop-In replacement to the Predicate defined in the
Simulator. Predicate is basically the same as our ast::Expr, but it
supports a small number of the SQL expression syntax. By creating a
NewType that wraps ast::Expr we can tap into our already mostly
correctly defined parser structs. This change will enable us to easily
add generation for more types of sql queries.
I also added an ArbitraryFrom impl for ast::Expr that can be used in a
freestyle way (for now) for differential testing.
This PR also aims to implement Unary Operator logic similar to the
Binary Operator logic we have for predicate. After this change we may
need to adjust the Logic for how some assertions are triggered.
<s>Sometimes the `Select-Select-Optimizer` property thinks that these
two queries should return the same thing:
```sql
SELECT (twinkling_winstanley.sensible_federations > x'66616e7461737469625e0f37879823db' AND twinkling_winstanley.sincere_niemeyer < -7428368947470022783) FROM twinkling_winstanley WHERE 1;
SELECT * FROM twinkling_winstanley WHERE twinkling_winstanley.sensible_federations > x'66616e7461737469625e0f37879823db' AND twinkling_winstanley.sincere_niemeyer < -7428368947470022783;
```
However after running the shrunk plan manually, the simulator was
incorrect in asserting that. Maybe this a bug a in the generation of
such a query? Not sure yet. </s>
<b>EDIT: The simulator was correctly catching a bug and I thought I was
the problem. The bug was in `exec_if` and I fixed it in this PR.</b>
I still need to expand the Unary Operator generation to other types of
predicates. For now, I just implemented it for `SimplePredicate` as I'm
trying to avoid to bloat even more this PR.
<b>EDIT: I decided to just have one PR open for all the changes I'm
making to make my life a bit easier and to avoid merge conflicts with my
own branches that I keep spawning for new code.</b>
PS: This should only be considered for merging after
https://github.com/tursodatabase/limbo/pull/1619 is merged. Then, I will
remove the draft status from this PR.
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Closes#1674
This commit introduces AUTOVACUUM to Limbo. It introduces the concept of ptrmap pages and also adds some additional instructions that are required to make AUTOVACUUM PRAGMA work
Currently we have this:
program.alloc_cursor_id(Option<String>, CursorType)`
where the String is the table's name or alias ('users' or 'u' in
the query).
This is problematic because this can happen:
`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`
There are two cursors, both with identifier 't'. This causes a bug
where the program will use the same cursor for both the main query
and the subquery, since they are keyed by 't'.
Instead introduce `CursorKey`, which is a combination of:
1. `TableInternalId`, and
2. index name (Option<String> -- in case of index cursors.
This should provide key uniqueness for cursors:
`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`
here the first 't' will have a different `TableInternalId` than the
second `t`, so there is no clash.
Currently our "table id"/"table no"/"table idx" references always
use the direct index of the `TableReference` in the plan, e.g. in
`SelectPlan::table_references`. For example:
```rust
Expr::Column { table: 0, column: 3, .. }
```
refers to the 0'th table in the `table_references` list.
This is a fragile approach because it assumes the table_references
list is stable for the lifetime of the query processing. This has so
far been the case, but there exist certain query transformations,
e.g. subquery unnesting, that may fold new table references from
a subquery (which has its own table ref list) into the table reference
list of the parent.
If such a transformation is made, then potentially all of the Expr::Column
references to tables will become invalid. Consider this example:
```sql
-- Assume tables: users(id, age), orders(user_id, amount)
-- Get total amount spent per user on orders over $100
SELECT u.id, sub.total
FROM users u JOIN
(SELECT user_id, SUM(amount) as total
FROM orders o
WHERE o.amount > 100
GROUP BY o.user_id) sub
WHERE u.id = sub.user_id
-- Before subquery unnesting:
-- Main query table_references: [users, sub]
-- u.id refers to table 0, column 0
-- sub.total refers to table 1, column 1
--
-- Subquery table_references: [orders]
-- o.user_id refers to table 0, column 0
-- o.amount refers to table 0, column 1
--
-- After unnesting and folding subquery tables into main query,
-- the query might look like this:
SELECT u.id, SUM(o.amount) as total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100
GROUP BY u.id;
-- Main query table_references: [users, orders]
-- u.id refers to table index 0 (correct)
-- o.amount refers to table index 0 (incorrect, should be 1)
-- o.user_id refers to table index 0 (incorrect, should be 1)
```
We could ofc traverse every expression in the subquery and rewrite
the table indexes to be correct, but if we instead use stable identifiers
for each table reference, then all the column references will continue
to be correct.
Hence, this PR introduces a `TableInternalId` used in `TableReference`
as well as `Expr::Column` and `Expr::Rowid` so that this kind of query
transformations can happen with less pain.
easy implementation, sqlite claims it is a noop now
"This pragma no longer functions. It has become a no-op. The capabilities
formerly provided by PRAGMA legacy_file_format are now available using
the SQLITE_DBCONFIG_LEGACY_FILE_FORMAT option to the sqlite3_db_config()
C-language interface."