Commit Graph

741 Commits

Author SHA1 Message Date
Zaid Humayun
e994adfb40 Persisting database header and pointer map page to cache
This commit ensures that the metadata in the database header and the pointer map pages allocated are correctly persisted to the page cache. This was not being done earlier.
2025-06-06 23:14:25 +05:30
Zaid Humayun
1f5025541c addresses comment https://github.com/tursodatabase/limbo/pull/1600#discussion_r2115796655 by @jussisaurio
this commit ensures that ptrmap operations return a CursorResult so operation can be suspended & later retried
2025-06-06 23:14:25 +05:30
Zaid Humayun
5827a33517 Beginnings of AUTOVACUUM
This commit introduces AUTOVACUUM to Limbo. It introduces the concept of ptrmap pages and also adds some additional instructions that are required to make AUTOVACUUM PRAGMA work
2025-06-06 23:14:22 +05:30
Jussi Saurio
2087393d22 Merge 'Write database header via normal pager route' from meteorgan
Closes: #1613

Closes #1634
2025-06-04 09:39:14 +03:00
Pekka Enberg
c6ef19396d Merge 'Add support for pragma table-valued functions' from Piotr Rżysko
This PR adds support for table-valued functions for PRAGMAs (see the
[PRAGMA functions section](https://www.sqlite.org/pragma.html)).
Additionally, it introduces built-in table-valued functions. I
considered using extensions for this, but there are several reasons in
favor of a dedicated mechanism:
* It simplifies the use of internal functions, structs, etc. For
example, when implementing `json_each` and `json_tree`, direct access to
internals was necessary:
https://github.com/tursodatabase/limbo/pull/1088
* It avoids FFI overhead. [Benchmarks](https://github.com/piotrrzysko/li
mbo/blob/pragma_vtabs_bench/core/benches/pragma_benchmarks.rs) on my
hardware show that `pragma_table_info()` implemented as an extension is
2.5× slower than the built-in version.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1642
2025-06-04 09:08:10 +03:00
meteorgan
f2bf6251cd write database header via normal pager route 2025-06-03 22:06:08 +08:00
Jussi Saurio
c488c32d43 Merge 'Make cursor seek reentrant' from Pedro Muniz
Closes #1628.  Every function that calls `process_overflow_read` needs
to be reentrant. I did not change it here, but it would include
`get_prev_record` and `get_next_record`. Maybe `tablebtree_move_to` did
not need to use the state machine, but I included it as a safeguard.
Edit: Closes #1625 . When I implemented `restore_context`, I forgot to
add a `return_if_io` after calling it in `next` 🤦‍♂️
Edit: Closes #1617 . Just tested it and it also solves this bug.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1636
2025-06-03 14:24:40 +03:00
Jussi Saurio
5f586b7b24 Merge 'Small tracing enhancement' from Pedro Muniz
Instrument trace_insn to debug print its the stack pc and instruction.
Also, disable rustyline logs for the CLI as it is too noisy to work
with.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1635
2025-06-02 17:41:52 +03:00
Piotr Rzysko
4d35e36b77 Introduce virtual table types 2025-06-01 07:45:57 +02:00
Piotr Rzysko
b291179554 Extract cursor logic from VirtualTable into VirtualTableCursor 2025-06-01 07:45:57 +02:00
Piotr Rzysko
6300deb77f Move VTabOpaqueCursor to vtab module 2025-06-01 07:45:57 +02:00
pedrocarlo
c9c73f2497 fix explain panicking on None CursorKey 2025-05-31 01:19:26 -03:00
pedrocarlo
bc563266b3 add instrumentation to more functions for debugging + adjust how cursors are opened 2025-05-30 20:35:50 -03:00
pedrocarlo
33480540f1 make cursor seek reentrant 2025-05-30 13:25:46 -03:00
pedrocarlo
0757109676 instrument trace_insn 2025-05-30 11:33:22 -03:00
Pere Diaz Bou
d4f1b8e068 update i64::MAX comment 2025-05-30 14:02:05 +02:00
Pere Diaz Bou
da4190a23e Convert u64 rowid to i64
Rowids can be negative, therefore let's swap to i64
2025-05-30 13:07:31 +02:00
Jussi Saurio
819a6138d0 Merge 'Fix: aggregate regs must be initialized as NULL at the start' from Jussi Saurio
Again found when fuzzing nested where clause subqueries:
Aggregate registers need to be NULLed at the start because the same
registers might be reused on another invocation of a subquery, and if
they are not NULLed, the 2nd invocation of the same subquery will have
values left over from the first invocation.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1614
2025-05-30 09:39:37 +03:00
Jussi Saurio
5632a6046e Merge 'Fix: allow DeferredSeek on more than one cursor per program' from Jussi Saurio
Found while fuzzing nested subqueries. Since subqueries result in nested
plans, it quickly revealed that there can be multiple `DeferredSeek`
instructions issued for different cursors, but our `ProgramState` only
supported one at a time.

Closes #1610
2025-05-30 09:39:23 +03:00
Jussi Saurio
f8257df77b Fix: aggregate regs must be initialized as NULL at the start 2025-05-29 18:44:53 +03:00
Jussi Saurio
69133b3b2e Fix: allow DeferredSeek on more than one cursor per program 2025-05-29 16:05:47 +03:00
Jussi Saurio
cc405dea7e Use new TableReferences struct everywhere 2025-05-29 11:44:56 +03:00
Jussi Saurio
592ba41137 Add assertion forbidding duplicate cursor keys 2025-05-29 01:04:45 +03:00
Jussi Saurio
77ce4780d9 Fix ProgramBuilder::cursor_ref not having unique keys
Currently we have this:

program.alloc_cursor_id(Option<String>, CursorType)`

where the String is the table's name or alias ('users' or 'u' in
the query).

This is problematic because this can happen:

`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`

There are two cursors, both with identifier 't'. This causes a bug
where the program will use the same cursor for both the main query
and the subquery, since they are keyed by 't'.

Instead introduce `CursorKey`, which is a combination of:

1. `TableInternalId`, and
2. index name (Option<String> -- in case of index cursors.

This should provide key uniqueness for cursors:

`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`

here the first 't' will have a different `TableInternalId` than the
second `t`, so there is no clash.
2025-05-29 00:59:24 +03:00
Pere Diaz Bou
28bd24b7d4 clear page cache on transaction failure
This is the first step towards rollback, since we still don't spill
pages with WAL, we can simply invalidate page cache in case of failure.
2025-05-28 15:54:28 +02:00
Jussi Saurio
ad0f2bb399 Merge 'Small VDBE insn tweaks' from Jussi Saurio
1. allow calling op_null with Insn::BeginSubrtn
    - BeginSubrtn is identical to Null, but named differently so that
its use in context is clearer
2. Insn::Return: add possibility to fallthrough on non-integer values as
per sqlite spec

Closes #1588
2025-05-27 20:19:31 +03:00
meteorgan
d9d3a5ecbb Use the SetCookie opcode to implement user_version pragma 2025-05-28 00:31:11 +08:00
Jussi Saurio
6914d61180 allow calling op_null with Insn::BeginSubrtn 2025-05-27 19:09:15 +03:00
Jussi Saurio
70965f4b28 Insn::Return: add possibility to fallthrough on non-integer values as per sqlite spec 2025-05-27 19:09:10 +03:00
Jussi Saurio
a88e1c38f3 Merge 'Fix bug: op_vopen should replace cursor slot, not add new one' from Jussi Saurio
Found this when reviewing #1528 locally and this was crashing
```sql
INSERT INTO t SELECT * FROM generate_series(1,10,1);
```
Reason was that `op_vopen` was not replacing the already allocated
cursor slot, but using `.insert()`

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1583
2025-05-27 12:50:11 +03:00
Pere Diaz Bou
312bb5205a Merge 'Reset idx delete state after successful finish' from Pere Diaz Bou
If we don't reset the state of `IdxDelete`, next `IdxDelete` will start
in `Deleting` state which is completely wrong since it should seek from
the start.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1584
2025-05-27 11:31:25 +02:00
Pere Diaz Bou
a5a8a52a07 reset-idx-delete-state 2025-05-27 10:47:21 +02:00
Jussi Saurio
360b1fcdae Fix bug: op_vopen should replace cursor slot, not add new one 2025-05-27 10:52:36 +03:00
Jussi Saurio
b72b99c973 Merge 'feature: INSERT INTO <table> SELECT' from Pedro Muniz
Closes #1528 .
- Modified `translate_select` so that the caller can define if the
statement is top-level statement or a subquery.
- Refactored `translate_insert` to offload the translation of multi-row
VALUES and SELECT statements to `translate_select`
- I did not try to change much of `populate_column_registers` as I did
not want to break `translate_virtual_table_insert`. Ideally, I would
want to unite this remaining logic folding `populate_column_registers`
into `populate_columns_multiple_rows` and the
`translate_virtual_table_insert` into `translate_insert`. But, I think
this may be best suited for a separate PR.
## TODO
- ~Tests~ - *Done*
- ~Need to emit a temp table when we are selecting and inserting into
the Same Table -
https://github.com/sqlite/sqlite/blob/master/src/insert.c#L1369~ -
*Done*
- Optimization when table have the exact same schema - open an Issue
about it
- Virtual Tables do not benefit yet from this feature - open an Issue
about it

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1566
2025-05-27 10:50:26 +03:00
Jussi Saurio
3ba9f2ab97 Small cleanups to pager/wal/vdbe - mostly naming
- Instead of using a confusing CheckpointStatus for many different things,
  introduce the following statuses:
    * PagerCacheflushStatus - cacheflush can result in either:
      - the WAL being written to disk and fsynced
      - but also a checkpoint to the main BD file, and fsyncing the main DB file

      Reflect this in the type.
    * WalFsyncStatus - previously CheckpointStatus was also used for this, even
      though fsyncing the WAL doesn't checkpoint.
    * CheckpointStatus/CheckpointResult is now used only for actual checkpointing.

- Rename HaltState to CommitState (program.halt_state -> program.commit_state)
- Make WAL a non-optional property in Pager
  * This gets rid of a lot of if let Some(...) boilerplate
  * For ephemeral indexes, provide a DummyWAL implementation that does nothing.
- Rename program.halt() to program.commit_txn()
- Add some documentation comments to structs and functions
2025-05-26 10:37:34 +03:00
pedrocarlo
e3fd1e589e support using a INSERT SELECT that references the same table in both statements 2025-05-25 19:15:28 -03:00
Jussi Saurio
7c07c09300 Add stable internal_id property to TableReference
Currently our "table id"/"table no"/"table idx" references always
use the direct index of the `TableReference` in the plan, e.g. in
`SelectPlan::table_references`. For example:

```rust
Expr::Column { table: 0, column: 3, .. }
```

refers to the 0'th table in the `table_references` list.

This is a fragile approach because it assumes the table_references
list is stable for the lifetime of the query processing. This has so
far been the case, but there exist certain query transformations,
e.g. subquery unnesting, that may fold new table references from
a subquery (which has its own table ref list) into the table reference
list of the parent.

If such a transformation is made, then potentially all of the Expr::Column
references to tables will become invalid. Consider this example:

```sql
-- Assume tables: users(id, age), orders(user_id, amount)

-- Get total amount spent per user on orders over $100
SELECT u.id, sub.total
FROM users u JOIN
     (SELECT user_id, SUM(amount) as total
      FROM orders o
      WHERE o.amount > 100
      GROUP BY o.user_id) sub
WHERE u.id = sub.user_id

-- Before subquery unnesting:
-- Main query table_references: [users, sub]
-- u.id refers to table 0, column 0
-- sub.total refers to table 1, column 1
--
-- Subquery table_references: [orders]
-- o.user_id refers to table 0, column 0
-- o.amount refers to table 0, column 1
--
-- After unnesting and folding subquery tables into main query,
-- the query might look like this:

SELECT u.id, SUM(o.amount) as total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100
GROUP BY u.id;

-- Main query table_references: [users, orders]
-- u.id refers to table index 0 (correct)
-- o.amount refers to table index 0 (incorrect, should be 1)
-- o.user_id refers to table index 0 (incorrect, should be 1)
```

We could ofc traverse every expression in the subquery and rewrite
the table indexes to be correct, but if we instead use stable identifiers
for each table reference, then all the column references will continue
to be correct.

Hence, this PR introduces a `TableInternalId` used in `TableReference`
as well as `Expr::Column` and `Expr::Rowid` so that this kind of query
transformations can happen with less pain.
2025-05-25 20:26:17 +03:00
PThorpe92
c2ec6caae1 Finish integrating xConnect into vtable open api 2025-05-24 14:49:58 -04:00
Jussi Saurio
597020bc0c Merge 'Support values statement and values in select' from meteorgan
Close: #866
**limbo output**:
```
limbo> explain values(1, 2);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     5     0                    0   Start at 5
1     Integer            1     1     0                    0   r[1]=1
2     Integer            2     2     0                    0   r[2]=2
3     ResultRow          1     2     0                    0   output=r[1..2]
4     Halt               0     0     0                    0
5     Goto               0     1     0                    0

limbo> explain values(1, 2), (3, 4);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     16    0                    0   Start at 16
1     InitCoroutine      1     9     2                    0
2     Integer            1     2     0                    0   r[2]=1
3     Integer            2     3     0                    0   r[3]=2
4     Yield              1     0     0                    0
5     Integer            3     2     0                    0   r[2]=3
6     Integer            4     3     0                    0   r[3]=4
7     Yield              1     0     0                    0
8     EndCoroutine       1     0     0                    0
9     InitCoroutine      1     0     2                    0
10    Yield              1     15    0                    0
11    Copy               2     4     0                    0   r[4]=r[2]
12    Copy               3     5     0                    0   r[5]=r[3]
13    ResultRow          4     2     0                    0   output=r[4..5]
14    Goto               0     10    0                    0
15    Halt               0     0     0                    0
16    Goto               0     1     0                    0

limbo> explain select * from (values(1, 2), (3, 4));
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     16    0                    0   Start at 16
1     InitCoroutine      1     9     2                    0
2     Integer            1     2     0                    0   r[2]=1
3     Integer            2     3     0                    0   r[3]=2
4     Yield              1     0     0                    0
5     Integer            3     2     0                    0   r[2]=3
6     Integer            4     3     0                    0   r[3]=4
7     Yield              1     0     0                    0
8     EndCoroutine       1     0     0                    0
9     InitCoroutine      1     0     2                    0
10    Yield              1     15    0                    0
11    Copy               2     4     0                    0   r[4]=r[2]
12    Copy               3     5     0                    0   r[5]=r[3]
13    ResultRow          4     2     0                    0   output=r[4..5]
14    Goto               0     10    0                    0
15    Halt               0     0     0                    0
16    Transaction        0     0     0                    0   write=false
17    Goto               0     1     0                    0
```
**sqlite output**:
```
sqlite> explain values(1, 2);
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     5     0                    0   Start at 5
1     Integer        1     1     0                    0   r[1]=1
2     Integer        2     2     0                    0   r[2]=2
3     ResultRow      1     2     0                    0   output=r[1..2]
4     Halt           0     0     0                    0
5     Goto           0     1     0                    0
sqlite> explain values(1, 2), (3, 4);
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     16    0                    0   Start at 16
1     InitCoroutine  1     9     2                    0
2     Integer        1     4     0                    0   r[4]=1
3     Integer        2     5     0                    0   r[5]=2
4     Yield          1     0     0                    0
5     Integer        3     4     0                    0   r[4]=3
6     Integer        4     5     0                    0   r[5]=4
7     Yield          1     0     0                    0
8     EndCoroutine   1     0     0                    0
9     InitCoroutine  1     0     2                    0
10      Yield          1     15    0                    0   next row of 2-ROW VALUES CLAUSE
11      Copy           4     8     0                    2   r[8]=r[4]
12      Copy           5     9     0                    2   r[9]=r[5]
13      ResultRow      8     2     0                    0   output=r[8..9]
14    Goto           0     10    0                    0
15    Halt           0     0     0                    0
16    Goto           0     1     0                    0
sqlite>  explain select * from (values(1, 2), (3, 4));
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     16    0                    0   Start at 16
1     InitCoroutine  1     9     2                    0
2     Integer        1     4     0                    0   r[4]=1
3     Integer        2     5     0                    0   r[5]=2
4     Yield          1     0     0                    0
5     Integer        3     4     0                    0   r[4]=3
6     Integer        4     5     0                    0   r[5]=4
7     Yield          1     0     0                    0
8     EndCoroutine   1     0     0                    0
9     InitCoroutine  1     0     2                    0
10      Yield          1     15    0                    0   next row of 2-ROW VALUES CLAUSE
11      Copy           4     8     0                    2   r[8]=r[4]
12      Copy           5     9     0                    2   r[9]=r[5]
13      ResultRow      8     2     0                    0   output=r[8..9]
14    Goto           0     10    0                    0
15    Halt           0     0     0                    0
16    Goto           0     1     0                    0
```

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1549
2025-05-23 13:56:31 +03:00
Zaid Humayun
4312d371fb addresses comment https://github.com/tursodatabase/limbo/pull/1548#discussion_r2102606810 by @jussisaurio
this commit changes the btree_destroy() signature to return an Option<usize>. This more closely resembles Rust semantics instead of passing a pointer to a usize.
However, I'm unsure if I'm handling the cursor result correctly
2025-05-23 00:46:05 +05:30
meteorgan
34e05ef974 make values work in subquery 2025-05-23 00:30:04 +08:00
Zaid Humayun
4072a41c9c Drop Table now uses an ephemeral table as a scratch table
Now when dropping a table, an ephemeral table is created as a scratch table. If a root page of some other table is moved into the page occupied by the root page of the table being dropped, that row is first written into an ephemeral table. Then on a next pass, it is deleted from the schema table and then re-inserted with the new root page.

This happens during AUTOVACUUM when deleting a root page will force the last root page to move into the slot being vacated by the root page of the table being deleted
2025-05-22 19:39:46 +05:30
Jussi Saurio
533a00eae3 Fix bug in op_decr_jump_zero() 2025-05-22 11:40:49 +03:00
Jussi Saurio
8bec75d804 Merge 'Initial Support for Nested Translation' from Pedro Muniz
This PR introduces some modifications to the Program Builder to allow us
to use nested parsing. By focusing the emission of Init and the last
Goto (prologue and epilogue), inside the ProgramBuilder, we can just not
emit them if we are parsing/translating in a nested context. For this
PR, I only migrated insert to use these functions as I need them to
support Insert statements that use `SELECT FROM` syntax. Nested parsing
overall enables code reuse for us and arguably is one of the only ways
to parse deeply nested queries without a lot of code duplication.
#1528

Closes #1543
2025-05-22 10:52:00 +03:00
Jussi Saurio
c7f984c5c8 Merge 'Page cache fixes' from Pere Diaz Bou
This PR builds on top of
https://github.com/tursodatabase/limbo/pull/1368 and adds few things
like allowing inserting pages with the same page key, fix fuzz tests by
adding transactions and some minor improvements to cacheflush.

Closes #1523
2025-05-22 10:12:56 +03:00
Jussi Saurio
fc150b12c9 Merge 'CSV virtual table extension' from Piotr Rżysko
This PR adds a port of [SQLite's CSV virtual table
extension](https://www.sqlite.org/csv.html).
Planned follow-ups:
* Pass detailed error messages from `VTabModule::create`, not just
`ResultCode`s.
* Address the TODO in `VTabModuleImpl::create_schema`.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1544
2025-05-22 09:48:53 +03:00
pedrocarlo
53bf5d5ef5 adjust translate functions to take a program instead of Option<ProgramBuilder> + remove any Init emission in traslate functions + use epilogue in all places necessary 2025-05-21 16:41:10 -03:00
pedrocarlo
1c12535d9f push prologue to top-level translate function 2025-05-21 15:50:43 -03:00
pedrocarlo
3090dd91fa push translate_ctx creation outside of prologue 2025-05-21 13:06:25 -03:00
pedrocarlo
f5d6d11d16 extract prologue and epilogue to program builder 2025-05-21 12:47:51 -03:00