Commit Graph

2780 Commits

Author SHA1 Message Date
Jussi Saurio
2087393d22 Merge 'Write database header via normal pager route' from meteorgan
Closes: #1613

Closes #1634
2025-06-04 09:39:14 +03:00
Jussi Saurio
ad8c9a4c15 Merge 'Fix WAL frame checksum mismatch' from Diego Reis
Closes #1622
I did an A/B test between SQLite and Limbo and they can restart the db
from each other, indicating that there isn't something very wrong with
our file format. Turns out it was with our reset logic without
truncating the file. I assumed it's safe to don't reset if we're in
`PASSIVE` mode, given the
[docs](https://www.sqlite.org/c3ref/wal_checkpoint_v2.html) and [source
code](https://github.com/sqlite/sqlite/blob/2bd9f69d40dd240c4122c6d02f1f
f447e7b5c098/src/wal.c#L2193).
It also does some small clean ups and fixes.

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1647
2025-06-04 09:14:45 +03:00
Pekka Enberg
c6ef19396d Merge 'Add support for pragma table-valued functions' from Piotr Rżysko
This PR adds support for table-valued functions for PRAGMAs (see the
[PRAGMA functions section](https://www.sqlite.org/pragma.html)).
Additionally, it introduces built-in table-valued functions. I
considered using extensions for this, but there are several reasons in
favor of a dedicated mechanism:
* It simplifies the use of internal functions, structs, etc. For
example, when implementing `json_each` and `json_tree`, direct access to
internals was necessary:
https://github.com/tursodatabase/limbo/pull/1088
* It avoids FFI overhead. [Benchmarks](https://github.com/piotrrzysko/li
mbo/blob/pragma_vtabs_bench/core/benches/pragma_benchmarks.rs) on my
hardware show that `pragma_table_info()` implemented as an extension is
2.5× slower than the built-in version.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1642
2025-06-04 09:08:10 +03:00
Diego Reis
09f978b239 core: Tagging some comments as TODO 2025-06-03 15:09:16 -03:00
meteorgan
1554c54f2b restore comments 2025-06-03 22:06:08 +08:00
meteorgan
f2bf6251cd write database header via normal pager route 2025-06-03 22:06:08 +08:00
Jussi Saurio
31b37332d5 all index cursors must be opened when DELETE does an index seek too 2025-06-03 15:18:45 +03:00
Jussi Saurio
06626f72eb Fix cursors not being opened for indexes in DELETE 2025-06-03 14:45:01 +03:00
Jussi Saurio
c488c32d43 Merge 'Make cursor seek reentrant' from Pedro Muniz
Closes #1628.  Every function that calls `process_overflow_read` needs
to be reentrant. I did not change it here, but it would include
`get_prev_record` and `get_next_record`. Maybe `tablebtree_move_to` did
not need to use the state machine, but I included it as a safeguard.
Edit: Closes #1625 . When I implemented `restore_context`, I forgot to
add a `return_if_io` after calling it in `next` 🤦‍♂️
Edit: Closes #1617 . Just tested it and it also solves this bug.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1636
2025-06-03 14:24:40 +03:00
Diego Reis
cf038b045d core/wal: Only reset the WAL if the file is truncated 2025-06-02 23:16:30 -03:00
Diego Reis
16c81f471b core/ondisk: Stop reading WAL file if a frame mismatch salt value with header
The salts values in the WAL header are (re)generated in every checkpoint (but in PASSIVE mode), so if we find a frame with mismatch it means it's a leftover from a previous checkpoint.
2025-06-02 23:14:51 -03:00
Diego Reis
ec4eb52734 core/wal: Refactor open_shared for readability 2025-06-02 19:21:13 -03:00
Jussi Saurio
ea301de726 Merge 'Pass input string to translate function' from Pedro Muniz
In preparation for `CREATE VIEW`, we need to have the original sql query
that was used to create the view. I'm using the scanner's offset to
slice into the original input, trimming the newlines, and passing it to
the translate function.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1621
2025-06-02 17:43:11 +03:00
Jussi Saurio
5f586b7b24 Merge 'Small tracing enhancement' from Pedro Muniz
Instrument trace_insn to debug print its the stack pc and instruction.
Also, disable rustyline logs for the CLI as it is too noisy to work
with.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1635
2025-06-02 17:41:52 +03:00
pedrocarlo
9b5f5f6053 do not move_to if we are already inserting in correct place 2025-06-02 02:55:32 -03:00
pedrocarlo
9dc6638313 cleaner approach for opening indexes 2025-06-02 01:13:14 -03:00
pedrocarlo
c2942a5819 small fixes 2025-06-01 12:11:03 -03:00
Piotr Rzysko
d1d8ead475 Add support for pragma table-valued functions 2025-06-01 10:25:42 +02:00
pedrocarlo
39434fd20f return_io when restoring context 2025-06-01 03:07:16 -03:00
Piotr Rzysko
4d35e36b77 Introduce virtual table types 2025-06-01 07:45:57 +02:00
Piotr Rzysko
b291179554 Extract cursor logic from VirtualTable into VirtualTableCursor 2025-06-01 07:45:57 +02:00
Piotr Rzysko
6300deb77f Move VTabOpaqueCursor to vtab module 2025-06-01 07:45:57 +02:00
Piotr Rzysko
149375b2b4 Extract VirtualTable to a separate module 2025-06-01 07:45:57 +02:00
pedrocarlo
2ddbb7eeed adjust move_to and seek functions to make them truly reentrant + adding return_if_locked_maybe_load in some places so that we read loaded pages 2025-06-01 01:01:35 -03:00
pedrocarlo
d688cfd547 make find_cell and process_overflow_page reentrant 2025-05-31 23:31:59 -03:00
pedrocarlo
dae58be071 make move_to reentrant 2025-05-31 02:56:30 -03:00
pedrocarlo
c9c73f2497 fix explain panicking on None CursorKey 2025-05-31 01:19:26 -03:00
pedrocarlo
bc563266b3 add instrumentation to more functions for debugging + adjust how cursors are opened 2025-05-30 20:35:50 -03:00
pedrocarlo
33480540f1 make cursor seek reentrant 2025-05-30 13:25:46 -03:00
pedrocarlo
0757109676 instrument trace_insn 2025-05-30 11:33:22 -03:00
pedrocarlo
b73200de86 pass input string to translate function 2025-05-30 11:20:36 -03:00
Pere Diaz Bou
d4f1b8e068 update i64::MAX comment 2025-05-30 14:02:05 +02:00
Pere Diaz Bou
da4190a23e Convert u64 rowid to i64
Rowids can be negative, therefore let's swap to i64
2025-05-30 13:07:31 +02:00
Jussi Saurio
819a6138d0 Merge 'Fix: aggregate regs must be initialized as NULL at the start' from Jussi Saurio
Again found when fuzzing nested where clause subqueries:
Aggregate registers need to be NULLed at the start because the same
registers might be reused on another invocation of a subquery, and if
they are not NULLed, the 2nd invocation of the same subquery will have
values left over from the first invocation.

Reviewed-by: Preston Thorpe (@PThorpe92)

Closes #1614
2025-05-30 09:39:37 +03:00
Jussi Saurio
5632a6046e Merge 'Fix: allow DeferredSeek on more than one cursor per program' from Jussi Saurio
Found while fuzzing nested subqueries. Since subqueries result in nested
plans, it quickly revealed that there can be multiple `DeferredSeek`
instructions issued for different cursors, but our `ProgramState` only
supported one at a time.

Closes #1610
2025-05-30 09:39:23 +03:00
Jussi Saurio
482eb4aa9a Merge 'Refactor: make clear distinction between 'joined tables' and 'tables referenced from outer query scopes'' from Jussi Saurio
**Beef:** we need to distinguish between references to tables in the
current query scope (CTEs, FROM clause) and references to tables from
outer query scopes (inside subqueries, or inside CTEs that refer to
previous CTEs). We don't want to consider 'tables from outside' in the
join order of a subquery, but we want the subquery to be able to
_reference_ those tables in e.g. its WHERE clause.
This PR -- or at least some sort of equivalent of it -- is a requirement
for #1595.
---
This PR replaces the `Vec<TableReference>` we use with new data
structures:
- TableReferences struct, which holds both:
     - joined_tables, and
     - outer_query_refs
- JoinedTable:
     - this is just a rename of the previous TableReference struct
- OuterQueryReference
     - this is to distinguish from JoinedTable those cases where
       e.g. a subquery refers to an outer query's table, or a CTE
       refers to a previous CTE.
Both JoinedTable and OuterQueryReference can be referred to by
expressions,
but only JoinedTables are considered for join ordering optimization and
so
forth.
These data structures are then used everywhere, which resulted in a lot
of changes.

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1580
2025-05-29 20:45:33 +03:00
Jussi Saurio
f8257df77b Fix: aggregate regs must be initialized as NULL at the start 2025-05-29 18:44:53 +03:00
Jussi Saurio
69133b3b2e Fix: allow DeferredSeek on more than one cursor per program 2025-05-29 16:05:47 +03:00
Pere Diaz Bou
dd15b7df7f remove dumb comment from pagecachekey 2025-05-29 14:12:16 +02:00
Pere Diaz Bou
93161e9fce remove lru size > 0 check on page cache fuzz 2025-05-29 14:12:16 +02:00
Pere Diaz Bou
37e834b092 remove unnecessary test 2025-05-29 14:12:16 +02:00
Pere Diaz Bou
44007075d9 remove frame_id from PageCacheKey
After reading sqlite a bit, it isn't needed because we have RWlock for
each table in the database file.
2025-05-29 14:12:16 +02:00
Jussi Saurio
211b511189 Fix join optimizer tests 2025-05-29 11:44:56 +03:00
Jussi Saurio
cc405dea7e Use new TableReferences struct everywhere 2025-05-29 11:44:56 +03:00
Jussi Saurio
124b38a262 plan.rs: add new datastructures
- TableReferences struct, which holds both:
     - joined_tables, and
     - outer_query_refs

- JoinedTable:
     - this is just a rename of the previous TableReference struct

- OuterQueryReference
     - this is to distinguish from JoinedTable those cases where
       e.g. a subquery refers to an outer query's table, or a CTE
       refers to a previous CTE.

Both JoinedTable and OuterQueryReference can be referred to by expressions,
but only JoinedTables are considered for join ordering optimization and so
forth.

This commit does not compile.
2025-05-29 11:03:09 +03:00
Jussi Saurio
592ba41137 Add assertion forbidding duplicate cursor keys 2025-05-29 01:04:45 +03:00
Jussi Saurio
77ce4780d9 Fix ProgramBuilder::cursor_ref not having unique keys
Currently we have this:

program.alloc_cursor_id(Option<String>, CursorType)`

where the String is the table's name or alias ('users' or 'u' in
the query).

This is problematic because this can happen:

`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`

There are two cursors, both with identifier 't'. This causes a bug
where the program will use the same cursor for both the main query
and the subquery, since they are keyed by 't'.

Instead introduce `CursorKey`, which is a combination of:

1. `TableInternalId`, and
2. index name (Option<String> -- in case of index cursors.

This should provide key uniqueness for cursors:

`SELECT * FROM t WHERE EXISTS (SELECT * FROM t)`

here the first 't' will have a different `TableInternalId` than the
second `t`, so there is no clash.
2025-05-29 00:59:24 +03:00
Jussi Saurio
85316d8419 Merge 'clear page cache on transaction failure' from Pere Diaz Bou
This is the first step towards rollback, since we still don't spill
pages with WAL, we can simply invalidate page cache in case of failure.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #1599
2025-05-28 23:14:44 +03:00
krishvishal
fb1d53b0ec Fix test. off by one. 2025-05-29 00:22:38 +05:30
krishvishal
5b57efd894 A couple more tests to test this case. 2025-05-29 00:10:59 +05:30