Commit Graph

2937 Commits

Author SHA1 Message Date
PThorpe92
588e43c5aa Minor improvements and cleanups in btree 2025-03-01 15:48:42 -05:00
PThorpe92
741c4e8037 Add json subtype for extension value text type 2025-03-01 14:27:33 -05:00
PThorpe92
5b8efd92a4 Update extension ownership cleanups for new vtab module 2025-03-01 14:27:33 -05:00
PThorpe92
e7713e87ec Prevent extensions from accidentally freeing value types, fix comments 2025-03-01 14:27:33 -05:00
l.gualtieri
6449c79e93 Escape character is ignored in LIKE function #1051 2025-03-01 18:32:09 +01:00
Zaid Humayun
2f14b5ddbc fix: addresses comment https://github.com/tursodatabase/limbo/pull/897#discussion_r1975723452
renames the variable r1 to null_reg since that's the purpose anyway
2025-03-01 01:26:00 +05:30
Zaid Humayun
23a904f38d Merge branch 'main' of https://github.com/tursodatabase/limbo 2025-03-01 01:18:45 +05:30
Pekka Enberg
1de73b389e Merge ' fix usable_space calculation and wrong old pages cell count usage ' from Pere Diaz Bou
Closes #1067
2025-02-28 19:45:07 +02:00
Pere Diaz Bou
e545cc7057 fix btree_insert_fuzz_ex implementation 2025-02-28 18:21:38 +01:00
Pere Diaz Bou
bbb3252ab6 fix usable_space calculation and wrong old pages cell count usage 2025-02-28 18:19:27 +01:00
Pekka Enberg
20d618f35c Disable some failing b-tree tests until we've fixed them 2025-02-28 19:17:29 +02:00
Pekka Enberg
13750e9255 Human Rust programmers exist to keep Clippy happy 2025-02-28 19:12:12 +02:00
Pekka Enberg
b4e8afa3c7 Merge 'Implement SQLite balancing algorithm' from Pere Diaz Bou
Beep boop.
What happened you ask? I removed the dumb balancing algorithm I
implemented in favor of SQLite's implementation based on B*Tree[1] where
a page is 2/3 full instead of 1/2. It also tries to balance a page by
taking a maximum 3 pages and distributing cells evenly between them.
I've made some changes that are somewhat related:
* Moved most operations on pages out of BTreeCursor because those
operations are based on a page, not on a cursor, and it makes it easier
to test.
* Fixed `write_u16` and `read_u16` cases that didn't need a implicit
offset calculation. Added: `write_u16_no_offset` and
`read_u16_no_offset` to counter this.
* Added some tests with fuzz testing too.
* Fixed some important actions like: `compute_free_space`,
`defragment_page` and `drop_cell`.
[1] https://dl.acm.org/doi/10.1145/356770.356776

Closes #968
2025-02-28 19:10:52 +02:00
l.gualtieri
cf407f639e fix #1064 2025-02-27 19:47:51 +01:00
Pekka Enberg
6d44ad22fd core: Optimize read_record() function
The SerialType::try_from() was pretty high up in CPU profiles so I asked
my dear friend Claude to optimize it away by using the serial type
integer value directly instead of constructing a fancy enumeration.
2025-02-26 13:31:38 +02:00
Pekka Enberg
936ae307b7 core: Kill value type
We currently have two value types, `Value` and `OwnedValue`. The
original thinking was that `Value` is external type and `OwnedValue` is
internal type. However, this just results in unnecessary transformation
between the types as data crosses the Limbo library boundary.

Let's just follow SQLite here and consolidate on a single value type
(where `sqlite3_value` is just an alias for the internal `Mem` type).
The way this will eventually work is that we can have bunch of
pre-allocated `OwnedValue` objects in `ProgramState` and basically
return a reference to them all the way to the application itself, which
extracts the actual value.
2025-02-26 10:57:45 +02:00
Pekka Enberg
fe440b7b34 Merge 'Fix casting text to integer to match SQLite' from Preston Thorpe
```console
thread 'fuzz::tests::logical_expression_fuzz_run' panicked at tests\integration\fuzz\mod.rs:818:13:
assertion `left == right` failed: query: SELECT  ( ( 3622873 || -8851250 ) * ( ( ( -124 ) + ( -5792536 ) ) ) ) = ( 179434259456392 < 65481085924370 ), limbo: [[Integer(1)]], sqlite: [[Integer(0)]]
  left: [[Integer(1)]]
 right: [[Integer(0)]]
```
This and a few other failing fuzzing tests were due to incorrectly
parsing numerics from strings. Some of our casting was done properly,
but it wasn't being applied to all cases where the behavior was needed.
It was also attempting to parse a string[0..N] N times until
`string[0..N].parse()` would no longer succeed. This searches for the
index of the first illegal character and parses the resulting slice
once.
Tests were added for some of the edgecases that were previously failing.
This PR also adds a macro in vdbe/insn.rs that allows for a bit of
cleanup and reduces some matching.

Closes #1053
2025-02-25 15:44:37 +02:00
Pekka Enberg
7f2525ac27 Merge 'Implement create virtual table using vtab modules, more work on virtual tables' from Preston Thorpe
This PR started out as one to improve the API of extensions but I ended
up building on top of this quite a bit and it just kept going. Sorry
this one is so large but there wasn't really a good stopping point, as
it kept leaving stuff in broken states.
**VCreate**: Support for `CREATE VIRTUAL TABLE t USING vtab_module`
**VUpdate**: Support for `INSERT` and `DELETE` methods on virtual
tables.
Sqlite uses `xUpdate` function with the `VUpdate` opcode to handle all
insert/update/delete functionality in virtual tables..
have to just document that:
```
if args[0] == NULL:  INSERT args[1] the values in args[2..]

if args[1] == NULL: DELETE args[0]

if args[0] != NULL && len(args) > 2: Update values=args[2..]  rowid=args[0]
```
I know I asked @jussisaurio on discord about this already, but it just
sucked so bad that I added some internal translation so we could expose
a [nice API](https://github.com/tursodatabase/limbo/pull/996/files#diff-
3e8f8a660b11786745b48b528222d11671e9f19fa00a032a4eefb5412e8200d1R54) and
handle the logic ourselves while keeping with sqlite's opcodes.
I'll change it back if I have to, I just thought it was genuinely awful
to have to rely on comments to explain all that to extension authors.
The included extension is not meant to be a legitimately useful one, it
is there for testing purposes. I did something similar in #960 using a
test extension, so I figure when they are both merged, I will go back
and combine them into one since you can do many kinds at once, and that
way it will reduce the amount of crates and therefore compile time.
1. Remaining opcodes.
2. `UPDATE` (when we support the syntax)
3. `xConnect` - expose API for a DB connection to a vtab so it can
perform arbitrary queries.

Closes #996
2025-02-25 15:31:12 +02:00
PThorpe92
b31363aecb More improvements/cleanups to vdbe around casting 2025-02-24 21:31:26 -05:00
PThorpe92
6d55cdba3b Remove allocations from numeric text casting, cleanups 2025-02-24 12:30:38 -05:00
PThorpe92
7e94a152a5 Consolidate code to parse numerics from text 2025-02-24 11:21:25 -05:00
PThorpe92
66f0835d51 Add tests for corrected behavior around casting 2025-02-24 11:21:25 -05:00
PThorpe92
8070e51e26 Fix vdbe casting and rounding issues 2025-02-24 11:21:22 -05:00
PThorpe92
8f27a5fc92 Fix (fuzzing tests) casting text to integer to match sqlite behavior 2025-02-24 11:13:25 -05:00
Pekka Enberg
eb6019b453 cargo fmt 2025-02-24 17:39:21 +02:00
Pekka Enberg
16306ee1f4 Merge 'Modify the LIKE function to work with all types' from Mohamed Hossam
This PR fixes
[#1040](https://github.com/tursodatabase/limbo/issues/1040) and modifies
the `LIKE` function in the VDBE to work on expressions of all types like
SQLite.
Looking at how SQLite handles this, it gets the text value of the
expression regardless of its affinity. I used `exec_cast(exp, "TEXT")`
to achieve the same effect. Since most `LIKE` queries will probably be
done on `TEXT` expressions, I avoid casting the expression if it's
already `TEXT`. If either of the expressions was `NULL`, SQLite returns
nothing i.e. `NULL`. I also changed the unreachable arm message from
`Like on non-text registers` to `Like failed`.
The following queries produced the same results in Limbo:
```
SQLite version 3.46.1 2024-08-13 09:16:08 (UTF-16 console I/O)
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE TABLE tbl (n NULL, i INTEGER, r REAL, t TEXT, b BLOB);
sqlite> INSERT INTO tbl VALUES(NULL,1,2.0,'a',X'0500');
sqlite> SELECT * FROM tbl;
|1|2.0|a|♣
sqlite> SELECT * FROM tbl WHERE n LIKE NULL;
sqlite> SELECT * FROM tbl WHERE n LIKE 'NULL';
sqlite> SELECT * FROM tbl WHERE n LIKE 1;
sqlite> SELECT * FROM tbl WHERE n LIKE 2.0;
sqlite> SELECT * FROM tbl WHERE n LIKE x'0500';
sqlite>
sqlite> SELECT * FROM tbl WHERE i LIKE NULL;
sqlite> SELECT * FROM tbl WHERE i LIKE 1;
|1|2.0|a|♣
sqlite> SELECT * FROM tbl WHERE i LIKE '1';
|1|2.0|a|♣
sqlite> SELECT * FROM tbl WHERE i LIKE 2.0;
sqlite> SELECT * FROM tbl WHERE i LIKE 1.0;
sqlite> SELECT * FROM tbl WHERE i LIKE x'0500';
sqlite>
sqlite> SELECT * FROM tbl WHERE r LIKE NULL;
sqlite> SELECT * FROM tbl WHERE r LIKE 2;
sqlite> SELECT * FROM tbl WHERE r LIKE 2.0;
|1|2.0|a|♣
sqlite> SELECT * FROM tbl WHERE r LIKE '2.0';
|1|2.0|a|♣
sqlite> SELECT * FROM tbl WHERE r LIKE 'a';
sqlite> SELECT * FROM tbl WHERE r LIKE x'0500';
sqlite>
sqlite> SELECT * FROM tbl WHERE t LIKE NULL;
sqlite> SELECT * FROM tbl WHERE t LIKE 1;
sqlite> SELECT * FROM tbl WHERE t LIKE 2.0;
sqlite> SELECT * FROM tbl WHERE t LIKE 'a';
|1|2.0|a|♣
sqlite> SELECT * FROM tbl WHERE t LIKE x'0500';
sqlite>
sqlite> SELECT * FROM tbl WHERE b LIKE NULL;
sqlite> SELECT * FROM tbl WHERE b LIKE 1;
sqlite> SELECT * FROM tbl WHERE b LIKE 2.0;
sqlite> SELECT * FROM tbl WHERE b LIKE 'a';
sqlite> SELECT * FROM tbl WHERE b LIKE x'0500';
|1|2.0|a|♣
sqlite> SELECT * FROM tbl WHERE b LIKE 'x''0500''';
sqlite> SELECT * FROM tbl WHERE b LIKE '♣';
sqlite>
sqlite> SELECT * FROM tbl WHERE 1 LIKE 1;
|1|2.0|a|♣
sqlite> SELECT * FROM tbl WHERE 2.0 LIKE 2.0;
|1|2.0|a|♣
sqlite> SELECT * FROM tbl WHERE 2.0 LIKE '2.0';
|1|2.0|a|♣
sqlite> SELECT * FROM tbl WHERE '2.0' LIKE 2.0;
|1|2.0|a|♣
sqlite> SELECT * FROM tbl WHERE '123.45' LIKE 123.45;
|1|2.0|a|♣
sqlite> SELECT * FROM tbl WHERE NULL LIKE NULL;
sqlite> SELECT * FROM tbl WHERE x'0500' LIKE x'0500';
|1|2.0|a|♣
sqlite> SELECT typeof(n), typeof(i), typeof(r), typeof(t), typeof(b) FROM tbl;
null|integer|real|text|blob
```
Though, these queries are very basic, and more testing could be done.

Closes #1044
2025-02-24 11:27:02 +02:00
Pekka Enberg
4cefb222db Merge 'Fix cast_text_to_number compatibility' from Pedro Muniz
Modified  `cast_text_to_number` to be more compatible with SQLite. When
I was running some fuzz tests, I would eventually get errors due to
incorrect casting of text to `INTEGER` or `REAL`. Previously in code
there were 2 implementations of `cast_text_to_number`: one in
`core/vdbe/insn.rs` and one in `core/vdbe/mod.rs`. I consolidated the
casting to only one function. Previously, the `mod.rs` function was just
calling `checked_cast_text_to_numeric`, which was used in `MustBeInt`
opcode.  Hopefully this fixes some of the CI testing issues we are
having. This was the query that prompted me to do this: `SELECT  ( ( (
878352367 ) <> ( 29 ) ) ) = ( ( ( -4309097 ) / ( -37 || -149680985265412
) ) - 755066415 );`

Closes #1038
2025-02-24 11:20:14 +02:00
Pekka Enberg
600ce590fb Merge 'Handle parsing URI according to SQLite specification' from Preston Thorpe
closes #977
In order to properly get #960 merged and keep some sort of the same API
we have now, we need to support URIs/query parameters for opening new
databases.
This PR doesn't attempt to implement anything useful with this, it only
handles parsing, but it will allow #960 to properly open a new file with
a specific VFS without having to entirely re-design the `open_file`
method/API. The existing option in that PR right now is less than ideal.
e.g. All of the existing methods already accept an `IO` impl
```rust
    pub fn open_file(io: Arc<dyn IO>, path: &str) -> Result<Arc<Database>> {
// or
      pub fn open(
        io: Arc<dyn IO>,
        page_io: Rc<dyn DatabaseStorage>,
        wal: Rc<RefCell<dyn Wal>>,
        shared_wal: Arc<RwLock<WalFileShared>>,
        buffer_pool: Rc<BufferPool>,
    ) -> Result<Arc<Database>> {

```
Right now, most of the parsed query parameters are not options we
support yet, but I figured it's better to handle parsing them now and
using them later on when we support them.
Also, if this looks way overly complicated for what it does... that's
because the cross platform edge-cases are a super pain in the ass.

Closes #1039
2025-02-24 11:17:39 +02:00
Zaid Humayun
662b34c8b6 fix: addresses comments https://github.com/tursodatabase/limbo/pull/897#discussion_r1962100891 and https://github.com/tursodatabase/limbo/pull/897#discussion_r1962100299 by @pereman2
this commit introduces the ability of the state machine traversal to handle index interior cell overflow pages. it also makes the state machine states more explicit with match arms.
2025-02-23 16:12:28 +05:30
Zaid Humayun
fdf4166a7c fix: addresses comment https://github.com/tursodatabase/limbo/pull/897#discussion_r1962102119 by @pereman2
this commit changes the contents of the state machine state for ClearOverflowPages to contain the BTreeCell instead of the cell index.
this avoids having to repeatedly do an operation to get the BTreeCell from the cell index each time the state is reached
2025-02-23 12:41:17 +05:30
m0hossam
2204d92a0b Modify LIKE to handle all affinities including Nulls 2025-02-22 04:43:43 +02:00
pedrocarlo
2e38aa1d6b remove dbg 2025-02-20 16:09:39 -03:00
pedrocarlo
13639899a5 more adjustments to parser to handle edge cases 2025-02-20 16:05:50 -03:00
pedrocarlo
033d0116d6 rewrote parsing from text to integer and real 2025-02-20 02:16:30 -03:00
m0hossam
2425b601f7 Cast the matching value into TEXT before matching 2025-02-20 04:57:01 +02:00
m0hossam
1935426509 Add support for Int columns in LIKE function 2025-02-20 00:42:41 +02:00
Zaid Humayun
77815b20d7 docs: added documentation for btree_destroy and clear_overflow_pages 2025-02-19 21:46:26 +05:30
Zaid Humayun
2ff27834ff fix: addresses comment https://github.com/tursodatabase/limbo/pull/897#discussion_r1948154297 by @pereman2
this commit adds the final touches to the state machine for btree_destroy and also introduces a state machine for clear_overflow_pages
this addresses the issue of IO interrupting an operation in progress and solves the following problems:
* performance issues with repeating an operation
* missing page issues with clear overflow pages
2025-02-19 21:46:26 +05:30
Zaid Humayun
8ce305ad44 fixed compiler errors after rebase 2025-02-19 21:46:26 +05:30
Zaid Humayun
f7dc0d42df btree_destroy: implemented functionality with state machine 2025-02-19 21:46:26 +05:30
Zaid Humayun
07efc9b902 DROP TABLE: index drop
added missing instructions to drop the indices for a table physically from btree pages in a loop
2025-02-19 21:46:26 +05:30
Zaid Humayun
34f390abc9 cleanup: removed commented code 2025-02-19 21:46:26 +05:30
Zaid Humayun
e0105398a6 btree: modified the btree_destroy subroutine
modified the btree_destroy subroutine to do an iterative DFS and use the stack cell counters to keep track of whether children have been removed. adds a unit test.
2025-02-19 21:46:26 +05:30
Zaid Humayun
dc2bb7cb9b DropTable: implementation complete
added helper methods to Schema to remove table and indices from in-memory structures
completed the implementation for DropTable using that
2025-02-19 21:46:26 +05:30
Zaid Humayun
40b08c559c vdbe: modified instruction DropTable
the instruction DropTable has been modified to correctly match SQLite semantics
2025-02-19 21:46:26 +05:30
Zaid Humayun
fbc8cd7e70 vdbe: modified the Null instruction
modified the Null instruction to more closely match SQLite semantics. Allows passing in a second register and all registers from r1..r2 area set to null
2025-02-19 21:46:26 +05:30
Zaid Humayun
97d87955cc DROP TABLE: renamed BTreeCusor::btree_drop to BTreeCursor::btree_destroy
this more closely matches semantics
2025-02-19 21:46:26 +05:30
Zaid Humayun
508dad77ab DROP TABLE: modified program instructions
correctly emitting constant instructions and also calling Destroy instead of DropTable instruction for removing btree pages
2025-02-19 21:46:25 +05:30
Zaid Humayun
713465c592 instruction: added destroy instruction
added required functionality for destroy. minor cleanup of print statements
2025-02-19 21:46:25 +05:30
Zaid Humayun
caab612901 resolved all conflicts after rebase from upstream/main 2025-02-19 21:46:25 +05:30