`i64::MAX`. We do this by attempting to generate random values smaller
than `i64::MAX` for 100 times and returns `DatabaseFull` error on
failure
- Introduced `DatabaseFull` error variant
Fixes: https://github.com/tursodatabase/turso/issues/1977
Currently we deserialize the entire record to compare them or to get a
particular column. This PR introduces efficient record operations such
as incremental column deserialization and efficient record comparison.
### Incremental Column Deserialization
- Introduced `RecordCursor` to keep track of how much of the header and
the record we have already parsed. Each `BTreeCursor` will have its own
`RecordCursor` similar to an `ImmutableRecord`.
- The `RecordCursor` gets the number of columns from schema when the
BTreeCursor is initialized in VDBE. This helps in cutting down heap
allocs by reserving the correct amount of space for underlying `Vec`s.
- `Immutable` record only carries the serialized `payload` now.
- We parse the header up till we reach the required serial type (denoted
by the column index) and then calculate the offsets and deserialize only
that particular slice of the payload.
- Manually inlined most of the deserialization code into `fn op_column`
code because the compiler is refusing to inline even with
`#[inline(always)]` hint. This is probably due to complicated control
flow.
- Tried to follow SQLite semantics, where it returns `Null` when the
requested column falls outside the number of columns available in the
record or when the payload is empty etc.
### Efficient Record Comparison ops
- Three record comparison function are introduced for Integer, String
and for general case which replaces the `compare_immutable`. These
functions compare a serialized record with deserialized one.
- `compare_records_int`: is used when the first field is integer, header
≤63 bytes, ≤13 total fields. No varint parsing, direct integer
extraction.
- `compare_records_string`: is used when the first field is text with
binary collation, header ≤63 bytes.
- `compare_records_generic`: is used in complex cases, custom
collations, large headers. Here we parse the record incrementally field
by field and comparing each field with the one from the deserialized
record. We early exit on the first mismatch saving on the
deserialization cost.
- `find_compare`: selects the optimal comparison strategy for a given
case and dispatches the function required.
### Benchmarks `main` vs `incremental_column`
I've used the `testing/testing.db` for this benchmark.
| Query | Main
| Incremental | % Change (Faster is +ve) |
|-------------------------------------------------------------|---------
-|-------------|------------------------|
| SELECT first_name FROM users | 1.3579ms
| 1.1452ms | 15.66 |
| SELECT age FROM users | 912.33µs
| 897.97µs | 1.57 |
| SELECT email FROM users | 1.3632ms
| 1.215ms | 10.87 |
| SELECT id FROM users | 1.4985ms
| 1.1762ms | 21.50 |
| SELECT first_name, last_name FROM users | 1.5736ms
| 1.4616ms | 7.11 |
| SELECT first_name, last_name, email FROM users | 1.7965ms
| 1.754ms | 2.36 |
| SELECT id, first_name, last_name, email, age FROM users | 2.3545ms
| 2.4059ms | -2.18 |
| SELECT * FROM users | 3.5731ms
| 3.7587ms | -5.19 |
| SELECT * FROM users WHERE age = 30 | 87.947µs
| 85.545µs | 2.73 |
| SELECT id, first_name FROM users WHERE first_name LIKE 'John%' |
1.8594ms | 1.6781ms | 9.75 |
| SELECT age FROM users LIMIT 1000 | 100.27µs
| 95.418µs | 4.83 |
| SELECT first_name, age, email FROM users LIMIT 1000 | 176.04µs
| 167.56µs | 4.81 |
Closes: https://github.com/tursodatabase/turso/issues/1703Closes#1923
With this change, the following two queries are considered equivalent:
```sql
SELECT value FROM generate_series(5, 50);
SELECT value FROM generate_series WHERE start = 5 AND stop = 50;
```
Arguments passed in parentheses to the virtual table name are now
matched to hidden columns.
Column references are still not supported as table-valued function
arguments. The only difference is that previously, a query like:
```sql
SELECT one.value, series.value
FROM (SELECT 1 AS value) one, generate_series(one.value, 3) series;
```
would cause a panic. Now, it returns a proper error message instead.
Adding support for column references is more nuanced for two main
reasons:
- We need to ensure that in joins where a TVF depends on other tables,
those other tables are processed first. For example, in:
```sql
SELECT one.value, series.value
FROM generate_series(one.value, 3) series, (SELECT 1 AS value) one;
```
the one table must be processed by the top-level loop, and series must
be nested.
- For outer joins involving TVFs, the arguments must be treated as ON
predicates, not WHERE predicates.
In SQLite, the field equivalent to `constraint_usage` (`aConstraintUsage`
from `sqlite3_index_info`) is used to request arguments that are later
passed to the `xFilter` method. In Limbo, this behavior applies to
virtual tables, but not to table-valued functions. Currently, TVFs have
dedicated handling that passes all function arguments to the filter
method and doesn't use information provided in the `constraint_usage`
field.
This commit is a step toward unifying the handling of virtual tables and
TVFs.
We need to enumerate first and filter afterward — not the other way
around — because we later use the indexes produced by `enumerate` to
access the original `predicates` slice.
Previously, the test queries added in this commit would fail with:
thread 'main' panicked at core/schema.rs:129:34:
not implemented
stack backtrace:
0: rust_begin_unwind
at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:665:5
1: core::panicking::panic_fmt
at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:74:14
2: core::panicking::panic
at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:148:5
3: limbo_core::schema::Table::get_root_page
at ./core/schema.rs:129:34
4: limbo_core::translate::main_loop::init_loop
at ./core/translate/main_loop.rs:260:44
5: limbo_core::translate::emitter::emit_query
at ./core/translate/emitter.rs:568:5
6: limbo_core::translate::emitter::emit_program_for_select
at ./core/translate/emitter.rs:496:5
7: limbo_core::translate::emitter::emit_program
at ./core/translate/emitter.rs:187:31
8: limbo_core::translate::select::translate_select
at ./core/translate/select.rs:82:5
9: limbo_core::translate::translate_inner
at ./core/translate/mod.rs:241:13
10: limbo_core::translate::translate
at ./core/translate/mod.rs:95:17
11: limbo_core::Connection::run_cmd
at ./core/lib.rs:416:31
12: <limbo_core::QueryRunner as core::iter::traits::iterator::Iterator>::next
at ./core/lib.rs:916:22
13: limbo::app::Limbo::run_query
at ./cli/app.rs:442:27
14: limbo::app::Limbo::handle_input_line
at ./cli/app.rs:544:13
15: limbo::main
at ./cli/main.rs:51:31
16: core::ops::function::FnOnce::call_once
This is done because the compiler is refusing to inline even after
adding inline hint.
- Get refvalues from directly from registers without using
`make_record`
compare_records_int
compare_records_string
compare_records_generic
comapre_records_generic will still be more efficient than compare-
_immutable because it deserializes the record column by column