Commit Graph

3577 Commits

Author SHA1 Message Date
Pere Diaz Bou
e504262bd5 fix rebase 2025-03-29 22:04:08 +01:00
Pere Diaz Bou
105b421274 make read_record, read_varint and read_value faster
We make read_record faster by not allocating Vec if not needed. This is
why I introduced a simple `SmallVec<T>` that will have a stack allocated
list for the simplest workloads, and a heap allocated if we were to
require more stuff.

Both read_varint and read_value, at least in my mac m4, were not
inlined. Since these functions are called so many times it made sense to
inline them to avoid call overhead. With this I saw something like 20%
improvement over previous commit in my m4.
2025-03-29 22:04:08 +01:00
Pere Diaz Bou
3317195a53 Reusable ImmutableRecord -> allocation reduction
Improve allocation usage from ImmutableRecords by reusing them.
ImmutableRecord is basically a contigous piece of memory that holds the
current record. If we move to some other record we usually deallocate
the previous one and allocate a new one -- obviously this is wasteful.
With this commit we will reuse the ImmutableRecord to allow payload to
be extended if needed or reused if we can, making it faster to iterate
records basically.
2025-03-29 22:04:08 +01:00
Pere Diaz Bou
ee55116ca6 return row as reference to registers 2025-03-29 22:04:08 +01:00
Pere Diaz Bou
5b7fcd27bd make column reuse blob/text fields 2025-03-29 22:02:49 +01:00
Pere Diaz Bou
78e9f1c09a append writer 2025-03-29 22:02:49 +01:00
Pere Diaz Bou
bf37fd3314 wip 2025-03-29 22:02:49 +01:00
Pekka Enberg
4ee60348f2 Merge 'Fixes probably all floating point math issues and floating point display issues.' from Ihor Andrianov
Closes #1206
Closes #447
Closes #1117
Issues:
 1. Rust floating point math fucntions are non deterministic.
 2. SQLite have complex floating point display rules.
I changed rust functions to libm for math ops and implemented Display
trait for OwnedValue:Float. A lot of float formatting SQLite probably
inherits from C have to be handcrafted.

Closes #1208
2025-03-29 17:33:53 +02:00
Ihor Andrianov
09b55a9b41 fix nan to return null 2025-03-29 14:47:08 +02:00
Ihor Andrianov
8b9f34af71 fix tests and return nan as null 2025-03-29 14:46:11 +02:00
Ihor Andrianov
922945e819 all output should go through display 2025-03-29 12:46:06 +02:00
Ihor Andrianov
6e04709c4f add comprehensive float display trait 2025-03-29 12:44:34 +02:00
Ihor Andrianov
303f1b3749 replace std math functions with libm for compat 2025-03-29 12:16:14 +02:00
Pere Diaz Bou
28904e74ae Merge 'Make BTreeCell/read_payload not allocate any data + overflow fixes' from Pere Diaz Bou
This PR has two parts:
## 1. Make reading cells faster
My benchmark was simple, running `test_sequential_write` test and see
how long it takes. It is a nice benchmark because it will write a new
row and then `select * from t` for every write to check the contents.
## From:
```bash
cargo test test_sequential_write --release -- --nocapture --test-threads=1   7.21s user 0.34s system 88% cpu 8.527 total
```
### To:
```bash
cargo test test_sequential_write --release -- --nocapture --test-threads=1  3.14s user 0.31s system 82% cpu 4.161 total
```
## 2. Fix reading overflow pages.
The code to read overflow pages was wrong, `read_payload` would try to
read as many overflow pages as possible but if there weren't in cache
they would return `IO` and not read more pages after that. This PR makes
every read record to check if it needs to read from overflow pages and
if not, it will simply read from the current page.

Closes #1141
2025-03-28 12:25:34 +01:00
Pere Diaz Bou
cb85ba8e82 fix extensions.py test 2025-03-28 11:58:03 +01:00
Pekka Enberg
704b4d3baf Merge 'JavaScript binding improvements' from Pekka Enberg
Closes #1204
2025-03-28 12:39:43 +02:00
Pere Diaz Bou
c5b718ac32 fix review comments 2025-03-28 11:30:19 +01:00
Pere Diaz Bou
01cdcf719c remove ignored from sequential tests 2025-03-28 11:30:19 +01:00
Pere Diaz Bou
83d0c9a1b6 fix read overflow page procedure 2025-03-28 11:30:19 +01:00
Pere Diaz Bou
f51c20adf0 read overflow pages on demand 2025-03-28 11:12:43 +01:00
Pere Diaz Bou
dc8acf1a4a cell_get no allocations 2025-03-28 11:12:27 +01:00
Pekka Enberg
94262e4660 bindings/javascript: Fix Statement.get() implementation 2025-03-28 11:32:55 +02:00
Pekka Enberg
7348eb0aa1 bindings/javascript: Add better-sqlite3 tests 2025-03-28 11:32:55 +02:00
Pekka Enberg
11d1dcf31a bindings/javascript: Run tests in parallel 2025-03-28 10:38:40 +02:00
Pekka Enberg
387b68fc06 Merge 'Expose 'Explain' to prepared statement to allow for alternate Writer ' from Preston Thorpe
### The problem:
I often need to copy the output of an `Explain` statement to my
clipboard. Currently this is not possible because it currently will only
write to stdout.
All other limbo output, I am able to run `.output file` in the CLI, then
enter my query and in another tmux pane I simply `cat file | xclip -in
-selection clipboard`.
### The solution:
Expose a `statement.explain()` method that returns the query explanation
as a string. If the user uses something like `execute` instead of
prepare, it will default to `stdout` as expected, but this allows the
user to access the query plan on the prepared statement and do with it
what they please.

Closes #1166
2025-03-28 09:55:58 +02:00
Pekka Enberg
8caa234df3 simulator: Reduce info-level logging
Make the simulator less noisy for casual runs.
2025-03-28 08:10:57 +02:00
Pekka Enberg
9a646acead Merge 'Add BTree balancing after delete' from Krishna Vishal
This PR adds balancing after delete.
- [x] Remove linear search for cell in the page
- [x] Change the implementation to state machine approach
- [x] Handle cases when balancing is needed and not needed
- [x] Add unit test to verify that balancing after delete maintains
BTree integrity.
Fixes: https://github.com/tursodatabase/limbo/issues/1019
Closes: https://github.com/tursodatabase/limbo/issues/455

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1147
2025-03-28 07:32:15 +02:00
Pekka Enberg
8d682fe857 Merge 'Kill test environment' from Pekka Enberg
...not sure why we added it in the first place, but it spams PRs all the
time.

Closes #1196
2025-03-28 07:31:59 +02:00
Pekka Enberg
41bdf25433 Merge 'Remove public unlock method from SpinLock to prevent unsafe aliasing' from Krishna Vishal
Currently we can create two guards and unlock one of them and still have
two mutable references to the same data.
By being able to unlock only via guard we prevent unsafe scenarios.
Currently, the test below will pass and shows that we can hold two
mutable references to the same data. After this fix, this would result
in a deadlock (have to remove `lock.unlock()`)
```rust
#[test]
fn two_mutable_reference() {
    let lock = SpinLock::new(42);
    let guard = lock.lock();
    lock.unlock();
    let guard2 = lock.lock();

    // two mutable references to same data
    *guard = 10;
    *guard2 = 20;

    assert_eq!(*guard, 20);
    assert_eq!(*guard2, 20);
}
```
Note: The javascript action failure is unrelated to this PR.

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1194
2025-03-27 20:43:16 +02:00
Pekka Enberg
4ccfdb1639 Kill test environment
...not sure why we added it in the first place, but it spams PRs all the
time.
2025-03-27 19:46:30 +02:00
Pekka Enberg
c71dad7683 Merge 'Introduce Register struct ' from Pere Diaz Bou
OwnedValue has become a powerhouse of madness, mainly because I decided
to do it like that when I first introduced AggContext. I decided it was
enough and I introduced a `Register` struct that contains `OwnedValue`,
`Record` and `Aggregation`, this way we don't use `OwnedValue` for
everything make everyone's life harder.
This is the next step towards making ImmutableRecords the default
because I want to remove unnecessary allocations. Right now we clone
OwnedValues when we generate a record more than needed.

Closes #1188
2025-03-27 19:45:11 +02:00
Pere Diaz Bou
d01423df83 fix clippy 2025-03-27 17:54:32 +01:00
Pere Diaz Bou
9291f60722 Introduce Register struct
OwnedValue has become a powerhouse of madness, mainly because I decided
to do it like that when I first introduced AggContext. I decided it was
enough and I introduced a `Register` struct that contains `OwnedValue`,
`Record` and `Aggregation`, this way we don't use `OwnedValue` for
everything make everyone's life harder.

This is the next step towards making ImmutableRecords the default
because I want to remove unnecessary allocations. Right now we clone
OwnedValues when we generate a record more than needed.
2025-03-27 17:53:02 +01:00
krishvishal
dcd92954f4 Remove unlock method and move it to guard's drop.
This makes the interface safe and prevents unlocking while still holding the guard
2025-03-27 18:45:05 +05:30
Pekka Enberg
af6e9cd2c2 Merge 'Handle limit zero case in query plan emitter' from Preston Thorpe
closes #1190
Don't emit full query plan and open table if 0 `LIMIT`
```console
limbo> explain select * from t limit 0;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     2     0                    0   Start at 2
1     Halt               0     0     0                    0
2     Transaction        0     0     0                    0   write=false
3     Goto               0     1     0                    0
```
Surprisingly, sqlite will still emit the Open/Rewind/Column/etc, etc for
this case. Definitely feels super unnecessary.

Closes #1191
2025-03-27 08:59:54 +02:00
Pekka Enberg
ec742a8468 Merge 'Fix numeric conversion in SELECT -'e'' from Diego Reis
closes #1157

Closes #1167
2025-03-27 08:58:57 +02:00
Pekka Enberg
eb8866a106 Merge 'Reduce MVCC cursor memory consumption' from Ihor Andrianov
Fix memory hogging in MVCC scan cursor #1104
The current scan cursor loads all rowids at once, which blows up memory
on big tables.
Added BucketScanCursor that loads rowids in configurable batches -
better memory usage while keeping decent performance. It's a drop-in
replacement with the same interface.
Also included LazyScanCursor as an alternative that fetches one rowid at
a time, though it's less efficient due to log(n) skipmap lookups for
each row.
BucketScanCursor is the recommended approach for most use cases. WDYT?

Closes #1178
2025-03-27 08:56:35 +02:00
PThorpe92
a1ac0ca175 Handle limit zero case in query plan emitter 2025-03-26 15:33:05 -04:00
krishvishal
1b77b76d58 Add comments detailing Delete state machine flow and few delete steps. 2025-03-27 01:02:59 +05:30
krishvishal
fe7ff6f53d Add balancing states. Balancing test works. 2025-03-27 00:27:03 +05:30
Pekka Enberg
c1a0236dcc Merge 'Introduce immutable record' from Pere Diaz Bou
Currently we have a Record, which is a dumb vector of cloned values.
This is incredibly bad for performance as we do not want to clone
objects unless needed. Therefore, let's start by introducing this type
so that any record that has already been serialized will be returned
from btree in the format of a simple payload with reference to payload.

Closes #1176
2025-03-26 20:31:04 +02:00
krishvishal
3e3a0f56a1 Make delete re-entrant.
Setup DeleteState enum
2025-03-26 22:44:49 +05:30
Pere Diaz Bou
f07f10ac53 fix read empty blob/text 2025-03-26 18:11:44 +01:00
Pekka Enberg
e0c414f9dd Merge 'Introduce libFuzzer' from Levy A.
This PR introduces structured fuzzing with
[libFuzzer](https://llvm.org/docs/LibFuzzer.html). The expression target
implementation is not complete, but already found a compatibility issue.
More fuzzing targets should be moved from `tests/fuzz` to `fuzz` and
benefit from more advanced fuzzing techniques.
- [x] Add fuzzing guide to `README.md`
   - Install `cargo-fuzz`.
   - Use the nightly version of cargo with the `fuzz` dev shell or use
rustup to switch versions.
   - Run `cargo fuzz run ...`
- [x] Add all binary operations.
# 🐞 Bugs
Compatibility issue found when trying to `select ?` with a `NaN` value.
Sqlite returns `NULL`, while Limbo returns `NaN` (reasonable, but
incompatible).
```
thread '<unnamed>' panicked at fuzz_targets/expression.rs:130:5:
assertion `left == right` failed
  left: Null
 right: Float(NaN)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
==59288== ERROR: libFuzzer: deadly signal
    #0 0x00010564c0f0 in __sanitizer_print_stack_trace+0x28 (librustc-nightly_rt.asan.dylib:arm64+0x5c0f0)
    #1 0x0001024e7b64 in fuzzer::PrintStackTrace()+0x30 (expression:arm64+0x101c53b64)
    #2 0x0001024da650 in fuzzer::Fuzzer::CrashCallback()+0x60 (expression:arm64+0x101c46650)
    #3 0x000195fa6de0 in _sigtramp+0x34 (libsystem_platform.dylib:arm64+0x3de0)
    #4 0x000195f6ff6c in pthread_kill+0x11c (libsystem_pthread.dylib:arm64+0x6f6c)
    #5 0x000195e7c904 in abort+0x7c (libsystem_c.dylib:arm64+0x79904)
    #6 0x000102580990 in std::sys::pal::unix::abort_internal::hd275d720c474f43c+0x8 (expression:arm64+0x101cec990)
    #7 0x000102621604 in std::process::abort::h62d9ecef2f17e944+0x8 (expression:arm64+0x101d8d604)
    #8 0x0001024d93bc in libfuzzer_sys::initialize::_$u7b$$u7b$closure$u7d$$u7d$::h3b4b43a8f9432830+0xb8 (expression:arm64+0x101c453bc)
    #9 0x000102577de0 in std::panicking::rust_panic_with_hook::h19683f6fd94fb24c+0x2b8 (expression:arm64+0x101ce3de0)
    #10 0x000102577970 in std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h4e98e5e8777eac5e+0x8c (expression:arm64+0x101ce3970)
    #11 0x0001025754e0 in std::sys::backtrace::__rust_end_short_backtrace::h12a2d70ebc9128b2+0x8 (expression:arm64+0x101ce14e0)
    #12 0x000102577628 in rust_begin_unwind+0x1c (expression:arm64+0x101ce3628)
    #13 0x000102623340 in core::panicking::panic_fmt::h8c4d74b8e5179d60+0x1c (expression:arm64+0x101d8f340)
    #14 0x0001026236cc in core::panicking::assert_failed_inner::he8fd1f85d57f866a+0x104 (expression:arm64+0x101d8f6cc)
    #15 0x0001025c73dc in core::panicking::assert_failed::h3e7590b91d46bff9 panicking.rs:364
    #16 0x000100930910 in expression::do_fuzz::hfcf5c5e5fde1a31c expression.rs:130
    #17 0x0001009373fc in rust_fuzzer_test_input lib.rs:359
    #18 0x0001024d2f34 in std::panicking::try::do_call::hce6ebc856827ae8b+0xc4 (expression:arm64+0x101c3ef34)
    #19 0x0001024d9624 in __rust_try+0x18 (expression:arm64+0x101c45624)
    #20 0x0001024d896c in LLVMFuzzerTestOneInput+0x16c (expression:arm64+0x101c4496c)
    #21 0x0001024dc3cc in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long)+0x148 (expression:arm64+0x101c483cc)
    #22 0x0001024db8dc in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*)+0x58 (expression:arm64+0x101c478dc)
    #23 0x0001024dd920 in fuzzer::Fuzzer::MutateAndTestOne()+0x258 (expression:arm64+0x101c49920)
    #24 0x0001024de908 in fuzzer::Fuzzer::Loop(std::__1::vector<fuzzer::SizedFile, std::__1::allocator<fuzzer::SizedFile>>&)+0x38c (expression:arm64+0x101c4a908)
    #25 0x0001024fd120 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long))+0x1bac (expression:arm64+0x101c69120)
    #26 0x00010250d884 in main+0x34 (expression:arm64+0x101c79884)
    #27 0x000195bf0270  (<unknown module>)

NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
MS: 3 CopyPart-ShuffleBytes-CrossOver-; base unit: a36928cfe783d55be82d526168a2da57372fdfdc
0xff,0xfd,0xff,0x3f,0x87,0x0,0x6e,0x6f,0x77,0x48,0x48,0x48,0xff,0x48,0xff,0xff,0x5b,0xff,0x5b,
\377\375\377?\207\000nowHHH\377H\377\377[\377[
artifact_prefix='/Users/levy/Documents/limbo/fuzz/artifacts/expression/'; Test unit written to /Users/levy/Documents/limbo/fuzz/artifacts/expression/crash-63bfc8813b82bd8b97c557650289a6bc2c055ca5
Base64: //3/P4cAbm93SEhI/0j//1v/Ww==

────────────────────────────────────────────────────────────────────────────────

Failing input:

        artifacts/expression/crash-63bfc8813b82bd8b97c557650289a6bc2c055ca5

Output of `std::fmt::Debug`:

        Value(
            Real(
                NaN,
            ),
        )

Reproduce with:

        cargo fuzz run expression artifacts/expression/crash-63bfc8813b82bd8b97c557650289a6bc2c055ca5

Minimize test case with:

        cargo fuzz tmin expression artifacts/expression/crash-63bfc8813b82bd8b97c557650289a6bc2c055ca5

────────────────────────────────────────────────────────────────────────────────
```

Reviewed-by: Preston Thorpe (@PThorpe92)
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1116
2025-03-26 18:36:39 +02:00
Pekka Enberg
114847415b Merge 'WAL frame checksum support' from Daniel Boll
closes #1151

Closes #1184
2025-03-26 17:48:52 +02:00
Daniel Boll
c2a2dfa67b Remove unused imports and handle WAL header read error
Refactor random number generation for WAL header salts
2025-03-26 11:31:29 -03:00
Pekka Enberg
fca643641c Merge 'Fix compute_shl negate with overflow' from Krishna Vishal
Fixes #1156
- Changed `compute_shl` implementation to handle negation with overflow.
- Added TCL tests.

Closes #1171
2025-03-26 16:08:59 +02:00
Pekka Enberg
3cb8cda746 Merge 'Unary + is a noop' from Levy A.
Fixes #1154.
Unary `+` is a noop in Sqlite: <https://sqlite.org/lang_expr.html#operat
ors_and_parse_affecting_attributes>.

Closes #1177
2025-03-26 15:59:45 +02:00
Pekka Enberg
96ab7938a8 github: Bump JavaScript workflow timeout to 20 minutes 2025-03-26 15:59:00 +02:00
Pekka Enberg
d6de99a7bb Merge 'Initial JavaScript bindings with napi-rs' from Pekka Enberg
Closes #1183
2025-03-26 13:36:19 +02:00