Commit Graph

3550 Commits

Author SHA1 Message Date
Pekka Enberg
9a646acead Merge 'Add BTree balancing after delete' from Krishna Vishal
This PR adds balancing after delete.
- [x] Remove linear search for cell in the page
- [x] Change the implementation to state machine approach
- [x] Handle cases when balancing is needed and not needed
- [x] Add unit test to verify that balancing after delete maintains
BTree integrity.
Fixes: https://github.com/tursodatabase/limbo/issues/1019
Closes: https://github.com/tursodatabase/limbo/issues/455

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1147
2025-03-28 07:32:15 +02:00
Pekka Enberg
8d682fe857 Merge 'Kill test environment' from Pekka Enberg
...not sure why we added it in the first place, but it spams PRs all the
time.

Closes #1196
2025-03-28 07:31:59 +02:00
Pekka Enberg
41bdf25433 Merge 'Remove public unlock method from SpinLock to prevent unsafe aliasing' from Krishna Vishal
Currently we can create two guards and unlock one of them and still have
two mutable references to the same data.
By being able to unlock only via guard we prevent unsafe scenarios.
Currently, the test below will pass and shows that we can hold two
mutable references to the same data. After this fix, this would result
in a deadlock (have to remove `lock.unlock()`)
```rust
#[test]
fn two_mutable_reference() {
    let lock = SpinLock::new(42);
    let guard = lock.lock();
    lock.unlock();
    let guard2 = lock.lock();

    // two mutable references to same data
    *guard = 10;
    *guard2 = 20;

    assert_eq!(*guard, 20);
    assert_eq!(*guard2, 20);
}
```
Note: The javascript action failure is unrelated to this PR.

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1194
2025-03-27 20:43:16 +02:00
Pekka Enberg
4ccfdb1639 Kill test environment
...not sure why we added it in the first place, but it spams PRs all the
time.
2025-03-27 19:46:30 +02:00
Pekka Enberg
c71dad7683 Merge 'Introduce Register struct ' from Pere Diaz Bou
OwnedValue has become a powerhouse of madness, mainly because I decided
to do it like that when I first introduced AggContext. I decided it was
enough and I introduced a `Register` struct that contains `OwnedValue`,
`Record` and `Aggregation`, this way we don't use `OwnedValue` for
everything make everyone's life harder.
This is the next step towards making ImmutableRecords the default
because I want to remove unnecessary allocations. Right now we clone
OwnedValues when we generate a record more than needed.

Closes #1188
2025-03-27 19:45:11 +02:00
Pere Diaz Bou
d01423df83 fix clippy 2025-03-27 17:54:32 +01:00
Pere Diaz Bou
9291f60722 Introduce Register struct
OwnedValue has become a powerhouse of madness, mainly because I decided
to do it like that when I first introduced AggContext. I decided it was
enough and I introduced a `Register` struct that contains `OwnedValue`,
`Record` and `Aggregation`, this way we don't use `OwnedValue` for
everything make everyone's life harder.

This is the next step towards making ImmutableRecords the default
because I want to remove unnecessary allocations. Right now we clone
OwnedValues when we generate a record more than needed.
2025-03-27 17:53:02 +01:00
krishvishal
dcd92954f4 Remove unlock method and move it to guard's drop.
This makes the interface safe and prevents unlocking while still holding the guard
2025-03-27 18:45:05 +05:30
Pekka Enberg
af6e9cd2c2 Merge 'Handle limit zero case in query plan emitter' from Preston Thorpe
closes #1190
Don't emit full query plan and open table if 0 `LIMIT`
```console
limbo> explain select * from t limit 0;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     2     0                    0   Start at 2
1     Halt               0     0     0                    0
2     Transaction        0     0     0                    0   write=false
3     Goto               0     1     0                    0
```
Surprisingly, sqlite will still emit the Open/Rewind/Column/etc, etc for
this case. Definitely feels super unnecessary.

Closes #1191
2025-03-27 08:59:54 +02:00
Pekka Enberg
ec742a8468 Merge 'Fix numeric conversion in SELECT -'e'' from Diego Reis
closes #1157

Closes #1167
2025-03-27 08:58:57 +02:00
Pekka Enberg
eb8866a106 Merge 'Reduce MVCC cursor memory consumption' from Ihor Andrianov
Fix memory hogging in MVCC scan cursor #1104
The current scan cursor loads all rowids at once, which blows up memory
on big tables.
Added BucketScanCursor that loads rowids in configurable batches -
better memory usage while keeping decent performance. It's a drop-in
replacement with the same interface.
Also included LazyScanCursor as an alternative that fetches one rowid at
a time, though it's less efficient due to log(n) skipmap lookups for
each row.
BucketScanCursor is the recommended approach for most use cases. WDYT?

Closes #1178
2025-03-27 08:56:35 +02:00
PThorpe92
a1ac0ca175 Handle limit zero case in query plan emitter 2025-03-26 15:33:05 -04:00
krishvishal
1b77b76d58 Add comments detailing Delete state machine flow and few delete steps. 2025-03-27 01:02:59 +05:30
krishvishal
fe7ff6f53d Add balancing states. Balancing test works. 2025-03-27 00:27:03 +05:30
Pekka Enberg
c1a0236dcc Merge 'Introduce immutable record' from Pere Diaz Bou
Currently we have a Record, which is a dumb vector of cloned values.
This is incredibly bad for performance as we do not want to clone
objects unless needed. Therefore, let's start by introducing this type
so that any record that has already been serialized will be returned
from btree in the format of a simple payload with reference to payload.

Closes #1176
2025-03-26 20:31:04 +02:00
krishvishal
3e3a0f56a1 Make delete re-entrant.
Setup DeleteState enum
2025-03-26 22:44:49 +05:30
Pere Diaz Bou
f07f10ac53 fix read empty blob/text 2025-03-26 18:11:44 +01:00
Pekka Enberg
e0c414f9dd Merge 'Introduce libFuzzer' from Levy A.
This PR introduces structured fuzzing with
[libFuzzer](https://llvm.org/docs/LibFuzzer.html). The expression target
implementation is not complete, but already found a compatibility issue.
More fuzzing targets should be moved from `tests/fuzz` to `fuzz` and
benefit from more advanced fuzzing techniques.
- [x] Add fuzzing guide to `README.md`
   - Install `cargo-fuzz`.
   - Use the nightly version of cargo with the `fuzz` dev shell or use
rustup to switch versions.
   - Run `cargo fuzz run ...`
- [x] Add all binary operations.
# 🐞 Bugs
Compatibility issue found when trying to `select ?` with a `NaN` value.
Sqlite returns `NULL`, while Limbo returns `NaN` (reasonable, but
incompatible).
```
thread '<unnamed>' panicked at fuzz_targets/expression.rs:130:5:
assertion `left == right` failed
  left: Null
 right: Float(NaN)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
==59288== ERROR: libFuzzer: deadly signal
    #0 0x00010564c0f0 in __sanitizer_print_stack_trace+0x28 (librustc-nightly_rt.asan.dylib:arm64+0x5c0f0)
    #1 0x0001024e7b64 in fuzzer::PrintStackTrace()+0x30 (expression:arm64+0x101c53b64)
    #2 0x0001024da650 in fuzzer::Fuzzer::CrashCallback()+0x60 (expression:arm64+0x101c46650)
    #3 0x000195fa6de0 in _sigtramp+0x34 (libsystem_platform.dylib:arm64+0x3de0)
    #4 0x000195f6ff6c in pthread_kill+0x11c (libsystem_pthread.dylib:arm64+0x6f6c)
    #5 0x000195e7c904 in abort+0x7c (libsystem_c.dylib:arm64+0x79904)
    #6 0x000102580990 in std::sys::pal::unix::abort_internal::hd275d720c474f43c+0x8 (expression:arm64+0x101cec990)
    #7 0x000102621604 in std::process::abort::h62d9ecef2f17e944+0x8 (expression:arm64+0x101d8d604)
    #8 0x0001024d93bc in libfuzzer_sys::initialize::_$u7b$$u7b$closure$u7d$$u7d$::h3b4b43a8f9432830+0xb8 (expression:arm64+0x101c453bc)
    #9 0x000102577de0 in std::panicking::rust_panic_with_hook::h19683f6fd94fb24c+0x2b8 (expression:arm64+0x101ce3de0)
    #10 0x000102577970 in std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h4e98e5e8777eac5e+0x8c (expression:arm64+0x101ce3970)
    #11 0x0001025754e0 in std::sys::backtrace::__rust_end_short_backtrace::h12a2d70ebc9128b2+0x8 (expression:arm64+0x101ce14e0)
    #12 0x000102577628 in rust_begin_unwind+0x1c (expression:arm64+0x101ce3628)
    #13 0x000102623340 in core::panicking::panic_fmt::h8c4d74b8e5179d60+0x1c (expression:arm64+0x101d8f340)
    #14 0x0001026236cc in core::panicking::assert_failed_inner::he8fd1f85d57f866a+0x104 (expression:arm64+0x101d8f6cc)
    #15 0x0001025c73dc in core::panicking::assert_failed::h3e7590b91d46bff9 panicking.rs:364
    #16 0x000100930910 in expression::do_fuzz::hfcf5c5e5fde1a31c expression.rs:130
    #17 0x0001009373fc in rust_fuzzer_test_input lib.rs:359
    #18 0x0001024d2f34 in std::panicking::try::do_call::hce6ebc856827ae8b+0xc4 (expression:arm64+0x101c3ef34)
    #19 0x0001024d9624 in __rust_try+0x18 (expression:arm64+0x101c45624)
    #20 0x0001024d896c in LLVMFuzzerTestOneInput+0x16c (expression:arm64+0x101c4496c)
    #21 0x0001024dc3cc in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long)+0x148 (expression:arm64+0x101c483cc)
    #22 0x0001024db8dc in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*)+0x58 (expression:arm64+0x101c478dc)
    #23 0x0001024dd920 in fuzzer::Fuzzer::MutateAndTestOne()+0x258 (expression:arm64+0x101c49920)
    #24 0x0001024de908 in fuzzer::Fuzzer::Loop(std::__1::vector<fuzzer::SizedFile, std::__1::allocator<fuzzer::SizedFile>>&)+0x38c (expression:arm64+0x101c4a908)
    #25 0x0001024fd120 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long))+0x1bac (expression:arm64+0x101c69120)
    #26 0x00010250d884 in main+0x34 (expression:arm64+0x101c79884)
    #27 0x000195bf0270  (<unknown module>)

NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
MS: 3 CopyPart-ShuffleBytes-CrossOver-; base unit: a36928cfe783d55be82d526168a2da57372fdfdc
0xff,0xfd,0xff,0x3f,0x87,0x0,0x6e,0x6f,0x77,0x48,0x48,0x48,0xff,0x48,0xff,0xff,0x5b,0xff,0x5b,
\377\375\377?\207\000nowHHH\377H\377\377[\377[
artifact_prefix='/Users/levy/Documents/limbo/fuzz/artifacts/expression/'; Test unit written to /Users/levy/Documents/limbo/fuzz/artifacts/expression/crash-63bfc8813b82bd8b97c557650289a6bc2c055ca5
Base64: //3/P4cAbm93SEhI/0j//1v/Ww==

────────────────────────────────────────────────────────────────────────────────

Failing input:

        artifacts/expression/crash-63bfc8813b82bd8b97c557650289a6bc2c055ca5

Output of `std::fmt::Debug`:

        Value(
            Real(
                NaN,
            ),
        )

Reproduce with:

        cargo fuzz run expression artifacts/expression/crash-63bfc8813b82bd8b97c557650289a6bc2c055ca5

Minimize test case with:

        cargo fuzz tmin expression artifacts/expression/crash-63bfc8813b82bd8b97c557650289a6bc2c055ca5

────────────────────────────────────────────────────────────────────────────────
```

Reviewed-by: Preston Thorpe (@PThorpe92)
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1116
2025-03-26 18:36:39 +02:00
Pekka Enberg
114847415b Merge 'WAL frame checksum support' from Daniel Boll
closes #1151

Closes #1184
2025-03-26 17:48:52 +02:00
Daniel Boll
c2a2dfa67b Remove unused imports and handle WAL header read error
Refactor random number generation for WAL header salts
2025-03-26 11:31:29 -03:00
Pekka Enberg
fca643641c Merge 'Fix compute_shl negate with overflow' from Krishna Vishal
Fixes #1156
- Changed `compute_shl` implementation to handle negation with overflow.
- Added TCL tests.

Closes #1171
2025-03-26 16:08:59 +02:00
Pekka Enberg
3cb8cda746 Merge 'Unary + is a noop' from Levy A.
Fixes #1154.
Unary `+` is a noop in Sqlite: <https://sqlite.org/lang_expr.html#operat
ors_and_parse_affecting_attributes>.

Closes #1177
2025-03-26 15:59:45 +02:00
Pekka Enberg
96ab7938a8 github: Bump JavaScript workflow timeout to 20 minutes 2025-03-26 15:59:00 +02:00
Pekka Enberg
d6de99a7bb Merge 'Initial JavaScript bindings with napi-rs' from Pekka Enberg
Closes #1183
2025-03-26 13:36:19 +02:00
Pekka Enberg
9ef729f81c Initial JavaScript bindings with napi-rs 2025-03-26 13:30:13 +02:00
Levy A.
5eae685fa8 add tests 2025-03-26 07:04:03 -03:00
Pere Diaz Bou
63cf86ba36 fix comparison of records 2025-03-26 10:10:19 +01:00
krishvishal
16d77acac6 Add comments describing the test_delete_balancing 2025-03-26 12:29:19 +05:30
Daniel Boll
4ea3faf0f0 Remove unnecessary TODO comment in wal.rs 2025-03-25 21:46:17 -03:00
Daniel Boll
6d42d6d485 Remove commented-out code and update min_frame assignment 2025-03-25 21:44:18 -03:00
Daniel Boll
5fc9ccdc8c Update checkpoint result initialization and WAL frame handling
- Use `CheckpointResult::default()` instead of `CheckpointResult::new()`
- Correct WAL frame header salt and checksum handling
- Ensure frame ID is 1-based and adjust frame offset calculation
- Add `Default` implementation for `CheckpointResult`
- Use random values for WAL header salts
2025-03-25 21:38:12 -03:00
Pere Diaz Bou
8642d416c7 Introduce immutable record.
Currently we have a Record, which is a dumb vector of cloned values.
This is incredibly bad for performance as we do not want to clone
objects unless needed. Therefore, let's start by introducing this type
so that any record that has already been serialized will be returned
from btree in the format of a simple payload with reference to payload.
2025-03-25 17:35:41 +01:00
Pekka Enberg
79620946c1 Merge 'JSON cache' from Ihor Andrianov
SQLite uses a similar approach for operations where up to 4 JSON objects
are accessed multiple times in a single query.
`SELECT json_extact(a, 'some_path'), json_remove(a, 'some_path')
json_set(a, 'some_path', 'some_value') from t;`

Closes #1163
2025-03-25 18:11:33 +02:00
Pekka Enberg
920b2efe31 Merge 'Bump rusqlite to 0.34' from Pere Diaz Bou
Closes #1175
2025-03-25 18:09:18 +02:00
Ihor Andrianov
7c1d827d33 clippy 2025-03-25 17:13:31 +02:00
Ihor Andrianov
8bfacf3955 add lazy and bucket cursor 2025-03-25 16:55:29 +02:00
Levy A.
dd10fb13a7 fix: unary + is a noop 2025-03-25 11:43:19 -03:00
Pere Diaz Bou
004dc374b2 bump rusqlite to 0.34 2025-03-25 14:17:31 +01:00
Pekka Enberg
28919038f1 Merge 'core: Rename FileStorage to DatabaseFile' from Pekka Enberg
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #1174
2025-03-25 11:35:00 +02:00
Pekka Enberg
df6af6ed79 core: Rename FileStorage to DatabaseFile 2025-03-25 11:15:16 +02:00
Pekka Enberg
1f29e1fe08 Add PyPI link to README 2025-03-25 09:49:44 +02:00
Pekka Enberg
b507ac401d Merge 'Fix a typo in README.md' from Tshepang Mbambo
Closes #1173
2025-03-25 09:31:05 +02:00
Pekka Enberg
93c0a29611 Merge 'Fix platform specific FFI C pointer type casts' from Preston Thorpe
Fixes #1159

Closes #1170
2025-03-25 09:10:16 +02:00
Pekka Enberg
731c3f037a Merge 'Improve Python bindings' from Diego Reis
Yet another PR to close #494.
While testing the code provided in the issue I noticed that it wasn't
closing the connection as it should, leading to lifetime issues like:
`Connection is unsendable, but is being dropped on another thread`. The
following code works fine:
```python
import limbo

def main():
    con = limbo.connect("test.db")
    cur = con.cursor()

    try:
        cur.execute("""
            CREATE TABLE IF NOT EXISTS users (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                username TEXT NOT NULL,
                email TEXT NOT NULL,
                role TEXT NOT NULL,
                created_at DATETIME NOT NULL DEFAULT (datetime('now'))
            )
        """)

        # Insert some sample data
        sample_users = [
            ("alice", "alice@example.com", "admin"),
            ("bob", "bob@example.com", "user"),
            ("charlie", "charlie@example.com", "moderator"),
            ("diana", "diana@example.com", "user")
        ]

        for username, email, role in sample_users:
            cur.execute("""
                INSERT INTO users (username, email, role)
                VALUES (?, ?, ?)
            """, (username, email, role))

        con.commit()

        # Query the table
        res = cur.execute("SELECT * FROM users")
        record = res.fetchone()
        print(record)

    finally:
        # Ensure connection is closed on the same thread <----
        con.close()
        pass

main()
```
You can test it [here](https://colab.research.google.com/drive/1NJau6Y9H
TRJrnYK_xp2AzwP_qEH8VsQx?usp=sharing)
To address these issues, this PR:
- Adds support for `with statement` a common resource management pattern
in Python;
- Close connection if it is dropped

Closes #1164
2025-03-25 09:08:01 +02:00
Tshepang Mbambo
8b48c4b7b7 readme: typo 2025-03-25 09:06:40 +02:00
krishvishal
1660ae5542 missed adding _ and a space. 2025-03-25 12:04:48 +05:30
krishvishal
785be8479f Fix a fuzzer failure and add tcl test covering the failure 2025-03-25 11:43:51 +05:30
krishvishal
f12e3a6993 For a few TCL tests more. 2025-03-25 10:28:48 +05:30
krishvishal
a8129d5e58 Add TCL tests for compute_shl 2025-03-25 10:26:08 +05:30
krishvishal
b55dc586bd change compute_shl implementation to handle negation with overflow 2025-03-25 10:10:15 +05:30