mirror of
https://github.com/aljazceru/turso.git
synced 2026-02-12 03:34:20 +01:00
Closes #2225. ## What We currently do not use pages in the [freelist](https://www.sqlite.org/fileformat.html#the_freelist) at all when allocating new pages. ## Why is this bad The effect of this is that 1. UPDATEs with overflow pages become really slow and 2. the database size grows really quickly. See #2225 for an extreme example comparison with SQLite. ## The fix Whenever `allocate_page()` is called, we first check if we have pages in the freelist, and if we do, we recycle one of those pages instead of creating a new one. If there are no freelist pages, we allocate a new page as normal. ## Implementation notes - `allocate_page()` now needs to return an `IOResult`, which means all of its callers also need to return an `IOResult`, necessitating quite a bit of new state machine logic to ensure re-entrancy. - I left a few "synchronous IO hacks" in the `balance()` routine because the size of this PR would balloon even more than it already has if I were to fix those immediately in this PR. - `fill_cell_payload()` uses some `unsafe` code to avoid lifetime issues, and adds an unfortunate double-indirection via `Arc<Mutex<Vec<T>>>` because the existing btree code constantly clones `WriteState`, and we must ensure the underlying buffers referenced by raw pointers in `fill_cell_payload` remain valid. **Follow-up cleanups:** 1. remove synchronous IO hacks that would require even more state machines and are best left for another PR 2. remove `Clone` from `WriteState` and implement it better ## Perf comparison `main`: 33 seconds ``` jussi@Jussis-MacBook-Pro limbo % time target/release/tursodb --experimental-indexes apinatest_main.db <<'EOF' create table t(x, y, z unique); insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100); insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100); insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100); insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100); insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100); insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); EOF Turso v0.1.3-pre.3 Enter ".help" for usage hints. This software is ALPHA, only use for development, testing, and experimentation. target/release/tursodb --experimental-indexes apinatest_main.db <<<'' 6.81s user 21.18s system 83% cpu 33.643 total ``` PR: 13 seconds ``` jussi@Jussis-MacBook-Pro limbo % time target/release/tursodb --experimental-indexes apinatest_PR.db <<'EOF' create table t(x, y, z unique); insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100); insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100); insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100); insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100); insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100); insert into t select randomblob(1024*128),randomblob(1024*128),randomblob(1024*128) from generate_series(1, 100); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); update t set x = x + 1 WHERE z > randomblob(1024*128); EOF Turso v0.1.3-pre.3 Enter ".help" for usage hints. This software is ALPHA, only use for development, testing, and experimentation. target/release/tursodb --experimental-indexes apinatest_PR.db <<<'' 3.89s user 7.83s system 89% cpu 13.162 total ``` (sqlite: 2 seconds 🤡 ) --- TODO: - [x] Fix whatever issue the simulator caught in CI (#2238 ) - [x] Post a performance comparison - [x] Fix autovacuum test failure - [x] Improve docs - [x] Fix `fill_cell_payload` re-entrancy issue when allocating overflow pages - [x] Add proper PR description Reviewed-by: Pere Diaz Bou <pere-altea@homail.com> Closes #2233