Commit Graph

6231 Commits

Author SHA1 Message Date
Glauber Costa
cbdd5c5fc7 improve handling of double quotes
I ended up hitting #1974 today and wanted to fix it. I worked with
Claude to generate a more comprehensive set of queries that could fail
aside from just the insert query described in the issue. He got most of
them right - lots of cases were indeed failing. The ones that were
gibberish, he told me I was absolutely right for pointing out they were
bad.

But alas. With the test cases generated, we can work on fixing it. The
place where the assertion was hit, all we need to do there is return
true (but we assert that this is indeed a string literal, it shouldn't
be anything else at this point).

There are then just a couple of places where we need to make sure we
handle double quotes correctly. We already tested for single quotes in a
couple of places, but never for double quotes.

There is one funny corner case where you can just select "col" from tbl,
and if there is no column "col" on the table, that is treated as a
string literal. We handle that too.

Fixes #1974
2025-07-18 10:39:02 -05:00
Glauber Costa
523f8f9c67 add .dbconfig option
Currently ignored. The reason we are adding it is so that we have
an output that can fit in a single line. This is so we can use it in
tests, and have a predictable output pattern for both sqlite and turso.
2025-07-18 10:25:06 -05:00
Jussi Saurio
bac811caad Merge 'fix/btree: fix insert_into_cell() logic' from Jussi Saurio
## What was wrong
During running simulations for #1988 I ran into a post-balance
validation error where the correct divider cell could not be found from
the parent.
This was caused by divider cell insertion happening this way:
- First divider cell caused overflow
- Second technically had space to fit, so we didn't add it to overflow
cells
- During balance validation, we were not able to find the divider in the
expected slot.
## First fix attempt
I looked at SQLite source, and it seems SQLite always adds the cell to
overflow cells if there are existing overflow cells, and doesn't allow
normal insertion even if the cell payload would fit:
```c
if( pPage->nOverflow || sz+2>pPage->nFree ){
  ...add to overflow cells...
}
```
So, I changed our implementation to do the same, which fixed the balance
validation issue.
## The sequel
However, then I ran into another issue:
A cell inserted during balancing in the `edit_page()` stage was added to
overflow cells, which should not happen. The reason for this was the
changed logic in `insert_into_page()`, outlined above. Since the page
being balanced contained not-yet-cleared overflow cells, any insert to
it ended up being shoved into the overflow cells vector too.
It looks like - unlike us - SQLite doesn't use the equivalent of
`insert_into_cell()` in its implementation of `page_insert_array()`
which explains this.
## Second fix
For simplicity, I made a second version of `insert_into_cell()` called
`insert_into_cell_during_balance()` which allows regular cell insertion
despite existing overflow cells, since the existing overflow cells are
what caused the balance to happen in the first place and will be cleared
as soon as `edit_page()` is done.

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2138
2025-07-17 19:06:40 +03:00
Jussi Saurio
49b9a69c40 fix/btree: fix insert_into_cell() logic
During running simulations for #1988 I ran into a post-balance validation
error where the correct divider cell could not be found from the parent.

This was caused by divider cell insertion happening this way:
- First divider cell caused overflow
- Second technically had space to fit, so we didn't add it to overflow cells

I looked at SQLite source, and it seems SQLite always adds the cell to overflow
cells if there are existing overflow cells:

```c
if( pPage->nOverflow || sz+2>pPage->nFree ){
  ...add to overflow cells...
}
```

So, I changed our implementation to do the same, which fixed the balance validation
issue.

However, then I ran into another issue:

A cell inserted during balancing in the `edit_page()` stage was added to overflow cells,
which should not happen. The reason for this was the changed logic in `insert_into_page()`,
outlined above.

It looks like SQLite doesn't use `insert_into_cell()´ in its implementation of `page_insert_array()`
which explains this.

For simplicity, I made a second version of `insert_into_cell()` called `insert_into_cell_during_balance()`
which allows regular cell insertion despite existing overflow cells, since the existing overflow cells are
what caused the balance to happen in the first place and will be cleared as soon as `edit_page()` is done.
2025-07-17 18:26:14 +03:00
Jussi Saurio
cb163fc955 Merge 'cli: fix not being able to redirect traces to a file from inline query' from Jussi Saurio
doing e.g. `RUST_LOG=debug target/debug/tursodb foo.db 'SELECT * FROM
bar' &> output.txt` didn't generate traces because the tracer was
initialized after `app.first_run()`

Closes #2114
2025-07-17 13:56:49 +03:00
Jussi Saurio
f8b06d9862 Merge 'page cache: temporarily increase default size until WAL spill is implemented' from Jussi Saurio
This unblocks proper testing in simulator where esp. with indexes
enabled, by far the most common reason for sim failure is cache being
full.

Reviewed-by: Pekka Enberg <penberg@iki.fi>

Closes #2135
2025-07-17 13:11:00 +03:00
Jussi Saurio
01ad75ecd0 page cache: temporarily increase default size until WAL spill is implemented 2025-07-17 12:28:44 +03:00
Jussi Saurio
5a2efa3077 Merge 'refactor/btree&vdbe: fold index key info (sort order, collations) into a single struct' from Jussi Saurio
These are nearly always used together in some form, so it makes sense to
colocate them, and it also makes many code paths simpler, as we don't
separately pass `collations` and `key_sort_order` around
As a side effect, as the bitfield-based `IndexKeySortOrder` is removed,
we now remove the arbitrary 64 column restriction for indexes, see e.g.
this sim failure which fails to 64+ index columns (not sure why it uses
an index if they are disabled):
https://github.com/tursodatabase/turso/actions/runs/16339391964/job/4615
8045158

Closes #2131
2025-07-17 11:55:56 +03:00
Jussi Saurio
e8199cb26c btree/vdbe: fold index key info (sort order, collations) into a single struct
These are nearly always used together in some form, so it makes sense to colocate
them, and it also makes many code paths simpler.
2025-07-17 10:58:43 +03:00
Pekka Enberg
45c77f5e07 Merge 'bind/javascript: Fix presentation mode disabling logic' from Diego Reis
Presentation mode disabling logic was ~obviously~ wrong (my bad), it's
fixed now.

Closes #2125
2025-07-17 10:51:15 +03:00
Pekka Enberg
99cdcf5348 Merge 'core: Copy-on-write for in-memory schema' from Levy A.
<img height="400" alt="image" src="https://github.com/user-
attachments/assets/bdd5c0a8-1bbb-4199-9026-57f0e5202d73" />
<img height="400" alt="image" src="https://github.com/user-
attachments/assets/7ea63e58-2ab7-4132-b29e-b20597c7093f" />
We were copying the schema preemptively on each `Database::connect`, now
the schema is shared until a change needs to be made by sharing a single
`Arc` and mutating it via `Arc::make_mut`. This is faster as reduces
memory usage.

Closes #2022
2025-07-17 10:46:46 +03:00
Pekka Enberg
2c23d8d9e3 Merge 'simulator: Disable INSERT INTO .. SELECT for being slow' from Pekka Enberg
Refs #2129

Closes #2130
2025-07-17 10:07:47 +03:00
Pekka Enberg
e8ac707190 simulator: Disable INSERT INTO .. SELECT for being slow
Refs #2129
2025-07-17 09:20:00 +03:00
Pekka Enberg
cae1c289b2 github: Reduce simulator iterations
...hopefully fixes simulator runs timing out problem.
2025-07-17 08:52:06 +03:00
Pekka Enberg
ae4dcbad0f Merge 'Async IO: registration of file descriptors' from Preston Thorpe
### Async IO performance, part 0
Relatively small and focused PR that mainly does two things, will add a
.md document of the proposed/planned improvements to the io_uring module
to fully revamp our async IO.
1. **Registration of file descriptors.**
At startup, by calling `io_uring_register_files_sparse` we can allocate
an array in shared kernel/user space by calling register_files_sparse
which initializes each slot to `-1`, and when we open a file we call
`io_uring_register_files_update`, providing an index into this array and
`fd`.
Then for the IO submission, we can reference the index into this array
instead of the fd, saving the kernel the work of looking up the fd in
the process file table, incrementing the reference count, doing the
operation, then finally decrementing the refcount. Instead the kernel
can just index into the array and do the operation.
This especially provides an improvement for cases like this, where files
are open for long periods of time, which the kernel will perform many
operations on.
The eventual goal of this, is to use Fixed read/write operations, where
both the file descriptor and the underlying buffer is registered with
the kernel. There is another branch continuing this work, that
introduces a buffer pool that memlock's one large 32MB arena mmap and
tries to use that wherever possible.
These Fixed operations are essentially the "holy grail" of io_uring
performance (for file operations).
2. **!Vectored IO**
This is kind of backwards, because the goal is to indeed implement
proper vectored IO and I'm removing some of the plumbing in this PR, but
currently we have been using `Writev`/`Readv`, while never submitting >
1 iovec at a time.
Writes to the WAL, especially, would benefit immensely from vectored IO,
as it is append-only and therefore all writes are contiguous. Regular
checkpointing/cache flushing to disk can also be adapted to aggregate
these writes and submit many in a single system call/opcode.
Until this is implemented, the bookkeeping and iovecs are unnecessary
noise/overhead, so let's temporarily remove them and revert to normal
`read`/`write` until they are needed and it can be designed from
scratch.
3. **Flags**
`setup_single_issuer` hints to the kernel that `IOURING_ENTER` calls
will all be sent from a single thread, and `setup_coop_taskrun` removes
some unnecessary kernel interrupts for providing cqe's which most single
threaded applications do not need. Both these flags demonstrate modest
improvement of performance.

Closes #2127
2025-07-17 08:47:44 +03:00
Pekka Enberg
d2158ff201 Merge 'Clean up AST unparsing, remove ToSqlString' from Levy A.
Enables formatting `Expr::Column` by adding the context to `ToTokens`
instead of creating a new unparsing implementation for each node.
`ToTokens` implemented for:
- [x] `UpdatePlan`
- [x] `Plan`
- [x] `JoinedTable`
- [x] `SelectPlan`
- [x] `DeletePlan`

Reviewed-by: Pedro Muniz (@pedrocarlo)

Closes #1949
2025-07-17 08:44:31 +03:00
PThorpe92
ad2ae3e22f Use fallback to regular fd if file registration is unavailable in io_uring 2025-07-16 23:08:46 -04:00
PThorpe92
fb78cdade0 Increase ring size from 128 -> 512 2025-07-16 22:44:20 -04:00
PThorpe92
4d09f1ab65 Enable coop_taskrun flag to disable excessive interrupts for completions 2025-07-16 22:43:44 -04:00
PThorpe92
95c343586c Enable single_issuer flag for io_uring to signal submissions from single thread 2025-07-16 22:42:40 -04:00
PThorpe92
9dfadf7872 Add registered file descriptors to io_uring IO module 2025-07-16 22:41:47 -04:00
Diego Reis
21882d1db3 bind/js: Fix presentation mode disabling logic 2025-07-16 15:07:12 -03:00
Pekka Enberg
b03b06107b Turso 0.1.3-pre.2 2025-07-16 20:08:46 +03:00
Pekka Enberg
c378f8a8bb Merge 'compat: add integrity_check' from Pere Diaz Bou
Reviewed-by: Avinash Sajjanshetty (@avinassh)
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2119
2025-07-16 20:08:32 +03:00
Pekka Enberg
e6c3a5a9b8 Merge 'rename operation_xxx to change_xxx to make naming more consistent' from Nikita Sivukhin
This PR renames CDC table column names to use "change"-centric
terminology and avoid using `operation_xxx` column names.
Just a small refactoring to bring more consistency as `turso-db` refer
to the feature as capture data **changes** - and there is no word
operation here.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2120
2025-07-16 20:08:23 +03:00
Pekka Enberg
af182d9895 Merge 'btree: fix post-balancing seek bug in delete path' from Jussi Saurio
Aftermath of seek-related refactor in #2065, which you can read for
background. The change in this PR is documented pretty well inline - if
we receive a `TryAdvance` seek result when seeking after balancing, we
need to - well - try to advance.
Closes #2116

Closes #2115
2025-07-16 20:08:15 +03:00
Levy A.
8e8f1682df add with_schema_mut
removes all repeated `Arc::make_mut`
2025-07-16 13:54:39 -03:00
Levy A.
d0e26db01a use lock for database schema 2025-07-16 13:54:39 -03:00
Levy A.
4c77d771ff only copy schema on writes 2025-07-16 13:54:36 -03:00
Jussi Saurio
bb0c017d9f Merge 'btree: fix trying to go upwards when we are already at the end of the entire btree' from Jussi Saurio
## What does this fix
This PR fixes an issue with BTree upwards traversal logic where we would
try to go up to a parent node in `next()` even though we are at the very
end of the btree. This behavior can leave the cursor incorrectly
positioned at an interior node when it should be at the right edge of
the rightmost leaf.
## Why doesn't it cause problems on main
This bug is masked on `main` by every table `insert()` (wastefully)
calling `find_cell()`:
- `op_new_rowid` called, let's say the current max rowid is `666`.
Cursor is left pointing at `666`.
- `insert()` is called with rowid `667`, cursor is currently pointing at
`666`, which is incorrect.
- `find_cell()` does a binary search every time, and hence somewhat
accidentally positions the cursor correctly _after_ `666` so that the
insert goes to the correct place
## Why was this issue found
in #1988, I am removing `find_cell()` entirely in favor of always
performing a seek to the correct location - and skipping `seek` when it
is not required, saving us from wasting a binary search on every insert
- but this change means that we need to call `next()` after
`op_new_rowid` to have the cursor positioned correctly at the new
insertion slot. Doing this surfaces this upwards traversal bug in that
PR branch.
## Details of solution
- Store `cell_count` together with `cell_idx` in pagestack, so that
chlidren can know whether their parents have reached their end without
doing IO
- To make this foolproof, pin pages on `PageStack` so the page cache
cannot evict them during tree traversal
- `cell_indices` renamed to `node_states` since it now carries more
information (cell index AND count, instead of just index)

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2005
2025-07-16 19:44:21 +03:00
Jussi Saurio
43f0ab39dc Merge 'Separate user-callable cacheflush from internal cacheflush logic' from Diego Reis
Cacheflush should only spill pages to WAL as non-commit frames, without
checkpointing nor syncing.
- [docs](https://sqlite.org/c3ref/db_cacheflush.html)
- [sqlite3PagerFlush](https://github.com/sqlite/sqlite/blob/625d0b70febe
cb0864a81b2a047a961a59e8c17e/src/pager.c#L4669)

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2044
2025-07-16 19:44:12 +03:00
Nikita Sivukhin
97b82fe6d8 rename operation_xxx to change_xxx to make naming more consistent 2025-07-16 20:16:24 +04:00
Levy A.
89911ee8d1 remove to_sql_string from simulator 2025-07-16 12:34:10 -03:00
Pere Diaz Bou
d559bf3d9f compat: add integrity_check 2025-07-16 17:19:51 +02:00
Levy A.
714225b9f0 remove ToSqlString trait 2025-07-16 12:16:34 -03:00
Levy A.
6fe2505425 add more ToTokens impls 2025-07-16 12:16:31 -03:00
Levy A.
e81c7b07fb modify tests for new formatter 2025-07-16 12:16:31 -03:00
Levy A.
373a4a26c4 fix: comma function 2025-07-16 12:16:28 -03:00
Levy A.
765b90aeb9 feat: implement ToTokens for UpdatePlan 2025-07-16 12:16:23 -03:00
Levy A.
9ff9c3fdc2 feat: add context to ToTokens 2025-07-16 12:12:15 -03:00
Diego Reis
b86674adbb Remove cache clearing in cacheflush 2025-07-16 11:11:52 -03:00
Jussi Saurio
8558675c4c page cache: pin pages on the stack 2025-07-16 17:09:05 +03:00
Diego Reis
5dd571483f Add cacheflush to Rust binding 2025-07-16 11:08:52 -03:00
Diego Reis
817ad8d50f Separate user-callable cacheflush from internal cacheflush logic
Cacheflush should only spill pages to WAL as non-commit frames, without checkpointing nor syncing. Check SQLite's sqlite3PagerFlush
2025-07-16 11:08:50 -03:00
Jussi Saurio
f7b9265c26 btree: fix trying to go upwards when at end of btree 2025-07-16 16:58:42 +03:00
Jussi Saurio
e0d797aac0 btree: use node_states instead of cell_indices (tracks cell count too) 2025-07-16 16:58:41 +03:00
Jussi Saurio
f0145fef5c btree: create BTreeNodeState struct for tracking cell idx and count 2025-07-16 16:58:11 +03:00
Jussi Saurio
ac065a79bb btree: fix post-balancing seek bug in delete path 2025-07-16 14:23:46 +03:00
Jussi Saurio
2fbb21fc17 cli: fix not being able to redirect traces to a file from inline query 2025-07-16 13:55:48 +03:00
Pekka Enberg
93634d56ba Turso 0.1.3-pre.1 2025-07-16 13:16:57 +03:00