Commit Graph

1190 Commits

Author SHA1 Message Date
TcMits
b0f4dd49d5 use match_ignore_ascii_case macro 2025-09-03 12:01:52 +07:00
Pekka Enberg
1de647758f Merge 'refactor parser fmt' from Lâm Hoàng Phúc
@penberg this PR try to clean up `turso_parser`'s`fmt` code.
- `get_table_name` and `get_column_name` should return None when
table/column does not exist.
```rust
/// Context to be used in ToSqlString
pub trait ToSqlContext {
    /// Given an id, get the table name
    /// First Option indicates whether the table exists
    ///
    /// Currently not considering aliases
    fn get_table_name(&self, _id: TableInternalId) -> Option<&str> {
        None
    }

    /// Given a table id and a column index, get the column name
    /// First Option indicates whether the column exists
    /// Second Option indicates whether the column has a name
    fn get_column_name(&self, _table_id: TableInternalId, _col_idx: usize) -> Option<Option<&str>> {
        None
    }

    // help function to handle missing table/column names
    fn get_table_and_column_names(
        &self,
        table_id: TableInternalId,
        col_idx: usize,
    ) -> (String, String) {
        let table_name = self
            .get_table_name(table_id)
            .map(|s| s.to_owned())
            .unwrap_or_else(|| format!("t{}", table_id.0));

        let column_name = self
            .get_column_name(table_id, col_idx)
            .map(|opt| {
                opt.map(|s| s.to_owned())
                    .unwrap_or_else(|| format!("c{col_idx}"))
            })
            .unwrap_or_else(|| format!("c{col_idx}"));

        (table_name, column_name)
    }
}
```
- remove `FmtTokenStream` because it is same as `WriteTokenStream `
- remove useless functions and simplify `ToTokens`
```rust
/// Generate token(s) from AST node
/// Also implements Display to make sure devs won't forget Display
pub trait ToTokens: Display {
    /// Send token(s) to the specified stream with context
    fn to_tokens<S: TokenStream + ?Sized, C: ToSqlContext>(
        &self,
        s: &mut S,
        context: &C,
    ) -> Result<(), S::Error>;

    // Return displayer representation with context
    fn displayer<'a, 'b, C: ToSqlContext>(&'b self, ctx: &'a C) -> SqlDisplayer<'a, 'b, C, Self>
    where
        Self: Sized,
    {
        SqlDisplayer::new(ctx, self)
    }
}
```

Closes #2748
2025-09-02 18:35:43 +03:00
TcMits
bfff05faba merge main 2025-09-02 18:25:20 +07:00
TcMits
33a04fbaf7 resolve conflict 2025-09-02 17:30:10 +07:00
Pekka Enberg
87d3f74e6e Merge 'Evict page from cache if page is unlocked and unloaded' from Pedro Muniz
Because we can abort a read_page completion, this means a page can be in
the cache but be unloaded and unlocked. However, if we do not evict that
page from the page cache, we will return an unloaded page later which
will trigger assertions later on. This is worsened by the fact that page
cache is not per `Statement`, so you can abort a completion in one
Statement, and trigger some error in the next one if we don't evict the
page in these circumstances.
Also, to propagate IO errors we need to return the Error from
IOCompletions on step.

Closes #2785
2025-09-02 09:08:12 +03:00
Pekka Enberg
d959319b42 Merge 'Use u64 for file offsets in I/O and calculate such offsets in u64' from Preston Thorpe
Using `usize` to compute file offsets caps us at ~16GB on 32-bit
systems. For example, with 4 KiB pages we can only address up to 1048576
pages; attempting the next page overflows a 32-bit usize and can wrap
the write offset, corrupting data. Switching our I/O APIs and offset
math to u64 avoids this overflow on 32-bit targets

Closes #2791
2025-09-02 09:06:49 +03:00
Pekka Enberg
cfaba4ab10 Merge 'Implement libSQL's ALTER COLUMN extension' from Levy A.
Implement `ALTER COLUMN` as described here:
https://github.com/tursodatabase/libsql/blob/main/libsql-
sqlite3/doc/libsql_extensions.md#altering-columns
- [x] Add `ALTER COLUMN` to parser
- [x] Implement `Insn::AlterColumn`
- [x] Add tests

Closes #2814
2025-09-02 09:06:03 +03:00
PThorpe92
e9b50b63fb Return sqlite_version() without being initialized 2025-09-01 13:36:41 -04:00
pedrocarlo
53cfae1db4 return Error from step if IO failed 2025-09-01 11:10:39 -03:00
TcMits
37f33dc45f add eq/contains/starts_with/ends_with_ignore_ascii_case 2025-08-31 16:18:42 +07:00
Levy A.
293865c2d6 feat+fix: add tests and restrict altering some constraints 2025-08-30 03:43:31 -03:00
Levy A.
ad639b2b23 fix: reintroduce rename
we don't store the parsed column to replace just the name, this will be
refactored later with a more general approach
2025-08-30 03:10:39 -03:00
Levy A.
5b378e3730 feat: add AlterColumn instruction
also refactor `RenameColumn` to reuse the logic from `AlterColumn`
2025-08-30 03:10:39 -03:00
themixednuts
eb93e4edc9 remove to_upper_case in favor of eq_ignore_ascii_case 2025-08-29 20:24:43 -05:00
themixednuts
6ffbdb4908 fix: column case sensitivity on strict table 2025-08-29 20:24:43 -05:00
Pekka Enberg
9fc5947fa6 core/vdbe: Micro-optimize "zero_or_null" opcode
It's a hot instruction for TPC-H, for example, so worth optimizing.
Reduces op_zero_or_null() from 5.6% to 2.4% in CPU flamegraph for TCP-H
Q1.
2025-08-29 14:38:50 +03:00
PThorpe92
0a56d23402 Use u64 for file offsets in IO and calculate such offsets in u64 2025-08-28 09:44:00 -04:00
Pekka Enberg
2ea4354afe Merge 'Improve integrity check' from Nikita Sivukhin
- check free list trunk and pages
- use shared hash map to check for duplicate references for pages
- properly check overflow pages

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2816
2025-08-28 16:06:15 +03:00
Pere Diaz Bou
48e5ad7a55 core/schema: get_dependent_materialized_views_unnormalized
If we get a table name for in memory structure, it's safe to assume it's
already normalized.
2025-08-28 13:11:40 +02:00
Nikita Sivukhin
ae705445bf improve integrity check
- check free list trunk and pages
- use shared hash map to check for duplicate references for pages
- properly check overflow pages
2025-08-27 23:14:21 +04:00
TcMits
4ddfdb2a62 finish 2025-08-27 14:58:35 +07:00
Glauber Costa
097510216e implement the projector operator for DBSP
My goal with this patch is to be able to implement the ProjectOperator
for DBSP circuits using VDBE for expression evaluation.

*not* doing so is dangerous for the following reason: we will end up
with different, subtle, and incompatible behavior between SQLite
expressions if they are used in views versus outside of views.

In fact, even in our prototype had them: our projection tests, which
used to pass, were actually wrong =) (sqlite would return something
different if those functions were executed outside the view context)

For optimization reasons, we single out trivial expressions: they don't
have go through VDBE. Trivial expressions are expressions that only
involve Columns, Literals, and simple operators on elements of the same
type. Even type coercion takes this out of the realm of trivial.

Everything that is not trivial, is then translated with translate_expr -
in the same way SQLite will, and then compiled with VDBE.

We can, over time, make this process much better. There are essentially
infinite opportunities for optimization here. But for now, the main
warts are:
* VDBE execution needs a connection
* There is no good way in VDBE to pass parameters to a program.
* It is almost trivial to pollute the original connection. For example,
  we need to issue HALT for the program to stop, but seeing that halt
  will usually cause the program to try and halt the original program.

Subprograms, like the ones we use in triggers are a possible solution,
but they are much more expensive to execute, especially given that our
execution would essentially have to have a program with no other role
than to wrap the subprogram.

Therefore, what I am doing is:
* There is an in-memory database inside the projection operator (an
  obvious optimization is to share it with *all* projection operators).
* We obtain a connection to that database when the operator is created
* We use that connection to execute our VDBE, which offers a clean, safe
  and isolated way to execute the expression.
* We feed the values to the program manually by editing the registers
  directly.
2025-08-25 17:48:17 +03:00
Nikita Sivukhin
f7ad55b680 remove unnecessary argument 2025-08-25 12:24:39 +04:00
Jussi Saurio
54ff656c9d Do not clear txn state inside nested statement
If a connection does e.g. CREATE TABLE, it will start a "child statement"
to reparse the schema. That statement does not start its own transaction,
and so should not try to end the existing one either.

We had a logic bug where these steps would happen:

- `CREATE TABLE` executed successfully
- pread fault happens inside `ParseSchema` child stmt
- `handle_program_error()` is called
- `pager.end_tx()` returns immediately because `is_nested_stmt` is true
  and we correctly no-op it.
- however, crucially: `handle_program_error()` then sets tx state to None
- parent statement now catches error from nested stmt and calls
  `handle_program_error()`, which calls `pager.end_tx()` again, and since
  txn state is None, when it calls `rollback()` we panic on the assertion
 `"dirty pages should be empty for read txn"`

Solution:

Do not do _any_ error processing in `handle_program_error()` inside a nested
stmt. This means that the parent write txn is still active when it processes
the error from the child and we avoid this panic.
2025-08-25 08:49:22 +03:00
Pekka Enberg
9d2f26bb04 sqlite3: Implement sqlite3_clear_bindings() 2025-08-24 19:33:18 +03:00
Levy A.
4ba1304fb9 complete parser integration 2025-08-21 15:23:59 -03:00
Levy A.
186e2f5d8e switch to new parser 2025-08-21 15:19:16 -03:00
Jussi Saurio
dd2e0ea596 Fix: always emit rowid when column is rowid alias
SQLite does not store the rowid alias column in the record at all
when it is a rowid alias, because the rowid is always stored anyway
in the record header.
2025-08-21 16:40:10 +03:00
Pekka Enberg
1dc6fb97c0 Merge 'core/mvcc: store txid in conn and reset transaction state on commit ' from Pere Diaz Bou
We were storing `txid` in `ProgramState`, this meant it was impossible
to track interactive transactions. This was extracted to `Connection`
instead.
Moreover, transaction state for mvcc now is reset on commit.

Closes #2689
2025-08-20 16:51:41 +03:00
Pere Diaz Bou
9e3b7b0c98 core/mvcc: store txid in conn and reset transaction state on commit 2025-08-20 12:23:28 +02:00
Jussi Saurio
e5f04ae100 Merge 'refactor/vdbe: move insert-related seeking to VDBE from BTreeCursor' from Jussi Saurio
This gets rid of `InsertState` in `BTreeCursor` plus the `moved_before`
parameter to `BTreeCursor::insert` -- instead, seek logic is now in the
existing state machines for `op_insert` and `op_idx_insert`

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #2639
2025-08-20 11:15:09 +03:00
pedrocarlo
7e98a464a7 check if completion finished instead of completed for step 2025-08-20 00:38:16 -03:00
Jussi Saurio
c2855cb0db refactor/idx_insert: move seeking to VDBE instead of BTreeCursor
Also removes `InsertState` and `moved_before` since neither are
needed anymore.
2025-08-19 19:04:42 +03:00
Jussi Saurio
d191c7d98b refactor/insert: move seeking to VDBE instead of BTreeCursor 2025-08-19 19:04:20 +03:00
pedrocarlo
de1811dea7 abort completions on error 2025-08-19 10:48:21 -03:00
pedrocarlo
ab3b68e360 change completion callbacks to take a Result param + create separate functions to declare a completion errored 2025-08-19 10:48:21 -03:00
pedrocarlo
d0c13f0104 remove IOError from Parser + store only ErrorKind in LimboError 2025-08-19 10:48:21 -03:00
Jussi Saurio
7f1eac9560 Do not start or end transaction in nested statement 2025-08-19 13:03:14 +03:00
Preston Thorpe
82fe508609 Merge 'add metrics and implement the .stats command' from Glauber Costa
This adds basic statement and connection metrics like SQLite (and
libSQL) have.
This is particularly useful to show that materialized views are working:
turso> create table t(a);
turso> insert into t(a) values (1) , (2), (3), (4), (5), (6), (7), (8),
(9), (10); turso> create materialized view v as select count(*) from t;
turso> .stats on
Stats display enabled.
turso> select count(*) from t;
┌───────────┐
│ count (*) │
├───────────┤
│        10 │
└───────────┘
Statement Metrics:
  Row Operations:
    Rows read:        10
    Rows written:     0
    [ ... other metrics ... ]
turso> select * from v;
┌───────────┐
│ count (*) │
├───────────┤
│        10 │
└───────────┘
Statement Metrics:
  Row Operations:
    Rows read:        1
    Rows written:     0
    [ ... other metrics ... ]

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #2651
2025-08-18 20:26:48 -04:00
Glauber Costa
36fc8e8fdb add metrics and implement the .stats command
This adds basic statement and connection metrics like SQLite (and
libSQL) have.

This is particularly useful to show that materialized views are working:

turso> create table t(a);
turso> insert into t(a) values (1) , (2), (3), (4), (5), (6), (7), (8), (9), (10);
turso> create materialized view v as select count(*) from t;
turso> .stats on
Stats display enabled.
turso> select count(*) from t;
┌───────────┐
│ count (*) │
├───────────┤
│        10 │
└───────────┘

Statement Metrics:
  Row Operations:
    Rows read:        10
    Rows written:     0
    [ ... other metrics ... ]

turso> select * from v;
┌───────────┐
│ count (*) │
├───────────┤
│        10 │
└───────────┘

Statement Metrics:
  Row Operations:
    Rows read:        1
    Rows written:     0
    [ ... other metrics ... ]
2025-08-18 09:11:06 -05:00
Pere Diaz Bou
94cd504d4c core/mvcc: support schema_did change on commit_txn
This not only changes schema_did_change on commit_txn for mvcc, but also
extracts the connection transaction state from non mvcc transactions to
mvcc too.
2025-08-18 15:52:10 +02:00
Jussi Saurio
c4f530d8f5 Merge 'unify halts' from Glauber Costa
We have halt and op_halt, doing essentially the same thing.
This PR unifies them. There is a minor difference between them now in
the way halt() handles auto-commit. My current understanding of the code
is that what we have in halt *is a bug*, which is already one bad
consequence of the duplication.

Closes #2631
2025-08-17 14:39:30 +03:00
Glauber Costa
270245b4d3 unify halts
We have halt and op_halt, doing essentially the same thing.

This PR unifies them. There is a minor difference between them now in
the way halt() handles auto-commit. My current understanding of the code
is that what we have in halt *is a bug*, which is already one bad
consequence of the duplication.
2025-08-16 16:52:53 -05:00
Preston Thorpe
7f8e181cda Merge 'Add documentation and rename functions' from Mikaël Francoeur
This PR adds documentation and renames some functions. Among other
things, I renamed everything that was still called `owned_value` to
either `db_value` or `value` (after
https://github.com/tursodatabase/turso/pull/1488).

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #2619
2025-08-16 17:45:33 -04:00
PThorpe92
2c526c4c37 Add io_yield_x macros to reduce boilerplate 2025-08-16 16:14:00 -04:00
Mikaël Francoeur
2ee0132afe rename functions 2025-08-15 17:08:53 -04:00
Pekka Enberg
c1c2b45141 core/vdbe: Drop excessive logging 2025-08-15 14:56:06 +03:00
Jussi Saurio
c75e4c1092 Fix non-4096 page sizes by making WAL header lazy 2025-08-14 12:40:58 +03:00
Jussi Saurio
a2a6feb193 Merge 'Use BufferPool owned by Database instead of a static global' from Jussi Saurio
## Problem
There are several problems with our current statically allocated
`BufferPool`.
1. You cannot open two databases in the same process with different page
sizes, because the `BufferPool`'s `Arena`s will be locked forever into
the page size of the first database. This is the case regardless of
whether the two `Database`s are open at the same time, or if the first
is closed before the second is opened.
2. It is impossible to even write Rust tests for different page sizes
because of this, assuming the test uses a single process.
## Solution
Make `Database` own `BufferPool` instead of it being statically
allocated, so this problem goes away.
Note that I didn't touch the still statically-allocated
`TEMP_BUFFER_CACHE`, because it should continue to work regardless of
this change. It should only be a problem if the user has two or more
databases with different page sizes open simultaneously, because
`TEMP_BUFFER_CACHE` will only support one pool of a given page size at a
time, so the rest of the allocations will go through the global
allocator instead.
## Notes
I extracted this change out from #2569, because I didn't want it to be
smuggled in without being reviewed as an individual piece.

Reviewed-by: Avinash Sajjanshetty (@avinassh)

Closes #2596
2025-08-14 12:40:32 +03:00
Jussi Saurio
d7186c7d7b Merge 'Add support for unlikely(X)' from bit-aloo
Implements the unlikely(X) function. Removes runtime implementations of
likely(), unlikely() and likelihood(), replacing them with panics if
they reach the VDBE.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2559
2025-08-14 10:56:27 +03:00