Commit Graph

674 Commits

Author SHA1 Message Date
Pekka Enberg
5dcffadad6 core/vdbe: Remove empty loop
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-05 16:03:25 +03:00
Levy A.
a7b60e6b00 fix: return NULL for negative base or input on exec_math_log 2025-09-05 10:00:59 -03:00
Glauber Costa
08b2e685d5 Persistence for DBSP-based materialized views
This fairly long commit implements persistence for materialized view.
It is hard to split because of all the interdependencies between components,
so it is a one big thing. This commit message will at least try to go into
details about the basic architecture.

Materialized Views as tables
============================

Materialized views are now a normal table - whereas before they were a virtual
table.  By making a materialized view a table, we can reuse all the
infrastructure for dealing with tables (cursors, etc).

One of the advantages of doing this is that we can create indexes on view
columns.  Later, we should also be able to write those views to separate files
with ATTACH write.

Materialized Views as Zsets
===========================

The contents of the table are a ZSet: rowid, values, weight. Readers will
notice that because of this, the usage of the ZSet data structure dwindles
throughout the codebase. The main difference between our materialized ZSet and
the standard DBSP ZSet, is that obviously ours is backed by a BTree, not a Hash
(since SQLite tables are BTrees)

Aggregator State
================

In DBSP, the aggregator nodes also have state. To store that state, there is a
second table.  The table holds all aggregators in the view, and there is one
table per view. That is __turso_internal_dbsp_state_{view_name}. The format of
that table is similar to a ZSet: rowid, serialized_values, weight. We serialize
the values because there will be many aggregators in the table. We can't rely
on a particular format for the values.

The Materialized View Cursor
============================

Reading from a Materialized View essentially means reading from the persisted
ZSet, and enhancing that with data that exists within the transaction.
Transaction data is ephemeral, so we do not materialize this anywhere: we have
a carefully crafted implementation of seek that takes care of merging weights
and stitching the two sets together.
2025-09-05 07:04:33 -05:00
Levy A.
73e901010c fix: float formating and float comparison 2025-09-05 02:35:03 -03:00
Pekka Enberg
ecbcd1ecd3 Merge ' core/mvcc: make commit_txn return on I/O ' from Pere Diaz Bou
`commit_txn` in MVCC was hacking its way through I/O until now. After
adding this and the test for concurrent writers we now see `busy` errors
returning as expected because there is no `commit` queueing happening
yet until next PR I open.

Closes #2895
2025-09-04 21:24:10 +03:00
Pekka Enberg
44357f93a2 Merge branch 'main' into 2025-08-21-make-limit-and-offset-expr 2025-09-04 09:54:45 +03:00
TcMits
b0f4dd49d5 use match_ignore_ascii_case macro 2025-09-03 12:01:52 +07:00
Pekka Enberg
1de647758f Merge 'refactor parser fmt' from Lâm Hoàng Phúc
@penberg this PR try to clean up `turso_parser`'s`fmt` code.
- `get_table_name` and `get_column_name` should return None when
table/column does not exist.
```rust
/// Context to be used in ToSqlString
pub trait ToSqlContext {
    /// Given an id, get the table name
    /// First Option indicates whether the table exists
    ///
    /// Currently not considering aliases
    fn get_table_name(&self, _id: TableInternalId) -> Option<&str> {
        None
    }

    /// Given a table id and a column index, get the column name
    /// First Option indicates whether the column exists
    /// Second Option indicates whether the column has a name
    fn get_column_name(&self, _table_id: TableInternalId, _col_idx: usize) -> Option<Option<&str>> {
        None
    }

    // help function to handle missing table/column names
    fn get_table_and_column_names(
        &self,
        table_id: TableInternalId,
        col_idx: usize,
    ) -> (String, String) {
        let table_name = self
            .get_table_name(table_id)
            .map(|s| s.to_owned())
            .unwrap_or_else(|| format!("t{}", table_id.0));

        let column_name = self
            .get_column_name(table_id, col_idx)
            .map(|opt| {
                opt.map(|s| s.to_owned())
                    .unwrap_or_else(|| format!("c{col_idx}"))
            })
            .unwrap_or_else(|| format!("c{col_idx}"));

        (table_name, column_name)
    }
}
```
- remove `FmtTokenStream` because it is same as `WriteTokenStream `
- remove useless functions and simplify `ToTokens`
```rust
/// Generate token(s) from AST node
/// Also implements Display to make sure devs won't forget Display
pub trait ToTokens: Display {
    /// Send token(s) to the specified stream with context
    fn to_tokens<S: TokenStream + ?Sized, C: ToSqlContext>(
        &self,
        s: &mut S,
        context: &C,
    ) -> Result<(), S::Error>;

    // Return displayer representation with context
    fn displayer<'a, 'b, C: ToSqlContext>(&'b self, ctx: &'a C) -> SqlDisplayer<'a, 'b, C, Self>
    where
        Self: Sized,
    {
        SqlDisplayer::new(ctx, self)
    }
}
```

Closes #2748
2025-09-02 18:35:43 +03:00
Pere Diaz Bou
13c505109a core/mvcc: make commit_txn return on I/O 2025-09-02 17:07:38 +02:00
TcMits
bfff05faba merge main 2025-09-02 18:25:20 +07:00
TcMits
33a04fbaf7 resolve conflict 2025-09-02 17:30:10 +07:00
Pekka Enberg
cfaba4ab10 Merge 'Implement libSQL's ALTER COLUMN extension' from Levy A.
Implement `ALTER COLUMN` as described here:
https://github.com/tursodatabase/libsql/blob/main/libsql-
sqlite3/doc/libsql_extensions.md#altering-columns
- [x] Add `ALTER COLUMN` to parser
- [x] Implement `Insn::AlterColumn`
- [x] Add tests

Closes #2814
2025-09-02 09:06:03 +03:00
PThorpe92
e9b50b63fb Return sqlite_version() without being initialized 2025-09-01 13:36:41 -04:00
TcMits
37f33dc45f add eq/contains/starts_with/ends_with_ignore_ascii_case 2025-08-31 16:18:42 +07:00
Levy A.
293865c2d6 feat+fix: add tests and restrict altering some constraints 2025-08-30 03:43:31 -03:00
Levy A.
ad639b2b23 fix: reintroduce rename
we don't store the parsed column to replace just the name, this will be
refactored later with a more general approach
2025-08-30 03:10:39 -03:00
Levy A.
5b378e3730 feat: add AlterColumn instruction
also refactor `RenameColumn` to reuse the logic from `AlterColumn`
2025-08-30 03:10:39 -03:00
themixednuts
eb93e4edc9 remove to_upper_case in favor of eq_ignore_ascii_case 2025-08-29 20:24:43 -05:00
themixednuts
6ffbdb4908 fix: column case sensitivity on strict table 2025-08-29 20:24:43 -05:00
Pekka Enberg
9fc5947fa6 core/vdbe: Micro-optimize "zero_or_null" opcode
It's a hot instruction for TPC-H, for example, so worth optimizing.
Reduces op_zero_or_null() from 5.6% to 2.4% in CPU flamegraph for TCP-H
Q1.
2025-08-29 14:38:50 +03:00
Pekka Enberg
2ea4354afe Merge 'Improve integrity check' from Nikita Sivukhin
- check free list trunk and pages
- use shared hash map to check for duplicate references for pages
- properly check overflow pages

Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>

Closes #2816
2025-08-28 16:06:15 +03:00
Pere Diaz Bou
48e5ad7a55 core/schema: get_dependent_materialized_views_unnormalized
If we get a table name for in memory structure, it's safe to assume it's
already normalized.
2025-08-28 13:11:40 +02:00
Nikita Sivukhin
ae705445bf improve integrity check
- check free list trunk and pages
- use shared hash map to check for duplicate references for pages
- properly check overflow pages
2025-08-27 23:14:21 +04:00
TcMits
4ddfdb2a62 finish 2025-08-27 14:58:35 +07:00
bit-aloo
ea3ab2a9c7 add ifNeg op_code 2025-08-26 19:55:42 +05:30
Nikita Sivukhin
f7ad55b680 remove unnecessary argument 2025-08-25 12:24:39 +04:00
Levy A.
4ba1304fb9 complete parser integration 2025-08-21 15:23:59 -03:00
Levy A.
186e2f5d8e switch to new parser 2025-08-21 15:19:16 -03:00
Pekka Enberg
1dc6fb97c0 Merge 'core/mvcc: store txid in conn and reset transaction state on commit ' from Pere Diaz Bou
We were storing `txid` in `ProgramState`, this meant it was impossible
to track interactive transactions. This was extracted to `Connection`
instead.
Moreover, transaction state for mvcc now is reset on commit.

Closes #2689
2025-08-20 16:51:41 +03:00
Pere Diaz Bou
9e3b7b0c98 core/mvcc: store txid in conn and reset transaction state on commit 2025-08-20 12:23:28 +02:00
Jussi Saurio
c2855cb0db refactor/idx_insert: move seeking to VDBE instead of BTreeCursor
Also removes `InsertState` and `moved_before` since neither are
needed anymore.
2025-08-19 19:04:42 +03:00
Jussi Saurio
d191c7d98b refactor/insert: move seeking to VDBE instead of BTreeCursor 2025-08-19 19:04:20 +03:00
Jussi Saurio
7f1eac9560 Do not start or end transaction in nested statement 2025-08-19 13:03:14 +03:00
Preston Thorpe
82fe508609 Merge 'add metrics and implement the .stats command' from Glauber Costa
This adds basic statement and connection metrics like SQLite (and
libSQL) have.
This is particularly useful to show that materialized views are working:
turso> create table t(a);
turso> insert into t(a) values (1) , (2), (3), (4), (5), (6), (7), (8),
(9), (10); turso> create materialized view v as select count(*) from t;
turso> .stats on
Stats display enabled.
turso> select count(*) from t;
┌───────────┐
│ count (*) │
├───────────┤
│        10 │
└───────────┘
Statement Metrics:
  Row Operations:
    Rows read:        10
    Rows written:     0
    [ ... other metrics ... ]
turso> select * from v;
┌───────────┐
│ count (*) │
├───────────┤
│        10 │
└───────────┘
Statement Metrics:
  Row Operations:
    Rows read:        1
    Rows written:     0
    [ ... other metrics ... ]

Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #2651
2025-08-18 20:26:48 -04:00
Glauber Costa
36fc8e8fdb add metrics and implement the .stats command
This adds basic statement and connection metrics like SQLite (and
libSQL) have.

This is particularly useful to show that materialized views are working:

turso> create table t(a);
turso> insert into t(a) values (1) , (2), (3), (4), (5), (6), (7), (8), (9), (10);
turso> create materialized view v as select count(*) from t;
turso> .stats on
Stats display enabled.
turso> select count(*) from t;
┌───────────┐
│ count (*) │
├───────────┤
│        10 │
└───────────┘

Statement Metrics:
  Row Operations:
    Rows read:        10
    Rows written:     0
    [ ... other metrics ... ]

turso> select * from v;
┌───────────┐
│ count (*) │
├───────────┤
│        10 │
└───────────┘

Statement Metrics:
  Row Operations:
    Rows read:        1
    Rows written:     0
    [ ... other metrics ... ]
2025-08-18 09:11:06 -05:00
Pere Diaz Bou
94cd504d4c core/mvcc: support schema_did change on commit_txn
This not only changes schema_did_change on commit_txn for mvcc, but also
extracts the connection transaction state from non mvcc transactions to
mvcc too.
2025-08-18 15:52:10 +02:00
Jussi Saurio
c4f530d8f5 Merge 'unify halts' from Glauber Costa
We have halt and op_halt, doing essentially the same thing.
This PR unifies them. There is a minor difference between them now in
the way halt() handles auto-commit. My current understanding of the code
is that what we have in halt *is a bug*, which is already one bad
consequence of the duplication.

Closes #2631
2025-08-17 14:39:30 +03:00
Glauber Costa
270245b4d3 unify halts
We have halt and op_halt, doing essentially the same thing.

This PR unifies them. There is a minor difference between them now in
the way halt() handles auto-commit. My current understanding of the code
is that what we have in halt *is a bug*, which is already one bad
consequence of the duplication.
2025-08-16 16:52:53 -05:00
Mikaël Francoeur
2ee0132afe rename functions 2025-08-15 17:08:53 -04:00
Pekka Enberg
c1c2b45141 core/vdbe: Drop excessive logging 2025-08-15 14:56:06 +03:00
Jussi Saurio
c75e4c1092 Fix non-4096 page sizes by making WAL header lazy 2025-08-14 12:40:58 +03:00
Jussi Saurio
a2a6feb193 Merge 'Use BufferPool owned by Database instead of a static global' from Jussi Saurio
## Problem
There are several problems with our current statically allocated
`BufferPool`.
1. You cannot open two databases in the same process with different page
sizes, because the `BufferPool`'s `Arena`s will be locked forever into
the page size of the first database. This is the case regardless of
whether the two `Database`s are open at the same time, or if the first
is closed before the second is opened.
2. It is impossible to even write Rust tests for different page sizes
because of this, assuming the test uses a single process.
## Solution
Make `Database` own `BufferPool` instead of it being statically
allocated, so this problem goes away.
Note that I didn't touch the still statically-allocated
`TEMP_BUFFER_CACHE`, because it should continue to work regardless of
this change. It should only be a problem if the user has two or more
databases with different page sizes open simultaneously, because
`TEMP_BUFFER_CACHE` will only support one pool of a given page size at a
time, so the rest of the allocations will go through the global
allocator instead.
## Notes
I extracted this change out from #2569, because I didn't want it to be
smuggled in without being reviewed as an individual piece.

Reviewed-by: Avinash Sajjanshetty (@avinassh)

Closes #2596
2025-08-14 12:40:32 +03:00
Jussi Saurio
d7186c7d7b Merge 'Add support for unlikely(X)' from bit-aloo
Implements the unlikely(X) function. Removes runtime implementations of
likely(), unlikely() and likelihood(), replacing them with panics if
they reach the VDBE.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2559
2025-08-14 10:56:27 +03:00
Jussi Saurio
0b17957f4e Merge 'Implement normal views' from Glauber Costa
Now that we actually implemented the statement parsing around views,
implementing normal SQLite views is relatively trivial, as they are just
an alias to a query.
We'll implement them now to get them out of the way, and then I'll go
back to DBSP

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>

Closes #2591
2025-08-14 10:54:46 +03:00
Jussi Saurio
359cba0474 Use BufferPool owned by Database instead of a static global
Problem

There are several problems with our current statically allocated
`BufferPool`.

1. You cannot open two databases in the same process with different
page sizes, because the `BufferPool`'s `Arena`s will be locked forever
into the page size of the first database. This is the case regardless
of whether the two `Database`s are open at the same time, or if the first
is closed before the second is opened.

2. It is impossible to even write Rust tests for different page sizes because
of this, assuming the test uses a single process.

Solution

Make `Database` own `BufferPool` instead of it being statically allocated, so this
problem goes away.

Note that I didn't touch the still statically-allocated `TEMP_BUFFER_CACHE`, because
it should continue to work regardless of this change. It should only be a problem if
the user has two or more databases with different page sizes open simultaneously, because
`TEMP_BUFFER_CACHE` will only support one pool of a given page size at a time, so the rest
of the allocations will go through the global allocator instead.

Notes

I extracted this change out from #2569, because I didn't want it to be smuggled in without
being reviewed as an individual piece.
2025-08-14 10:29:52 +03:00
Preston Thorpe
78abe72762 Merge 'fix: Handle fresh INSERTs in materialized view incremental maintenance' from Glauber Costa
The op_insert function was incorrectly trying to capture an "old record"
for fresh INSERT operations when a table had dependent materialized
views. This caused a "Cannot delete: no current row" error because the
cursor wasn't positioned on any row for new inserts.
The issue was introduced in commit f38333b3 which refactored the state
machine for incremental view handling but didn't properly distinguish
between:
- Fresh INSERT operations (no old record exists)
- UPDATE operations without rowid change (old record should be captured)
- UPDATE operations with rowid change (already handled by DELETE)
This fix checks if cursor.rowid() returns a value before attempting to
capture the old record. If no row exists (fresh INSERT), we correctly
set old_record to None instead of erroring out.
I am also including tests to make sure this doesn't break. The reason I
didn't include tests earlier is that I didn't know it was possible to
run the tests under a flag. But in here, I am just adding the flag to
the execution script.

Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Preston Thorpe <preston@turso.tech>

Closes #2579
2025-08-13 21:36:19 -04:00
Glauber Costa
5ab6f78f6b Implement views
Views (non materialized) are relatively simple, since they are just
query aliases.

We can expand them as if they were subqueries.
2025-08-13 14:14:03 -05:00
Glauber Costa
337f27a433 rename some structures to mention materialized views
A lot of the structures we have - like the ones under Schema, are
specific for materialized views. In preparation to adding normal views,
rename them, so things are less confusing.
2025-08-13 14:13:16 -05:00
bit-aloo
96be4eb40c remove the exec_* test 2025-08-13 22:51:36 +05:30
bit-aloo
8e6df064df remove likelihood, likely and unlikely exec methods, as we dont need them 2025-08-13 22:51:22 +05:30