Adds the AggValue instruction, which computes the current aggregate
result and writes it to a dedicated destination register.
Unlike AggFinal, it does not overwrite or clear the accumulator
register. This makes it possible to retrieve aggregate results multiple
times—needed when processing window functions—while preserving the
accumulator state.
Currently, when MVCC is enabled, every transaction mode supports
concurrent reads and writes, which makes it hard to adopt for existing
applications that use `BEGIN DEFERRED` or `BEGIN IMMEDIATE`.
Therefore, add support for `BEGIN CONCURRENT` transactions when MVCC is
enabled. The transaction mode allows multiple concurrent read/write
transactions that don't block each other, with conflicts resolved at
commit time. Furthermore, implement the correct semantics for `BEGIN
DEFERRED` and `BEGIN IMMEDIATE` by taking advantage of the pager level
write lock when transaction upgrades to write. This means that now
concurrent MVCC transactions are serialized against the legacy ones when
needed.
The implementation includes:
- Parser support for CONCURRENT keyword in BEGIN statements
- New Concurrent variant in TransactionMode to distinguish from regular
read/write transactions
- MVCC store tracking of exclusive transactions to support IMMEDIATE and
EXCLUSIVE modes alongside CONCURRENT
- Proper transaction state management for all transaction types in MVCC
This enables better concurrency for applications that can handle
optimistic concurrency control, while still supporting traditional
SQLite transaction semantics via IMMEDIATE and EXCLUSIVE modes.
Reviewed-by: Pere Diaz Bou <pere-altea@homail.com>
Closes#3021
Currently, when MVCC is enabled, every transaction mode supports
concurrent reads and writes, which makes it hard to adopt for existing
applications that use `BEGIN DEFERRED` or `BEGIN IMMEDIATE`.
Therefore, add support for `BEGIN CONCURRENT` transactions when MVCC is
enabled. The transaction mode allows multiple concurrent read/write
transactions that don't block each other, with conflicts resolved at
commit time. Furthermore, implement the correct semantics for `BEGIN
DEFERRED` and `BEGIN IMMEDIATE` by taking advantage of the pager level
write lock when transaction upgrades to write. This means that now
concurrent MVCC transactions are serialized against the legacy ones when
needed.
The implementation includes:
- Parser support for CONCURRENT keyword in BEGIN statements
- New Concurrent variant in TransactionMode to distinguish from regular
read/write transactions
- MVCC store tracking of exclusive transactions to support IMMEDIATE and
EXCLUSIVE modes alongside CONCURRENT
- Proper transaction state management for all transaction types in MVCC
This enables better concurrency for applications that can handle
optimistic concurrency control, while still supporting traditional
SQLite transaction semantics via IMMEDIATE and EXCLUSIVE modes.
The MakeRecord instruction now accepts an optional affinity_str
parameter that applies column-specific type conversions before creating
records. When provided, the affinity string is applied
character-by-character to each register using the existing
apply_affinity_char() function, matching SQLite's behavior.
Fixes#2040Fixes#2041
This fairly long commit implements persistence for materialized view.
It is hard to split because of all the interdependencies between components,
so it is a one big thing. This commit message will at least try to go into
details about the basic architecture.
Materialized Views as tables
============================
Materialized views are now a normal table - whereas before they were a virtual
table. By making a materialized view a table, we can reuse all the
infrastructure for dealing with tables (cursors, etc).
One of the advantages of doing this is that we can create indexes on view
columns. Later, we should also be able to write those views to separate files
with ATTACH write.
Materialized Views as Zsets
===========================
The contents of the table are a ZSet: rowid, values, weight. Readers will
notice that because of this, the usage of the ZSet data structure dwindles
throughout the codebase. The main difference between our materialized ZSet and
the standard DBSP ZSet, is that obviously ours is backed by a BTree, not a Hash
(since SQLite tables are BTrees)
Aggregator State
================
In DBSP, the aggregator nodes also have state. To store that state, there is a
second table. The table holds all aggregators in the view, and there is one
table per view. That is __turso_internal_dbsp_state_{view_name}. The format of
that table is similar to a ZSet: rowid, serialized_values, weight. We serialize
the values because there will be many aggregators in the table. We can't rely
on a particular format for the values.
The Materialized View Cursor
============================
Reading from a Materialized View essentially means reading from the persisted
ZSet, and enhancing that with data that exists within the transaction.
Transaction data is ephemeral, so we do not materialize this anywhere: we have
a carefully crafted implementation of seek that takes care of merging weights
and stitching the two sets together.
A lot of the structures we have - like the ones under Schema, are
specific for materialized views. In preparation to adding normal views,
rename them, so things are less confusing.
This is just the bare minimum that I needed to convince myself that this
approach will work. The only views that we support are slices of the
main table: no aggregations, no joins, no projections.
drop view is implemented.
view population is implemented.
deletes, inserts and updates are implemented.
much like indexes before, a flag must be passed to enable views.
When building views (soon), it will be important to know which table
is being deleted. Getting from the cursor id is very cumbersome.
What we are doing here is symmetrical to op_insert, and sqlite also
passes table information in one of the registers (p4)
SQLite generates those in aggregations like min / max with collation
information either in the table definition or in the column expression.
We currently generate the wrong result here, and properly generating the
bytecode instruction fixes it.
Our compat matrix mentions a couple of opcodes: ToInt, ToBlob, etc.
Those opcodes do not exist.
Instead, there is a single Cast opcode, that takes the affinity as a
parameter.
Currently we just call a function when we need to cast. This PR fixes
the compat file, implements the cast opcode, and in at least one
instance, when explicitly using the CAST keyword, uses that opcode
instead of a function in the generated bytecode.
There's no such thing as a read-only connection.
In a normal connection, you can have many attached databases. Some
r/o, some r/w.
To properly fix that, we also need to fix the OpenWrite opcode. Right
now we are passing a name, which is the name of the table. That
parameter is not used anywhere. That is also not what the SQLite opcode
specifies. Same as OpenRead, the p3 register should be the database
index.
With that change, we can - for now - pass the index 0, which is all
we support anyway, and then use that to test if we are r/o.
Two of the opcodes we implement (OpenRead and Transaction) should have
an opcode specifying the database to use, but they don't.
Add it, and for now always use 0 (the main database).