Commit Graph

160 Commits

Author SHA1 Message Date
Jussi Saurio
f5ee4807da Properly differentiate between source and target in UPDATE
- Encode information about ephemeral source table in OperationMode::UPDATE
  if present
- Use OperationMode information to correctly resolve cursors in UPDATE
2025-10-14 14:17:28 +03:00
Diego Reis
da323fa0c4 Some clean ups and correctly working on WHERE clauses 2025-10-09 11:57:15 -03:00
Diego Reis
79958f468d Add jump_target_null to ConditionMetadata
It's kinda make sense, conditions can be evaluated into 3 values: false,
true and null. Now we handle that.
2025-10-09 11:56:14 -03:00
Nikita Sivukhin
4313f57ecb Optimize range scans 2025-10-09 11:47:41 +03:00
Jussi Saurio
5a5f49933d Collate: add proper collation info to DISTINCT indexes 2025-10-02 21:49:33 +03:00
Nikita Sivukhin
c84486c411 clippy logged in as jussi - so I need to fix more stuff 2025-09-30 18:45:17 +04:00
Nikita Sivukhin
bf5567de35 fix clippy
- the proper fix is to nuke it actually :)
2025-09-30 18:06:42 +04:00
Nikita Sivukhin
4a9309fe31 fix clippy 2025-09-30 17:58:12 +04:00
Nikita Sivukhin
f1597dea90 fix all combinations of iteration direction and index order to properly handle nulls 2025-09-30 17:57:03 +04:00
Nikita Sivukhin
c211fd1359 handle btree-table search properly
- btree-table doesn't have nulls in keys - so seek operation do some conversions and we shouldn't emit SeekGT { Null } in this case
2025-09-30 17:05:39 +04:00
Nikita Sivukhin
e9b8b0265d skip NULL in case of search over index 2025-09-30 16:16:04 +04:00
Jussi Saurio
d5de088abe Merge 'translate: implement Sequence opcode and fix sort order' from Preston Thorpe
This PR implements the `Sequence` and `SequenceTest` opcodes, although
does not yet add plumbing to emit the latter.
SQLite has two distinct mechanisms that determine the final row order
with aggregates:
Traversal order of GROUP BY, and ORDER BY tiebreaking. When ORDER BY
contains only aggregate expressions and/or constants, SQLite has no
extra tiebreak key, but when ORDER BY mixes aggregate and non-aggregate
terms, SQLite adds an implicit, stable row `sequence` so “ties” respect
the input order.
This PR also fixes an issue with a query like the following:
```sql
SELECT u.first_name, COUNT(*) AS c
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.first_name
ORDER BY c DESC;
```
Because ORDER BY has only an aggregate (COUNT(*) DESC) and no non-
aggregate terms, SQLite traverses the group key (u.first_name) in DESC
order in this case, so ties on c naturally appear with group keys in
descending order.
Previously tursodb would return the group key sorted in ASC order,
because it was used in all cases as the default

Closes #3287
2025-09-24 08:38:08 +03:00
PThorpe92
376d2bf7b1 Add plumbing to add sequence column to stabilize tiebreakers in order+group by 2025-09-23 22:35:59 -04:00
PThorpe92
51f970a263 Support partial indexes in INSERT/UPDATE/DELETE 2025-09-20 14:38:48 -04:00
Piotr Rzysko
f5efcbe745 Add support for window functions
Adds initial support for window functions. For now, only existing
aggregate functions can be used as window functions—no specialized
window-specific functions are supported yet.

Currently, only the default frame definition is implemented:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE NO OTHERS.
2025-09-13 11:12:44 +02:00
Piotr Rzysko
1826023c32 Decouple AggArgumentSource::Expression from Aggregate
This allows it to be reused for window function processing without
relying on the Aggregate struct.
2025-09-13 10:49:14 +02:00
Pekka Enberg
f88f39082a core/vdbe: Fix MakeRecord affinity handling
The MakeRecord instruction now accepts an optional affinity_str
parameter that applies column-specific type conversions before creating
records. When provided, the affinity string is applied
character-by-character to each register using the existing
apply_affinity_char() function, matching SQLite's behavior.

Fixes #2040
Fixes #2041
2025-09-08 18:49:13 +03:00
Glauber Costa
08b2e685d5 Persistence for DBSP-based materialized views
This fairly long commit implements persistence for materialized view.
It is hard to split because of all the interdependencies between components,
so it is a one big thing. This commit message will at least try to go into
details about the basic architecture.

Materialized Views as tables
============================

Materialized views are now a normal table - whereas before they were a virtual
table.  By making a materialized view a table, we can reuse all the
infrastructure for dealing with tables (cursors, etc).

One of the advantages of doing this is that we can create indexes on view
columns.  Later, we should also be able to write those views to separate files
with ATTACH write.

Materialized Views as Zsets
===========================

The contents of the table are a ZSet: rowid, values, weight. Readers will
notice that because of this, the usage of the ZSet data structure dwindles
throughout the codebase. The main difference between our materialized ZSet and
the standard DBSP ZSet, is that obviously ours is backed by a BTree, not a Hash
(since SQLite tables are BTrees)

Aggregator State
================

In DBSP, the aggregator nodes also have state. To store that state, there is a
second table.  The table holds all aggregators in the view, and there is one
table per view. That is __turso_internal_dbsp_state_{view_name}. The format of
that table is similar to a ZSet: rowid, serialized_values, weight. We serialize
the values because there will be many aggregators in the table. We can't rely
on a particular format for the values.

The Materialized View Cursor
============================

Reading from a Materialized View essentially means reading from the persisted
ZSet, and enhancing that with data that exists within the transaction.
Transaction data is ephemeral, so we do not materialize this anywhere: we have
a carefully crafted implementation of seek that takes care of merging weights
and stitching the two sets together.
2025-09-05 07:04:33 -05:00
TcMits
33a04fbaf7 resolve conflict 2025-09-02 17:30:10 +07:00
Piotr Rzysko
6f1cd17fcf Consolidate methods emitting AggStep 2025-08-31 13:29:10 +02:00
TcMits
4ddfdb2a62 finish 2025-08-27 14:58:35 +07:00
TcMits
d24812373f missing context for to_string 2025-08-24 14:37:29 +07:00
TcMits
9e4f3b41ef correctly implement get_column_name 2025-08-24 14:07:46 +07:00
Levy A.
4ba1304fb9 complete parser integration 2025-08-21 15:23:59 -03:00
Levy A.
186e2f5d8e switch to new parser 2025-08-21 15:19:16 -03:00
Jussi Saurio
dd2e0ea596 Fix: always emit rowid when column is rowid alias
SQLite does not store the rowid alias column in the record at all
when it is a rowid alias, because the rowid is always stored anyway
in the record header.
2025-08-21 16:40:10 +03:00
Jussi Saurio
a99c8a8ca0 Simplify ORDER BY sorter column remapping
In case an ORDER BY column exactly matches a result column in the SELECT,
the insertion of the result column into the ORDER BY sorter can be skipped
because it's already necessarily inserted as a sorting column.

For this reason we have a mapping to know what index a given result column
has in the order by sorter.

This commit makes that mapping much simpler.
2025-08-15 15:48:41 +03:00
Piotr Rzysko
375b9047e2 Evaluate WHERE conditions after LEFT JOIN
Previously, the query from the added test would not filter out rows
where `products.price` was NULL.
2025-08-08 06:26:30 +02:00
Piotr Rzysko
92ba25e44d Extract loop emitting conditions into a method
No functional changes — this is just preparation for reusing this code
and avoiding polluting future commits with trivial refactoring.
2025-08-08 06:21:08 +02:00
Piotr Rzysko
8986266394 Emit conditions in open_loop in one place
The loop emitting conditions is independent of the operation type.
2025-08-07 19:26:32 +02:00
Nikita Sivukhin
c6a87d61c7 emit CDC entries if necessary for schema changes 2025-08-06 01:03:49 +04:00
Nikita Sivukhin
0b4c1ac802 refactor code a little bit 2025-08-06 01:03:48 +04:00
Piotr Rzysko
82491ceb6a Integrate virtual tables with optimizer
This change connects virtual tables with the query optimizer.
The optimizer now considers virtual tables during join order search
and invokes their best_index callbacks to determine feasible access
paths.

Currently, this is not a visible change, since none of the existing
extensions return information indicating that a plan is invalid.
2025-08-05 05:48:28 +02:00
Piotr Rzysko
718598eab8 Introduce scan type
Different scan parameters are required for different table types.
Currently, index and iteration direction are only used by B-tree tables,
while the remaining table types don’t require any parameters. Planning
access to virtual tables, however, will require passing additional
information from the planner, such as the virtual table index (distinct
from a B-tree index) and the constraints that must be forwarded to the
`filter` method.
2025-08-04 20:27:22 +02:00
Piotr Rzysko
61234eeb19 Add ResultCode to best_index result
The `best_index` implementation now returns a ResultCode along with the
IndexInfo. This allows it to signal specific outcomes, such as errors or
constraint violations. This change aligns better with SQLite’s xBestIndex
contract, where cases like missing constraints or invalid combinations of
constraints must not result in a valid plan.
2025-08-04 20:18:44 +02:00
Piotr Rzysko
c465ce6e7b Clarify semantics of argv_index
Extend the documentation of `argv_index` and add validations enforcing
the requirements it must meet.
2025-08-04 19:31:18 +02:00
Piotr Rzysko
b0460a589f Ensure argv_index is either None or >= 1
Previously, there were two ways to indicate that a constraint should not
be passed to the filter function: setting `argv_index` to `None` or to
a value less than 1. This was redundant, so now only `None` is used.
2025-08-04 19:27:53 +02:00
Piotr Rzysko
c6f398122d Add validation for constraint usage length returned by best_index
Additional changes:
- Update IndexInfo documentation to clarify that constraint_usages must
  have exact 1:1 correspondence with input ConstraintInfo array. The code
  translating constraints into VFilter arguments heavily relies on this.
- Fix best_index implementation in test extension to comply with new
  validation requirements by returning usage entry for each constraint
2025-08-04 19:25:10 +02:00
Glauber Costa
988b16f962 Support ATTACH (read only)
Support for attaching databases. The main difference from SQLite is that
we support an arbitrary number of attached databases, and we are not
bound to just 100ish.

We for now only support read-only databases. We open them as read-only,
but also, to keep things simple, we don't patch any of the insert
machinery to resolve foreign tables.  So if an insert is tried on an
attached database, it will just fail with a "no such table" error - this
is perfect for now.

The code in core/translate/attach.rs is written by Claude, who also
played a key part in the boilerplate for stuff like the .databases
command and extending the pragma database_list, and also aided me in
the test cases.
2025-07-24 19:19:48 -05:00
PThorpe92
0871a8c7f3 Bail early when we detect a readonly virtual table 2025-07-23 16:57:30 -04:00
Glauber Costa
57a1113460 make readonly a property of the database
There's no such thing as a read-only connection.
In a normal connection, you can have many attached databases. Some
r/o, some r/w.

To properly fix that, we also need to fix the OpenWrite opcode. Right
now we are passing a name, which is the name of the table. That
parameter is not used anywhere. That is also not what the SQLite opcode
specifies. Same as OpenRead, the p3 register should be the database
index.

With that change, we can - for now - pass the index 0, which is all
we support anyway, and then use that to test if we are r/o.
2025-07-22 09:41:32 -05:00
Glauber Costa
65312baee6 fix opcodes missing a database register
Two of the opcodes we implement (OpenRead and Transaction) should have
an opcode specifying the database to use, but they don't.

Add it, and for now always use 0 (the main database).
2025-07-20 12:27:26 -05:00
Piotr Rzysko
30ae6538ee Treat table-valued functions as tables
With this change, the following two queries are considered equivalent:
```sql
SELECT value FROM generate_series(5, 50);
SELECT value FROM generate_series WHERE start = 5 AND stop = 50;
```
Arguments passed in parentheses to the virtual table name are now
matched to hidden columns.

Column references are still not supported as table-valued function
arguments. The only difference is that previously, a query like:
```sql
SELECT one.value, series.value
FROM (SELECT 1 AS value) one, generate_series(one.value, 3) series;
```
would cause a panic. Now, it returns a proper error message instead.

Adding support for column references is more nuanced for two main
reasons:
- We need to ensure that in joins where a TVF depends on other tables,
those other tables are processed first. For example, in:
```sql
SELECT one.value, series.value
FROM generate_series(one.value, 3) series, (SELECT 1 AS value) one;
```
the one table must be processed by the top-level loop, and series must
be nested.
- For outer joins involving TVFs, the arguments must be treated as ON
predicates, not WHERE predicates.
2025-07-14 07:16:53 +02:00
Piotr Rzysko
44b1b1852a Fix referencing virtual table predicates
We need to enumerate first and filter afterward — not the other way
around — because we later use the indexes produced by `enumerate` to
access the original `predicates` slice.
2025-07-14 07:16:53 +02:00
Nikita Sivukhin
32fa2ac3ee avoid capturing changes in cdc table 2025-07-06 22:24:35 +04:00
Nikita Sivukhin
a988bbaffe allow to specify table in the capture_data_changes PRAGMA 2025-07-06 22:19:32 +04:00
Nikita Sivukhin
40769618c1 small refactoring 2025-07-06 21:16:58 +04:00
Nikita Sivukhin
04f2efeaa4 small renames 2025-07-06 21:16:57 +04:00
Nikita Sivukhin
a82529f55a emit cdc changes for UPDATE / DELETE statements 2025-07-06 21:16:25 +04:00
Levy A.
ffd6844b5b refactor: remove PseudoTable from Table
the only reason for `PseudoTable` to exist, is to provide column
information for `PseudoCursor` creation. this should not be part of the
schema.
2025-06-30 14:31:58 -03:00