This PR unifies the logic for resolving aggregate functions. Previously,
bare aggregates (e.g. `SELECT max(a) FROM t1`) and aggregates wrapped in
expressions (e.g. `SELECT max(a) + 1 FROM t1`) were handled differently,
which led to duplicated code. Now both cases are resolved consistently.
The added benchmark shows a small improvement:
```
Prepare `SELECT first_name, last_name, state, city, age + 10, LENGTH(email), UPPER(first_name), LOWE...
time: [59.791 µs 59.898 µs 60.006 µs]
change: [-7.7090% -7.2760% -6.8242%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
8 (8.00%) high mild
2 (2.00%) high severe
```
For an existing benchmark, no change:
```
Prepare `SELECT first_name, count(1) FROM users GROUP BY first_name HAVING count(1) > 1 ORDER BY cou...
time: [11.895 µs 11.913 µs 11.931 µs]
change: [-0.2545% +0.2426% +0.6960%] (p = 0.34 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low severe
2 (2.00%) high mild
5 (5.00%) high severe
```
Reviewed-by: Jussi Saurio <jussi.saurio@gmail.com>
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#2884
@penberg this PR try to clean up `turso_parser`'s`fmt` code.
- `get_table_name` and `get_column_name` should return None when
table/column does not exist.
```rust
/// Context to be used in ToSqlString
pub trait ToSqlContext {
/// Given an id, get the table name
/// First Option indicates whether the table exists
///
/// Currently not considering aliases
fn get_table_name(&self, _id: TableInternalId) -> Option<&str> {
None
}
/// Given a table id and a column index, get the column name
/// First Option indicates whether the column exists
/// Second Option indicates whether the column has a name
fn get_column_name(&self, _table_id: TableInternalId, _col_idx: usize) -> Option<Option<&str>> {
None
}
// help function to handle missing table/column names
fn get_table_and_column_names(
&self,
table_id: TableInternalId,
col_idx: usize,
) -> (String, String) {
let table_name = self
.get_table_name(table_id)
.map(|s| s.to_owned())
.unwrap_or_else(|| format!("t{}", table_id.0));
let column_name = self
.get_column_name(table_id, col_idx)
.map(|opt| {
opt.map(|s| s.to_owned())
.unwrap_or_else(|| format!("c{col_idx}"))
})
.unwrap_or_else(|| format!("c{col_idx}"));
(table_name, column_name)
}
}
```
- remove `FmtTokenStream` because it is same as `WriteTokenStream `
- remove useless functions and simplify `ToTokens`
```rust
/// Generate token(s) from AST node
/// Also implements Display to make sure devs won't forget Display
pub trait ToTokens: Display {
/// Send token(s) to the specified stream with context
fn to_tokens<S: TokenStream + ?Sized, C: ToSqlContext>(
&self,
s: &mut S,
context: &C,
) -> Result<(), S::Error>;
// Return displayer representation with context
fn displayer<'a, 'b, C: ToSqlContext>(&'b self, ctx: &'a C) -> SqlDisplayer<'a, 'b, C, Self>
where
Self: Sized,
{
SqlDisplayer::new(ctx, self)
}
}
```
Closes#2748
Currently we have `Pager::update_dirty_loaded_page_in_cache` which does
exactly what you would expect, but `DumbLruPageCache::_insert` method
with `ignore_existing` set to true, totally ignores the previous entry
and leaks the memory.
I really want to get #2885 finished and through because of the perf, but
I ran into this when inspecting it for correctness changes
Closes#2892
Fix brekage from first merging commit d959319b ("Merge 'Use u64 for file
offsets in I/O and calculate such offsets in u64' from Preston Thorpe")
and then commit 6591b66c ("Merge 'Simulate I/O in memory' from Pedro
Muniz"), which was unaware of the changes.
Revives the `MemorySim` PR and fixes a page cache issue where we could
have a unlocked and unloaded page in the page cache after a FaultyQuery.
The page would continue in the cache and could affect other queries as
the `page_cache` is at the `Connection` level.
Depends on #2785Closes#2693
This PR updates the internal encryption framework to handle
authentication tags explicitly rather than relying on the underlying
cipher libraries to append/verify them automatically.
closes: #2850
Reviewed-by: Avinash Sajjanshetty (@avinassh)
Closes#2858
This also gave a small performance boost.
Local run results:
```
Prepare `SELECT first_name, last_name, state, city, age + 10, LENGTH(email), UPPER(first_name), LOWE...
time: [59.791 µs 59.898 µs 60.006 µs]
change: [-7.7090% -7.2760% -6.8242%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
8 (8.00%) high mild
2 (2.00%) high severe
```
Aggregate functions cannot be nested, and this is validated during the
translation of aggregate function arguments. Therefore, traversing their
child expressions is unnecessary.
Handled in the same way as in `prepare_one_select_plan` for bare
function calls. In `prepare_one_select_plan`, however, resolving
external scalar functions is performed unnecessarily twice.
The new query combines multiple aggregate functions, plain columns,
arithmetic expressions, and aggregates wrapped in additional expressions.
Local run results:
```
Prepare `SELECT first_name, last_name, state, city, age + 10, LENGTH(email), UPPER(first_name), LOWE...
time: [64.535 µs 64.623 µs 64.713 µs]
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
```
The initial commits fix issues and plug gaps between ungrouped and
grouped aggregations.
The final commit consolidates the code that emits `AggStep` to prevent
future disparities between the two.
Reviewed-by: Preston Thorpe <preston@turso.tech>
Closes#2867
Some progress working on `printf` support.
(relevant issue https://github.com/tursodatabase/turso/issues/885)
Implementation of the basic substitution types cited in the `TODO`
comment on the beginning of the file (%i, %x, %X, %o, %e, %E, %c). There
are some others in the sqlite spec which I will implement in a future
PR.
I tried to pay attention to the specific behaviors from sqlite as much
as possible while testing this, but if there's something I missed please
tell me.
Also, I see this code needs to be reorganized already, I'm still
thinking on the best approach to do that without affecting the
ergonomics of new implementations, I'm still learning Rust so this is
not obvious for me right now. I'm open to suggestions about it.
Closes#2868
Because we can abort a read_page completion, this means a page can be in
the cache but be unloaded and unlocked. However, if we do not evict that
page from the page cache, we will return an unloaded page later which
will trigger assertions later on. This is worsened by the fact that page
cache is not per `Statement`, so you can abort a completion in one
Statement, and trigger some error in the next one if we don't evict the
page in these circumstances.
Also, to propagate IO errors we need to return the Error from
IOCompletions on step.
Closes#2785