I noticed that the parse errors were a bit hard to read - only the nearest token and the line/col offsets were printed. I made a first attempt at improving the errors using [miette](https://github.com/zkat/miette). - Added derive for `miette::Diagnostic` to both the parser's error type and LimboError. - Added miette dependency to both sqlite3_parser and core. The `fancy` feature is only enabled for the CLI. So the overhead on the libraries (core, parser) should be minimal. Some future improvements that can be made further: - Add spans to AST nodes so that errors can better point to the correct token. See upstream issue: https://github.com/gwenn/lemon-rs/issues/33 - Construct more errors with offset information. I noticed that most parser errors are constructed with `None` as the offset. - The messages are a bit redundant (example "syntax error at (1, 6)"). This can improved. Comparisons. Before: ``` ❯ cargo run --package limbo --bin limbo database.db --output-mode pretty ... limbo> selet * from a; [2025-01-05T11:22:55Z ERROR sqlite3Parser] near "Token([115, 101, 108, 101, 116])": syntax error Parse error: near "selet": syntax error at (1, 6) ``` <img width="969" alt="image" src="https://github.com/user- attachments/assets/82651a77-f5ac-4eee-b208-88c6ea7fc9b7" /> After: ``` ❯ cargo run --package limbo --bin limbo database.db --output-mode pretty ... limbo> selet * from a; [2025-01-05T12:25:52Z ERROR sqlite3Parser] near "Token([115, 101, 108, 101, 116])": syntax error × near "selet": syntax error at (1, 6) ╭──── 1 │ selet * from a · ▲ · ╰── syntax error ╰──── ``` <img width="980" alt="image" src="https://github.com/user- attachments/assets/747a90e5-5085-41f9-b0fe-25864179ca35" /> Closes #618
LEMON parser generator modified to generate Rust code.
Lemon source and SQLite3 grammar were last synced as of July 2024.
Unsupported
Unsupported Grammar syntax
%token_destructor: Code to execute to destroy token data%default_destructor: Code for the default non-terminal destructor%destructor: Code which executes whenever this symbol is popped from the stack during error processing
https://www.codeproject.com/Articles/1056460/Generating-a-High-Speed-Parser-Part-Lemon https://www.sqlite.org/lemon.html
SQLite
SQLite lexer and SQLite parser have been ported from C to Rust. The parser generates an AST.
Lexer/Parser:
- Keep track of position (line, column).
- Streamable (stop at the end of statement).
- Resumable (restart after the end of statement).
Lexer and parser have been tested with the following scripts:
- https://github.com/bkiers/sqlite-parser/tree/master/src/test/resources
- https://github.com/codeschool/sqlite-parser/tree/master/test/sql/official-suite which can be updated with script in https://github.com/codeschool/sqlite-parser/tree/master/test/misc
TODO:
- Check generated AST (reparse/reinject)
- If a keyword in double quotes is used in a context where it cannot be resolved to an identifier but where a string literal is allowed, then the token is understood to be a string literal instead of an identifier.
- Tests
- Do not panic while parsing
- CREATE VIRTUAL TABLE args
- Zero copy (at least tokens)
Unsupported by Rust
#linedirective
API change
- No
ParseAlloc/ParseFreeanymore
Features not tested
- NDEBUG
- YYNOERRORRECOVERY
- YYERRORSYMBOL
To be fixed
- RHS are moved. Maybe it is not a problem if they are always used once. Just add a check in lemon...
%extra_argumentis not supported.- Terminal symbols generated by lemon should be dumped in a specified file.
Raison d'être
-
lemon_rust does the same thing but with an old version of
lemon. And it seems not possible to useyystackas a stack because items may be access randomly and thetop+1item can be used. -
lalrpop would be the perfect alternative but it does not support fallback/streaming (see this issue) and compilation/generation is slow.
Minimum supported Rust version (MSRV)
Latest stable Rust version at the time of release. It might compile with older versions.