mirror of
https://github.com/aljazceru/notedeck.git
synced 2025-12-17 08:44:20 +01:00
docs: add tokenator docs
Signed-off-by: William Casarin <jb55@jb55.com>
This commit is contained in:
211
crates/tokenator/DEVELOPER.md
Normal file
211
crates/tokenator/DEVELOPER.md
Normal file
@@ -0,0 +1,211 @@
|
|||||||
|
# Tokenator Developer Documentation
|
||||||
|
|
||||||
|
This document provides detailed information for developers who want to use the Tokenator library in their projects or contribute to its development.
|
||||||
|
|
||||||
|
## Core Concepts
|
||||||
|
|
||||||
|
Tokenator works with two primary concepts:
|
||||||
|
|
||||||
|
1. **Token Parsing**: Converting a sequence of string tokens into structured data
|
||||||
|
2. **Token Serialization**: Converting structured data into a sequence of string tokens
|
||||||
|
|
||||||
|
The library is designed to be simple, efficient, and flexible for working with delimited string formats.
|
||||||
|
|
||||||
|
## API Reference
|
||||||
|
|
||||||
|
### TokenParser
|
||||||
|
|
||||||
|
`TokenParser` is responsible for parsing tokens from a slice of string references.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub struct TokenParser<'a> {
|
||||||
|
tokens: &'a [&'a str],
|
||||||
|
index: usize,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Key methods:
|
||||||
|
|
||||||
|
- `new(tokens: &'a [&'a str]) -> Self`: Creates a new parser from a slice of string tokens
|
||||||
|
- `pull_token() -> Result<&'a str, ParseError<'a>>`: Gets the next token and advances the index
|
||||||
|
- `peek_token() -> Result<&'a str, ParseError<'a>>`: Looks at the next token without advancing the index
|
||||||
|
- `parse_token(expected: &'static str) -> Result<&'a str, ParseError<'a>>`: Checks if the next token matches the expected value
|
||||||
|
- `alt<R>(parser: &mut TokenParser<'a>, routes: &[fn(&mut TokenParser<'a>) -> Result<R, ParseError<'a>>]) -> Result<R, ParseError<'a>>`: Tries each parser in `routes` until one succeeds
|
||||||
|
- `parse_all<R>(&mut self, parse_fn: impl FnOnce(&mut Self) -> Result<R, ParseError<'a>>) -> Result<R, ParseError<'a>>`: Ensures all tokens are consumed after parsing
|
||||||
|
- `try_parse<R>(&mut self, parse_fn: impl FnOnce(&mut Self) -> Result<R, ParseError<'a>>) -> Result<R, ParseError<'a>>`: Attempts to parse and backtracks on failure
|
||||||
|
- `is_eof() -> bool`: Checks if there are any tokens left to parse
|
||||||
|
|
||||||
|
### TokenWriter
|
||||||
|
|
||||||
|
`TokenWriter` is responsible for serializing tokens into a string with the specified delimiter.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub struct TokenWriter {
|
||||||
|
delim: &'static str,
|
||||||
|
tokens_written: usize,
|
||||||
|
buf: Vec<u8>,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Key methods:
|
||||||
|
|
||||||
|
- `new(delim: &'static str) -> Self`: Creates a new writer with the specified delimiter
|
||||||
|
- `default() -> Self`: Creates a new writer with ":" as the delimiter
|
||||||
|
- `write_token(token: &str)`: Appends a token to the buffer
|
||||||
|
- `str() -> &str`: Gets the current buffer as a string
|
||||||
|
- `buffer() -> &[u8]`: Gets the current buffer as a byte slice
|
||||||
|
|
||||||
|
### TokenSerializable
|
||||||
|
|
||||||
|
`TokenSerializable` is a trait that types can implement to be serialized to and parsed from tokens.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub trait TokenSerializable: Sized {
|
||||||
|
fn parse_from_tokens<'a>(parser: &mut TokenParser<'a>) -> Result<Self, ParseError<'a>>;
|
||||||
|
fn serialize_tokens(&self, writer: &mut TokenWriter);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Error Handling
|
||||||
|
|
||||||
|
The library provides detailed error types:
|
||||||
|
|
||||||
|
- `ParseError<'a>`: Represents errors that can occur during parsing
|
||||||
|
- `Incomplete`: Not done parsing yet
|
||||||
|
- `AltAllFailed`: All parsing options failed
|
||||||
|
- `DecodeFailed`: General decoding failure
|
||||||
|
- `HexDecodeFailed`: Hex decoding failure
|
||||||
|
- `UnexpectedToken`: Encountered an unexpected token
|
||||||
|
- `EOF`: No more tokens
|
||||||
|
|
||||||
|
## Advanced Usage
|
||||||
|
|
||||||
|
### Backtracking and Alternative Parsing
|
||||||
|
|
||||||
|
One of the powerful features of Tokenator is its support for backtracking and alternative parsing paths:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Try multiple parsing strategies
|
||||||
|
let result = TokenParser::alt(&mut parser, &[
|
||||||
|
|p| parse_strategy_a(p),
|
||||||
|
|p| parse_strategy_b(p),
|
||||||
|
|p| parse_strategy_c(p),
|
||||||
|
]);
|
||||||
|
|
||||||
|
// Attempt to parse but backtrack on failure
|
||||||
|
let result = parser.try_parse(|p| {
|
||||||
|
let token = p.parse_token("specific_token")?;
|
||||||
|
// More parsing...
|
||||||
|
Ok(result)
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
### Parsing Hex Data
|
||||||
|
|
||||||
|
The library includes utilities for parsing hexadecimal data:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use tokenator::parse_hex_id;
|
||||||
|
|
||||||
|
// Parse a 32-byte hex string from the next token
|
||||||
|
let hash: [u8; 32] = parse_hex_id(&mut parser)?;
|
||||||
|
```
|
||||||
|
|
||||||
|
### Custom Delimiters
|
||||||
|
|
||||||
|
You can use custom delimiters when serializing tokens:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Create a writer with a custom delimiter
|
||||||
|
let mut writer = TokenWriter::new("|");
|
||||||
|
writer.write_token("user");
|
||||||
|
writer.write_token("alice");
|
||||||
|
// Result: "user|alice"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
1. **Implement TokenSerializable for your types**: This ensures consistency between parsing and serialization logic.
|
||||||
|
|
||||||
|
2. **Use try_parse for speculative parsing**: When trying different parsing strategies, wrap them in `try_parse` to ensure proper backtracking.
|
||||||
|
|
||||||
|
3. **Handle all error cases**: The detailed error types provided by Tokenator help identify and handle specific parsing issues.
|
||||||
|
|
||||||
|
4. **Consider memory efficiency**: The parser works with string references to avoid unnecessary copying.
|
||||||
|
|
||||||
|
5. **Validate input**: Always validate input tokens before attempting to parse them into your data structures.
|
||||||
|
|
||||||
|
## Integration Examples
|
||||||
|
|
||||||
|
### Custom Protocol Parser
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use tokenator::{TokenParser, TokenWriter, TokenSerializable, ParseError};
|
||||||
|
|
||||||
|
enum Command {
|
||||||
|
Get { key: String },
|
||||||
|
Set { key: String, value: String },
|
||||||
|
Delete { key: String },
|
||||||
|
}
|
||||||
|
|
||||||
|
impl TokenSerializable for Command {
|
||||||
|
fn parse_from_tokens<'a>(parser: &mut TokenParser<'a>) -> Result<Self, ParseError<'a>> {
|
||||||
|
let cmd = parser.pull_token()?;
|
||||||
|
|
||||||
|
match cmd {
|
||||||
|
"GET" => {
|
||||||
|
let key = parser.pull_token()?.to_string();
|
||||||
|
Ok(Command::Get { key })
|
||||||
|
},
|
||||||
|
"SET" => {
|
||||||
|
let key = parser.pull_token()?.to_string();
|
||||||
|
let value = parser.pull_token()?.to_string();
|
||||||
|
Ok(Command::Set { key, value })
|
||||||
|
},
|
||||||
|
"DEL" => {
|
||||||
|
let key = parser.pull_token()?.to_string();
|
||||||
|
Ok(Command::Delete { key })
|
||||||
|
},
|
||||||
|
_ => Err(ParseError::UnexpectedToken(tokenator::UnexpectedToken {
|
||||||
|
expected: "GET, SET, or DEL",
|
||||||
|
found: cmd,
|
||||||
|
})),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn serialize_tokens(&self, writer: &mut TokenWriter) {
|
||||||
|
match self {
|
||||||
|
Command::Get { key } => {
|
||||||
|
writer.write_token("GET");
|
||||||
|
writer.write_token(key);
|
||||||
|
},
|
||||||
|
Command::Set { key, value } => {
|
||||||
|
writer.write_token("SET");
|
||||||
|
writer.write_token(key);
|
||||||
|
writer.write_token(value);
|
||||||
|
},
|
||||||
|
Command::Delete { key } => {
|
||||||
|
writer.write_token("DEL");
|
||||||
|
writer.write_token(key);
|
||||||
|
},
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Contributing
|
||||||
|
|
||||||
|
Contributions to Tokenator are welcome! Here are some areas that could be improved:
|
||||||
|
|
||||||
|
- Additional parsing utilities
|
||||||
|
- Performance optimizations
|
||||||
|
- More comprehensive test coverage
|
||||||
|
- Example implementations for common use cases
|
||||||
|
- Documentation improvements
|
||||||
|
|
||||||
|
When submitting a pull request, please ensure:
|
||||||
|
|
||||||
|
1. All tests pass
|
||||||
|
2. New functionality includes appropriate tests
|
||||||
|
3. Documentation is updated to reflect changes
|
||||||
|
4. Code follows the existing style conventions
|
||||||
@@ -1,5 +1,75 @@
|
|||||||
|
# Tokenator
|
||||||
|
|
||||||
# tokenator
|
Tokenator is a simple, efficient library for parsing and serializing string tokens in Rust. It provides a lightweight solution for working with colon-delimited (or custom-delimited) string formats.
|
||||||
|
|
||||||
Tokenator is a simple string token parser and serializer.
|
## Features
|
||||||
|
|
||||||
|
- Parse colon-delimited (or custom-delimited) string tokens
|
||||||
|
- Serialize data structures into token strings
|
||||||
|
- Robust error handling with descriptive error types
|
||||||
|
- Support for backtracking and alternative parsing routes
|
||||||
|
- Zero-copy parsing for improved performance
|
||||||
|
- Hex decoding utilities
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
Add this to your `Cargo.toml`:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[dependencies]
|
||||||
|
tokenator = "0.1.0"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use tokenator::{TokenParser, TokenWriter, TokenSerializable};
|
||||||
|
|
||||||
|
// Define a type that can be serialized to/from tokens
|
||||||
|
struct User {
|
||||||
|
name: String,
|
||||||
|
age: u32,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl TokenSerializable for User {
|
||||||
|
fn parse_from_tokens<'a>(parser: &mut TokenParser<'a>) -> Result<Self, tokenator::ParseError<'a>> {
|
||||||
|
// Expect the token "user" first
|
||||||
|
parser.parse_token("user")?;
|
||||||
|
|
||||||
|
// Parse name and age
|
||||||
|
let name = parser.pull_token()?.to_string();
|
||||||
|
let age_str = parser.pull_token()?;
|
||||||
|
let age = age_str.parse::<u32>().map_err(|_| tokenator::ParseError::DecodeFailed)?;
|
||||||
|
|
||||||
|
Ok(Self { name, age })
|
||||||
|
}
|
||||||
|
|
||||||
|
fn serialize_tokens(&self, writer: &mut TokenWriter) {
|
||||||
|
writer.write_token("user");
|
||||||
|
writer.write_token(&self.name);
|
||||||
|
writer.write_token(&self.age.to_string());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
// Parsing example
|
||||||
|
let tokens = ["user", "alice", "30"];
|
||||||
|
let mut parser = TokenParser::new(&tokens);
|
||||||
|
let user = User::parse_from_tokens(&mut parser).unwrap();
|
||||||
|
assert_eq!(user.name, "alice");
|
||||||
|
assert_eq!(user.age, 30);
|
||||||
|
|
||||||
|
// Serializing example
|
||||||
|
let user = User {
|
||||||
|
name: "bob".to_string(),
|
||||||
|
age: 25,
|
||||||
|
};
|
||||||
|
let mut writer = TokenWriter::default();
|
||||||
|
user.serialize_tokens(&mut writer);
|
||||||
|
assert_eq!(writer.str(), "user:bob:25");
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT
|
||||||
|
|||||||
Reference in New Issue
Block a user