mirror of
https://github.com/aljazceru/notedeck.git
synced 2025-12-17 08:44:20 +01:00
docs: add tokenator docs
Signed-off-by: William Casarin <jb55@jb55.com>
This commit is contained in:
211
crates/tokenator/DEVELOPER.md
Normal file
211
crates/tokenator/DEVELOPER.md
Normal file
@@ -0,0 +1,211 @@
|
||||
# Tokenator Developer Documentation
|
||||
|
||||
This document provides detailed information for developers who want to use the Tokenator library in their projects or contribute to its development.
|
||||
|
||||
## Core Concepts
|
||||
|
||||
Tokenator works with two primary concepts:
|
||||
|
||||
1. **Token Parsing**: Converting a sequence of string tokens into structured data
|
||||
2. **Token Serialization**: Converting structured data into a sequence of string tokens
|
||||
|
||||
The library is designed to be simple, efficient, and flexible for working with delimited string formats.
|
||||
|
||||
## API Reference
|
||||
|
||||
### TokenParser
|
||||
|
||||
`TokenParser` is responsible for parsing tokens from a slice of string references.
|
||||
|
||||
```rust
|
||||
pub struct TokenParser<'a> {
|
||||
tokens: &'a [&'a str],
|
||||
index: usize,
|
||||
}
|
||||
```
|
||||
|
||||
Key methods:
|
||||
|
||||
- `new(tokens: &'a [&'a str]) -> Self`: Creates a new parser from a slice of string tokens
|
||||
- `pull_token() -> Result<&'a str, ParseError<'a>>`: Gets the next token and advances the index
|
||||
- `peek_token() -> Result<&'a str, ParseError<'a>>`: Looks at the next token without advancing the index
|
||||
- `parse_token(expected: &'static str) -> Result<&'a str, ParseError<'a>>`: Checks if the next token matches the expected value
|
||||
- `alt<R>(parser: &mut TokenParser<'a>, routes: &[fn(&mut TokenParser<'a>) -> Result<R, ParseError<'a>>]) -> Result<R, ParseError<'a>>`: Tries each parser in `routes` until one succeeds
|
||||
- `parse_all<R>(&mut self, parse_fn: impl FnOnce(&mut Self) -> Result<R, ParseError<'a>>) -> Result<R, ParseError<'a>>`: Ensures all tokens are consumed after parsing
|
||||
- `try_parse<R>(&mut self, parse_fn: impl FnOnce(&mut Self) -> Result<R, ParseError<'a>>) -> Result<R, ParseError<'a>>`: Attempts to parse and backtracks on failure
|
||||
- `is_eof() -> bool`: Checks if there are any tokens left to parse
|
||||
|
||||
### TokenWriter
|
||||
|
||||
`TokenWriter` is responsible for serializing tokens into a string with the specified delimiter.
|
||||
|
||||
```rust
|
||||
pub struct TokenWriter {
|
||||
delim: &'static str,
|
||||
tokens_written: usize,
|
||||
buf: Vec<u8>,
|
||||
}
|
||||
```
|
||||
|
||||
Key methods:
|
||||
|
||||
- `new(delim: &'static str) -> Self`: Creates a new writer with the specified delimiter
|
||||
- `default() -> Self`: Creates a new writer with ":" as the delimiter
|
||||
- `write_token(token: &str)`: Appends a token to the buffer
|
||||
- `str() -> &str`: Gets the current buffer as a string
|
||||
- `buffer() -> &[u8]`: Gets the current buffer as a byte slice
|
||||
|
||||
### TokenSerializable
|
||||
|
||||
`TokenSerializable` is a trait that types can implement to be serialized to and parsed from tokens.
|
||||
|
||||
```rust
|
||||
pub trait TokenSerializable: Sized {
|
||||
fn parse_from_tokens<'a>(parser: &mut TokenParser<'a>) -> Result<Self, ParseError<'a>>;
|
||||
fn serialize_tokens(&self, writer: &mut TokenWriter);
|
||||
}
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
|
||||
The library provides detailed error types:
|
||||
|
||||
- `ParseError<'a>`: Represents errors that can occur during parsing
|
||||
- `Incomplete`: Not done parsing yet
|
||||
- `AltAllFailed`: All parsing options failed
|
||||
- `DecodeFailed`: General decoding failure
|
||||
- `HexDecodeFailed`: Hex decoding failure
|
||||
- `UnexpectedToken`: Encountered an unexpected token
|
||||
- `EOF`: No more tokens
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### Backtracking and Alternative Parsing
|
||||
|
||||
One of the powerful features of Tokenator is its support for backtracking and alternative parsing paths:
|
||||
|
||||
```rust
|
||||
// Try multiple parsing strategies
|
||||
let result = TokenParser::alt(&mut parser, &[
|
||||
|p| parse_strategy_a(p),
|
||||
|p| parse_strategy_b(p),
|
||||
|p| parse_strategy_c(p),
|
||||
]);
|
||||
|
||||
// Attempt to parse but backtrack on failure
|
||||
let result = parser.try_parse(|p| {
|
||||
let token = p.parse_token("specific_token")?;
|
||||
// More parsing...
|
||||
Ok(result)
|
||||
});
|
||||
```
|
||||
|
||||
### Parsing Hex Data
|
||||
|
||||
The library includes utilities for parsing hexadecimal data:
|
||||
|
||||
```rust
|
||||
use tokenator::parse_hex_id;
|
||||
|
||||
// Parse a 32-byte hex string from the next token
|
||||
let hash: [u8; 32] = parse_hex_id(&mut parser)?;
|
||||
```
|
||||
|
||||
### Custom Delimiters
|
||||
|
||||
You can use custom delimiters when serializing tokens:
|
||||
|
||||
```rust
|
||||
// Create a writer with a custom delimiter
|
||||
let mut writer = TokenWriter::new("|");
|
||||
writer.write_token("user");
|
||||
writer.write_token("alice");
|
||||
// Result: "user|alice"
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Implement TokenSerializable for your types**: This ensures consistency between parsing and serialization logic.
|
||||
|
||||
2. **Use try_parse for speculative parsing**: When trying different parsing strategies, wrap them in `try_parse` to ensure proper backtracking.
|
||||
|
||||
3. **Handle all error cases**: The detailed error types provided by Tokenator help identify and handle specific parsing issues.
|
||||
|
||||
4. **Consider memory efficiency**: The parser works with string references to avoid unnecessary copying.
|
||||
|
||||
5. **Validate input**: Always validate input tokens before attempting to parse them into your data structures.
|
||||
|
||||
## Integration Examples
|
||||
|
||||
### Custom Protocol Parser
|
||||
|
||||
```rust
|
||||
use tokenator::{TokenParser, TokenWriter, TokenSerializable, ParseError};
|
||||
|
||||
enum Command {
|
||||
Get { key: String },
|
||||
Set { key: String, value: String },
|
||||
Delete { key: String },
|
||||
}
|
||||
|
||||
impl TokenSerializable for Command {
|
||||
fn parse_from_tokens<'a>(parser: &mut TokenParser<'a>) -> Result<Self, ParseError<'a>> {
|
||||
let cmd = parser.pull_token()?;
|
||||
|
||||
match cmd {
|
||||
"GET" => {
|
||||
let key = parser.pull_token()?.to_string();
|
||||
Ok(Command::Get { key })
|
||||
},
|
||||
"SET" => {
|
||||
let key = parser.pull_token()?.to_string();
|
||||
let value = parser.pull_token()?.to_string();
|
||||
Ok(Command::Set { key, value })
|
||||
},
|
||||
"DEL" => {
|
||||
let key = parser.pull_token()?.to_string();
|
||||
Ok(Command::Delete { key })
|
||||
},
|
||||
_ => Err(ParseError::UnexpectedToken(tokenator::UnexpectedToken {
|
||||
expected: "GET, SET, or DEL",
|
||||
found: cmd,
|
||||
})),
|
||||
}
|
||||
}
|
||||
|
||||
fn serialize_tokens(&self, writer: &mut TokenWriter) {
|
||||
match self {
|
||||
Command::Get { key } => {
|
||||
writer.write_token("GET");
|
||||
writer.write_token(key);
|
||||
},
|
||||
Command::Set { key, value } => {
|
||||
writer.write_token("SET");
|
||||
writer.write_token(key);
|
||||
writer.write_token(value);
|
||||
},
|
||||
Command::Delete { key } => {
|
||||
writer.write_token("DEL");
|
||||
writer.write_token(key);
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Contributing
|
||||
|
||||
Contributions to Tokenator are welcome! Here are some areas that could be improved:
|
||||
|
||||
- Additional parsing utilities
|
||||
- Performance optimizations
|
||||
- More comprehensive test coverage
|
||||
- Example implementations for common use cases
|
||||
- Documentation improvements
|
||||
|
||||
When submitting a pull request, please ensure:
|
||||
|
||||
1. All tests pass
|
||||
2. New functionality includes appropriate tests
|
||||
3. Documentation is updated to reflect changes
|
||||
4. Code follows the existing style conventions
|
||||
@@ -1,5 +1,75 @@
|
||||
# Tokenator
|
||||
|
||||
# tokenator
|
||||
Tokenator is a simple, efficient library for parsing and serializing string tokens in Rust. It provides a lightweight solution for working with colon-delimited (or custom-delimited) string formats.
|
||||
|
||||
Tokenator is a simple string token parser and serializer.
|
||||
## Features
|
||||
|
||||
- Parse colon-delimited (or custom-delimited) string tokens
|
||||
- Serialize data structures into token strings
|
||||
- Robust error handling with descriptive error types
|
||||
- Support for backtracking and alternative parsing routes
|
||||
- Zero-copy parsing for improved performance
|
||||
- Hex decoding utilities
|
||||
|
||||
## Installation
|
||||
|
||||
Add this to your `Cargo.toml`:
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
tokenator = "0.1.0"
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
```rust
|
||||
use tokenator::{TokenParser, TokenWriter, TokenSerializable};
|
||||
|
||||
// Define a type that can be serialized to/from tokens
|
||||
struct User {
|
||||
name: String,
|
||||
age: u32,
|
||||
}
|
||||
|
||||
impl TokenSerializable for User {
|
||||
fn parse_from_tokens<'a>(parser: &mut TokenParser<'a>) -> Result<Self, tokenator::ParseError<'a>> {
|
||||
// Expect the token "user" first
|
||||
parser.parse_token("user")?;
|
||||
|
||||
// Parse name and age
|
||||
let name = parser.pull_token()?.to_string();
|
||||
let age_str = parser.pull_token()?;
|
||||
let age = age_str.parse::<u32>().map_err(|_| tokenator::ParseError::DecodeFailed)?;
|
||||
|
||||
Ok(Self { name, age })
|
||||
}
|
||||
|
||||
fn serialize_tokens(&self, writer: &mut TokenWriter) {
|
||||
writer.write_token("user");
|
||||
writer.write_token(&self.name);
|
||||
writer.write_token(&self.age.to_string());
|
||||
}
|
||||
}
|
||||
|
||||
fn main() {
|
||||
// Parsing example
|
||||
let tokens = ["user", "alice", "30"];
|
||||
let mut parser = TokenParser::new(&tokens);
|
||||
let user = User::parse_from_tokens(&mut parser).unwrap();
|
||||
assert_eq!(user.name, "alice");
|
||||
assert_eq!(user.age, 30);
|
||||
|
||||
// Serializing example
|
||||
let user = User {
|
||||
name: "bob".to_string(),
|
||||
age: 25,
|
||||
};
|
||||
let mut writer = TokenWriter::default();
|
||||
user.serialize_tokens(&mut writer);
|
||||
assert_eq!(writer.str(), "user:bob:25");
|
||||
}
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
|
||||
Reference in New Issue
Block a user