6.7 KiB
Tokenator Developer Documentation
This document provides detailed information for developers who want to use the Tokenator library in their projects or contribute to its development.
Core Concepts
Tokenator works with two primary concepts:
- Token Parsing: Converting a sequence of string tokens into structured data
- Token Serialization: Converting structured data into a sequence of string tokens
The library is designed to be simple, efficient, and flexible for working with delimited string formats.
API Reference
TokenParser
TokenParser is responsible for parsing tokens from a slice of string references.
pub struct TokenParser<'a> {
tokens: &'a [&'a str],
index: usize,
}
Key methods:
new(tokens: &'a [&'a str]) -> Self: Creates a new parser from a slice of string tokenspull_token() -> Result<&'a str, ParseError<'a>>: Gets the next token and advances the indexpeek_token() -> Result<&'a str, ParseError<'a>>: Looks at the next token without advancing the indexparse_token(expected: &'static str) -> Result<&'a str, ParseError<'a>>: Checks if the next token matches the expected valuealt<R>(parser: &mut TokenParser<'a>, routes: &[fn(&mut TokenParser<'a>) -> Result<R, ParseError<'a>>]) -> Result<R, ParseError<'a>>: Tries each parser inroutesuntil one succeedsparse_all<R>(&mut self, parse_fn: impl FnOnce(&mut Self) -> Result<R, ParseError<'a>>) -> Result<R, ParseError<'a>>: Ensures all tokens are consumed after parsingtry_parse<R>(&mut self, parse_fn: impl FnOnce(&mut Self) -> Result<R, ParseError<'a>>) -> Result<R, ParseError<'a>>: Attempts to parse and backtracks on failureis_eof() -> bool: Checks if there are any tokens left to parse
TokenWriter
TokenWriter is responsible for serializing tokens into a string with the specified delimiter.
pub struct TokenWriter {
delim: &'static str,
tokens_written: usize,
buf: Vec<u8>,
}
Key methods:
new(delim: &'static str) -> Self: Creates a new writer with the specified delimiterdefault() -> Self: Creates a new writer with ":" as the delimiterwrite_token(token: &str): Appends a token to the bufferstr() -> &str: Gets the current buffer as a stringbuffer() -> &[u8]: Gets the current buffer as a byte slice
TokenSerializable
TokenSerializable is a trait that types can implement to be serialized to and parsed from tokens.
pub trait TokenSerializable: Sized {
fn parse_from_tokens<'a>(parser: &mut TokenParser<'a>) -> Result<Self, ParseError<'a>>;
fn serialize_tokens(&self, writer: &mut TokenWriter);
}
Error Handling
The library provides detailed error types:
ParseError<'a>: Represents errors that can occur during parsingIncomplete: Not done parsing yetAltAllFailed: All parsing options failedDecodeFailed: General decoding failureHexDecodeFailed: Hex decoding failureUnexpectedToken: Encountered an unexpected tokenEOF: No more tokens
Advanced Usage
Backtracking and Alternative Parsing
One of the powerful features of Tokenator is its support for backtracking and alternative parsing paths:
// Try multiple parsing strategies
let result = TokenParser::alt(&mut parser, &[
|p| parse_strategy_a(p),
|p| parse_strategy_b(p),
|p| parse_strategy_c(p),
]);
// Attempt to parse but backtrack on failure
let result = parser.try_parse(|p| {
let token = p.parse_token("specific_token")?;
// More parsing...
Ok(result)
});
Parsing Hex Data
The library includes utilities for parsing hexadecimal data:
use tokenator::parse_hex_id;
// Parse a 32-byte hex string from the next token
let hash: [u8; 32] = parse_hex_id(&mut parser)?;
Custom Delimiters
You can use custom delimiters when serializing tokens:
// Create a writer with a custom delimiter
let mut writer = TokenWriter::new("|");
writer.write_token("user");
writer.write_token("alice");
// Result: "user|alice"
Best Practices
-
Implement TokenSerializable for your types: This ensures consistency between parsing and serialization logic.
-
Use try_parse for speculative parsing: When trying different parsing strategies, wrap them in
try_parseto ensure proper backtracking. -
Handle all error cases: The detailed error types provided by Tokenator help identify and handle specific parsing issues.
-
Consider memory efficiency: The parser works with string references to avoid unnecessary copying.
-
Validate input: Always validate input tokens before attempting to parse them into your data structures.
Integration Examples
Custom Protocol Parser
use tokenator::{TokenParser, TokenWriter, TokenSerializable, ParseError};
enum Command {
Get { key: String },
Set { key: String, value: String },
Delete { key: String },
}
impl TokenSerializable for Command {
fn parse_from_tokens<'a>(parser: &mut TokenParser<'a>) -> Result<Self, ParseError<'a>> {
let cmd = parser.pull_token()?;
match cmd {
"GET" => {
let key = parser.pull_token()?.to_string();
Ok(Command::Get { key })
},
"SET" => {
let key = parser.pull_token()?.to_string();
let value = parser.pull_token()?.to_string();
Ok(Command::Set { key, value })
},
"DEL" => {
let key = parser.pull_token()?.to_string();
Ok(Command::Delete { key })
},
_ => Err(ParseError::UnexpectedToken(tokenator::UnexpectedToken {
expected: "GET, SET, or DEL",
found: cmd,
})),
}
}
fn serialize_tokens(&self, writer: &mut TokenWriter) {
match self {
Command::Get { key } => {
writer.write_token("GET");
writer.write_token(key);
},
Command::Set { key, value } => {
writer.write_token("SET");
writer.write_token(key);
writer.write_token(value);
},
Command::Delete { key } => {
writer.write_token("DEL");
writer.write_token(key);
},
}
}
}
Contributing
Contributions to Tokenator are welcome! Here are some areas that could be improved:
- Additional parsing utilities
- Performance optimizations
- More comprehensive test coverage
- Example implementations for common use cases
- Documentation improvements
When submitting a pull request, please ensure:
- All tests pass
- New functionality includes appropriate tests
- Documentation is updated to reflect changes
- Code follows the existing style conventions