Lua in Rust: Combinatory parsing (cont.)
Continued exploration of combinatory parsing in Rust.
Extracting parsed string
Playing around some more, I realized I need a parser that would run a given parser and return a consumed string.
pub fn parsed_string<'a, T>(p: impl Parser<'a, T>) -> impl Parser<'a, (T, &'a str)> {
move |input| {
let (p_result, p_input) = p(input)?;
Some(((p_result, ???), p_input))
}
}
It’s going to be useful in two places: parsing long brackets and numbers. Long brackets
are a form of “raw” string literals: they do not contain any escape characters or anything.
So, I’d be able to write a parser that does nothing and only waits for the closing bracket to appear and
then wrap it in parsed_string
which would give me precisely the string inside the brackets.
As for the numbers: I’d be able to parse numbers as Lua defines them and then give the resulting string
to Rust stdlib to parse the number for me.
#[derive(Copy, Clone)]
struct ParserState<'a> {
input: &'a str,
index: usize,
}
trait Parser<'a, T>: Fn(ParserState<'a>) -> Option<(T, ParserState<'a>)> {}
impl<'a, T, F: Fn(ParserState<'a>) -> Option<(T, ParserState<'a>)>> Parser<'a, T> for F {}
fn run_parser<'a, T>(input: &'a str, p: impl Parser<'a, T>) -> Option<T> {
let s = ParserState { input, index: 0 };
match p(s) {
Some((result, s)) => Some(result),
None => None,
}
}
fn satisfies<'a>(f: impl Fn(char) -> bool) -> impl Parser<'a, char> {
move |s: ParserState<'a>| {
let c = s.input[s.index..].chars().next()?;
if f(c) {
let mut out_s = s;
out_s.index += 1;
return Some((c, out_s));
}
return None;
}
}
fn parsed_string<'a, T>(p: impl Parser<'a, T>) -> impl Parser<'a, (T, &'a str)> {
move |s| {
let (p_result, p_s) = p(s)?;
Some(((p_result, &s.input[s.index..p_s.index]), p_s))
}
}
Instead of passing input around I now pass ParserState
around, which contains an immutable input
and
an index
into it. satisfies
now moves an index
and parsed_string
is able to use indices before and
after p
to get parsed string. And I also added a helper run_parser
that hides ParserState
from the clients.
parser_lib
Here’s all the code I ended up with. There’re also some additional combinators I didn’t mension here:
fmap(f, p)
(I just named it like it’s in Haskell) that applies a functionf
to the result of parserp
. If parserp
fails,fmap(f, p)
also fails; if parserp
givesx
,fmap(f, p)
givesf(x)
and consumes exactly whatp
consumes.choice(p1, p2)
that selects the first passing parser of the two.many(p)
that appliesp
as many times as it can collecting the results and returning them in a vector.many1(p)
that is just likemany
but requiresp
to pass at least once.string_parser(expected_str)
that expects input to start withexpected_str
.
Everything is a Box
Tired but feeling a sense of accomplishment, I tried one last thing:
1
2
3
4
5
6
7
8
9
10
11
#[test]
fn test_complex_parser() {
let mut p = fmap(|_| 3, empty_parser());
assert_eq!(run_parser_impl("abc def", &p), Some((3, "abc def")));
p = fmap(|x: (i32, ())| x.0, seq(p, fail_parser()));
assert_eq!(run_parser_impl("abc def", &p), None);
let q = fmap(|_| 5, string_parser("abc"));
assert_eq!(run_parser_impl("abc def", &q), Some((5, " def")));
p = choice(p, q);
assert_eq!(run_parser_impl("abc def", &p), Some((5, " def")));
}
And got a lovely message:
p = fmap(|x: (i32, ())| x.0, seq(p, fail_parser()));
^^^^^^^^^^^^^^^^^^^^^ expected opaque type, found a different opaque type
Right, so there’s a problem with opaque types. At line 3 p
gets assigned some type from fmap
. And then
at line 5 we’re trying to assign a value of a different opaque type into p
. This is obviously wrong.
The natural thing to do here is to store parser in a heap and make p
point to it. We then would do just that
on line 3, on line 5 and 9 we would create a new parser, put it in an another place in a heap and make p
point to it instead. I tried to do just that, but probably made some mistake as it didn’t work out.
Since I was tired (protip: never do anything when you’re tired), my solution is rather heavy handed:
everything is a Box.
All it does is changes impl Parser<'a, T>
to Box<dyn Parser<'a, T>
. Which puts every parser in a heap.
It also added new lifetime requirement:
fn satisfies<'a, 'b>(f: impl Fn(char) -> bool + 'b) -> Box<dyn Parser<'a, char> + 'b> {
Box::new(move |s: ParserState<'a>| {
let c = s.input[s.index..].chars().next()?;
if f(c) {
let mut out_s = s;
out_s.index += 1;
return Some((c, out_s));
}
None
})
}
f
is now moved into a Box
and so we need to guarantee that stuff inside f
lives at least as long as Parser
inside
the returned box.
The downside of the approach - every parser allocates a place in a heap. It’s suboptimal, I’d look at this again when time comes for performance optimisations.
parser_lib again
And now just a bit more utility parsers to make parser_lib
ready to work on Lua. Code
seq_bind(p, f)
runs parserp
, extracts the resultx
and returns parserf(x)
.seq1(p1, p2)
,seq2(p1, p2)
,seq_(p1, p2)
that extract only the first result (seq1
), the second result (seq2
) or ignore the results completely (seq_
)seqs_[p1, p2, ...]
that’s implemented as a declarative macro and looks like a standardvec!
macro. It just appliesseq_
repeatedly (and now remember that every time a new heap allocation is made… Ouch.).choices_(ps)
that accepts a vector of parsers and chooses the first that passes. It also just applieschoice
in a loop. The reason thatseqs_
couldn’t use a vector and this could -seq_
doesn’t care what return typesp1, p2, ...
have, whilechoices_
requires that all parsers return the same type.try_parser(p)
tries to run parserp
and if it returnsx
, returnsSome(x)
and if it fails, returnsNone
without consuming input.not_parser(p)
that tries to run parserp
and if it succeeds, fails, and if it fails, returns()
without consuming input.