The first order of business is to parse text and turn it into Lua syntax tree.

Combinatory parsing

From my days of Haskell programming I know an instrument called combinatory parsing for creating recursive descent parsers. Parser combinators allow to produce very readable parsers without using parser generator tools like flex and yacc.

The main idea is that parser is a function from an input to the resulting value together with the rest of input. And since it’s just a function, you can write it by hand, or you can take 2 parsers and compose them together, or (if your language treats functions as first-class citizens) you can write a function that takes parser as an input and outputs another parser.

Rust implementation

In Rust we can express parsers like that:

fn parse<'a, T>(input: &'a str) -> Option<(T, &'a str)> {
  ...
}

A function that takes a string slice, and optionally returns a T with the rest of the slice. I choose to explicitly annotate lifetimes because it makes it easier for me to understand them. Here by using the same 'a on both slices I want to express that they point to one and the same data (albeit to different portions of it).

Some basic parsers:

  • Empty parser: always succeeds, does not consume any input.
fn empty_parser<'a>(input: &'a str) -> Option<((), &'a str)> {
  Some(((), input))
}
  • Failing parser: never succeeds.
fn fail_parser<'a>(input: &'a str) -> Option<((), &'a str)> {
  None
}
  • A parser that checks if the next character in input satisfies f (returning the character if it does)
fn satisfies<'a>(f: impl Fn(char) -> bool,
                 input: &'a str) -> Option<(char, &'a str)> {
    // Here I mean to say "take next character if any". And if there isn't any,
    // the function immediately returns None. That's what ? does.
    let c = input[0..].chars().next()?;
    if f(c) {
        // If character c satisfies f, return it and forward input 1 character.
        Some((c, &input[1..]))
    } else {
        // Otherwise just fail.
        None
    }
}

impl Fn(char) -> bool means “any type that implements trait Fn(char) -> bool” and trait Fn(char) -> bool is a trait that is implemented by closures that take char argument and produce bool.

  • A parser that takes two parsers and applies them one after another.
fn seq<'a, T1, T2>(p1: impl Fn(&'a str) -> Option<(T1, &'a str)>,
                   p2: impl Fn(&'a str) -> Option<(T2, &'a str)>,
                   input: &'a str) -> Option<((T1, T2), &'a str)> {
    // Try running p1.
    let (p1_result, input) = p1(input)?;
    // Try running p2 on the rest of the input from p1.
    let (p2_result, input) = p2(input)?;
    // And aggregate the result returning the rest of the input from p2.
    Some(((p1_result, p2_result), input))
}

Just like a previous parser, this one takes functions as arguments, however, this time these functions are parsers. So seq is the first example of a parser combinator.

Now, let’s try to use this parser library.

fn char_parser<'a>(c: char, input: &'a str) -> Option<(char, &'a str)> {
  satisfies(|in_c| in_c == c, input)
}

// This parser expects input to start with string "ab"
fn ab_parser<'a>(input: &'a str) -> Option<((char, char), &'a str)> {
    seq(|input| char_parser('a', input), |input| char_parser('b', input), input)
}

I don’t like this ab_parser much: it’s too verbose with explicit input passing.

Making it more functional

So, what if our parser functions stop being parsers themselves and would return parsers instead?

fn empty_parser<'a>() -> impl Fn(&'a str) -> Option<((), &'a str)> {
    |input| Some(((), input))
}

fn fail_parser<'a>() -> impl Fn(&'a str) -> Option<((), &'a str)> {
    |input| None
}

fn satisfies<'a>(f: impl Fn(char) -> bool) ->
                 impl Fn(&'a str) -> Option<(char, &'a str)> {
    // `move` here means to move f into the closure instead of borrowing it.
    move |input| {
        let c = input[0..].chars().next()?;
        if (f(c)) {
            Some((c, &input[1..]))
        } else {
            None
        }
    }
}

fn seq<'a, T1, T2>(p1: impl Fn(&'a str) -> Option<(T1, &'a str)>,
                   p2: impl Fn(&'a str) -> Option<(T2, &'a str)>) ->
                   impl Fn(&'a str) -> Option<((T1, T2), &'a str)> {
    move |input| {
        let (p1_result, input) = p1(input)?;
        let (p2_result, input) = p2(input)?;
        Some(((p1_result, p2_result), input))
    }
}

What changed here is input: &'a str has migrated to the return type with impl Fn(&'a str) and into the function body as |input|. This means that empty_parser, fail_parser, satisfies and seq now return closures that take &'a str and return Option<(..., &'a str)>. impl on the return type position is interesting: it’s so-called opaque type. It’s in fact some concrete type but users of the function only know that implements a trait Fn(...) -> ....

The library usage now looks like:

fn char_parser<'a>(c: char) -> impl Fn(&'a str) -> Option<(char, &'a str)> {
    satisfies(move |in_c| in_c == c)
}

fn ab_parser<'a>() -> impl Fn(&'a str) -> Option<((char, char), &'a str)> {
    seq(char_parser('a'), char_parser('b'))
}

Way nicer.

Day 1 and it’s already time for hacks

What’s not so nice is all the impl Fn(&'a str) -> Option<(..., &'a str)> lying around. What I really want is to make a trait alias, but there’s no such thing yet (see here). So, time for a hack:

trait Parser<'a, T>: Fn(&'a str) -> Option<(T, &'a str)> {}
impl<'a, T, F : Fn(&'a str) -> Option<(T, &'a str)>> Parser<'a, T> for F {}

I’m creating a brand new empty trait Parser<'a, T> that extends Fn(&'a str) -> Option<(T, &'a str)> and also add an implementation of Parser<'a, T> to all types F that implement Fn(&'a str) -> Option<(T, &'a str)>. This way all types that implement Parser<'a ,T> implement Fn(&'a str) -> Option<(T, &'a str)> (because of inheritance) and all types that implement Fn(&'a str) -> Option<(T, &'a str)> implement Parser<'a, T> (because impl rule explicitly says so).

fn empty_parser<'a>() -> impl Parser<'a, ()> {
    |input| Some(((), input))
}

fn fail_parser<'a>() -> impl Parser<'a, ()> {
    |input| None
}

fn satisfies<'a>(f: impl Fn(char) -> bool) -> impl Parser<'a, char> {
    move |input| {
        let c = input[0..].chars().next()?;
        if (f(c)) {
            Some((c, &input[1..]))
        } else {
            None
        }
    }
}

fn seq<'a, T1, T2>(p1: impl Parser<'a, T1>,
                   p2: impl Parser<'a, T2>) ->
                   impl Parser<'a, (T1, T2)> {
    move |input| {
        let (p1_result, input) = p1(input)?;
        let (p2_result, input) = p2(input)?;
        Some(((p1_result, p2_result), input))
    }
}

Noise at the type level is eliminated.

Outline

  1. Lua in Rust: Introduction
  2. (This one) Lua in Rust: Combinatory parsing
  3. Lua in Rust: Combinatory parsing (cont.)
  4. Lua in Rust: Lua lexemes
  5. Lua in Rust: More parsing
  6. Lua in Rust: Left recursion