Fred on Programming: Tools for Writing a Parser

People have been writing tools to generate scanners and parsers for decades. YACC is probably the most famous. It was created in the 1970s, and, since it stands for Yet Another Compiler Compiler, it probably wasn't the first attempt at the problem.

YACC is a pain to use, though. It uses a parsing algorithm that has great worst case performance but causes massive headaches for programmers. Essentially, you have to ensure that your grammar conforms to the LARL(1) rules. You'd better know what that means before you use it, too!

Well, I wanted to accomplish a task instead of learning about parsing algorithms. I looked around for alternatives. Wow! There's a lot of possibilities out there!

If you look at that link, though, most of them use the same algorithm or a similar algorithm as YACC. I ended up looking at two that didn't: Accent and Spirit.

I've used Spirit a few times before, and it works great. It's a great C++ library created by Joel de Guzemann and others. It lets you write the scanner and parser using a C++ expression template library. It's extremely flexible, fairly slow, and mostly easy to use. I decided against it, though because I was curious what it was competing against.

Accent appears to be its major competition if I didn't want to learn about LL(1) grammars and left/reduce vs reduce/reduce conflicts. (Yeah, I got that wrong. Feel free to prove me wrong by explaining why in a few simple sentences!)

Accent is a fairly small program that generates some pretty nice C code. (I tried to embed some C++ in the semantic actions and failed. Just accept that you'll have to manually manage memory yourself for a few moments.) Anyway, it's pretty easy to use. On my first try, I created a grammar that turned out to be "ambiguous". I didn't bother to figure out its algorithm. I just added a bit of logic to disambiguate it, and I moved on with my life. Very nice!

Oh, one more thing. Spirit uses the same technology to build both the scanner and the parser. It's really pretty slick. If you're creating an AST tree, it's even possible to make a trivial scanner and do both the tokenizing and the parsing in one grammar. This really slows things down a lot, but it is possible! (Don't use Spirit if speed is your goal. Code generators are still better. Spirit 2.0 may change that, though. We'll see.)

Anyway, Accent uses Flex to generate the scanner. It's another file format you'll have to learn, but it's extremely simple. Spirit isn't really gaining much there.

Which one is better? Spirit is incredibly flexible, but Accent produces much faster code. They're both pretty easy to use.

This originally appeared in PC-Doctor's blog.

Fred on Programming

Wednesday, February 13, 2008

Tools for Writing a Parser

No comments:

Blog Archive

Labels

Who is Fred?