Wednesday, February 13, 2008

Tools for Writing a Parser

People have been writing tools to generate scanners and parsers for decades. YACC is probably the most famous. It was created in the 1970s, and, since it stands for Yet Another Compiler Compiler, it probably wasn't the first attempt at the problem.

YACC is a pain to use, though. It uses a parsing algorithm that has great worst case performance but causes massive headaches for programmers. Essentially, you have to ensure that your grammar conforms to the LARL(1) rules. You'd better know what that means before you use it, too!
People have been writing tools to generate scanners and parsers for decades. YACC is probably the most famous. It was created in the 1970s, and, since it stands for Yet Another Compiler Compiler, it probably wasn't the first attempt at the problem.

YACC is a pain to use, though. It uses a parsing algorithm that has great worst case performance but causes massive headaches for programmers. Essentially, you have to ensure that your grammar conforms to the LARL(1) rules. You'd better know what that means before you use it, too!

Well, I wanted to accomplish a task instead of learning about parsing algorithms. I looked around for alternatives. Wow! There's a lot of possibilities out there!

If you look at that link, though, most of them use the same algorithm or a similar algorithm as YACC. I ended up looking at two that didn't: Accent and Spirit.

I've used Spirit a few times before, and it works great. It's a great C++ library created by Joel de Guzemann and others. It lets you write the scanner and parser using a C++ expression template library. It's extremely flexible, fairly slow, and mostly easy to use. I decided against it, though because I was curious what it was competing against.

Accent appears to be its major competition if I didn't want to learn about LL(1) grammars and left/reduce vs reduce/reduce conflicts. (Yeah, I got that wrong. Feel free to prove me wrong by explaining why in a few simple sentences!)

Accent is a fairly small program that generates some pretty nice C code. (I tried to embed some C++ in the semantic actions and failed. Just accept that you'll have to manually manage memory yourself for a few moments.) Anyway, it's pretty easy to use. On my first try, I created a grammar that turned out to be "ambiguous". I didn't bother to figure out its algorithm. I just added a bit of logic to disambiguate it, and I moved on with my life. Very nice!

Oh, one more thing. Spirit uses the same technology to build both the scanner and the parser. It's really pretty slick. If you're creating an AST tree, it's even possible to make a trivial scanner and do both the tokenizing and the parsing in one grammar. This really slows things down a lot, but it is possible! (Don't use Spirit if speed is your goal. Code generators are still better. Spirit 2.0 may change that, though. We'll see.)

Anyway, Accent uses Flex to generate the scanner. It's another file format you'll have to learn, but it's extremely simple. Spirit isn't really gaining much there.

Which one is better? Spirit is incredibly flexible, but Accent produces much faster code. They're both pretty easy to use.

This originally appeared in PC-Doctor's blog.

Tuesday, February 5, 2008

Interfaces, Friends, and the .NET Framework

The .NET Framework has a lot of really great things in it. I've just started playing with a few corners of it, and I love the amount of stuff that it's got in it. Some things really irritate me, though, and it's a lot more satisfying to talk about that stuff!

C# and the CLR make it really hard to hide information. First of all, the .NET framework is built around inheritance. Everything is inherited from something else, and if you want to extend an object, then you're going to inherit, too. Inheritance hides almost nothing, but you already knew that, and, presumably, you don't use it as much as the programmers in Redmond.

That's not what I want to talk about here. I'm going to complain about the member access rights that C# and the CLR support.

In C#, there are four different member access rights. The usual private, protected, and public member access rights are supported. They also support "internal", which is the same as not exporting a function from your DLL in C++. The CLR supports one additional one called "Family and Assembly" which is the same as a protected member that is not exported from the DLL.

It's missing two extremely useful access rights, however. Java's package level scope and C++'s friendship. Both of these allow a programmer to grant access to a limited number of functions and classes. Java's package level scope is the optimal one, in my opinion. This allows a programmer to give limited access to a limited number of classes. Friendship allows complete access to individual classes and functions.

In the C# world, you're expected to grant public access to functions that should only be used by one other function outside your class. You're supposed to be happy about it, too!

This originally appeared on the PC-Doctor blog.