Wednesday, February 10, 2010

Reproduction

I made a recent mailing list post with some thoughts and status updates, which I thought I'd reproduce here.
The big wave in the small world of alternative PHP implementations is the announcement of HipHop from Facebook. I haven't seen the source code as of yet, so rather than make any comments on it, I'll just note that Facebook (for better or worse) didn't attempt any sort of cooperation with either Roadsend or phc (or any others, as far as I know) when starting their project, and highlight a humorous (but true) comment Paul Biggar (author of phc) mentions in a recent blog post [1]:

"... I’m also slightly annoyed that people all of a sudden care about PHP compilers. I worked on one for 4 years and I could not convince anyone to give a shit. But now that its got the Facebook logo on it, all of a sudden PHP compilers are the greatest thing ever. Bah."

Anyway, when we see the source, we can determine how closely the goals of HipHop coincide with those of Raven. I'm intrigued that they (apparently) opted for their own C++ runtime (as we did with rphp).

For now though, here's what we've been up to and where we're heading.

Recent work is all about our frontend. We've been working towards two goals: a fast, memory efficient parser, and generating a parse tree (with matching transformation API) mostly compatible with the one used by phc. Regarding the former, we've adopted an AST design that's very similar to the one used in Clang (using LLVM's memory pool, string pool, child node iteration technique, etc). Regarding the latter, this is because we plan to use the PHP source level analyzer and optimizer that is part of the phc project. While fleshing these changes out, a new tool for static analysis has been produced (rphp-analyzer) which parses and analyzes PHP source files, able to run the available passes, produce messages, and dump to XML.

We're now to the point where the parser is rather PHP 5.2 complete, and we've already begun porting phc lowering passes to rphp. Unfortunately, LLVM IR code generation had been suspended while this work was going on. I think we're almost to the point where simple code generation can resume.

So now that a lot of ground work has been completed, we're hoping that the real guts of the phc analyzer and optimizer can be ported into rphp. I say "ported", because even though both projects are in C++, for purposes of speed and efficiency[2] we've made several design decisions that make our data structures and API a bit different from those in phc. This means there is some work involved in moving e.g. a pass from phc to rphp, but on the whole we've tried to minimize this overhead and it seems to be working out well so far.

Otherwise, we're setting up a testing framework for the passes, and the grammar is still waiting for PHP 5.3/6 language constructs and general improvements. Another important goal I'd like to get to soon is making sure we can produce good diagnostics and error messages during parse and transform.

Because we've been focused on the frontend, the runtime hasn't changed recently. Now that we're almost ready to resume code generation, though, we'll get back into that as well.

In a nutshell, that's where we're at, and what we'll be working on for the time being.

Finally, if you've been following Roadsend PHP at all and are dismayed at the speed with which we're progressing, it's worth noting that since this project went open source in 2007, it has had no financial support and is simply a grassroots labor of love[3]  for those who contribute. If you're interested in lending a hand, join us in #roadsend on irc.freenode.net!
[1] http://blog.paulbiggar.com/archive/a-rant-about-php-compilers-in-general-and-hiphop-in-particular/
[2] Ostensibly, anyway .. we don't have real benchmarks yet
[3] Ok, "enjoyable distraction", at least :)