Monday, August 9, 2010

Current Status

Hey folks, since several people have spoken up about the lack of movement on the project, I thought I'd give a quick update.

We've got plenty of people who are interested in seeing this project move forward - unfortunately, few willing or able to help with the actual development. I had hoped to pick up at least one or two other developers by now. This project is sufficiently audacious that I can't hope to achieve the goals I set for it by myself, in the time I have available for it. I'm happy with the work that's been done so far and I'd like to see it continue. If/when other developers start contributing, I'll be happy to get back to work on it. Until then, my contributions to it will be spotty.

So, if you know some C++ and you'd like to see this forward, let me know!

If you're interested in language development and you're willing to try something new, you can also check out another project I'm involved with, Crack: http://code.google.com/p/crack-language/

Wednesday, February 10, 2010

Reproduction

I made a recent mailing list post with some thoughts and status updates, which I thought I'd reproduce here.
The big wave in the small world of alternative PHP implementations is the announcement of HipHop from Facebook. I haven't seen the source code as of yet, so rather than make any comments on it, I'll just note that Facebook (for better or worse) didn't attempt any sort of cooperation with either Roadsend or phc (or any others, as far as I know) when starting their project, and highlight a humorous (but true) comment Paul Biggar (author of phc) mentions in a recent blog post [1]:

"... I’m also slightly annoyed that people all of a sudden care about PHP compilers. I worked on one for 4 years and I could not convince anyone to give a shit. But now that its got the Facebook logo on it, all of a sudden PHP compilers are the greatest thing ever. Bah."

Anyway, when we see the source, we can determine how closely the goals of HipHop coincide with those of Raven. I'm intrigued that they (apparently) opted for their own C++ runtime (as we did with rphp).

For now though, here's what we've been up to and where we're heading.

Recent work is all about our frontend. We've been working towards two goals: a fast, memory efficient parser, and generating a parse tree (with matching transformation API) mostly compatible with the one used by phc. Regarding the former, we've adopted an AST design that's very similar to the one used in Clang (using LLVM's memory pool, string pool, child node iteration technique, etc). Regarding the latter, this is because we plan to use the PHP source level analyzer and optimizer that is part of the phc project. While fleshing these changes out, a new tool for static analysis has been produced (rphp-analyzer) which parses and analyzes PHP source files, able to run the available passes, produce messages, and dump to XML.

We're now to the point where the parser is rather PHP 5.2 complete, and we've already begun porting phc lowering passes to rphp. Unfortunately, LLVM IR code generation had been suspended while this work was going on. I think we're almost to the point where simple code generation can resume.

So now that a lot of ground work has been completed, we're hoping that the real guts of the phc analyzer and optimizer can be ported into rphp. I say "ported", because even though both projects are in C++, for purposes of speed and efficiency[2] we've made several design decisions that make our data structures and API a bit different from those in phc. This means there is some work involved in moving e.g. a pass from phc to rphp, but on the whole we've tried to minimize this overhead and it seems to be working out well so far.

Otherwise, we're setting up a testing framework for the passes, and the grammar is still waiting for PHP 5.3/6 language constructs and general improvements. Another important goal I'd like to get to soon is making sure we can produce good diagnostics and error messages during parse and transform.

Because we've been focused on the frontend, the runtime hasn't changed recently. Now that we're almost ready to resume code generation, though, we'll get back into that as well.

In a nutshell, that's where we're at, and what we'll be working on for the time being.

Finally, if you've been following Roadsend PHP at all and are dismayed at the speed with which we're progressing, it's worth noting that since this project went open source in 2007, it has had no financial support and is simply a grassroots labor of love[3]  for those who contribute. If you're interested in lending a hand, join us in #roadsend on irc.freenode.net!
[1] http://blog.paulbiggar.com/archive/a-rant-about-php-compilers-in-general-and-hiphop-in-particular/
[2] Ostensibly, anyway .. we don't have real benchmarks yet
[3] Ok, "enjoyable distraction", at least :)

Tuesday, January 26, 2010

Front Ended

Quick update:

Between 12/14 and 1/14 we managed 90 commits worth of parser and other front end improvements.

rphp-analyzer is now created during the build. This is a new tool just for static analysis, which we're using to help perfect the parser. We can now parse quite a bit of standard PHP5 code, and we've started to port several passes from the phc project into the rphp framework.

We expect basic code generation to find its way back into the build soon!

Monday, December 7, 2009

Dataflow

Attention has turned back to the compiler frontend. Some recent work includes:
  • Removal of the preprocessor. This was used (pre-parse) to convert double quoted strings into their single quoted equivalents, interpreting escape codes and interpolated variables in the process. This has been removed in favor of doing it after parsing with an AST transform (which is not yet complete).
  • A new parsing context allocates memory for AST nodes using an LLVM BumpPtr allocator (memory pool). This design is similar to the one used in Clang.
  • We've also adopted Clang's use of a common iterator class for children of AST nodes
  • Identifiers in the AST are now allocated from a common StringPool, which saves memory by storing only one unique string in memory that common identifiers can point to.
The next exciting development relates to another open source PHP compiler effort, phc. We've been working with the project's maintainer, Paul Biggar, to see how we can combine efforts towards common goals. The phc project itself has a different philosphy than Roadsend PHP, namely that it uses the PHP runtime rather than a custom one. This lets it take advantage of all of the current PHP extensions and maintains very good compatibility with Zend PHP, at the expense of being limited by Zend engine/runtime design decisions.

However, there's room for cooperation on the frontend. The phc authors have put much effort into parsing and source level analysis of PHP (Paul recently completed his PhD thesis on the subject). We're currently investigating how we can take advantage of this analysis in Roadsend PHP. The goal would be to use the same (or similar) IR structure and reuse the applicable passes that analyze and optimize PHP source, allowing us to generate more efficient low level LLVM IR.

More on these developments soon!

Thursday, November 5, 2009

Big Integers

The runtime recently underwent its first change that is not 100% compatible with Zend based PHP: arbitrary precision integers.

In Zend PHP, an integer (represented as a long, which is generally 32 bits or 64 bits big, depending on the platform) will overflow to a float. This happens when you specify a large literal integer in the source code, or upon certain arithmetical operations.

We've decided to integrate seamless "bignums" into Roadsend PHP instead. This means you can specify an integer as big as you want (only limited by memory), and that arithmetical operations will yield accurate results even when the numbers get large.

Internally, integers that can fit into a hardware word are still represented this way, for speed. They will convert to a bignum (using GMP library) only when necessary.

Zend PHP 5.x and current 6 has support for this now only through the GMP or BC math extensions. This is awkward since you have to juggle the numbers as resources. But, according to recent developer notes, this same functionality may end up in Zend PHP at some point as well.

So, either we're a little ahead of the curve, or we're diverging slightly. Either way we think this is a win for the language as a whole.

Thursday, September 17, 2009

Summer Work

We've had a few contributions over the last few months, and I've been able to get some work done on the Error Manager as well.

Corni contributed parsing and code generation support for basic if-else blocks. He's also commented a bit and cleaned up some of the code generation. In addition, he's setup a git repository that updates from the main svn trunk. If you're interested in using that, it's available at http://gitorious.org/rphp/rphp

I've added the Error Manager class to the runtime, which is in charge of:
  • Keeping track of current source location and PHP stack (when available)
  • Handling PHP runtime warnings, notices, fatal errors and exceptions
  • PHP level, user defined error and exception handlers
It's also used by the target system for emitting warnings and notices. This is because the compiler will be called from the runtime (during e.g. include or eval) and needs to be able to redirect its output to the current runtime instance.

When a runtime error occurs, a C++ exception is thrown. For this to work in the JIT, it has to catch a C++ exception above the JIT engine code when thrown by the runtime that gets called from the generated LLVM IR. We had a tough time getting this to work, and ended up making a new sandbox for simplified test cases in which to test this (and in the future, other) bits of code.

In the end it also led to porting the code to the latest (and as of this post, unreleased) version of LLVM: 2.6. It's in pre-release and should be available soon, however. Unfortunately this means the current trunk doesn't support 2.5, but it did fix our issues with exceptions.

I'll be attending the LLVM Dev Meeting in a few weeks, which I'm really looking forward to. If you're going or will be in the area and want to chat about rphp, let me know!

Finally, I've setup a mailing list for rphp development. You can find it at http://mail.roadsend.com/mailman/listinfo/rphp-devel

Monday, July 20, 2009

On Target

Work has progressed lately on the target system. These are a set of classes used to convey information, such as configuration or command line options, to various parts of the backend that do the real work - such as compiling or JITing a single script - and also to the runtime. This paves the way for a very modular, dynamic driver system which will be necessary to use rphp from as many different front ends as are planned. For example, we're planning on supporting at least:

  1. Generating native (static file) binaries and libraries
  2. Just In Time compilation and execution of command line scripts
  3. A web interface to the JIT, which you can fire up and point to the root directory of your PHP application
  4. Library interface to the analyzer, which can be called from e.g. an IDE

Speaking of the analyzer, we've also had some progress there as well. More on that soon.