Monday, December 7, 2009

Dataflow

Attention has turned back to the compiler frontend. Some recent work includes:
  • Removal of the preprocessor. This was used (pre-parse) to convert double quoted strings into their single quoted equivalents, interpreting escape codes and interpolated variables in the process. This has been removed in favor of doing it after parsing with an AST transform (which is not yet complete).
  • A new parsing context allocates memory for AST nodes using an LLVM BumpPtr allocator (memory pool). This design is similar to the one used in Clang.
  • We've also adopted Clang's use of a common iterator class for children of AST nodes
  • Identifiers in the AST are now allocated from a common StringPool, which saves memory by storing only one unique string in memory that common identifiers can point to.
The next exciting development relates to another open source PHP compiler effort, phc. We've been working with the project's maintainer, Paul Biggar, to see how we can combine efforts towards common goals. The phc project itself has a different philosphy than Roadsend PHP, namely that it uses the PHP runtime rather than a custom one. This lets it take advantage of all of the current PHP extensions and maintains very good compatibility with Zend PHP, at the expense of being limited by Zend engine/runtime design decisions.

However, there's room for cooperation on the frontend. The phc authors have put much effort into parsing and source level analysis of PHP (Paul recently completed his PhD thesis on the subject). We're currently investigating how we can take advantage of this analysis in Roadsend PHP. The goal would be to use the same (or similar) IR structure and reuse the applicable passes that analyze and optimize PHP source, allowing us to generate more efficient low level LLVM IR.

More on these developments soon!

Thursday, November 5, 2009

Big Integers

The runtime recently underwent its first change that is not 100% compatible with Zend based PHP: arbitrary precision integers.

In Zend PHP, an integer (represented as a long, which is generally 32 bits or 64 bits big, depending on the platform) will overflow to a float. This happens when you specify a large literal integer in the source code, or upon certain arithmetical operations.

We've decided to integrate seamless "bignums" into Roadsend PHP instead. This means you can specify an integer as big as you want (only limited by memory), and that arithmetical operations will yield accurate results even when the numbers get large.

Internally, integers that can fit into a hardware word are still represented this way, for speed. They will convert to a bignum (using GMP library) only when necessary.

Zend PHP 5.x and current 6 has support for this now only through the GMP or BC math extensions. This is awkward since you have to juggle the numbers as resources. But, according to recent developer notes, this same functionality may end up in Zend PHP at some point as well.

So, either we're a little ahead of the curve, or we're diverging slightly. Either way we think this is a win for the language as a whole.

Thursday, September 17, 2009

Summer Work

We've had a few contributions over the last few months, and I've been able to get some work done on the Error Manager as well.

Corni contributed parsing and code generation support for basic if-else blocks. He's also commented a bit and cleaned up some of the code generation. In addition, he's setup a git repository that updates from the main svn trunk. If you're interested in using that, it's available at http://gitorious.org/rphp/rphp

I've added the Error Manager class to the runtime, which is in charge of:
  • Keeping track of current source location and PHP stack (when available)
  • Handling PHP runtime warnings, notices, fatal errors and exceptions
  • PHP level, user defined error and exception handlers
It's also used by the target system for emitting warnings and notices. This is because the compiler will be called from the runtime (during e.g. include or eval) and needs to be able to redirect its output to the current runtime instance.

When a runtime error occurs, a C++ exception is thrown. For this to work in the JIT, it has to catch a C++ exception above the JIT engine code when thrown by the runtime that gets called from the generated LLVM IR. We had a tough time getting this to work, and ended up making a new sandbox for simplified test cases in which to test this (and in the future, other) bits of code.

In the end it also led to porting the code to the latest (and as of this post, unreleased) version of LLVM: 2.6. It's in pre-release and should be available soon, however. Unfortunately this means the current trunk doesn't support 2.5, but it did fix our issues with exceptions.

I'll be attending the LLVM Dev Meeting in a few weeks, which I'm really looking forward to. If you're going or will be in the area and want to chat about rphp, let me know!

Finally, I've setup a mailing list for rphp development. You can find it at http://mail.roadsend.com/mailman/listinfo/rphp-devel

Monday, July 20, 2009

On Target

Work has progressed lately on the target system. These are a set of classes used to convey information, such as configuration or command line options, to various parts of the backend that do the real work - such as compiling or JITing a single script - and also to the runtime. This paves the way for a very modular, dynamic driver system which will be necessary to use rphp from as many different front ends as are planned. For example, we're planning on supporting at least:

  1. Generating native (static file) binaries and libraries
  2. Just In Time compilation and execution of command line scripts
  3. A web interface to the JIT, which you can fire up and point to the root directory of your PHP application
  4. Library interface to the analyzer, which can be called from e.g. an IDE

Speaking of the analyzer, we've also had some progress there as well. More on that soon.

Tuesday, June 2, 2009

Uptime

I've recently picked up a Mac Mini, my first move into the mac world. Of course one of my first tests was to see if Raven would compile. It didn't, but I was surprised at how little work it took to make it (hats off to cmake, macports, and llvm).

I will now be testing builds on linux x86 and x86_64, and OSX.

In addition to committing a patch that fixes the build on OSX, I've updated the tree to include the latest lexertl release, as well as removing the "driver" library, which was unnecessary and integrated into the IR library.

Thursday, May 14, 2009

Downtime

Hi Folks, just a quick post to let you know that the project has not been abandoned, I've simply spent the last month moving continents and doing some design work in the process. Expect some progress and commits soon.

Saturday, April 4, 2009

March Madness

March has come and gone. and rphp advanced a little more.

LLVM 2.5, released in early March, is now supported. As we ended up in the release notes (thanks Chris!), a few people were interested in trying to compile an early version. To make this easier, we switched to using environment variables so that rphp can find its library files, and updated the docs on the wiki.

A new rphp specific test suite was also added. This is based on the pcc (classic) testsuite. It only contains two tests so far, but does test both JIT interpreted and native compiled binaries. This will help prevent regressions as the code undergoes heavy development.

The biggest item that made it in to code generation was basic user defined functions. These now work up to arity 5 (excluding passing by reference, type hints, defaults... ).

Attention also turned back to the runtime. Hash tables now take unicode keys. We switched back to using the ICU UnicodeString as the main unicode string object in the runtime. The header files went through a large cleanup, and received some much needed documentation (more left to do here). Some basic benchmarks were added to examine the speed of runtime variable creation and destruction.

x86-64 is now supported and tested in the build, runtime, and compiler. Both JIT and native compilation work.

Finally, the ASIO networking library headers and a sample HTTP server were added as the base for the microserver and HTTP frontend.

In other news, the unladen-swallow project was announced. This is good news for rphp, as they plan to work on several projects that Roadsend PHP can benefit from, including a new garbage collector and thread-safe code generation in LLVM.

Saturday, February 28, 2009

Squawk

After a light January, rphp saw quite a few improvements in February.

We've switched the lexer from the beta Boost::Spirit2 to Ben Hanson's Boost::Lexer submission (lexertl). They are actually related, as Spirit2 uses lexertl itself, but as we weren't using the parse functionality of Spirit (we're using lemon), there was no reason to endure the extra compile time overhead (it's heavily template based) and complexity.

Speaking of the lexer, it now properly tokenizes (nearly) all PHP tokens, in any charset. Proper double quote string parsing is yet to be completed, however.

Literal array support was added to the parser and code generator, which means all the basic types (null, int, float, bool, string, unicode string, array) have some support.

Some other miscellaneous advances include implementing var_dump and initial support for if blocks and function declarations in the parser.