Monday, December 7, 2009


Attention has turned back to the compiler frontend. Some recent work includes:
  • Removal of the preprocessor. This was used (pre-parse) to convert double quoted strings into their single quoted equivalents, interpreting escape codes and interpolated variables in the process. This has been removed in favor of doing it after parsing with an AST transform (which is not yet complete).
  • A new parsing context allocates memory for AST nodes using an LLVM BumpPtr allocator (memory pool). This design is similar to the one used in Clang.
  • We've also adopted Clang's use of a common iterator class for children of AST nodes
  • Identifiers in the AST are now allocated from a common StringPool, which saves memory by storing only one unique string in memory that common identifiers can point to.
The next exciting development relates to another open source PHP compiler effort, phc. We've been working with the project's maintainer, Paul Biggar, to see how we can combine efforts towards common goals. The phc project itself has a different philosphy than Roadsend PHP, namely that it uses the PHP runtime rather than a custom one. This lets it take advantage of all of the current PHP extensions and maintains very good compatibility with Zend PHP, at the expense of being limited by Zend engine/runtime design decisions.

However, there's room for cooperation on the frontend. The phc authors have put much effort into parsing and source level analysis of PHP (Paul recently completed his PhD thesis on the subject). We're currently investigating how we can take advantage of this analysis in Roadsend PHP. The goal would be to use the same (or similar) IR structure and reuse the applicable passes that analyze and optimize PHP source, allowing us to generate more efficient low level LLVM IR.

More on these developments soon!