Parsing and evaluating PHP in Haskell: Part 2

Tags:

Last week I wrote a post about a PHP parser / evaluator I wrote in Haskell. I explained some of the parts on how the parser itself was designed to process PHP code into an abstract syntax tree.

Continuing from where we left off in the previous part, in this post I’ll discuss the actual evaluation part.

The basics

The evaluator is split into functions to evaluate each kind of data type we introduced in the parser.

We have a function to evaluate statements and a function to evaluate expressions. A function to produce a value from an expression, some to convert values between types.. etc.

We also have some more custom types to allow us to carry around state and such more easily. We need to introduce several types of mutable variables too – Haskell variables are immutable, but it does provide the facilities for mutable variables as well.

The PHPEval monad

We introduce a custom monad transformer stack to carry along the functionality we need. For those unfamiliar with Haskell concepts like monad transformers, it’s really a rather simple concept.

Evaluating code requires us to carry along an environment: A list of variables, a list of functions, list of some settings, and things like that. For this, we can use a reader monad, which allows us to easily define some environment to carry along and read from it.

We also need a way to handle errors. For this, we can use the Error monad, which does pretty much what you’d guess – it allows the computations to either produce a result, or an error result.

A monad transformer is something that allows us to combine other monads together, so since we need both the environment and errors, we use some monad transformers to get the features we want.

If you don’t know what a monad is.. well, it’s pretty much just an interface which defines some functions and how they should behave. Different monads have different features like we saw above, but because they all share certain common behaviors, they are all called monads instead of being called something else.

In some other languages, we might be using a class to carry the environment and exception handling to do the error stuff.

Evaluating statements and expressions

The relevant code for this is in the Evaluator.hs file

We define two functions for evaluating statements and expressions: evalStmt and evalExpr

They both function in a quite similar way: They each use pattern matching to determine what kind of statement or expression is in question, and then process it.

Expressions always produce a value. The result of evaluating an expression is another expression. However, in most cases, it will always be a literal value expression – an expression produces a PHP value of some kind.

In order to get the actual PHPValue type out from an expression, the function exprVal is used. It will only work on expressions which are literals, so attempting to use it on other kinds of expressions will result in an error.

Evaluating statements works pretty much like evaluating expressions. The main difference is that statements don’t always produce a value. In the current way the evaluator is implemented, only return statements will produce a value. This is used to signify when to “short circuit” the evaluation of top level or function code. We’ll talk about this more in a bit.

Declaring custom functions

This is probably one of the more interesting bits. Of course in a scenario where you evaluate PHP code, you would want to let the user declare their own functions.

A function declaration is a statement. The parser produces a function value from it, containing a list of argument definitions and the function body, which is in turn another statement (remember, a statement can be a list of other statements as well).

The relevant functions for this are defineFunction and makeFunction, which you can find in the file here.

defineFunction is pretty much a wrapper around makeFunction, used to actually add the function into the list of available functions. You can see mutable values in Haskell are a bit more complicated to use – we need to specifically read and write them using the IORef functions seen used.

The makeFunction is the one with the interesting parts. It first defines a set of helper functions: Two for checking that the number of arguments is correct, and two for applying the function’s parameters into the local variable scope. Since you can define default values for those parameters which are missing, these too are taken into account.

The remainder simply returns a lambda function which has a closure over the function body. It’s surprisingly simple: We check the arguments, apply the arguments into the local scope and just use the evaluator function to evaluate the function’s body. The lambda function takes a list of PHPValue‘s as its argument – these are the function’s parameters.

There are also some other bits related to functions in the evalExpr function, in particular.

Evaluating a function call

Related source: Evaluator.hs line 262

A function call is evaluated by attempting to look up the function being called. Obviously if it isn’t found, we produce an error. The interesting part is what happens when it’s found:

We first create a mutable list to store all local variables. Then, we need to check for any static variables bound for the function, and apply them to the list of locals. We also need to store a reference to the global variables, in case the function uses the global statement.

Each of the function’s paramters are then evaluated. A parameter can be an expression, so we need to evaluate it to get the actual value out. Finally, we actually go ahead and call the function. However, we need to replace the local scope first – We use a feature of the reader monad here, which lets us change the environment locally to some specific function call. We define a new environment, replacing the reference the the local variables with the new list we created, and store the global variable listing and current function’s name into the new environment to keep track.

This approach allows us to easily replace the local scope when going into functions. We can nest it as deeply as we need to, and with this, the local scopes will not get leaky with other variables.

Haskell functions generally just have a single return point. The returned value from a Haskell function is the final value it produces. In PHP however, we can litter return statements all over the place. How can we stop execution if we have a return statement somewhere?

The trick is using the return value as mentioned earlier. When we have a statement that is a list of statements – such as is the case when we have a function body – we evaluate each statement one at a time. Each time a statement is evaluated, we check whether it returned some kind of value or not. If it did, we can be certain that this was a return statement, since it’s the only kind of one that returns a value. If not, we just continue with evaluating the next statement. By doing this, we can just stop the evaluation once we reach a return statement. (https://github.com/jhartikainen/hs-language-php/blob/218f5685976132d7812d53bba414f1bec43d2852/Evaluator.hs#L328)

However, if you think of it, return is not the only way to get a value out of a function. If we throw an exception, this would also kind of return a value. The evaluator does not support throwing exceptions, but if it did, we would need to use some other data type to represent statement results. We could, for example, have a custom data type which we can pattern match against, such as data StatementResult = NoValue | ReturnValue PHPValue | ExceptionValue PHPException. In a case like this, we would simply change the evaluating function to check against this type instead of the Maybe type used now.

Conditionals and looping

Condition evaluating is really quite straightforward. The parser converts if-elses into data structures which contain the condition and the bodies for each block. This is simply processed so that the condition expression is evaluated, and the resulting value is compared for truthiness.

While loops are also quite simple. Rather similar to if clauses, the conditions are also evaluated and if true, the body is evaluated. Repeat until condition no longer true.

For loops on the other hand are a bit more involved. As you know, these have three parts in the for itself, and the body. To evaluate a for loop, we first need to evaluate the initialization part and add any variables it declares into the local scope. This is done quite easily by just evaluating the expressions in the init block. After this, it behaves similar to evaluating a while-loop, with the difference of running the iterator block after the body of the loop is evaluated on each iteration.

Type juggling

As you know, Haskell uses strong typing. You can’t just compare strings with ints and expect it to work. PHP on the other hand is pretty relaxed about it.

In the previous part we talked about how we solve this using a custom data type which supports different types of values. Now that we’re evaluating the AST, we actually need to make our own comparison functions and other things that can also work against all these values.

The file Conversion.hs has a bunch of helper functions defined for this purpose.

Most of the functions use the same approach: Pattern match each parameter, and if necessary, convert to compatible type. The conversion rules and boolean comparison rules follow the PHP behaviors for these, although it is lacking some such as converting strings to numeric values.

Running the evaluator

In order for this to be easily usable to parse a PHP file, the library has the main module which compiles into a binary that can evaluate php files passed in as its parameter.

See: Main.hs

This is a pretty simple module. It simply initializes the php evaluator configuration with some builtin functions declared in the other modules and runs the evaluator on the result of the parsing.

In closing

Writing something like this was a pretty interesting experiment. I did some quick benchmarks of the evaluator against PHP itself, and it was some 8 times slower… so maybe not very useful, but considering I really didn’t think of optimizing it at all and that it was the first time I even wrote something like that, I think it’s not that bad.