Library author: Don’t provide an exploitable interface

SQL injection is a pretty big deal. Its cousin shell injection is also a common issue, demonstrated quite well by a recent post to the PHP reddit. Although some suspect it was a troll, I heard echos from a variety of people who had seen pretty much exactly the same vulnerability in production.

This got me thinking: People writing libraries for doing things like shell commands, SQL, etc., don’t actually have to provide an interface that can be easily mis-used. An interface like this could just as easily be based on some other data type besides a plain string, completely side stepping issues caused by concatenation.

“What on earth are you talking about?” – Let me explain…

Interfacing with a database

When you work with SQL (or shell) from a programming language, you have a bunch of functions you use to do that. Generally the first interface people learn is the straight SQL one which lets you execute commands pretty much the same way as you’d execute them from the shell.

The other type of interface is commonly an object oriented one, which uses some kind of abstraction to hide away the details of writing SQL queries. Perhaps it follows the ActiveRecord pattern such as in Ruby on Rails, or maybe it’s an ORM like Doctrine 2 in PHP. In either case, this interface is usually a bit more complex, but it also pretty much eliminates SQL injection issues. It’s also much harder to mis-use this object oriented interface, as you can’t simply concatenate strings to attach values. Instead you usually call methods or assign the values into properties of some object, which in turn makes sure that the value is safe to put in a query.

So as we can see, the problem is essentially that the simple string based interface allows the user to do dumb things with it very very easily.

Can’t we just teach them to do it right?

The common idea these days with regards to injection vulnerabilities like SQL injection or shell injection like in the Reddit post is that it’s because of a lack of education. In order to combat that, we should improve education of developers to make sure they know about application security.

Now this is all great – app security is an important topic and developers should know about it.

Education is certainly a part of why exploits like this are so common. However, this does not mean we shouldn’t try to find ways to reduce the likelihood of the user accidentally (due to lack of knowledge or not) causing problems.

I don’t think we can simply start telling beginners to use an ORM however. Generally they are more complex to set up, and not understanding what is going on behind the scenes can lead into difficulties later on. Sure, a beginner could probably get around using some simple ActiveRecord implementation, but eventually they would need to learn about SQL, and they would again commit the same mistakes since ActiveRecord is pretty much a black box and the user does not learn good security practices from using it.

A better interface

So is there a way we could actually make a low level SQL API, but without the easy exploitability a pure string based API affords?

In order to fix this we first need to identify what the issue in the existing API is. As you probably know, injection attacks are usually caused by insufficient escaping of values placed into the SQL (or shell) string, usually by using string concatenation.

So, we basically have two approaches we can take: Don’t allow concatenating strings, or give a different API which hides away the actual SQL string generation.

ORMs generally take the second approach. This in turn can lead to overhead and complexity in use. So for the sake of keeping things (relatively) simple, we can try to figure a way to allow building SQL yourself but without using string concatenation in the API.

The ideal solution for such would obviously be some kind of domain specific language embedded straight into the programming language in question. This is not easy or feasible in many languages, so the other way to do this would be to use values of other types than strings to construct the SQL command.

Certain languages are more suitable for providing this kind of API than others. For example, Haskell would allow you to use a relatively terse API even with more typing. Languages like PHP or JavaScript would require a bit more clunky syntax.

An example API

For example, using Haskell, we could create a custom string-like type which represents “safe” SQL – let’s call it SafeSQL – it’s some data that has been escaped so it’s safe to execute. Then, we’ll define some operators that allow us to easily concatenate snippets of this SafeSQL type with snippets of unsafe SQL, or in other words, plain string data.

query $ "SELECT * FROM example WHERE foo = " ?> someVal <? " AND bar = " ?> someOtherVal

That’s a reasonably simple syntax, right? Looks a lot like string concatenation, but by clever use of types with the concatenation operators ?> and <?, we can actually force a type check and conversion. Essentially what the above would do is it lets us plonk the values from someVal and someOtherVal straight into the SQL but so that it would be automatically escaped.

The query function would not actually accept a standard string. It would require a value of the SafeSQL type. Using the two concatenation operators would allow a string-concatenation style interface for generating values of the SafeSQL type.

This brings us another great thing: if the user was to try and use plain old strings, the compiler would complain about it. It would also complain about using the operators wrong, or using wrong kinds of data with the operators. This would make it more guaranteed that the user doesn’t do anything silly, and it would also actually encourage the user to understand why this is done.

The user would probably attempt to concatenate strings or wonder why they can’t use standard strings with this. This could make them seek the information, which in turn would most likely expose them to why this is done (eg. to make sure the SQL gets escaped correctly).

So in other words, not only did we make an API that is more secure and about the same in complexity, but it actually guides the user to learn about security!

In closing

So by spending some time to think about the possible ways an interface can be exploited, we can actually design it in a way that makes it harder to misuse it.

Now obviously there are some challenges with this in most other languages – many don’t let you do the sort of type trickery as Haskell does, making it harder to design an interface like this. It might be doable, but most likely the syntax you would end up with would be much more clunky.

Perhaps this is simply another thing where Haskell’s strengths really come into play? This is the sort of thing where having a strong typing system helps afterall, and having the flexibility to allow a syntax that doesn’t make it super clunky for the user.

What do you think? Please leave your thoughts in the comments, I would be interested in hearing any opinions on this topic.