What makes an abstraction good, and why should I care?

Tags:

You probably know what abstraction means – making a complex process simpler – but do you know what makes an abstraction good, and why it’s important?

If you are writing code for a project that lives a bit longer, or has multiple developers working on it, having a good abstraction matters. Lack of a good and consistent abstraction will reduce the cohesion of your classes – it’ll make them more difficult to use and understand.

Let’s take a look at what makes or breaks an abstraction!

Role of abstractions

Why you want abstraction in your code is because it makes complex tasks seem simpler.

If you have ever written a linked list or other data-structure in C with pointers, and later on used a higher-level language like Python to do something similar, you’ve probably noticed how much simpler that is in Python.

The reason it’s simpler in a high-level language is that high-level languages abstract away pointers – usually behind so-called references, which can be copied around like any value, and you won’t have to worry (much) about them.

Similarily, in your own code you would want to put common operations behind a simple interface – you want to abstract the complex parts so you can concentrate on more important issues.

A good example in many applications is database access. When working with it, you rarely want to deal directly in SQL – it’s easier to just say getUsersByFirstName('Pete'), than writing the appropriate SQL:

SELECT id, first_name, last_name, email, phone, address, postal_code, city, country, create_date, last_login_time FROM users WHERE first_name == 'Pete'

Oh, and it just happens there’s a mistake in that SQL clause. Higher level abstractions also work to prevent errors.

What does “level” mean with regard to abstraction?

In the previous example of the C language with pointers and Python with references demonstrated two levels of abstraction.

  1. In Python, when working with classes, you have references. They work similar to pointers but they are simpler to use.
  2. In C, you use pointers. They are more complex, but you can do various tricks with them that range from very useful to useless

In Python, references are used to hide away pointers – it doesn’t give you access to raw memory addresses. C uses pointers to make addresing memory a bit easier than doing it in assembly. Python’s level of abstraction for pointers is higher than C’s, which has a level closer to the hardware.

What makes an abstraction good?

An abstraction is defined by the public interface of a class – the methods other code can use to work with the class. Various factors affect how good the abstraction is, such as the level and consistency, and additional things like encapsulation.

The main point is that the abstraction should provide a consistent set of methods. This goes hand in hand with encapsulation, as if you have poor encapsulation, the abstraction is not good.

Poor encapsulation leads to a leaky abstraction. This means that you can take a peek at what’s going on behind the interface, or worse, touching the privates of the class outside the accepted interface. This is not a good thing, as it allows other code to use the class in inappropriate ways.

Consistency in abstraction means that the abstraction stays at the same level. Consider the following example:

class UserRepository {
  public User getUser(int id)
  public void addUser(User u)
  public SQLRow getUserRow(int id)
  ...
}

What’s wrong with it? Yep, we are returning an SQLRow from one of the methods, which is at a lower level of abstraction than the rest. We should not expose details of how things work on different levels.

Another good example of poor level of abstraction is generic lists in languages which support it, such as Java. It’s been a while since I’ve programmed in Java, so pardon any mistakes in the following snippet – feel free to point them out though:

public class UserRepository {
  private List<User> _users
 
  public User getUser(int id)
  public void addUser(User u)
  public List<User> getUsers()
}

At first, there doesn’t seem to be anything wrong with this, unless you’re familiar with this type of issue. The problem here is that getUsers is returning a specific List type, exposing the internal data structure used. In the worst case, it might return the actual list used internally by the class!

In the worst case scenario if the code returns the list used inside, this allows anyone to modify the internal representation of data. For example you might want to prevent having users with same ID twice in the list in the addUser method, but if you return a reference to the internal list, nothing stops someone from adding it directly to that.

In a case such as this, the better alternative is to conver the internal list into an array of the correct type, resulting in an interface like this:

public class UserRepository {
  private List<User> _users
 
  public User getUser(int id)
  public void addUser(User u)
  public User[] getUsers()
}

In addition to making the abstraction consistent, this prevents the mentioned issues of people being able to modify the internal data without using the defined interface to it. Returning copies of internal structures like this is a good defensive programming practice.

How the level of abstraction (high, low, etc.) affects things is mainly related to how different your interface is from the implementation used behind the scenes. If it maps 1 to 1 to the code used inside the class, it may be a bit pointless to have this class at all.

What about cohesion?

As a side effect, poor cohesion leads to poor abstraction, and poor abstraction often leads to poor cohesion.

Cohesion – how well a class is focused on a single task – breaks easily with poor abstraction because poor abstraction often means there are methods or even public properties that are unrelated to the task of the class. In going the other way around, if the cohesion is poor – class has unrelated methods, such as utilities to do something else – the abstraction naturally goes away with it.

In closing

Abstractions are an important part of software development. It can be difficult to get them always right, especially if you don’t have a lot of experience, but as usual spending some time thinking pays off. With good abstraction, your code will be easier to understand and use, since your classes are more focused and provide a more consistent interface.

The book Code Complete
has good information on this topic amongst others.

Also, remember that the programming error tracking challenge! You should check it out if you haven’t yet!