Have you ever wondered why sometimes fixing bugs seems to take much longer than it should? When you finally find the problem, it turns out all you need is one small change. Yet, it took a lot of time to find what’s going on. This happens to me more often than I’d like.
On the other hand, when you’re writing code and you go and test it and it doesn’t work correctly, fixing the bug is really quick. You jump back into your editor, whip up a line of code and the problem is solved.
Why is it that sometimes fixing bugs takes a lot of work even when the problem is simple, and other times, it’s really quick to fix the problem – maybe even when it isn’t so trivial? Is there something we can learn from the bugs that are easy to fix, so that we could spend less time fixing bugs in general?
Let’s talk about that, and see what ways there are that we can use to solve this problem and stop pulling our hair out because of hard to find bugs.
A typical bug fixing process
To identify what takes so long when we fix bugs, let’s first look at the steps involved in fixing one.
- First we need to understand the problem. This means we need to know what’s going wrong and where, and what is supposed to happen instead.
- Next, we need to reproduce the bug. A typical case could be we need to go into the app we’re working on and click around on a few things to see what happens.
- Then, we need to figure out which part of the code is causing the problem. We can start this usually by using debugging tools – such as using Chrome’s debugger to step through code on the page where we’ve reproduced the problem.
- Once we’ve found the piece of code that’s causing the problem, we need to identify the root cause. Depending on the complexity of the problem, the difficulty of this can vary drastically.
- After we’ve identified the root cause, we can finally fix the bug.
- Finally, we need to ensure the bug is actually fixed. This is usually done by trying to reproduce it again.
Hmm, I think we can already start seeing a problem here. Fixing the bug itself is only one step out of six!
But before we jump to conclusions, let’s look at these steps in more detail. We can then see what in each of these steps takes time, and find ways to make them faster.
Step 1: Understanding the problem
The first step in the bug fixing process is understanding the problem. We need to gather enough information so that we know what’s going on, and what’s supposed to be happening instead.
The biggest contributor to this step taking a long time is awful, terrible bug reports.
Users never give good bug reports. It’s an undeniable fact of life.
I might be exaggerating a bit, but I’m sure you’ve heard the words “it does not work” more often than you’d like.
“X doesn’t work, it needs to be fixed yesterday!”
And then you proceed to ask a number of questions in the hope that you can glean some more useful information than “it doesn’t work” out of the reporter.
Of course sometimes when the planets and the stars align correctly, you get a good bug report. You get a clear description of what went wrong, precise steps to reproduce it, and maybe even information on what browser and operating system the user had! That’s when you can jump right into step 2 and be on your way towards fixing the bug.
But in the worst case, you only get a vague idea of what’s happening, which means all the more effort is required in step 2.
Step 2: Reproducing the bug
If you got a good bug report in step 1, this part can be easy. You follow the steps from the bug report and you can reproduce the bug right away. Fantastic! Now you can move to finding the broken code.
Sadly, more often than not, this step is not such smooth sailing.
Because of vague bug reports, this step often involves a lot of guesswork.
Maybe the user was using Firefox, or maybe they were using Chrome. They weren’t sure what they did before clicking this button. I wonder if I should just press buttons randomly and hope for the best?
Sometimes you have to go between steps 1 and 2 repeatedly after trying to reproduce the issue without any results. Hopefully you can get more information from the user and then try again.
It’s clear at this point that in order to speed up steps 1 and 2, we need to gather as much information as possible. The more information we have, the easier it is to understand and reproduce the problem.
Step 3: Finding the problematic piece of code
Once we’ve reproduced the bug, we need to find the specific part of our code that’s causing the problem.
The difficulty of this step varies, mostly depending on two factors:
- The amount of code you have
- Your familiarity with the codebase
- (And to a lesser extent, your knowledge of debugging tools)
The amount of code affects this because the number of possible bugs increases with each line of code. Thankfully being familiar with the codebase can narrow it down significantly.
Finding the problem usually begins by taking an educated guess.
“OK, this is the problem, and this is how I can reproduce it, so I think the problem is somewhere in part Y of the code”
The more familiar you are with the codebase, the better your guess will be. This allows you to narrow down the amount of code you need to look through, possibly by a large amount.
“OK, I was working on function Z recently, and it had code related to this. I better check it first.”
Depending on the type of issue, you may also be able to make use of debugging tools to help you locate the problematic code more easily.
“Right, the bug appears when I click this. I’ll set up a breakpoint into the event handler and go from there.”
At this step of the process, the biggest time sink is finding the exact place where the problem is happening. It could be a misbehaving function, a bad value from the user, or any number of things, and you need to locate the source of the problem before you can continue.
Step 4: Identifying the root cause
This is possibly the most important step of the process, but it’s often completely skipped!
It could get skipped because of perceived time constraints, or simply because a less experienced developer might not know they should do this. Either way, skipping this step usually means your code slowly starts filling up with hacks and kludges.
Notice that I said perceived time constraints. Often you might feel pressured to do a quick fix.
“Just fix it quickly, the customer is waiting. You can do a proper job later.”
So you slap in some piece of code to fix the broken code and skip finding the root cause. Of course it’s very likely you’ll never get around to fixing it properly, because there’s always something else that needs to be done.
But the result of quick fixes on top of quick fixes is the same as fixing leaky plumbing with duct tape. Even though in Finland we call duct tape “Jesus tape” for its miraculous ability to fix anything, at some point the tape fix starts leaking and you need to apply some more duct tape to it. Before long, you just have a huge mess in your hands and you have to tear it all down.
In the end, you need to spend more time fixing the quick fix than it would have taken to do a proper job of it in the first place.
But I digress.
Identifying the root cause means you need to find the real source of the bug. Let me give you an example.
Let’s say some value on your website is being displayed incorrectly. You could fix this by changing the display code, but more often than not, the piece of code exhibiting the symptoms is not the root cause.
If you dig more deeply into the problem, you might find that the data in your database is wrong too. Digging further, you’d find that the code that saves the value is broken. This is the root cause of the problem. The original piece of code you found was simply showing the symptoms of a problem elsewhere.
If you had simply fixed the symptom, the real problem would remain. It would continue to cause problems in the future, all the while you keep fixing more symptoms.
Unlike finding the symptom, this step does not require much guesswork. You have a starting point, from where you can follow the trail back to the root cause, so you don’t need to guess.
Despite that, this step can be very time consuming, because you often need to dig into the code on several levels. How much time largely depends on where the root cause is in comparison to the symptom – sometimes they might even be the same, but as shown in the example, it could be several levels down.
Step 5: Fixing the bug
Finally we can fix the bug. We’ve reproduced the bug, found where the symptoms occur and discovered the root cause.
After doing all the work up to this point, this step is often fairly trivial. We have the information on what went wrong, what should have happened and what was the symptoms. Bugs don’t often need large changes to fix, so the implementation part tends to be quick.
Step 6: Ensuring the bug is fixed
And as the last step of the process, we need to make sure the bug is well and truly buried.
This can be done simply by repeating the steps you had earlier to reproduce the problem.
Occasionally the bug still reproduces. In that case, you typically need to go back to step 4 or 5 and continue from there.
What are the problems in this process?
Now that we’ve looked at each step in the bug fixing process, we can identify these key pain points:
- Lack of information about the problem: Bug reports are often missing vital information, which makes it hard to understand the problem and reproduce it. The less information we have, the more time it takes.
- Guesswork: We often have to take a number of guesses. We don’t have all the information we need for the problem and we don’t have a way of pinpointing the problem in our code. Therefore we need to guess, which is by its very nature error-prone
- Pressure of time constraints: We’re often required to fix bugs quickly, because users and customers, and therefore the business, depends on the software working. This in turn can make it tempting to not analyze problems properly, which leads to even more bugs and problems later down the line.
All of these contribute towards both slowing down bug fixing and making it a tedious process. The part where we write code to fix the bug is rarely the biggest time sink!
However, despite these, sometimes we can fix bugs very quickly. This usually happens when we’re adding new features and working on brand new code.
Why is this?
If we think of the typical situation when working on brand new code, we’re often very familiar with what we just wrote. The person who finds the bug in a situation like this is often the same who wrote the code.
This almost completely eliminates the guesswork!
- We know the code we just wrote
- We often test our code in small gradual steps, so there’s less code which could be causing the error
As a result, fixing the bug is a breeze. More often than not, you’ll simply alt-tab back to the editor, immediately spot the problem and have a fix in place faster than you can say “It doesn’t work”.
Gathering better information
We can safely say that the lack of information and the resulting guesswork is the biggest contributor that makes fixing bugs slow.
What can we do to improve the situation?
Let’s first look at what we can do to get more information from the user and into the bug report. The first step towards that is asking the user some specific questions. This helps guide the user to give us the information we need to solve the problem more efficiently.
Here are some questions you can use. I don’t remember where I saw these the first time, but I tend to use this format myself and it works quite well.
- Short description of the problem
- What happened?
- What should have happened instead?
- How to reproduce this issue?
The first one is not absolutely necessary, but it’s useful if you use tools like JIRA as you need a name for the issue. The second is fairly obvious, and the third is very useful for understanding what the user expected to happen instead. Although you can figure this out yourself, it’s good to know up front – especially as sometimes it might not be a technical problem, and simply a result of confusion.
The fourth point is perhaps the most important, but users don’t always know how to fill this. If possible, it’s best to give them a small sample of how you want it filled, such as “1. I was on page X 2. I clicked on button Y 3. I typed in value Z”.
Additional information that is helpful especially with web apps is the user’s browser and operating system. Depending on the user, they might not always know that, so being able to gather this information automatically can be valuable. For that, you can consider integrating services such as Usersnap, which can help collect more data into bug reports.
These steps are a good starting point in improving the process. However they do not really solve everything. For example, users can and will keep sending confusing bug reports regardless of how much you try.
I would know. I repeatedly tell users that they need to include the steps to reproduce the problem, or we can’t do anything, and I still keep getting the dreaded “it doesn’t work” bug report.
“Jani, X is broken fix it”
So what else is there we can do that doesn’t depend on the finicky users so much?
Logging runtime information
Logging is often overlooked as a tool. Perhaps it’s the lack of good libraries to do so, or the lack of good tools to consume log output – because let’s face it, who wants to go through a huge log file by hand looking for that specific event? But done correctly and with good tools, logs can provide valuable information.
Most developers only use logging as a temporary measure. It’s really easy to slap a bunch of `console.log`s into our code just to see what’s happening afterall – I do it a lot.
But when I say logging, I’m not just talking about debug logs or error logs. I’m talking about logging in general – information about what’s happening in the code, what inputs are being sent, etc.
Logging in a more systematic way like that requires some work. We need to make a point of including logging in our code, and we need to make sure we log information that can be useful. So how does this help speed up bug fixing?
- Sometimes issues can occur when there is no user to tell us about it. Automatic processes are essentially opaque without logging, and you’ll never know what went wrong.
- If you have good logs, they can help you locate the problem more quickly. Depending on what and how you log, they can point you in the general direction of the problem, or even give more specific information such as showing you what values are wrong.
Especially in case of automated processes, logging is important. There is often no way of following what happens in such processes unless you have logs. The logs should be verbose enough to give us a reasonable idea of what’s happening.
Using logs like this helps reduce the amount of guesswork involved by providing us with more useful information.
As much as I like making fun of Java for being terrible, logging is one thing they do well. It has many libraries and established practices for logging. If you’ve ever had a problem with a Java application, you might’ve looked at the logs or even increased the log verbosity. Many of them output a lot of logs.
What is useful to log? It’s not absolutely vital for every application to have logs, or to log all the things, but here are some examples:
- Log individual requests. For example, if you’ve used Apache or Nginx, they both have a log file which gets one line for each request performed to the server, along with information about it. This info alone is not necessarily useful, but if you log multiple pieces of information for each request, this can be a good one to log first.
- Log steps in processes or transactions. Let’s imagine your users can upload images, but you resize them. You could log each step in the process, such as “Image uploaded”, “Resizing image” and so on.
- Log interactions with external tools/services. Access to databases, APIs, or in our image resizing example, if you call a tool such as ImageMagick to perform some action.
Depending on how useful you find the logs, it can also be a good idea to implement a way to enable/disable certain types of logs, or enable/disable logs on a per-user basis.
Detecting bugs sooner
All these measures we’ve looked at so far don’t address our key point. Both bug reports and logs, although helpful, only give us more information after the fact.
Remember how we found the important difference between bugs that are fixed quickly, and bugs that take a long time to fix is how early we detect the problem.
Whenever we are actively working on a piece of code and we find bugs during the development process, they are much faster to fix. We have a fresh memory of the code and we have all the information in our heads already, so we don’t need to do the kind of archaeology otherwise necessary.
None of the things so far help with this, even though this is one of the biggest contributors to how much time it takes.
How could we detect more bugs sooner, or even during our development process?
Obviously we could hire two dozen QA specialists to go through changes with a microscope. This is not very practical for most teams however, and even with a QA process, it can take some time for issues to be discovered, at which point we’ve already moved on to something else, so it wouldn’t help as much.
Something that can help us discover bugs during the development process is test automation.
- We can run automated tests on our development machine
- Unit test suites can be ran quickly and automatically even after small code changes
- Well designed tests give us precise information about the problem
Test automation is a much more realistic goal. It doesn’t require big up-front investments: You can start using it gradually and with each step, you gain more and more benefits from doing so.
How does unit testing speed up bug fixing?
Whenever we change code, we risk introducing bugs. But if our newly added code has bugs, we’ll often spot and fix them easily.
Newly added or changed code is easy to fix because we have it in our heads – we just spent time working on it afterall! This means any issues in it can be easily found and fixed because we don’t need to start digging through code to find it. We can still remember where things are.
So, we can agree that sooner we find a bug, the easier and faster it’s to fix.
But how does unit testing help us do that?
First of all, a unit test usually has the following parts:
- A descriptive name of what is being tested, such as “it should validate user’s name”
- A short code snippet to initialize and run the test
- Verification. Each test has an assertion to verify the code snippet produced the correct result.
Then, if such a test fails, we get the following pieces of information:
- The name of the failed test
- Specific information on how the assertion failed, often giving us a specific comparison such as “the user’s name did not match the expected format”
- A stack trace, which gives us the specific line of code which caused the failure
Let’s compare these to what we want from a bug report:
- Description of what happened
- Description of what should have happened
- List of steps to reproduce the problem
Can you see the comparison with what we get from a test? We get exactly the information we want from a good bug report! And not only that, a test gives us this information during the development process!
A unit test gives you immediate feedback with the exact information you need, when you’re still working on the code, so everything is fresh in your mind.
All of this contributes towards ensuring we have enough information to fix a bug quickly. We know what happened, what should have happened, we can reproduce the bug by running the test again… and we can even verify our bugfix code works correctly by running the test again.
As an additional benefit, tests catch more bugs elsewhere in the code too. Quite often as a result of our changes, we’ll accidentally cause a bug somewhere else. These easily go unnoticed and end up being hard to fix, but if you have unit tests, no problem – once you write a test, you can keep it around and it keeps catching bugs and being useful.
With fixing bugs, the biggest time sink is not writing the fix. It’s all the work we need to do before we can even start writing any code to fix the bug. Most of it is caused by lack of information – bad bug reports, large amounts of code, even poor code can contribute to it.
We can improve the situation by trying to get more information in bug reports, but the best way to fix bugs faster is to find bugs earlier.
Bugs caught during development are the fastest to fix because we’re actively working on the code in question, and we have the information we need in our heads. This means we don’t need to start digging through bug reports or code to figure out what’s happening.
The best way to catch more issues during testing is unit testing. They solve all three problems:
- We don’t have enough information. Solved – Failing test tells us what went wrong and what should’ve happened, pointing us to the exact function. The test itself acts as the reproduction for the bug.
- Unless we’re very familiar with the whole codebase, we need to do some work to find the source of the problem. Solved – A failing test tells us which function failed so there is no guesswork involved. Even better, tests often fail during development, so all the code is still fresh in our memory.
- Due to time constraints, finding the root cause is sometimes skipped, leading to bad code. Solved – A test will fail for the root cause, not the symptom, giving us the exact place we need to fix.
You also don’t need a lot of effort with tests. You can start adding tests one by one, for example when you fix bugs. With every test you add, you’ll get more and more benefits.
The main issue with tests is that it can be difficult to get started. However, once you learn the concepts, they won’t go out of fashion – unlike the popular libraries of the moment, testing has been around for a long time, and the exact same principles can be used no matter what library or language you use.