Optimizing SQL: Removing queries inside loops

I mentioned shortly on twitter that you should avoid using SQL queries inside loops. There was some questions on how, so here’s a short post that hopefully explains some tricks that you can use.

Why?

Before how, let’s quickly look at why.

When you have an SQL query inside a loop it means it’s ran more than once, which is pretty obvious. So by moving the query outside the loop, so it’s executed only once, usually nets a performance boost.

Another reason why queries inside loops are bad is that often the number of iterations depends on how many objects another query returns. Sure, with a small amount of data your loop is probably pretty fast, but when you end up with more data, it will start becoming slower and slower.

How?

Now, let’s check out some simple solutions.

Using JOINs

If you are not familiar with SQL’s joins, I suggest familiarizing yourself with them right away, as they are immensely useful. With joins, you can, for example, select from multiple tables in a single query, so you could for example join related data from another table that you need. This is similar to the approach demonstrated in the next example…

Using GROUP BY

You often use aggregate functions in queries in loops – in other words, you may be COUNTing something, or selecting a MAX something and so on.

A typical counting scenario:

SELECT * FROM products
 
- for each product
    print product details
    SELECT COUNT(id) FROM sold_products WHERE product_id = product's ID
    print how many of this product has been sold

We can run this in one go by using a JOIN and a GROUP BY:

SELECT products.*, COUNT(sold_products.id) AS total_sold
FROM products
LEFT JOIN sold_products ON sold_products.product_id = product.id
GROUP BY sold_products.product_id

By using a query like this, each of the product rows gets an additional column called “total_sold” which has the count from the other table we joined. By grouping the query by the product ID, the database knows how it should perform the counting.

This same approach can be used for other types of queries using aggregate functions as well.

Using subselects

A subselect is essentially a SELECT inside a SELECT. They can be used in the column listing, joins and where clause (depends on your DB really but these are common).

SELECT a_column,
           (SELECT b FROM c) AS other_value
...

In a where, subselect values can be used for comparison, for example…

SELECT ...
FROM ...
WHERE some_column = (SELECT ...)

It should be noted that joins can often achieve the same result with better speed.

Using batch inserts

Not all looped queries are selects. Inserting multiple rows is also a common thing to do.

To speed up inserting a lot of data, you can batch your rows into a single insert. The syntax to do that is as follow:

INSERT INTO some_table (col1, col2) 
VALUES ('row1-col1', 'row1-col2'), 
       ('row2-col1', 'row2-col2'), 
       (... AND so ON ...)

In closing

I hope the examples shown here explained this optimizing technique a bit better. These are still just scratching the surface on what you can do with joins, grouping and such, so I would suggest you consult your database of choice’s manual for a deeper explanation.

I’m sure these are not all the ways you could avoid querying in loops. If you know any more, feel free to share