Optimizing SQL: Removing queries inside loops

October 7, 2010 – 7:36 pm Tags: , ,

I mentioned shortly on twitter that you should avoid using SQL queries inside loops. There was some questions on how, so here’s a short post that hopefully explains some tricks that you can use.

Why?

Before how, let’s quickly look at why.

When you have an SQL query inside a loop it means it’s ran more than once, which is pretty obvious. So by moving the query outside the loop, so it’s executed only once, usually nets a performance boost.

Another reason why queries inside loops are bad is that often the number of iterations depends on how many objects another query returns. Sure, with a small amount of data your loop is probably pretty fast, but when you end up with more data, it will start becoming slower and slower.

How?

Now, let’s check out some simple solutions.

Using JOINs

If you are not familiar with SQL’s joins, I suggest familiarizing yourself with them right away, as they are immensely useful. With joins, you can, for example, select from multiple tables in a single query, so you could for example join related data from another table that you need. This is similar to the approach demonstrated in the next example…

Using GROUP BY

You often use aggregate functions in queries in loops – in other words, you may be COUNTing something, or selecting a MAX something and so on.

A typical counting scenario:

SELECT * FROM products
 
- for each product
    print product details
    SELECT COUNT(id) FROM sold_products WHERE product_id = product's ID
    print how many of this product has been sold

We can run this in one go by using a JOIN and a GROUP BY:

SELECT products.*, COUNT(sold_products.id) AS total_sold
FROM products
LEFT JOIN sold_products ON sold_products.product_id = product.id
GROUP BY sold_products.product_id

By using a query like this, each of the product rows gets an additional column called “total_sold” which has the count from the other table we joined. By grouping the query by the product ID, the database knows how it should perform the counting.

This same approach can be used for other types of queries using aggregate functions as well.

Using subselects

A subselect is essentially a SELECT inside a SELECT. They can be used in the column listing, joins and where clause (depends on your DB really but these are common).

SELECT a_column,
           (SELECT b FROM c) AS other_value
...

In a where, subselect values can be used for comparison, for example…

SELECT ...
FROM ...
WHERE some_column = (SELECT ...)

It should be noted that joins can often achieve the same result with better speed.

Using batch inserts

Not all looped queries are selects. Inserting multiple rows is also a common thing to do.

To speed up inserting a lot of data, you can batch your rows into a single insert. The syntax to do that is as follow:

INSERT INTO some_table (col1, col2) 
VALUES ('row1-col1', 'row1-col2'), 
       ('row2-col1', 'row2-col2'), 
       (... AND so ON ...)

In closing

I hope the examples shown here explained this optimizing technique a bit better. These are still just scratching the surface on what you can do with joins, grouping and such, so I would suggest you consult your database of choice’s manual for a deeper explanation.

I’m sure these are not all the ways you could avoid querying in loops. If you know any more, feel free to share

Share this:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • DZone
  • email
  • LinkedIn
  • Pownce
  • Reddit
  • StumbleUpon
  • Technorati

RSS feed Subscribe to my RSS feed

  1. 8 Responses to “Optimizing SQL: Removing queries inside loops”

  2. This is not about making queries in loop. But I’ve seen this in loops as well!

    One of the mistakes beginners often make:

    $sql = "select * from customer";
    $result = mysql_query($sql);
    $result_set = mysql_fetch_array($result)
    $total = count($result_set);

    If you only need to know the count, use the count() function of the database.

    $sql = "select count(*) from customer";
    $result = mysql_query($sql);
    $result_set = mysql_fetch_array($result)
    $total = $result_set[0];

    The first example will not work on huge databases. PHP will run out of memory. Delegate tasks to database as much as possible. Having good knowledge of SQL helps a lot. When you are developing an application, test your code with huge databases. Use data generators.

    The code is for illustration purpose only. The actual code was slightly different. In essence, it was doing the same as in the above example.

    By Sudheer on Oct 7, 2010

  3. For example if you need to get specific products based on their id’s, beginners call them in a loop, but there is a better solutions.

    First example: Get the id’s you need with one query, then make another query and in then define the where statement like this:
    WHERE id IN (list, of, ids) (in php implode is nice function for this)

    Second example: subselect in the where in statement:
    SELECT * FROM products WHERE id IN (SELECT id FROM othertable)

    By Coder on Oct 8, 2010

  4. Another good habit is using specific columns in selects instead stars.

    By Rick on Oct 8, 2010

  5. For multiple inserts, using prepared statements can make a huge difference. In PHP, you can use the PDO or mysqli db extensions to use them.

    By Oscar on Oct 8, 2010

  6. Using batch inserts:

    if the sql statement is too long, mysql_query($sql) will get failed.

    By huarong on Oct 11, 2010

  7. Use ‘select count(*)’ is better than ‘select count(column name)’,
    select ‘column name’ is better than ‘select *’

    By Anonymous on Oct 12, 2010

  8. Thanks for sharing your tips all.

    huarong, you’re correct. Not long ago I ran into that myself… Solved it by just splitting the batch insert into smaller batches :)

    Anonymous, the speed of COUNT(*) vs COUNT(column) depends on your database. You should check your DB’s manual to find which is best approach.

    By Jani Hartikainen on Oct 12, 2010

  9. Sudheer, there’s probably a mistake in the code you submitted. The first snippet counts the result set columns, the second returns the number of rows. I think you meant

    $sql = "select * from customer";
    $result = mysql_query($sql);
    $total = mysql_num_rows($result);

    By Francesco on Oct 20, 2010

Post a Comment

You can use some HTML (a, em, strong, etc.). If you want to post code, use <pre lang="PHP">code here</pre> (you can replace PHP with the language you are posting)