Anti-Patterns: Cursors

You probably got the memo: cursors are the evil, performance-sucking spawn of the devil and should be avoided at all costs. You either faithfully adhere to the dogma or blithely ignore it.

SQL Server is designed to work with sets of data. Simply put, a set is a collection of distinct objects which are manipulated as a group without regard to how many objects are in the group. It’s usually cheaper/faster/easier to manipulate the set as a whole than it would be to manipulate each item individually. This is why we pack things in boxes before moving them. Even though the box actually adds some overhead, it’s easier to carry the box of items rather than each individual item. Using a cursor is like carrying each item one at a time instead of carrying a box full of them. We call this RBAR, for row by agonizing row.

So clearly the wisdom of the ancients is good and cursors only exist to tempt us away from rightousness so that our collegues can mock us, right? Well… no. Cursors are just one implementation of RBAR, and it’s the row-by-row processing we want to avoid if we can. Here’s an example of RBAR that doesn’t use a cursor:

This will actually cause more I/O than a cursor, in part because you have to UPDATE a flag column so that the row isn’t returned in subsequent SELECTs. Incredible, you say? Don’t take my word for it, try it yourself. Here’s a script that’ll run in the AdventureWorks2014 database:

The first batch is just some set-up work. I whip up a temporary table because I don’t want to make any permanent changes to the database. We’re not interested in any execution statistics there. The “work” in the examples is just printing the SOUNDEX() of the product name, but imagine something more complex that truly doesn’t have a set-based alternative. Since we’re evaluating the different design patterns and not the work, it doesn’t matter that we’re not actually doing anything.

To get the I/O statistics, fire up Profiler and start a standard trace. You can omit all the events except for SQL:BatchCompleted and filter on the database name or SPID or whatever will work for you to ignore any other activity. Execute the script and you should see something similar to this:

Profiler Trace

Wow! That’s a six-fold difference in I/O between the two methods, plus twice the CPU and three times the duration. I even tried to optimize the non-cursor method by giving my temporary table a good clustered index.

So, cursors are okay then? Well… mostly. You still want to use set operations whenever possible. Each FETCH NEXT results in at least one I/O and you have one FETCH per row, no matter what size the row is. An UPDATE will have one I/O per 8kb data page and you probably have several rows per page.

When you do have to use a cursor, the type of cursor matters. I’ve never had to use anything other than a FAST_FORWARD. This is also called a “firehose cursor” because it’s optimized to deliver as much data as possible as quickly as possible.

Things are almost never black or white. While you should avoid using a cursor whenever possible, sometimes it really is the right tool for the job. A number of routines in the DBAdmin Toolkit use a cursor because it’s the best way to step through a list of database objects and perform individual operations on them. Thoroughly evaluate your problem to determine if a cursor really is necessary, and if it is, go forth boldly and DECLARE CURSOR without fear of rebuke, for you know that the true wisdom of the ancients is “It depends.”

Leave a Reply

Your email address will not be published.