Friday, January 25, 2013

Optimizing SQL Server queries

If you’ve worked with SQL Server databases for any amount of time, it’s practically a given that you’ve run into slow running queries. Sometimes the reason for the slow response time is due to the load on the system, and other times it is because the query is not written to perform as efficiently as possible (I find this reason much more common). Learn how you may be able to improve performance by knowing what to look for in your query designs.

1. Operate on small sets

The fewer rows you are dealing with, the quicker you will find what you want. If the query optimizer sees that the number of rows in the table is sufficiently small, no indexes on that table will be used. The optimizer will realize that it is faster to scan the rows in the table to satisfy the request rather than traverse an index structure to find the rows.

2.Limit the columns returned

When returning data in your applications, the less data that is returned, the faster the information is transmitted over the network — this goes for the amount of rows returned, as well as the number of columns returned. This is why I am against using the SELECT * statement, especially in a production environment. In my experience with database administration and development, I have seen very few times that have warranted using a SELECT * statement. The reason for this is twofold. It doesn’t make sense to bring back columns that you are not going to be using. The second reason (which I feel is more important) is that using SELECT * can break existing code.
Consider the following example.
I have an INSERT statement in my production environment. I use a SELECT * statement as a data source in my INSERT statement. This isn’t a big deal because my SourceTable has the same number of columns in it as the DestinationTable.
SELECT INTO DestinationTable
(Fld1, Fld2, Fld3)
SELECT *
FROM SourceTable.

A business situation arises in which I need to add a field to my SourceTable table.
ALTER TABLE SourceTable
ADD Fld4 INT
Adding this new field will break my INSERT statement, which will cause problems in my production environment.

3.Searching for rows

The manner in which rows are searched for in a database table will always be one of the more vital implementations in your database environment. The SQL Server query optimizer will operate much more efficiently for some WHERE statements as compared to other WHERE statements based upon how the statement is written even if the outcome of the statements is the same.
The following example uses the IN() statement to specify a series of values being searched for. For this example, assume the OrderID column as a NonClustered index.
SELECT * FROM ProductSales
WHERE OrderID IN(4494, 4495, 4496)

This statement is exactly the same as using an OR operator to specify the three values being searched for. Either statement will cause SQL Server not to use the index on the field and to cycle through the rows in the table searching for the values. Since the values used in the example are contiguous, I can use the BETWEEN operator instead. This will allow the query optimizer to effectively use the index.
SELECT * FROM ProductSales
WHERE OrderID BETWEEN 4494 AND 4496
In general, most types of exclusion statements in your WHERE clause will cause SQL Server to not be able to use an index. The following are some additional examples: <>, !, OR, NOT IN, NOT EXISTS
The manner in which LIKE statements are used also makes a difference. If you are able to specify the beginning character(s) of the statement you are searching for, you will have a better chance of your statement using an index. If you specify a wildcard(%) before any type of search string, the optimizer will be unable to use an index.
SELECT * FROM Customers WHERE LastName LIKE '%TR%'

4.Date searches

Date searches are sometimes a little tricky to perform in the database; for instance, I have seen numerous situations where date functions are used for date searches. In the following example, the sales records are being retrieved from the SalesHistorry table for August 15, 2005:
SELECT SaleID
FROM SalesHistory
WHERE
MONTH(SaleDate) = 8 AND
YEAR(SaleDate) = 2005 AND
DAY(SaleDate) = 15
The functions in the WHERE clause will cause SQL Server to perform the function on every row being searched, which means the index will not be used. This is why I discourage the returns values from functions to be used as criteria in queries. The following code shows how you can rewrite the statement so that an index is used, and the results are returned in a much quicker fashion.
SELECT SaleID
FROM SalesHistory
WHERE
SaleDate >= '8/15/2005' AND
SaleDate < ‘8/16/2005'

5.Use JOINs rather than subqueries

If possible (and if it makes sense), I suggest using JOIN statements rather than subqueries to improve performance. When a subquery is used as criteria in a SELECT statement, the values returned from the subquery are distinct. Returning a distinct list of values requires additional processing, which can slow down your queries.

6.Use explicit transactions When data manipulation occurs in the database, the actions are written to the transaction log. If your statements are executing many DML statements, it might be a good idea to place them inside of a transaction for performance purposes. Placing the statements inside of a transaction will prevent all of the statements from being written to the transaction log serially. For example, the following statement takes approximately seven seconds to execute on my laptop:
CREATE InsertTable
(
IDCol INT IDENTITY(1,1),
ColVal INT
)
GO
DECLARE @Counter INT
SET @Counter = 1

WHILE @Counter < 15000
BEGIN
INSERT INTO InsertTable(ColVal)
SELECT DATEPART(ms, GETDATE())

SET @Counter = @Counter + 1
END
If I wrap the INSERT statements inside of a transaction, it takes a little less than two seconds to execute. This is because the statements are inside of a transaction rather than committed to a transaction log until the transaction commits. This reduces the number of writes to the log.
DECLARE @Counter INT
SET @Counter = 1
BEGIN TRAN
WHILE @Counter < 15000
BEGIN
INSERT INTO InsertTable(ColVal)
SELECT DATEPART(ms, GETDATE())

SET @Counter = @Counter + 1
END
COMMIT TRAN
Note: I advise you to use this approach with care. If there are too many statements inside a transaction, it will increase the duration of the transaction, which increases the amount of time locks are held on the objects inside of the transaction.

7.Use UNION ALL instead of UNION

When you use the UNION clause to concatenate the results from two or more SELECT statements, duplicate records are removed. This duplicate removal requires additional computing to accomplish. If you are not concerned that your results may include duplicate records, use the UNION ALL clause, which concatenates the full results from the SELECT statements.

8.Use EXISTS when possible

When you need to check for the presence of certain conditions, it is usually faster to use the EXISTS function over COUNT(*). This is because COUNT(*) has to scan all records returned by the statement, while EXISTS will return a true value as soon as it finds a record that meets the criteria.

9.STATISTICS IO

There are different ways to determine the best way to write your queries. Two of my favorite methods are looking at the number of logical reads produced by the query and looking at graphical execution plans provided by SQL Server Management Studio. For determining the number of logical reads, you can turn the STATISTICS IO option ON. Consider this query:
SET STATISTICS IO ON
SELECT * FROM SalesHistory
The following is returned in the Messages window in SQL Server Management Studio: Table 'SalesHistory'. Scan count 1, logical reads 33, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
There are several bits of data returned by STATISTICS IO, but I am really only concerned with the logical reads portion because it will tell me the number of pages read from the data cache. This is the most helpful to me because it will stay constant when I run the same query, which is important because there are sometimes external factors that might vary the execution time of my queries, such as locking by other queries.
When I’m tuning my queries, my goal is to get the number of logical reads as low as possible. Fewer logical reads typically leads to faster execution times.

No comments: