Wednesday, June 6, 2018

Improve stored procedure performance in SQL Server (T-SQL Best Practices)

Tips and optimization to improve stored procedure performance.

Use SET NOCOUNT ON, NO LOCK, Avoid use SELECT *


When performing DML operations (i.e. INSERT, DELETE, SELECT, and UPDATE), SQL Server always returns the number of rows affected. In complex queries with a lot of joins, this becomes a huge performance issue. Using SET NOCOUNT ON will improve performance because it will not count the number of rows affected.

Use WITH(NOLOCK) will improve the performance of the select query or use JOIN

Use Database.Schema


It helps SQL Server to find the object.
A fully qualified object name is database.schema.object. When stored procedure is called as schema.object, SQL Server can swiftly find the compiled plan instead of looking for procedure in other schemas when schema is not specified. This may not be a great boost to the performance but should be followed as best practice. All objects inside procedure should also be referred as database.schema.object.

Use JOIN, avoid subqueries or nested queries


Using JOIN is better for the performance than using subqueries or nested queries.

Using IF EXISTS AND SELECT


IF EXISTS is used to check existence of a record, object etc..
And is a handy statement to improve performance of queries where in one only wants to check existence of a record in a table instead of using that record/row in the query.
When doing so use IF EXISTS(SELECT 1 from table) instead of IF EXISTS(Select * from table) as only thing we are interested in is to check the presence of record/s.
So, if the query return 1 then record is present else it’s not. It’s needless to return all column values.

Use set based queries wherever possible.


T-SQL is a set based language and thus loops don’t work well in here. Cursors and while loop are to be used only when a set based query is either expensive or can’t be formulated.

Nullable Columns


Do not use NOT IN when comparing with nullable columns. Use NOT EXISTS instead.
When NOT IN is used in the query (even if the query doesn’t return rows with null values), SQL Server will check each result to see if it is null or not. Using NOT EXISTS will not do the comparison with nulls.

Avoid begin stored procedure’s name with sp_


When the stored procedure is named sp_ or SP_, SQL Server always checks in the system/master database even if the Owner/Schema name is provided. Providing a name without SP_  to a stored procedure avoids this unnecessary check in the system/master database in SQL Server.

Avoid use GROUP BY, ORDER BY, and DISTINCT


Avoid using GROUP BY, ORDER BY, and DISTINCT as much as possible

When using GROUP BY, ORDER BY, or DISTINCT, SQL Server engine creates a work table and puts the data on the work table. After that, it organizes this data in work table as requested by the query, and then it returns the final result.

Use GROUP BY, ORDER BY, or DISTINCT in your query only when absolutely necessary.

Avoid use the COUNT() aggregate in a subquery


Do not use:
SELECT column_list FROM table WHERE 0 < (SELECT count(*) FROM table2 WHERE ..)
Instead, use:
SELECT column_list FROM table WHERE EXISTS (SELECT * FROM table2 WHERE ...)
  • When you use COUNT(), SQL Server does not know that you are doing an existence check. It counts all matching values, either by doing a table scan or by scanning the smallest non-clustered index.
  • When you use EXISTS, SQL Server knows you are doing an existence check. When it finds the first matching value, it returns TRUE and stops looking. The same applies to using COUNT() instead of IN or ANY.

Avoid joining between two types of columns


When joining between two columns of different data types, one of the columns must be converted to the type of the other. The column whose type is lower is the one that is converted.
If you are joining tables with incompatible types, one of them can use an index, but the query optimizer cannot choose an index on the column that it converts. For example:

SELECT column_list FROM small_table, large_table WHERE
smalltable.float_column = large_table.int_column 
In this case, SQL Server converts the integer column to float, because int is lower in the hierarchy than float. It cannot use an index on large_table.int_column, although it can use an index on smalltable.float_column.

Use FULL-TEXT SEARCH


Write full-text queries by using the predicates CONTAINS and FREETEXT and the rowset-valued functions CONTAINSTABLE and FREETEXTTABLE with a SELECT statement.
  • To match words and phrases, use CONTAINS and CONTAINSTABLE.
  • To match the meaning, but not the exact wording, use FREETEXT and FREETEXTTABLE.
Full text searches always outperform LIKE searches.
  • Full text searches will enable you to implement complex search criteria that can't be implemented using a LIKE search, such as searching on a single word or phrase (and optionally, ranking the result set), searching on a word or phrase close to another word or phrase, or searching on synonymous forms of a specific word.
  • Implementing full text search is easier to implement than LIKE search (especially in the case of complex search requirements).
  • For more info on full text search, see http://msdn.microsoft.com/en-us/library/ms142571(SQL.90).aspx

Table Variables and Joins


Temporary tables usually increase a query’s complexity. It’s suggested to avoid the temporary tables.
Do not use table variables in joins. Use temporary tables, CTEs (Common Table Expressions), or derived tables in joins instead.

Even though table variables are very fast and efficient in a lot of situations, the SQL Server engine sees it as a single row. Due to this, they perform horribly when used in joins. CTEs and derived tables perform better with joins compared to table variables.

Try to use UNION to implement an "OR" operation


  • Try not to use "OR" in a query. Instead use "UNION" to combine the result set of two distinguished queries. This will improve query performance.
  • Better use UNION ALL if a distinguished result is not required. UNION ALL is faster than UNION as it does not have to sort the result set to find out the distinguished values.


Use sp_executesql instead of Execute for dynamic queries


The sp_executesql allows for cache plan reuse and protects from SQL Injection. Let’s see an example of the plan reuse.

DBCC FREEPROCCACHE
GO
Declare
@dynamic_sql varchar(max), @salesorderid int
SET @salesorderid=43660
SET @dynamic_sql=' SELECT * FROM Sales.SalesOrderDetail where SalesOrderID='
+ CAST(@salesorderid AS VARCHAR(100)) 
EXECUTE(@dynamic_sql)
The above query executes a dynamic query using EXECUTE command for two values of salesorderid 43660 and 43661. Let’s analyze the cached plans.


As shown in above snapshot, there are two separate plans for the two salesorderids. Let’s now execute the same query with sp_execute SQL and analyze the cached plans.
DECLARE @dynamic_sql NVARCHAR(100)
SET @dynamic_sql = N'SELECT * FROM Sales.SalesOrderDetail where SalesOrderID=@salesorderid'
EXECUTE sp_executesql @dynamic_sql, N'@salesorderid int', @salesorderid = 43661
The above query uses sp_executesql to execute the dynamic query for 2 different values of salesorderid. Let’s analyze the cached plans.

As shown in above snapshot, only one plan is cached and is used for different values of salesorderid.

Unless really required, avoid the use of dynamic SQL because:

  • Dynamic SQL is hard to debug and troubleshoot.
  • If the user provides the input to the dynamic SQL, then there is possibility of SQL injection attacks.

Keep transaction short and crisp


The longer the transaction the longer the locks will be held based on isolation level. This may result in deadlocks and blocking. Open a new query window and execute the below query
use AdventureWorks2014
GO
BEGIN TRANSACTION
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
SELECT * FROM Sales.SalesOrderDetail
Note the session id for the query. Open a new query window and execute the below query. Note down the session id of the query.
begin tran
Update Sales.SalesOrderDetail
SET OrderQty=50 WHERE SalesOrderDetailID=1

The above update query will wait on the select query on shared lock. Let’s analyze the locks for these two sessions.

As shown in above snapshot, session 58 the update query is waiting on shared lock taken by session 57.

0 comments: