Saturday, June 9, 2018

Understanding the query execution plan

Whenever you issue a SQL statement in the SQL Server engine, SQL Server first has to determine the best possible way to execute it. In order to carry this out, the Query Optimizer (a system that generates the optimal query execution plan before executing the query) uses several information like the data distribution statistics, index structure, metadata, and other information to analyze several possible execution plans and finally select one that is likely to be the best execution plan most of the time.

Did you know? You can use SQL Server Management Studio to preview and analyze the estimated execution plan for the query that you are going to issue. After writing the SQL in SQL Server Management Studio, click on the estimated execution plan icon (see below) to see the execution plan before actually executing the query.

(Note: Alternatively, you can switch the actual execution plan option "on" before executing the query. If you do this, Management Studio will include the actual execution plan that is being executed along with the result set in the result window.)

Understanding the query execution plan in detail

Each icon in the execution plan graph represents an action item (Operator) in the plan. The execution plan has to be read from right to left, and each action item has a percentage of cost relative to the total execution cost of the query (100%).

In the above execution plan graph, the first icon in the right most part represents a "Clustered Index Scan" operation (reading all primary key index values in the table) in the HumanResources table (that requires 100% of the total query execution cost), and the left most icon in the graph represents a SELECT operation (that requires only 0% of the total query execution cost).

Following are the important icons and their corresponding operators you are going to see frequently in the graphical query execution plans:


(Each icon in the graphical execution plan represents a particular action item in the query. For a complete list of the icons and their corresponding action items, go to http://technet.microsoft.com/en-us/library/ms175913.aspx.)

Note the "Query cost" in the execution plan given above. It has 100% cost relative to the batch. That means, this particular query has 100% cost among all queries in the batch as there is only one query in the batch. If there were multiple queries simultaneously executed in the query window, each query would have its own percentage of cost (less than 100%).

To know more details for each particular action item in the query plan, move the mouse pointer on each item/icon. You will see a window that looks like the following:

This window provides detailed estimated information about a particular query item in the execution plan. The above window shows the estimated detailed information for the clustered index scan and it looks for the row(s) which have/has Gender = 'M' in the Employee table in HumanResources schema in the AdventureWorks database. The window also shows the estimated IO, CPU, number of rows, with the size of each row, and other costs that is used to compare with other possible execution plans to select the optimal plan.

I found an article that can help you further understand and analyze TSQL execution plans in detail. You can take a look at it here: http://www.simple-talk.com/sql/performance/execution-plan-basics/.

What information do we get by viewing the execution plans?

Whenever any of your query performs slowly, you can view the estimated (and, actual if required) execution plan and can identify the item that is taking the most amount of time (in terms of percentage) in the query. When you start reviewing any TSQL for optimization, most of the time, the first thing you would like to do is view the execution plan. You will most likely quickly identify the area in the SQL that is creating the bottleneck in the overall SQL.

Keep watching for the following costly operators in the execution plan of your query. If you find one of these, you are likely to have problems in your TSQL and you need to re-factor the TSQL to improve performance.

Table Scan: Occurs when the corresponding table does not have a clustered index. Most likely, creating a clustered index or defragmenting index will enable you to get rid of it.

Clustered Index Scan: Sometimes considered equivalent to Table Scan. Takes place when a non-clustered index on an eligible column is not available. Most of the time, creating a non-clustered index will enable you to get rid of it.

Hash Join: The most expensive joining methodology. This takes place when the joining columns between two tables are not indexed. Creating indexes on those columns will enable you to get rid of it.

Nested Loops: Most cases, this happens when a non-clustered index does not include (Cover) a column that is used in the SELECT column list. In this case, for each member in the non-clustered index column, the database server has to seek into the clustered index to retrieve the other column value specified in the SELECT list. Creating a covered index will enable you to get rid of it.

RID Lookup: Takes place when you have a non-clustered index but the same table does not have any clustered index. In this case, the database engine has to look up the actual row using the row ID, which is an expensive operation. Creating a clustered index on the corresponding table would enable you to get rid of it.

Wednesday, June 6, 2018

Improve stored procedure performance in SQL Server (Indexing)

Implement computed columns and create an index on these


You might have written application code where you select a result set from the database and do a calculation for each row in the result set to produce the ultimate information to show in the output. For example, you might have a query that retrieves Order information from the database, and in the application, you might have written code to calculate the total Order price by doing arithmetic operations on Product and Sales data. But, why don't you do all this processing in the database?

Take a look at the following figure. You can specify a database column as a "computed column" by specifying a formula. While your TSQL includes the computed column in the select list, the SQL engine will apply the formula to derive the value for this column. So, while executing the query, the database engine will calculate the Order total price and return the result for the computed column.


Sounds good. Using a computed column in this way would allow you to do the entire calculation in the back-end. But sometimes, this might be expensive if the table contains a large number of rows. The situation might get worse if the computed column is specified in the WHERE clause in a SELECT statement. In this case, to match the specified value in the WHERE clause, the database engine has to calculate the computed column's value for each row in the table. This is a very inefficient process because it always requires a table or full clustered index scan.

So, we need to improve performance on computed columns. How? The solution is, you need to create an index on the computed columns. When an index is built on a computed column, SQL Server calculates the result in advance and builds an index over them. Additionally, when the corresponding column values are updated (that the computed column depends on), the index values on the computed column are also updated. So, while executing the query, the database engine does not have to execute the computation formula for every row in the result set. Rather, the pre-calculated values for the computed column are just selected and returned from the index. As a result, creating an index on a computed column gives you excellent performance boost.

Note: If you want to create an index on a computed column, you must make sure that the computed column formula does not contain any "nondeterministic" function (for example, getdate() is a nondeterministic function because each time you call it, it returns a different value).

Create "Indexed Views"

Did you know that you can create indexes on views (with some restrictions)? Well, if you have come this far, let us learn about indexed views!

Why do we use Views?

As we all know, Views are nothing but compiled SELECT statements residing as objects in a database. If you implement your common and expensive TSQLs using Views, it's obvious that you can re-use these across your data access routines. Doing this will enable you to join Views with other tables/views to produce an output result set, and the database engine will merge the view definition with the SQL you provide and will generate an execution plan to execute. Thus, sometimes Views allow you to re-use common complex SELECT queries across your data access routines, and also let the database engine to re-use execution plans for some portion of your TSQLs.

Take my word. Views don't give you any significant performance benefit. In my early SQL days, when I first learned about views, I got exited thinking that Views were something that "remembers" the result for the complex SELECT query it is built upon. But soon, I was disappointed to know that Views are nothing but compiled queries, and Views just can't remember any result set. (Poor me! I can bet many of you got the same wrong idea about Views in your first SQL days.)

But now, I may have a surprise for you! You can do something on a View so that it can truly "remember" the result set for the SELECT query it is composesd of. How? It's not hard; you just have to create indexes on the View.

Well, if you apply indexing on a View, the View becomes an "indexed view". For an indexed View, the database engine processes the SQL and stores the result in the data file just like a clustered table. SQL Server automatically maintains the index when data in the base table changes. So, when you issue a SELECT query on the indexed View, the database engine simply selects values from an index, which obviously performs very fast. Thus, creating indexes on views gives you excellent performance benefits.

Please note that nothing comes free. As creating indexed Views gives you performance boost, when data in the base table changes, the database engine has to update the index also. So, you should consider creating indexed Views when the view has to process too many rows with aggregate functions, and when data and the base table do not change often.

How to create an indexed View?

Create/modify the view specifying the SCHEMABINDING option:
CREATE VIEW dbo.vOrderDetails
WITH SCHEMABINDING
AS
  SELECT...
Create a unique clustered index on the View.
Create a non-clustered index on the View as required.
Wait! Don't get too much exited about indexed Views. You can't always create indexes on Views. Following are the restrictions:

The View has to be created with the SCHEMABINDING option. In this case, the database engine will not allow you to change the underlying table schema.
The View cannot contain nondeterministic functions, DISTINCT clause, or subquery.
The underlying tables in the View must have a clustered index (primary keys).
Try finding the expensive TSQLs in your application that are already implemented using Views or that could be implemented using Views. Try creating indexes on these Views to boost up your data access performance.

Create indexes on User Defined Functions (UDF)

Did you know this? You can create indexes on User Defined Functions too in SQL Server. But, you can't do this in a straightforward way. To create an index on a UDF, you have to create a computed column specifying a UDF as the formula, and then you have to create an index on the computed column field.

Here are the steps to follow:
Create the function (if not exists already) and make sure that the function (that you want to create the index on) is deterministic. Add the SCHEMABINDING option in the function definition and make sure that there is no non-deterministic function/operator (getdate() or distinct etc.) in the function definition.
For example:
CREATE FUNCTION [dbo.ufnGetLineTotal]
(
-- Add the parameters for the function here
@UnitPrice [money],
@UnitPriceDiscount [money],
@OrderQty [smallint]
)
RETURNS money
WITH SCHEMABINDING
AS
BEGIN
    return (((@UnitPrice*((1.0)-@UnitPriceDiscount))*@OrderQty))
END
Add a computed column in your desired table and specify the function with parameters as the value of the computed column.
Hide   Copy Code
CREATE FUNCTION [dbo.ufnGetLineTotal]
(
-- Add the parameters for the function here
@UnitPrice [money],
@UnitPriceDiscount [money],
@OrderQty [smallint]
)
RETURNS money
WITH SCHEMABINDING
AS
BEGIN
    return (((@UnitPrice*((1.0)-@UnitPriceDiscount))*@OrderQty))
END

Specifying UDF as computation formula for the computed column
Create an index on the computed column.
We have already seen that we can create an index on computed columns to retrieve faster results on computed columns. But, what benefit could we achieve by using a UDF in the computed columns and creating an index on those?

Well, doing this would give you a tremendous performance benefit when you include the UDF in a query, especially if you use UDFs in the join conditions between different tables/views. I have seen lots of join queries written using UDFs in the joining conditions. I've always thought UDFs in join conditions are bound to be slow (if the number of results to process is significantly large), and there has to be a way to optimize it. Creating indexes on functions in the computed columns is the solution.

Create indexes on XML columns

Create indexes on XML columns if there is any. XML columns are stored as binary large objects (BLOBs) in SQL Server (SQL Server 2005 and later) which can be queried using XQuery, but querying XML data types can be very time consuming without an index. This is true especially for large XML instances because SQL Server has to shred the binary large object containing the XML at runtime to evaluate the query.

To improve query performance on XML data types, XML columns can be indexed. XML indexes fall in two categories:

Primary XML indexes

When the primary index on an XML column is created, SQL Server shreds the XML content and creates several rows of data that includes information like element and attribute names, the path to the root, node types and values, and so on. So, creating the primary index enables SQL server to support XQuery requests more easily.

Following is the syntax for creating a primary XML index:

CREATE PRIMARY XML INDEX
index_name
ON <object> ( xml_column )  
Secondary XML indexes
Creating primary XML indexes improves XQuery performance because the XML data is shredded already. But, SQL Server still needs to scan through the shredded data to find the desired result. To further improve query performance, secondary XML index should be created on top of primary XML indexes.

Three types of secondary XML indexes are there. These are:

"Path" secondary XML indexes: Useful when using the .exist() methods to determine whether a specific path exists.
"Value" secondary XML indexes: Used when performing value-based queries where the full path is unknown or includes wildcards.
"Property" secondary XML indexes: Used to retrieve property values when the path to the value is known.
Following is the syntax for creating secondary XML indexes:

CREATE XML INDEX
index_name
ON <object> ( xml_column )
USING XML INDEX primary_xml_index_name
FOR { VALUE | PATH | PROPERTY }
Please note that the above guidelines are the basics. But, creating indexes blindly on each and every table on the mentioned columns may not always result in performance optimization, because sometimes, you may find that creating indexes on particular columns in particular tables result in slowing down data insert/update operations in that table (particularly if the table has a low selectivity on a column). Also, if the table is a small one containing a small number of rows (say, <500), creating an index on the table might in turn increase the data retrieval performance (because, for smaller tables, a table scan is faster). So, we should be judicious while determining the columns to create indexes on.

Top 5 Ways to Find Slow Queries (Performance Tuning)

Top 5 Ways to Find Slow Queries (Performance Tuning)
Provide some tips for how developers can find slow SQL queries and do performance tuning in SQL Server.

Find Slow Queries With SQL DMVs


One of the great features of SQL Server is all of the dynamic management views (DMVs) that are built into it. There are dozens of them and they can provide a wealth of information about a wide range of topics.

There are several DMVs that provide data about query stats, execution plans, recent queries and much more. These can be used together to provide some amazing insights.

For example, this query below can be used to find the queries that use the most reads, writes, worker time (CPU), etc.

SELECT TOP 10 SUBSTRING(qt.TEXT, (qs.statement_start_offset/2)+1,
((CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(qt.TEXT)
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2)+1),
qs.execution_count,
qs.total_logical_reads, qs.last_logical_reads,
qs.total_logical_writes, qs.last_logical_writes,
qs.total_worker_time,
qs.last_worker_time,
qs.total_elapsed_time/1000000 total_elapsed_time_in_S,
qs.last_elapsed_time/1000000 last_elapsed_time_in_S,
qs.last_execution_time,
qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
ORDER BY qs.total_logical_reads DESC -- logical reads
-- ORDER BY qs.total_logical_writes DESC -- logical writes
-- ORDER BY qs.total_worker_time DESC -- CPU time
The result of the query will look something like this below. The image below is from a marketing app I made. You can see that one particular query (the top one) takes up all the resources.

By looking at this, I can copy that SQL query and see if there is some way to improve it, add an index, etc.
Find slow SQL queries with DMVs


  • Pros: Always available basic rollup statistics.
  • Cons: Doesn’t tell you what is calling the queries. Can’t visualize when the queries are being called over time.

Query Reporting via APM Solutions


SQL Server Profiler (DEPRECATED!)


The SQL Server Profiler has been around for a very long time. It is very useful if you are trying to see in real time what SQL queries are being executed against your database.

NOTE: Microsoft has announced that SQL Server Profiler is being deprecated!

SQL Profiler captures very detailed events about your interaction with SQL Server.


  • Login connections, disconnections, and failures
  • SELECT, INSERT, UPDATE, and DELETE statements
  • RPC batch status calls
  • Start and end of stored procedures
  • Start and end of statements within a stored procedure
  • Staart and end of a SQL batch
  • Errors written to the SQL Server error log
  • A lock acquired or released on a database object
  • An opened cursor
  • Security permission checks


SQL Server Extended Events


SQL Azure Query Performance Insights



Summary

Improve stored procedure performance in SQL Server (T-SQL Best Practices)

Tips and optimization to improve stored procedure performance.

Use SET NOCOUNT ON, NO LOCK, Avoid use SELECT *


When performing DML operations (i.e. INSERT, DELETE, SELECT, and UPDATE), SQL Server always returns the number of rows affected. In complex queries with a lot of joins, this becomes a huge performance issue. Using SET NOCOUNT ON will improve performance because it will not count the number of rows affected.

Use WITH(NOLOCK) will improve the performance of the select query or use JOIN

Use Database.Schema


It helps SQL Server to find the object.
A fully qualified object name is database.schema.object. When stored procedure is called as schema.object, SQL Server can swiftly find the compiled plan instead of looking for procedure in other schemas when schema is not specified. This may not be a great boost to the performance but should be followed as best practice. All objects inside procedure should also be referred as database.schema.object.

Use JOIN, avoid subqueries or nested queries


Using JOIN is better for the performance than using subqueries or nested queries.

Using IF EXISTS AND SELECT


IF EXISTS is used to check existence of a record, object etc..
And is a handy statement to improve performance of queries where in one only wants to check existence of a record in a table instead of using that record/row in the query.
When doing so use IF EXISTS(SELECT 1 from table) instead of IF EXISTS(Select * from table) as only thing we are interested in is to check the presence of record/s.
So, if the query return 1 then record is present else it’s not. It’s needless to return all column values.

Use set based queries wherever possible.


T-SQL is a set based language and thus loops don’t work well in here. Cursors and while loop are to be used only when a set based query is either expensive or can’t be formulated.

Nullable Columns


Do not use NOT IN when comparing with nullable columns. Use NOT EXISTS instead.
When NOT IN is used in the query (even if the query doesn’t return rows with null values), SQL Server will check each result to see if it is null or not. Using NOT EXISTS will not do the comparison with nulls.

Avoid begin stored procedure’s name with sp_


When the stored procedure is named sp_ or SP_, SQL Server always checks in the system/master database even if the Owner/Schema name is provided. Providing a name without SP_  to a stored procedure avoids this unnecessary check in the system/master database in SQL Server.

Avoid use GROUP BY, ORDER BY, and DISTINCT


Avoid using GROUP BY, ORDER BY, and DISTINCT as much as possible

When using GROUP BY, ORDER BY, or DISTINCT, SQL Server engine creates a work table and puts the data on the work table. After that, it organizes this data in work table as requested by the query, and then it returns the final result.

Use GROUP BY, ORDER BY, or DISTINCT in your query only when absolutely necessary.

Avoid use the COUNT() aggregate in a subquery


Do not use:
SELECT column_list FROM table WHERE 0 < (SELECT count(*) FROM table2 WHERE ..)
Instead, use:
SELECT column_list FROM table WHERE EXISTS (SELECT * FROM table2 WHERE ...)
  • When you use COUNT(), SQL Server does not know that you are doing an existence check. It counts all matching values, either by doing a table scan or by scanning the smallest non-clustered index.
  • When you use EXISTS, SQL Server knows you are doing an existence check. When it finds the first matching value, it returns TRUE and stops looking. The same applies to using COUNT() instead of IN or ANY.

Avoid joining between two types of columns


When joining between two columns of different data types, one of the columns must be converted to the type of the other. The column whose type is lower is the one that is converted.
If you are joining tables with incompatible types, one of them can use an index, but the query optimizer cannot choose an index on the column that it converts. For example:

SELECT column_list FROM small_table, large_table WHERE
smalltable.float_column = large_table.int_column 
In this case, SQL Server converts the integer column to float, because int is lower in the hierarchy than float. It cannot use an index on large_table.int_column, although it can use an index on smalltable.float_column.

Use FULL-TEXT SEARCH


Write full-text queries by using the predicates CONTAINS and FREETEXT and the rowset-valued functions CONTAINSTABLE and FREETEXTTABLE with a SELECT statement.
  • To match words and phrases, use CONTAINS and CONTAINSTABLE.
  • To match the meaning, but not the exact wording, use FREETEXT and FREETEXTTABLE.
Full text searches always outperform LIKE searches.
  • Full text searches will enable you to implement complex search criteria that can't be implemented using a LIKE search, such as searching on a single word or phrase (and optionally, ranking the result set), searching on a word or phrase close to another word or phrase, or searching on synonymous forms of a specific word.
  • Implementing full text search is easier to implement than LIKE search (especially in the case of complex search requirements).
  • For more info on full text search, see http://msdn.microsoft.com/en-us/library/ms142571(SQL.90).aspx

Table Variables and Joins


Temporary tables usually increase a query’s complexity. It’s suggested to avoid the temporary tables.
Do not use table variables in joins. Use temporary tables, CTEs (Common Table Expressions), or derived tables in joins instead.

Even though table variables are very fast and efficient in a lot of situations, the SQL Server engine sees it as a single row. Due to this, they perform horribly when used in joins. CTEs and derived tables perform better with joins compared to table variables.

Try to use UNION to implement an "OR" operation


  • Try not to use "OR" in a query. Instead use "UNION" to combine the result set of two distinguished queries. This will improve query performance.
  • Better use UNION ALL if a distinguished result is not required. UNION ALL is faster than UNION as it does not have to sort the result set to find out the distinguished values.


Use sp_executesql instead of Execute for dynamic queries


The sp_executesql allows for cache plan reuse and protects from SQL Injection. Let’s see an example of the plan reuse.

DBCC FREEPROCCACHE
GO
Declare
@dynamic_sql varchar(max), @salesorderid int
SET @salesorderid=43660
SET @dynamic_sql=' SELECT * FROM Sales.SalesOrderDetail where SalesOrderID='
+ CAST(@salesorderid AS VARCHAR(100)) 
EXECUTE(@dynamic_sql)
The above query executes a dynamic query using EXECUTE command for two values of salesorderid 43660 and 43661. Let’s analyze the cached plans.


As shown in above snapshot, there are two separate plans for the two salesorderids. Let’s now execute the same query with sp_execute SQL and analyze the cached plans.
DECLARE @dynamic_sql NVARCHAR(100)
SET @dynamic_sql = N'SELECT * FROM Sales.SalesOrderDetail where SalesOrderID=@salesorderid'
EXECUTE sp_executesql @dynamic_sql, N'@salesorderid int', @salesorderid = 43661
The above query uses sp_executesql to execute the dynamic query for 2 different values of salesorderid. Let’s analyze the cached plans.

As shown in above snapshot, only one plan is cached and is used for different values of salesorderid.

Unless really required, avoid the use of dynamic SQL because:

  • Dynamic SQL is hard to debug and troubleshoot.
  • If the user provides the input to the dynamic SQL, then there is possibility of SQL injection attacks.

Keep transaction short and crisp


The longer the transaction the longer the locks will be held based on isolation level. This may result in deadlocks and blocking. Open a new query window and execute the below query
use AdventureWorks2014
GO
BEGIN TRANSACTION
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
SELECT * FROM Sales.SalesOrderDetail
Note the session id for the query. Open a new query window and execute the below query. Note down the session id of the query.
begin tran
Update Sales.SalesOrderDetail
SET OrderQty=50 WHERE SalesOrderDetailID=1

The above update query will wait on the select query on shared lock. Let’s analyze the locks for these two sessions.

As shown in above snapshot, session 58 the update query is waiting on shared lock taken by session 57.