Implement computed columns and create an index on these
You might have written application code where you select a result set from the database and do a calculation for each row in the result set to produce the ultimate information to show in the output. For example, you might have a query that retrieves Order information from the database, and in the application, you might have written code to calculate the total Order price by doing arithmetic operations on Product and Sales data. But, why don't you do all this processing in the database?
Take a look at the following figure. You can specify a database column as a "computed column" by specifying a formula. While your TSQL includes the computed column in the select list, the SQL engine will apply the formula to derive the value for this column. So, while executing the query, the database engine will calculate the Order total price and return the result for the computed column.
Sounds good. Using a computed column in this way would allow you to do the entire calculation in the back-end. But sometimes, this might be expensive if the table contains a large number of rows. The situation might get worse if the computed column is specified in the WHERE clause in a SELECT statement. In this case, to match the specified value in the WHERE clause, the database engine has to calculate the computed column's value for each row in the table. This is a very inefficient process because it always requires a table or full clustered index scan.
So, we need to improve performance on computed columns. How? The solution is, you need to create an index on the computed columns. When an index is built on a computed column, SQL Server calculates the result in advance and builds an index over them. Additionally, when the corresponding column values are updated (that the computed column depends on), the index values on the computed column are also updated. So, while executing the query, the database engine does not have to execute the computation formula for every row in the result set. Rather, the pre-calculated values for the computed column are just selected and returned from the index. As a result, creating an index on a computed column gives you excellent performance boost.
Note: If you want to create an index on a computed column, you must make sure that the computed column formula does not contain any "nondeterministic" function (for example, getdate() is a nondeterministic function because each time you call it, it returns a different value).
Create "Indexed Views"
Did you know that you can create indexes on views (with some restrictions)? Well, if you have come this far, let us learn about indexed views!
Why do we use Views?
As we all know, Views are nothing but compiled SELECT statements residing as objects in a database. If you implement your common and expensive TSQLs using Views, it's obvious that you can re-use these across your data access routines. Doing this will enable you to join Views with other tables/views to produce an output result set, and the database engine will merge the view definition with the SQL you provide and will generate an execution plan to execute. Thus, sometimes Views allow you to re-use common complex SELECT queries across your data access routines, and also let the database engine to re-use execution plans for some portion of your TSQLs.
Take my word. Views don't give you any significant performance benefit. In my early SQL days, when I first learned about views, I got exited thinking that Views were something that "remembers" the result for the complex SELECT query it is built upon. But soon, I was disappointed to know that Views are nothing but compiled queries, and Views just can't remember any result set. (Poor me! I can bet many of you got the same wrong idea about Views in your first SQL days.)
But now, I may have a surprise for you! You can do something on a View so that it can truly "remember" the result set for the SELECT query it is composesd of. How? It's not hard; you just have to create indexes on the View.
Well, if you apply indexing on a View, the View becomes an "indexed view". For an indexed View, the database engine processes the SQL and stores the result in the data file just like a clustered table. SQL Server automatically maintains the index when data in the base table changes. So, when you issue a SELECT query on the indexed View, the database engine simply selects values from an index, which obviously performs very fast. Thus, creating indexes on views gives you excellent performance benefits.
Please note that nothing comes free. As creating indexed Views gives you performance boost, when data in the base table changes, the database engine has to update the index also. So, you should consider creating indexed Views when the view has to process too many rows with aggregate functions, and when data and the base table do not change often.
How to create an indexed View?
Create/modify the view specifying the SCHEMABINDING option:
CREATE VIEW dbo.vOrderDetails
WITH SCHEMABINDING
AS
SELECT...
Create a unique clustered index on the View.
Create a non-clustered index on the View as required.
Wait! Don't get too much exited about indexed Views. You can't always create indexes on Views. Following are the restrictions:
The View has to be created with the SCHEMABINDING option. In this case, the database engine will not allow you to change the underlying table schema.
The View cannot contain nondeterministic functions, DISTINCT clause, or subquery.
The underlying tables in the View must have a clustered index (primary keys).
Try finding the expensive TSQLs in your application that are already implemented using Views or that could be implemented using Views. Try creating indexes on these Views to boost up your data access performance.
Create indexes on User Defined Functions (UDF)
Did you know this? You can create indexes on User Defined Functions too in SQL Server. But, you can't do this in a straightforward way. To create an index on a UDF, you have to create a computed column specifying a UDF as the formula, and then you have to create an index on the computed column field.
Here are the steps to follow:
Create the function (if not exists already) and make sure that the function (that you want to create the index on) is deterministic. Add the SCHEMABINDING option in the function definition and make sure that there is no non-deterministic function/operator (getdate() or distinct etc.) in the function definition.
For example:
CREATE FUNCTION [dbo.ufnGetLineTotal]
(
-- Add the parameters for the function here
@UnitPrice [money],
@UnitPriceDiscount [money],
@OrderQty [smallint]
)
RETURNS money
WITH SCHEMABINDING
AS
BEGIN
return (((@UnitPrice*((1.0)-@UnitPriceDiscount))*@OrderQty))
END
Add a computed column in your desired table and specify the function with parameters as the value of the computed column.
Hide Copy Code
CREATE FUNCTION [dbo.ufnGetLineTotal]
(
-- Add the parameters for the function here
@UnitPrice [money],
@UnitPriceDiscount [money],
@OrderQty [smallint]
)
RETURNS money
WITH SCHEMABINDING
AS
BEGIN
return (((@UnitPrice*((1.0)-@UnitPriceDiscount))*@OrderQty))
END
Specifying UDF as computation formula for the computed column
Create an index on the computed column.
We have already seen that we can create an index on computed columns to retrieve faster results on computed columns. But, what benefit could we achieve by using a UDF in the computed columns and creating an index on those?
Well, doing this would give you a tremendous performance benefit when you include the UDF in a query, especially if you use UDFs in the join conditions between different tables/views. I have seen lots of join queries written using UDFs in the joining conditions. I've always thought UDFs in join conditions are bound to be slow (if the number of results to process is significantly large), and there has to be a way to optimize it. Creating indexes on functions in the computed columns is the solution.
Create indexes on XML columns
Create indexes on XML columns if there is any. XML columns are stored as binary large objects (BLOBs) in SQL Server (SQL Server 2005 and later) which can be queried using XQuery, but querying XML data types can be very time consuming without an index. This is true especially for large XML instances because SQL Server has to shred the binary large object containing the XML at runtime to evaluate the query.
To improve query performance on XML data types, XML columns can be indexed. XML indexes fall in two categories:
Primary XML indexes
When the primary index on an XML column is created, SQL Server shreds the XML content and creates several rows of data that includes information like element and attribute names, the path to the root, node types and values, and so on. So, creating the primary index enables SQL server to support XQuery requests more easily.
Following is the syntax for creating a primary XML index:
CREATE PRIMARY XML INDEX
index_name
ON <object> ( xml_column )
Secondary XML indexes
Creating primary XML indexes improves XQuery performance because the XML data is shredded already. But, SQL Server still needs to scan through the shredded data to find the desired result. To further improve query performance, secondary XML index should be created on top of primary XML indexes.
Three types of secondary XML indexes are there. These are:
"Path" secondary XML indexes: Useful when using the .exist() methods to determine whether a specific path exists.
"Value" secondary XML indexes: Used when performing value-based queries where the full path is unknown or includes wildcards.
"Property" secondary XML indexes: Used to retrieve property values when the path to the value is known.
Following is the syntax for creating secondary XML indexes:
CREATE XML INDEX
index_name
ON <object> ( xml_column )
USING XML INDEX primary_xml_index_name
FOR { VALUE | PATH | PROPERTY }
Please note that the above guidelines are the basics. But, creating indexes blindly on each and every table on the mentioned columns may not always result in performance optimization, because sometimes, you may find that creating indexes on particular columns in particular tables result in slowing down data insert/update operations in that table (particularly if the table has a low selectivity on a column). Also, if the table is a small one containing a small number of rows (say, <500), creating an index on the table might in turn increase the data retrieval performance (because, for smaller tables, a table scan is faster). So, we should be judicious while determining the columns to create indexes on.