Dot Net Interview Question Answer: Difference between covering and included Index

Difference between covering and included Indxe
A covering index is an index that contains all columns referenced in the query. A clustered index is a covering index by definition, but the adjetive is mainly used for non-clustered indexes. The columns in an index can be part of the key, or can be included and be part of the leaf nodes. The columns participating in the key are by default in the leaf nodes too (B-Tree).
If an index is not covering, then SQL Server will have to go to the clustered index, or the table if it is a heap, to pull the values for the rest of the columns that are not part of the index.
There is no such concept as included index, but an index can use the INCLUDE clause, to incorporate in the leaf nodes, columns that are not part of the key. Before SS 2005, the only way a column can be part of an index was participating in the key.
The columns participating in the key, are referenced, in a query, in the clauses JOIN, WHERE, GROUP BY, HAVING, ORDER BY, etc and the ones that have been included are mainly referenced in the list of columns on the SELECT clause.
Example:
T(a, b, c, d)
select a, b, c, d
from T
where c = @c;
For this query, we need an index by [c] as part of the key, so SQL Server can create statistics an use them to estimate cardinality or selectivity of the filter expression, and columns (a, b, d) as part of the leaf nodes (included) so the engine can bring their values.
select a, b, c, d
from T
where b = @b and c = @c;
For this query, the optimal index will have columns (b, c) as part of the key, the order of the columns will depend on the number of unique values in the column, and columns (a, d) as included.
There is nothing stopping us from also using an index by (a, b, c, d) or any other order, but if you are not planning to join / filter by [a] or [d], then it is better having them as included, instead beeing part of the key, unless the combination of them can be used to uniquely identify a row in which case you will use all columns in the filter or join. Remember, for the key, SQL Server create intermediate nodes on the B-Tree in order to know the path of each value in the leaf nodes.
To finish, a covering index is an index that includes all columns referenced in a query. The columns could be part of the index key, or can included and be part of the leaf nodes only.
How do you include a column?
Using the INCLUDE clause, in the CREATE INDEX statement.
create nonclustered index idx_nc_T_b_c
on T(b, c) include (a, d);
i.e.
In the first one Column2 gets added to the index key. In the second one it might not ^(*) get added to the key in which case it will only appear in the index leaf pages. This can allow the index to be searched by Column1 but avoid the need to go back to the base table (bookmark lookup/key lookup) to retrieve the value for Column2.
i.e. it makes index2 "covering" for queries such as

SELECT Column1,Column2
FROM [dbo].[MyTable] 
WHERE Column1 = 'X'

And it also covers queries such as

SELECT Column1,Column2
FROM [dbo].[MyTable] 
WHERE Column1 = 'X' AND Column2 = 'Y'

But index1 may well perform better for the second query as it can seek on the two columns directly (as opposed to only being able to search on Column1 then needing to evaluate all matching rows at the index leaf level to see if they meet the Column2 predicate). If Column2 is never used as a search predicate against that index and your queries against that index wouldn't benefit from having Column2 ordered then it should be added as an INCLUDE-d column to keep the size of the key down and reduce the number of pages in the index.

A key column is the column(s) that the index is created on, the non-key column are included columns

CREATE NONCLUSTERED INDEX idx1 ON Table1 (Col1, Col2) INCLUDE (Col3, Col4)

In the above example, Col1 and Col2 are key columns, Col3 and Col4 are non-key columns

CREATE CLUSTERED INDEX cidx ON Table1 (Col1)

In the above example Col1 is the key column, and all other columns in the table are classed as non-key columns, as the clustered index is the table.

Clustered Index
Only 1 allowed per table
Physically rearranges the data in the table to conform to the index constraints
For use on columns that are frequently searched for ranges of data
For use on columns with low selectivity

Non-Clustered Index
Up to 249 allowed per table
Creates a separate list of key values with pointers to the location of the data in the data pages
For use on columns that are searched for single values
For use on columns with high selectivity

A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages. A non-clustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a non-clustered index does not consist of the data pages. Instead, the leaf nodes contain index rows

Included Column Index (New in SQL Server 2005)
In SQL Server 2005, the functionality of non-clustered indexes is extended by adding non-key columns to the leaf level of the non-clustered index. Non-key columns, can help to create cover indexes.By including non-key columns, you can create non-clustered indexes that cover more queries. The Database Engine does not consider non-key columns when calculating the number of index key columns or index key size. Non-key columns can be included in non-clustered index to avoid exceeding the current index size limitations of a maximum of 16 key columns and a maximum index key size of 900 bytes. Another advantage is that using non-key column in index we can have index data types not allowed as index key columns generally.

In following example column FileName is varchar(400), which will increase the size of the index key bigger than it is allowed. If we still want to include in our cover index to gain performance we can do it by using the Keyword INCLUDE.

USE AdventureWorks

GO

CREATE INDEX IX_Document_Title

ON Production.Document (Title, Revision)

INCLUDE (FileName)

Non-key columns can be included only in non-clustered indexes. Columns can’t be defined in both the key column and the INCLUDE list. Column names can’t be repeated in the INCLUDE list. Non-key columns can be dropped from a table only after the non-key index is dropped first. For Included Column Index to exist there must be at least one key column defined with a maximum of 16 key columns and 1023 included columns.

Avoid adding unnecessary columns. Adding too many index columns, key or non-key as they will affect negatively on performance. Fewer index rows will fit on a page. This could create I/O increases and reduced cache efficiency. More disk space will be required to store the index. Index maintenance may increase the time that it takes to perform modifications, inserts, updates, or deletes, to the underlying table or indexed view.

Another example to test:
Create following Index on Database AdventureWorks in SQL SERVER 2005

USE AdventureWorks

GO

CREATE NONCLUSTERED INDEX IX_Address_PostalCode

ON Person.Address (PostalCode)

INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID)

GO

Test the performance of following query before and after creating Index. The performance improvement is significant.

SELECT AddressLine1, AddressLine2, City, StateProvinceID, PostalCode

FROM Person.Address

WHERE PostalCode BETWEEN '98000'

AND '99999';

GO

Limitations and Restrictions of indexes

Nonkey columns can only be defined on nonclustered indexes.
All data types except text, ntext, and image can be used as nonkey columns.
Computed columns that are deterministic and either precise or imprecise can be nonkey columns. For more information, see Indexes on Computed Columns.
Computed columns derived from image, ntext, and text data types can be nonkey columns as long as the computed column data type is allowed as a nonkey index column.
Nonkey columns cannot be dropped from a table unless that table’s index is dropped first.
Nonkey columns cannot be changed, except to do the following:
- Change the nullability of the column from NOT NULL to NULL.
- Increase the length of varchar, nvarchar, or varbinary columns.

Dot Net Interview Question Answer

Tuesday, 25 March 2014

Difference between covering and included Index

Limitations and Restrictions of indexes

No comments:

Post a Comment

About Me