SQL World: When we will go for clustered and Non-clustered Indexes

1. Indexes.

Microsoft SQL Server index is a structure associated with a table that speeds retrieval of the rows in the table. An index contains keys built from one or more columns in the table. These keys are stored in a structure that allows SQL Server to find the row or rows associated with the key values quickly and efficiently.

If a table is created with no indexes, the data rows are not stored in any particular order. This structure is called a heap.

The two types of SQL Server indexes are:

Clustered

Clustered indexes sort and store the data rows in the table based on their key values. Because the data rows are stored in sorted order on the clustered index key, clustered indexes are efficient for finding rows. There can only be one clustered index per table, because the data rows themselves can only be sorted in one order. The data rows themselves form the lowest level of the clustered index.

The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. If a table has no clustered index, its data rows are stored in a heap.

Nonclustered

Nonclustered indexes have a structure that is completely separate from the data rows. The lowest rows of a nonclustered index contain the nonclustered index key values and each key value entry has pointers to the data rows containing the key value. The data rows are not stored in order based on the nonclustered key.

The pointer from an index row in a nonclustered index to a data row is called a row locator. The structure of the row locator depends on whether the data pages are stored in a heap or are clustered. For a heap, a row locator is a pointer to the row. For a table with a clustered index, the row locator is the clustered index key

2. When we will go for clustered and Non-clustered Indexes.

Indexes assist when a query:

· Searches for rows that match a specific search key value (an exact match query). An exact match comparison is one in which the query uses the WHERE statement to specify a column entry with a given value. For example:

WHERE emp_id = 'VPA30890F'

· Searches for rows with search key values in a range of values (a range query). A range query is one in which the query specifies any entry whose value is between two values. For example:

WHERE job_lvl BETWEEN 9 and 12

or,

WHERE job_lvl >= 9 and job_lvl <= 12

· Searches for rows in a table T1 that match, based on a join predicate, a row in another table T2 (an index nested loops join).

· Produces sorted query output without an explicit sort operation, in particular for sorted dynamic cursors.

· Scans rows in a sorted order to permit an order-based operation, such as merge join and stream aggregation, without an explicit sort operation.

· Scans all rows in a table with better performance than a table scan, due to the reduced column set and overall data volume to be scanned (a covering index for the query at hand).

· Searches for duplicates of new search key values in insert and update operations, to enforce PRIMARY KEY and UNIQUE constraints.

· Searches for matching rows between two tables for which a FOREIGN KEY constraint is defined.

Queries using LIKE comparisons can benefit from an index if the pattern starts with a specific character string, for example 'abc%', but not if the pattern starts with a wildcard search, for example '%xyz'.

Consider using nonclustered indexes for:

· Columns that contain a high number of distinct values, such as a combination of last name and first name (if a clustered index is used for other columns). If there are very few distinct values, such as only 1 and 0, no index should be created.

· Queries that do not return large result sets.

· Columns frequently involved in search conditions of a query (WHERE clause) that return exact matches.

· Decision Support System applications for which joins and grouping are frequently required. Create multiple nonclustered indexes on columns involved in join and grouping operations, and a clustered index on any foreign key columns.

· Covered queries

Consider using a clustered index for:

· Columns that contain a limited number of distinct values, such as a state column that contains only 50 distinct state codes. However, if there are very few distinct values, such as only 1 and 0, no index should be created.

· Queries that return a range of values using operators such as BETWEEN, >, >=, <, and <=.

· Columns that are accessed sequentially.

· Queries that return large result sets.

· Columns that are frequently accessed by queries involving join or GROUP BY clauses; typically these are foreign key columns. An index on the column(s) specified in the ORDER BY or GROUP BY clause eliminates the need for SQL Server to sort the data because the rows are already sorted. This improves query performance.

· OLTP-type applications where very fast single row lookup is required, typically by means of the primary key. Create a clustered index on the primary key.

Clustered indexes are not a good choice for:

· Columns that undergo frequent changes because this results in the entire row moving (because SQL Server must keep the row’s data values in physical order). This is an important consideration in high-volume transaction processing systems where data tends to be volatile.

· Covered queries. The more columns within the search key, the greater the chance for the data in the indexed column to change, resulting in additional I/O.

Index Optimization tips

Every index increases the time in takes to perform INSERTS, UPDATES and DELETES, so the number of indexes should not be very much. Try to use maximum 4-5 indexes on one table, not more. If you have read-only table, then the number of indexes may be increased.
Keep your indexes as narrow as possible. This reduces the size of the index and reduces the number of reads required to read the index.
Try to create indexes on columns that have integer values rather than character values.
If you create a composite (multi-column) index, the orders of the columns in the key are very important. Try to order the columns in the key as to enhance selectivity, with the most selective columns to the left most of the key.
If you want to join several tables, try to create surrogate integer keys for this purpose and create indexes on their columns.
Create surrogate integer primary key (identity for example) if your table will not have many insert operations.
Clustered indexes are more preferable than no clustered, if you need to select by a range of values or you need to sort results set with GROUP BY or ORDER BY.
If your application will be performing the same query over and over on the same table, consider creating a covering index on the table.
You can use the SQL Server Profiler Create Trace Wizard with "Identify Scans of Large Tables" trace to determine which tables in your database may need indexes. This trace will show which tables are being scanned by queries instead of using an index.
You can use sp_MSforeachtable undocumented stored procedure to rebuild all indexes in your database. Try to schedule it to execute during CPU idle time and slow production periods.
sp_MSforeachtable @command1="print '?' DBCC DBREINDEX ('?')"