I have been reading lately about how clustered index
and non-clustered index
works. My understanding in simple terms (correct me if wrong):
The data structure that backs clustered
and non-clustered index
is B-Tree
Clustered Index
: physically sorts the data based on the index column (or key). you can have only one clustered Index
per table
. If no index
is specified during table creation, SQL
server will automatically create a clustered Index
on the primary key column
.
Q1: Since data is physically sorted based on index, there is no extra space needed here. is this correct? so what happens when I drop the index I created?
Non-clustered Index
: In non-clustered indexes
, the leaf-node
of the tree contains the columns values and a pointer (row locator) to the actual row in the database. Here there is extra space needed for storing this non-clustered index table
physically on disk. However, one is not limited by the number of non-clustered Indexes.
Q2: Does it mean a query on non-clustered index column will not result in the sorted data?
Q3: There is an extra look-up associated here to locate the actual row data using the pointer at the leaf node. How much performance difference would this be when compared to a clustered index?
Excercise:
consider an Employee table:
CREATE TABLE Employee
(
PersonID int PRIMARY KEY,
Name varchar(255),
age int,
salary int
);
Now I created an employee table ( a default clustered index on employee is created).
Two frequent queries on this table happen only on age and salary columns. For sake of simplicity,
lets assume that the table is NOT frequently updated
for example:
select * from employee where age > XXX;
select * from employee where salary > XXXX and salary < YYYY;
Q4 : what is the best way to construct indexes, so that queries on both these column have similar performance. If I have clustered index on age queries on age column will be faster but than on salary column will be slower.
Q5: On a related note, I have repeatedly seen that indexes (both clustered and non-clustered) should be created on column with unique constraints. why is that? what will happen on failure to do this?
Thank you very much
Posts I read are here:
http://msdn.microsoft.com/en-us/library/ms190457.aspx
What do Clustered and Non clustered index actually mean?
What are the differences between a clustered and a non-clustered index?
How does database indexing work?
Answer
For SQL Server
Q1 Extra space is only needed for the clustered index if it is not unique. SQL Server will add a 4 byte uniquifier internally to a non-unique clustered index. This is because it uses the cluster key as a rowid in non-clustered indexes.
Q2 A non-clustered index can be read in order. That may aid queries where you specify an order. It may also make merge joins attractive. It will also help with range queries (x < col and y > col).
Q3 SQL Server does an extra "bookmark lookup" when using a non-clustered index. But, this is only if it needs a column that isn't in the index. Note also, that you can include
extra columns in the leaf level of indexs. If an index can be used without the additional lookup it is called a covering index.
If a bookmark lookup is required, it doesn't take a high percentage of rows until it's quicker just to scan the whole clustered index. The level depends on row size, key size etc. But 5% of rows is a typical cut off.
Q4 If the most important thing in your application was making both these queries as fast as possible, you could create covering index on both of them:
create index IX_1 on employee (age) include (name, salary);
create index IX_2 on employee (salary) include (name, age);
Note you don't have to specifically include the cluster key, as the non-clustered index has it as the row pointer.
Q5 This is more important for cluster keys than non-cluster keys due to the uniquifier. The real issue though is whether an index is selective or not for your queries. Imagine an index on a bit
value. Unless the distribution of data is very skewed, such an index is unlikely to be used for anything.
More info about the uniquifier. Imagine you and a non unique clustered index on age, and a non-clustered index on salary. Say you had the following rows:
age | salary | uniqifier
20 | 1000 | 1
20 | 2000 | 2
Then the salary index would locate rows like so
1000 -> 20, 1
2000 -> 20, 2
Say you ran the query select * from employee where salary = 1000
, and the optimizer chose to use the salary index. It would then find the pair (20, 1) from the index lookup, then lookup this value in the main data.
No comments:
Post a Comment