|
|
||||||
|
#1
|
|
|
|
|
i have a table that has no index on a date column and a regular index
on a group_id and another on a customer_id column. the group_id will be somewhat volatile, lots of updates and and deletes on this table based on this column. this is the design i inherited and i need a little advice. table: orderqueue (id int identity primary key, group_id int, customer_id int not null, queuedate datetime not null default getdate()) clustered index on queuedate (new index i was going to add) non clustered index on customer_id non clustered index on group_id rows are inserted for customers and an outside process reads them and stamps them with a group_id. several customer records can share a group_id (anywhere from 1 to 50 per group_id). once processing is completed for customers in the group the rows are deleted queries being issued select * from orderqueue where customer_id is null order by queuedate insert into orderqueue (group_id, customer_id) values (null, @customer_id) update orderqueue set group_id = @group_id where group_id is null delete from orderqueue where group_id = @group_id the updates/deletes will be about 2:1 to selects. i didn't want to cluster on group_id/queuedate because i was afraid of blocking on the updates to the cluster index page. questions what is the effect on a cluster index when a non-clustered index column is updated (e.g. group_id is set to a value)? looking at query plan for the update, the majority of the time is being spent by an update in actual updating of the cluster index data page.... is that "good"? one test i ran showed about 35% of the time seeking rows and 65% of the time doing a "Clustered Index Update/Update". it was an update of 5 rows out of 15,000. any help is appreciated |
|
|
|
#2
|
|
|
|
|
Derek (gepetto_2000) writes:
> i have a table that has no index on a date column and a regular index > on a group_id and another on a customer_id column. the group_id will > be somewhat volatile, lots > of updates and and deletes on this table based on this column. > this is the design i inherited and i need a little advice. > > table: > orderqueue (id int identity primary key, group_id int, customer_id int > not null, queuedate > datetime not null default getdate()) > > clustered index on queuedate (new index i was going to add) > non clustered index on customer_id > non clustered index on group_id You did not say whether PK is clustered or not. But both id and queuedate are good choices for clustered indexes, since they are monotonically growing. But maybe it's better to use the id, and leave out queuedate as an un-indexed to keep down the number of indexes. One advantage of this is that id is four bytes, and an index on queuedate is would be twelve bytes (datetime + uniqifier), so this keeps down the size of the NC indexes. (Recall that the clustered index key is the row locator for the NC indexes.) > queries being issued > > select * from orderqueue where customer_id is null order by queuedate I assume that the WHERE condition is "group_id IS NULL"? Else the query does not make much sense, since customer_id is not nullable. It seems to me that the query would be more efficient, if the process querying this table, tracks the last id. When you could add AND id > @last_id > update orderqueue set group_id = @group_id where group_id is null As well as for this query. (Which would bring a change, since in the current process, it appears that rows inserted after the SELECT would also be assigned a group_id.) > the updates/deletes will be about 2:1 to selects. i didn't want to > cluster on group_id/queuedate because i was afraid of blocking on > the updates to the cluster index page. Yes, you would get a lot of page moves this way, since you also would have to update the index on customer_id and id. So keeping group_id out from the clustered index is a good idea. > what is the effect on a cluster index when a non-clustered index column > is updated (e.g. group_id is set to a value)? None. Of course, the data page is updated, but the index as such is not affected. > looking at query plan for the update, the majority of the time is being > spent by an update in actual updating of the cluster index data page.... > is that "good"? one test i ran showed about 35% of the time seeking > rows and 65% of the time doing a "Clustered Index Update/Update". it > was an update of 5 rows out of 15,000. It's probably a good thing if the update takes more time than seeking. That indicates that the indexes are good for locating the rows. |
|
#3
|
|
|
|
|
hi erland.... thank you and happy holidays to you
the pk is not indexed at all at the moment. i'm not strong on database since i am actually a java developer. you are right select * from orderqueue where customer_id is null order by queuedate is a mistake. it is actually select * from orderqueue where group_id is null order by queuedate. it is used to find all the queue records that need to be marked with a group_id. if i perform queries using order by queuedate should i make the pk clustered and the queuedate non clustered? or just leave out the queuedate altogether as an index? the results per retrieval would be small (between 1 - 100) for each select on the table. is it better to leave out the index and let the database do the sort? i am going to experiment a bit but i was looking for some advice on it. Erland Sommarskog wrote: [..] |
|
#4
|
|
|
|
|
Derek (gepetto_2000) writes:
> the pk is not indexed at all at the moment. i'm not strong on database > since i am actually a java developer. If you have create the table with id as a primary, then there is an index on id, because that is how PK constraints are implemented in SQL Server. And that index is clustered, since PKs are created as clustered by default (unless there is already is a clustered index at the time.) You can use sp_helpindex on the table to check this. > if i perform queries using order by queuedate should i make the pk > clustered and the queuedate non clustered? or just leave out the > queuedate altogether as an index? You could just as well order by id, which should be correlated with queuedate. Did you consider the alternative of saving the id for each fetch and adding it to the WHERE condition. |
|
#5
|
|
|
|
|
hi erland
yes it looks like the pk is a cluster index. i didn't know it was. i can't check for last id because there will be multiple java threads that can execute the queries and they won't know what the last id is. my new question for you, if my query is select * from orderqueue where group_id is null order by queuedate i was thinking of this scheme based on your recommendation id - (pk of table) - clustered customer_id - non clustered group_id - non clustered queuedate - non clustered assumptions select will be WHERE group_id is null ORDER BY queuedate updates/deletes will be WHERE group_id = @group_id in your professional opinion is this a good index design? or does it make sense to cluster to the queuedate and make the pk a non-cluster or will a cluster pk be sufficient enough to make the sorting happen fast? i understand the queuedate would make the non clusters bigger because of the datetime type. Erland Sommarskog wrote: [..] |
|
#6
|
|
|
|
|
Derek (gepetto_2000) writes:
> yes it looks like the pk is a cluster index. i didn't know it was. > > i can't check for last id because there will be multiple java threads > that can execute the queries and they won't know what the last id > is. Each thread should know what was the last id it got. (It does a bit funny to me. So multiple threads reads data from this queue and assign group ids? As far as I can understand multiple threads can read the same unassigned entries, even if only one of them will actually set a group id for it? By the way, are you on SQL 2000 or SQL 2005? If you are on SQL 2005, did you consider using Service Broker?) > my new question for you, if my query is > > select * from orderqueue where group_id is null order by queuedate I still think you should do order by id. If I understand this correctly, when you add new rows, queuedate is always assigned getdate(). This means that id and queuedate are fully correlated. But id would be better to sort on, because it's a smaller column. > i was thinking of this scheme based on your recommendation > > id - (pk of table) - clustered > customer_id - non clustered > group_id - non clustered > queuedate - non clustered Thinking of it, the index on group_id should probably be (group_id, id). Then this index can be used to retrieve the rows in the desired order. (This assumes that you order by id and not queuedate.) Note that since id is the clustered index, it is in the index of group_id anyway. Then again, if you only retrieve 50-100 rows at a time, I would not expect sorting to be very expensive. |
|
#7
|
|
|
|
|
hi erland thank you for your replies.
the java threads periodically check this queue table for new entries that have been inserted. the group_id is null on the insert. when the threads check the table it stamps them with a group_id so they are processed together by another program. when they are done being processed they are deleted (delete from orderqueue where group_id = @group_id) i did have a follow up question (i'm learning a lot - thank you!) if i do a lot of deletes, updates, and inserts on the table would the cluster on id (or queuedate) be ok since the data pages won't have to move as dramatically it is because the index is growing sequentially anyway and new rows are added to the end. ? do i understand that correctly? Erland Sommarskog wrote: [..] |
|
#8
|
|
|
|
|
Derek (gepetto_2000) writes:
> if i do a lot of deletes, updates, and inserts on the table would the > cluster on id (or queuedate) be ok since the data pages won't have to > move as dramatically it is because the index is growing sequentially > anyway and new rows are added to the end. ? do i understand that > correctly? Yes. You can still experience problems with fragmentation, is some old rows are left around and never deleted, but that fragmenation would only appear at the tail of the table. |
|
#9
|
|
|
|
|
thank you. i can not express enough how appreciative i am for your
advice. i have really learned a lot! is there a good book you can recommend on sql server internals? i need something that can explain to a non database person how data paging works, fragmentation, and so on. i am working on sql server 2005. Erland Sommarskog wrote: [..] |
|
#10
|
|
|
|
|
Derek (gepetto_2000) writes:
> thank you. i can not express enough how appreciative i am for your > advice. i have > really learned a lot! > > is there a good book you can recommend on sql server internals? i need > something > that can explain to a non database person how data paging works, > fragmentation, and so on. i am working on sql server 2005. The obvious recommendation would be "Inside SQL Server 2005: The Storage Engine" by Kalen Delaney. This book is certainly not for a beginner. Then again, you asked for an internals book... The book is actually the third in a series of four. The first two are about T-SQL Querying and T-SQL Programming and written by Itizk Ben-Gan. (The fourth is about tuning and optimization, but has not been published yet, I believe.) Itzik touches topics like indexing but does not go into the gory details. (He does go into the gory details of querying and programming, though!) |
|
|
| Similar Threads | |
| Advice on setting up a cluster Hello, I am currently setting up a 2 node W2K3 R2 cluster and hope someone could advise me on best practices for connecting the shared disk shelf. Each node has a SAS 5e... |
|
| Failed SQL Cluster Install Advice A quick rant - After days of study, hours of install, and some killer directions for Brad M McGehee I began to install a SQL Cluster on a Windows 2003 enterprise server... |
|
| Advice on cluster on windows Hi, First of all I need to say that I'm coming from java world so maybe there is a simple answer to my questions that I'm not aware of. What we need is to run existing... |
|
| re-Index advice We periodically rebuild our indexes when we notice defragmentation of indexes however after rebuilding the indexes we do not recompile the stored procedure. My experience has... |
|
| Advice Needed: Pocket Media Players (software advice) (If responding via email, remove _spam from my address shown) I seek advice (and crackz) on the best software for managing and playing MP3 files and movies on my HP Ipaq... |
|
|
All times are GMT. The time now is 11:06 PM. | Privacy Policy
|