keyongtech


  keyongtech > sqlserver.* > sqlserver.programming > 12/2006

 #1  
12-24-06, 01:56 PM
Derek
i have a table that has no index on a date column and a regular index
on a group_id and another on a customer_id column. the group_id will
be somewhat volatile, lots
of updates and and deletes on this table based on this column.
this is the design i inherited and i need a little advice.

table:
orderqueue (id int identity primary key, group_id int, customer_id int
not null, queuedate
datetime not null default getdate())

clustered index on queuedate (new index i was going to add)
non clustered index on customer_id
non clustered index on group_id

rows are inserted for customers and an outside process reads them and
stamps
them with a group_id. several customer records can share a group_id
(anywhere from 1 to 50 per
group_id). once processing is completed for customers in the group the
rows are deleted

queries being issued

select * from orderqueue where customer_id is null order by queuedate
insert into orderqueue (group_id, customer_id) values (null,
@customer_id)
update orderqueue set group_id = @group_id where group_id is null
delete from orderqueue where group_id = @group_id

the updates/deletes will be about 2:1 to selects. i didn't want to
cluster on group_id/queuedate because i was afraid of blocking on
the updates to the cluster index page.

questions

what is the effect on a cluster index when a non-clustered index column
is
updated (e.g. group_id is set to a value)?

looking at query plan for the update, the majority of the time is being
spent by an
update in actual updating of the cluster index data page.... is that
"good"? one
test i ran showed about 35% of the time seeking rows and 65% of the
time
doing a "Clustered Index Update/Update". it was an update of 5 rows
out of
15,000.

any help is appreciated
 #2  
12-24-06, 03:25 PM
Erland Sommarskog
Derek (gepetto_2000) writes:
> i have a table that has no index on a date column and a regular index
> on a group_id and another on a customer_id column. the group_id will
> be somewhat volatile, lots
> of updates and and deletes on this table based on this column.
> this is the design i inherited and i need a little advice.
>
> table:
> orderqueue (id int identity primary key, group_id int, customer_id int
> not null, queuedate
> datetime not null default getdate())
>
> clustered index on queuedate (new index i was going to add)
> non clustered index on customer_id
> non clustered index on group_id


You did not say whether PK is clustered or not. But both id and queuedate
are good choices for clustered indexes, since they are monotonically
growing. But maybe it's better to use the id, and leave out queuedate
as an un-indexed to keep down the number of indexes. One advantage of
this is that id is four bytes, and an index on queuedate is would be
twelve bytes (datetime + uniqifier), so this keeps down the size of
the NC indexes. (Recall that the clustered index key is the row locator
for the NC indexes.)

> queries being issued
>
> select * from orderqueue where customer_id is null order by queuedate


I assume that the WHERE condition is "group_id IS NULL"? Else the
query does not make much sense, since customer_id is not nullable.

It seems to me that the query would be more efficient, if the process
querying this table, tracks the last id. When you could add

AND id > @last_id

> update orderqueue set group_id = @group_id where group_id is null


As well as for this query. (Which would bring a change, since in the
current process, it appears that rows inserted after the SELECT would
also be assigned a group_id.)

> the updates/deletes will be about 2:1 to selects. i didn't want to
> cluster on group_id/queuedate because i was afraid of blocking on
> the updates to the cluster index page.


Yes, you would get a lot of page moves this way, since you also would
have to update the index on customer_id and id. So keeping group_id
out from the clustered index is a good idea.

> what is the effect on a cluster index when a non-clustered index column
> is updated (e.g. group_id is set to a value)?


None. Of course, the data page is updated, but the index as such is not
affected.

> looking at query plan for the update, the majority of the time is being
> spent by an update in actual updating of the cluster index data page....
> is that "good"? one test i ran showed about 35% of the time seeking
> rows and 65% of the time doing a "Clustered Index Update/Update". it
> was an update of 5 rows out of 15,000.


It's probably a good thing if the update takes more time than seeking.
That indicates that the indexes are good for locating the rows.
 #3  
12-24-06, 08:27 PM
Derek
hi erland.... thank you and happy holidays to you

the pk is not indexed at all at the moment. i'm not strong on database
since i am
actually a java developer.

you are right select * from orderqueue where customer_id is null order
by queuedate
is a mistake.
it is actually select * from orderqueue where group_id is null order by
queuedate. it is used
to find all the queue records that need to be marked with a group_id.

if i perform queries using order by queuedate should i make the pk
clustered and
the queuedate non clustered? or just leave out the queuedate altogether
as an index?
the results per retrieval would be small (between 1 - 100) for each
select on the table. is it better to leave out the index and let the
database do the sort?
i am going to experiment a bit but i was looking for some advice on it.

Erland Sommarskog wrote:
[..]
 #4  
12-24-06, 09:27 PM
Erland Sommarskog
Derek (gepetto_2000) writes:
> the pk is not indexed at all at the moment. i'm not strong on database
> since i am actually a java developer.


If you have create the table with id as a primary, then there is an
index on id, because that is how PK constraints are implemented in SQL
Server. And that index is clustered, since PKs are created as clustered
by default (unless there is already is a clustered index at the time.)

You can use sp_helpindex on the table to check this.

> if i perform queries using order by queuedate should i make the pk
> clustered and the queuedate non clustered? or just leave out the
> queuedate altogether as an index?


You could just as well order by id, which should be correlated with
queuedate. Did you consider the alternative of saving the id for each
fetch and adding it to the WHERE condition.
 #5  
12-25-06, 03:02 PM
Derek
hi erland

yes it looks like the pk is a cluster index. i didn't know it was.

i can't check for last id because there will be multiple java threads
that can execute the queries and they won't know what the last id
is.

my new question for you, if my query is

select * from orderqueue where group_id is null order by queuedate


i was thinking of this scheme based on your recommendation

id - (pk of table) - clustered
customer_id - non clustered
group_id - non clustered
queuedate - non clustered

assumptions
select will be WHERE group_id is null ORDER BY queuedate
updates/deletes will be WHERE group_id = @group_id

in your professional opinion is this a good index design?

or does it make sense to cluster to the queuedate and make the pk
a non-cluster or will a cluster pk be sufficient enough to make the
sorting happen fast? i understand the queuedate would make the
non clusters bigger because of the datetime type.

Erland Sommarskog wrote:
[..]
 #6  
12-25-06, 04:44 PM
Erland Sommarskog
Derek (gepetto_2000) writes:
> yes it looks like the pk is a cluster index. i didn't know it was.
>
> i can't check for last id because there will be multiple java threads
> that can execute the queries and they won't know what the last id
> is.


Each thread should know what was the last id it got.

(It does a bit funny to me. So multiple threads reads data from this
queue and assign group ids? As far as I can understand multiple threads
can read the same unassigned entries, even if only one of them will
actually set a group id for it? By the way, are you on SQL 2000 or
SQL 2005? If you are on SQL 2005, did you consider using Service
Broker?)

> my new question for you, if my query is
>
> select * from orderqueue where group_id is null order by queuedate


I still think you should do order by id. If I understand this
correctly, when you add new rows, queuedate is always assigned getdate().
This means that id and queuedate are fully correlated. But id would be
better to sort on, because it's a smaller column.

> i was thinking of this scheme based on your recommendation
>
> id - (pk of table) - clustered
> customer_id - non clustered
> group_id - non clustered
> queuedate - non clustered


Thinking of it, the index on group_id should probably be (group_id, id).
Then this index can be used to retrieve the rows in the desired order.
(This assumes that you order by id and not queuedate.) Note that since id
is the clustered index, it is in the index of group_id anyway.

Then again, if you only retrieve 50-100 rows at a time, I would not expect
sorting to be very expensive.
 #7  
12-25-06, 10:11 PM
Derek
hi erland thank you for your replies.

the java threads periodically check this queue table for new entries
that have
been inserted. the group_id is null on the insert. when the threads
check
the table it stamps them with a group_id so they are processed together
by
another program. when they are done being processed they are deleted
(delete from orderqueue where group_id = @group_id)

i did have a follow up question (i'm learning a lot - thank you!)

if i do a lot of deletes, updates, and inserts on the table would the
cluster on id (or queuedate) be ok since the data pages won't have to
move as dramatically it is because the index is growing sequentially
anyway and new rows are added to the end. ? do i understand that
correctly?



Erland Sommarskog wrote:
[..]
 #8  
12-25-06, 11:19 PM
Erland Sommarskog
Derek (gepetto_2000) writes:
> if i do a lot of deletes, updates, and inserts on the table would the
> cluster on id (or queuedate) be ok since the data pages won't have to
> move as dramatically it is because the index is growing sequentially
> anyway and new rows are added to the end. ? do i understand that
> correctly?


Yes.

You can still experience problems with fragmentation, is some old rows
are left around and never deleted, but that fragmenation would only appear
at the tail of the table.
 #9  
12-26-06, 02:35 PM
Derek
thank you. i can not express enough how appreciative i am for your
advice. i have
really learned a lot!

is there a good book you can recommend on sql server internals? i need
something
that can explain to a non database person how data paging works,
fragmentation, and so on. i am working on sql server 2005.


Erland Sommarskog wrote:
[..]
 #10  
12-26-06, 03:11 PM
Erland Sommarskog
Derek (gepetto_2000) writes:
> thank you. i can not express enough how appreciative i am for your
> advice. i have
> really learned a lot!
>
> is there a good book you can recommend on sql server internals? i need
> something
> that can explain to a non database person how data paging works,
> fragmentation, and so on. i am working on sql server 2005.


The obvious recommendation would be "Inside SQL Server 2005: The Storage
Engine" by Kalen Delaney. This book is certainly not for a beginner. Then
again, you asked for an internals book...

The book is actually the third in a series of four. The first two are about
T-SQL Querying and T-SQL Programming and written by Itizk Ben-Gan. (The
fourth is about tuning and optimization, but has not been published yet,
I believe.) Itzik touches topics like indexing but does not go into the
gory details. (He does go into the gory details of querying and programming,
though!)
Similar Threads
Advice on setting up a cluster

Hello, I am currently setting up a 2 node W2K3 R2 cluster and hope someone could advise me on best practices for connecting the shared disk shelf. Each node has a SAS 5e...

Failed SQL Cluster Install Advice

A quick rant - After days of study, hours of install, and some killer directions for Brad M McGehee I began to install a SQL Cluster on a Windows 2003 enterprise server...

Advice on cluster on windows

Hi, First of all I need to say that I'm coming from java world so maybe there is a simple answer to my questions that I'm not aware of. What we need is to run existing...

re-Index advice

We periodically rebuild our indexes when we notice defragmentation of indexes however after rebuilding the indexes we do not recompile the stored procedure. My experience has...

Advice Needed: Pocket Media Players (software advice)

(If responding via email, remove _spam from my address shown) I seek advice (and crackz) on the best software for managing and playing MP3 files and movies on my HP Ipaq...


All times are GMT. The time now is 11:06 PM. | Privacy Policy