keyongtech


  keyongtech > comp.programming

 #1  
05-26-07, 10:23 AM
A.J.
Hi all,

I'm looking for a data structure for an associative array with the
following properties:

- Maintains key/value pairs.
- Remembers the sequence order in which the keys arrive, similar to a
static array.
- Allows lookup, insertion, and deletion by both key and integer
index.
- Exhibits high performance (around O(log n) is fine) for lookups,
inserts, and deletes.

The best that I've able to come up with for such a data structure is a
pair of binary search trees. The first tree is an ordinary self-
balancing tree (e.g. a red-black tree) where the nodes are sorted
based on the key. This is the tree that actually contains the data in
the collection. The second tree is a dynamic order statistics tree
(like the one described at http://pine.cs.yale.edu/pinewiki/OrderStatisticsTree)
that contains a collection of keys. This second tree allows lookups
by integer index to take place.

The challenges that I'm faced with when implementing this data
structure are as follows:
- The order statistics tree assumes that the keys will be stored in a
sorted order, as in an ordinary binary search tree. The tree needs to
be modified so that the keys will be sorted based on the index in
which the keys originally arrive, preserving the insertion order when
inserts and deletes take place. I could sort the nodes using a
sequence number, but I'll have to be able to handle inserting elements
in the middle of the tree without bumping up sequence numbers beyond
the insertion point (since that will degrade the performance of
insertion down to O(n)).
- The use of two trees for such a data structure requires two nodes to
be stored in memory per key/value pair. I'm concerned about the
memory use when storing a larger number of nodes in memory.

I'm sure that this problem has been tackled before. I'm open to
suggestions.

Thanks,
--naijireja
 #2  
05-26-07, 02:58 PM
CBFalconer
"A.J." wrote:
>
> I'm looking for a data structure for an associative array with the
> following properties:
>
> - Maintains key/value pairs.
> - Remembers the sequence order in which the keys arrive, similar to a
> static array.
> - Allows lookup, insertion, and deletion by both key and integer
> index.
> - Exhibits high performance (around O(log n) is fine) for lookups,
> inserts, and deletes.


You can do all that with hashlib, and have O(1) performance. Take
a look at how it is used in id2id-20. These are all in standard C,
and available at:

<http://cbfalconer.home.att.net/download/>
 #3  
05-26-07, 04:31 PM
Gene
On May 26, 5:23 am, "A.J." <naijir> wrote:
[..]
> insertion down to O(n)).
> - The use of two trees for such a data structure requires two nodes to
> be stored in memory per key/value pair. I'm concerned about the
> memory use when storing a larger number of nodes in memory.
>
> I'm sure that this problem has been tackled before. I'm open to
> suggestions.
>
> Thanks,
> --naijireja


CBFalconer's hashlib is excellent, but I'm guessing from your design
that you need deletions to "collapse" the array index space so there
are no holes. I can't see a way this can be done in O(1). The order
stats tree is a nice O(log n) solution. You pay O(log n) on insert
(with respect to a hash table e.g.) in order to allow deletions to be
O(log n) rather than O(n).

If you implement a self-balancing order stats tree, you're there.
Since order stats need full descendent counts anyway, you can use
these to enforce balance rules rather than the normal couple of bits.

Your problem is really a non-problem. The _order stat_ is effectively
the key, and that's never stored. The nodes contain no separate key
field, just the descendent count. You always insert the newest node as
the last (rightmost) element. When you do an index lookup, you are
walking down the tree choosing the direction - left or right - that
ensures the node with your desired index is somewhere below you.

The main improvement I can suggest, as CBFalconer does, is to use a
hash table for the key->data map rather than a second tree. You can
keep hash table fields, tree link fields, descendent counts, and data
all in the same node. CBF's hashlib will support this.

Roughly this scheme is implemented in the back end of http://bridgecontest.usma.edu/
, where it is used to compute a given team's world-wide standing
quickly, even with millions of teams in the database. The array index
itself is the standing. Teams can be disqualified, which requires
deletion. I needed persistence, so implementation is over BerkeleyDB,
which has an option to create B-Trees with order-stat lookup.

HTH
 #4  
05-26-07, 07:17 PM
CBFalconer
Gene wrote:
> On May 26, 5:23 am, "A.J." <naijir> wrote:
> >
> > I'm looking for a data structure for an associative array with the
> > following properties:
> >

.... snip ...
>
> CBFalconer's hashlib is excellent, but I'm guessing from your design
> that you need deletions to "collapse" the array index space so there
> are no holes. I can't see a way this can be done in O(1). The order
> stats tree is a nice O(log n) solution. You pay O(log n) on insert
> (with respect to a hash table e.g.) in order to allow deletions to be
> O(log n) rather than O(n).


Thanks. However hashlib can do O(1) deletions. The hashwalk
function allows compression to a single list at any time (O(N)).
The result is invalid after further insertion/deletions, but will
survive searches. It also seems to be accurate, in that I have had
no reports of flaws in several years.
 #5  
05-28-07, 10:22 PM
Gene
On May 26, 2:17 pm, CBFalconer <cbfalco> wrote:
[..]
>
> --
> <http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
> <[..]
> <http://www.aaxnet.com/editor/edit043.html>
> <http://kadaitcha.cx/vista/dogsbreakfast/index.html>
> cbfalconer at maineline dot net
>
> --
> Posted via a free Usenet account fromhttp://www.teranews.com


Thanks. But I guess I'm not clear on what you're proposing. If you
add A, C, D to initially empty x then x[1]=A, x[2]=C, x[3]=D. Now
delete A. You are saying that moving C and D to positions x[1] and
x[2] will require O(n). Yes? With the order statistics tree, the
deletion remains O(log n). That's what I was getting at.
 #6  
05-29-07, 01:04 AM
CBFalconer
Gene wrote:
> On May 26, 2:17 pm, CBFalconer <cbfalco> wrote:
>
> Thanks. But I guess I'm not clear on what you're proposing. If you
> add A, C, D to initially empty x then x[1]=A, x[2]=C, x[3]=D. Now
> delete A. You are saying that moving C and D to positions x[1] and
> x[2] will require O(n). Yes? With the order statistics tree, the
> deletion remains O(log n). That's what I was getting at.


A hash table is NOT an array. Things don't normally get moved.
Get it and look. It's not that big.
 #7  
05-29-07, 03:23 AM
Gene
On May 28, 8:04 pm, CBFalconer <cbfalco> wrote:
[..]
> <http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
> <http://www.securityfocus.com/columnists/423>
> <http://www.aaxnet.com/editor/edit043.html>
> <http://kadaitcha.cx/vista/dogsbreakfast/index.html>
> cbfalconer at maineline dot net
>
> --
> Posted via a free Usenet account fromhttp://www.teranews.com- Hide quoted text -
>
> - Show quoted text -


Thanks. I guess I'm not communicating very well. I had looked at your
package long before this conversation. It's nice. I have implemented
similar designs myself.

The OP however is asking for a data structure that can be indexed in
order of insertion _with deletion_. When item K is deleted from an
array of N items, items at index positions K+1 to N must be "moved
down" to fill the hole to maintain the orginal order, and the new
array length is N-1. With a simple array, this is an O(N)
operation.

With an order stats tree index, you don't need O(N). Time O(log N)
per operation is enough to delete from a balanced tree -- a standard
tradeoff. Inserts get more expensive than the array, O(1) to O(log
N), but deletes get cheaper, O(N) to O(log N).

With a hash table mapping indices to values, deleting a hash element
ought to be O(1). It obviously is with yours, since you just mark
items deleted in a closed table. But still leaves the "move down". I
don't see how this can be accomplished with anything less than O(N) --
looking up items K+1 to N, deleting, and reinserting with new keys.

Again, this is the OP's requirement, not mine.

If you have a way, I'm truly interested.
Similar Threads
Regular Expression Result as Associative Array Index

Hi Everyone. $layout=ereg_replace("\[\[([a-z:0-9]+)\]\]", $dillo['\\1'], $layout); I'm trying to use the result of a Regular Expression Match as an Index for the...

How to query the index value of associative array

Hello PowerShellers, How do I return the index value of an associative array? I'm referencing "Pro Windows PowerShell" by Hristo Deshev, p. 18-24 Problem 1: Array and...

finding the index name of an associative array

myarray=array() myarray['a']=1 myarray['b']=1 myarray['c']=1 Is there an iterative way to find out the array index values ('a', 'b' and 'c') of myarray?

Creating Tree Structure from associative array

Hi all, I have an associative array, which contains parent and child relationships. I've searched the web for creating a tree structure from this and found a few good sites...

data structure, fast lookup for recently found items?

Hello Is there is any data structure that once you search for an item, it will put that item at the top. If you keep searching for that same item it will be returned...


All times are GMT. The time now is 06:32 PM. | Privacy Policy