|
|
||||||
|
#1
|
|
|
|
|
Hi all,
I'm looking for a data structure for an associative array with the following properties: - Maintains key/value pairs. - Remembers the sequence order in which the keys arrive, similar to a static array. - Allows lookup, insertion, and deletion by both key and integer index. - Exhibits high performance (around O(log n) is fine) for lookups, inserts, and deletes. The best that I've able to come up with for such a data structure is a pair of binary search trees. The first tree is an ordinary self- balancing tree (e.g. a red-black tree) where the nodes are sorted based on the key. This is the tree that actually contains the data in the collection. The second tree is a dynamic order statistics tree (like the one described at http://pine.cs.yale.edu/pinewiki/OrderStatisticsTree) that contains a collection of keys. This second tree allows lookups by integer index to take place. The challenges that I'm faced with when implementing this data structure are as follows: - The order statistics tree assumes that the keys will be stored in a sorted order, as in an ordinary binary search tree. The tree needs to be modified so that the keys will be sorted based on the index in which the keys originally arrive, preserving the insertion order when inserts and deletes take place. I could sort the nodes using a sequence number, but I'll have to be able to handle inserting elements in the middle of the tree without bumping up sequence numbers beyond the insertion point (since that will degrade the performance of insertion down to O(n)). - The use of two trees for such a data structure requires two nodes to be stored in memory per key/value pair. I'm concerned about the memory use when storing a larger number of nodes in memory. I'm sure that this problem has been tackled before. I'm open to suggestions. Thanks, --naijireja |
|
|
|
#2
|
|
|
|
|
"A.J." wrote:
> > I'm looking for a data structure for an associative array with the > following properties: > > - Maintains key/value pairs. > - Remembers the sequence order in which the keys arrive, similar to a > static array. > - Allows lookup, insertion, and deletion by both key and integer > index. > - Exhibits high performance (around O(log n) is fine) for lookups, > inserts, and deletes. You can do all that with hashlib, and have O(1) performance. Take a look at how it is used in id2id-20. These are all in standard C, and available at: <http://cbfalconer.home.att.net/download/> |
|
#3
|
|
|
|
|
On May 26, 5:23 am, "A.J." <naijir> wrote:
[..] > insertion down to O(n)). > - The use of two trees for such a data structure requires two nodes to > be stored in memory per key/value pair. I'm concerned about the > memory use when storing a larger number of nodes in memory. > > I'm sure that this problem has been tackled before. I'm open to > suggestions. > > Thanks, > --naijireja CBFalconer's hashlib is excellent, but I'm guessing from your design that you need deletions to "collapse" the array index space so there are no holes. I can't see a way this can be done in O(1). The order stats tree is a nice O(log n) solution. You pay O(log n) on insert (with respect to a hash table e.g.) in order to allow deletions to be O(log n) rather than O(n). If you implement a self-balancing order stats tree, you're there. Since order stats need full descendent counts anyway, you can use these to enforce balance rules rather than the normal couple of bits. Your problem is really a non-problem. The _order stat_ is effectively the key, and that's never stored. The nodes contain no separate key field, just the descendent count. You always insert the newest node as the last (rightmost) element. When you do an index lookup, you are walking down the tree choosing the direction - left or right - that ensures the node with your desired index is somewhere below you. The main improvement I can suggest, as CBFalconer does, is to use a hash table for the key->data map rather than a second tree. You can keep hash table fields, tree link fields, descendent counts, and data all in the same node. CBF's hashlib will support this. Roughly this scheme is implemented in the back end of http://bridgecontest.usma.edu/ , where it is used to compute a given team's world-wide standing quickly, even with millions of teams in the database. The array index itself is the standing. Teams can be disqualified, which requires deletion. I needed persistence, so implementation is over BerkeleyDB, which has an option to create B-Trees with order-stat lookup. HTH |
|
#4
|
|
|
|
|
Gene wrote:
> On May 26, 5:23 am, "A.J." <naijir> wrote: > > > > I'm looking for a data structure for an associative array with the > > following properties: > > .... snip ... > > CBFalconer's hashlib is excellent, but I'm guessing from your design > that you need deletions to "collapse" the array index space so there > are no holes. I can't see a way this can be done in O(1). The order > stats tree is a nice O(log n) solution. You pay O(log n) on insert > (with respect to a hash table e.g.) in order to allow deletions to be > O(log n) rather than O(n). Thanks. However hashlib can do O(1) deletions. The hashwalk function allows compression to a single list at any time (O(N)). The result is invalid after further insertion/deletions, but will survive searches. It also seems to be accurate, in that I have had no reports of flaws in several years. |
|
#5
|
|
|
|
|
On May 26, 2:17 pm, CBFalconer <cbfalco> wrote:
[..] > > -- > <http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt> > <[..] > <http://www.aaxnet.com/editor/edit043.html> > <http://kadaitcha.cx/vista/dogsbreakfast/index.html> > cbfalconer at maineline dot net > > -- > Posted via a free Usenet account fromhttp://www.teranews.com Thanks. But I guess I'm not clear on what you're proposing. If you add A, C, D to initially empty x then x[1]=A, x[2]=C, x[3]=D. Now delete A. You are saying that moving C and D to positions x[1] and x[2] will require O(n). Yes? With the order statistics tree, the deletion remains O(log n). That's what I was getting at. |
|
#6
|
|
|
|
|
Gene wrote:
> On May 26, 2:17 pm, CBFalconer <cbfalco> wrote: > > Thanks. But I guess I'm not clear on what you're proposing. If you > add A, C, D to initially empty x then x[1]=A, x[2]=C, x[3]=D. Now > delete A. You are saying that moving C and D to positions x[1] and > x[2] will require O(n). Yes? With the order statistics tree, the > deletion remains O(log n). That's what I was getting at. A hash table is NOT an array. Things don't normally get moved. Get it and look. It's not that big. |
|
#7
|
|
|
|
|
On May 28, 8:04 pm, CBFalconer <cbfalco> wrote:
[..] > <http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt> > <http://www.securityfocus.com/columnists/423> > <http://www.aaxnet.com/editor/edit043.html> > <http://kadaitcha.cx/vista/dogsbreakfast/index.html> > cbfalconer at maineline dot net > > -- > Posted via a free Usenet account fromhttp://www.teranews.com- Hide quoted text - > > - Show quoted text - Thanks. I guess I'm not communicating very well. I had looked at your package long before this conversation. It's nice. I have implemented similar designs myself. The OP however is asking for a data structure that can be indexed in order of insertion _with deletion_. When item K is deleted from an array of N items, items at index positions K+1 to N must be "moved down" to fill the hole to maintain the orginal order, and the new array length is N-1. With a simple array, this is an O(N) operation. With an order stats tree index, you don't need O(N). Time O(log N) per operation is enough to delete from a balanced tree -- a standard tradeoff. Inserts get more expensive than the array, O(1) to O(log N), but deletes get cheaper, O(N) to O(log N). With a hash table mapping indices to values, deleting a hash element ought to be O(1). It obviously is with yours, since you just mark items deleted in a closed table. But still leaves the "move down". I don't see how this can be accomplished with anything less than O(N) -- looking up items K+1 to N, deleting, and reinserting with new keys. Again, this is the OP's requirement, not mine. If you have a way, I'm truly interested. |
|
|
| Similar Threads | |
| Regular Expression Result as Associative Array Index Hi Everyone. $layout=ereg_replace("\[\[([a-z:0-9]+)\]\]", $dillo['\\1'], $layout); I'm trying to use the result of a Regular Expression Match as an Index for the... |
|
| How to query the index value of associative array Hello PowerShellers, How do I return the index value of an associative array? I'm referencing "Pro Windows PowerShell" by Hristo Deshev, p. 18-24 Problem 1: Array and... |
|
| finding the index name of an associative array myarray=array() myarray['a']=1 myarray['b']=1 myarray['c']=1 Is there an iterative way to find out the array index values ('a', 'b' and 'c') of myarray? |
|
| Creating Tree Structure from associative array Hi all, I have an associative array, which contains parent and child relationships. I've searched the web for creating a tree structure from this and found a few good sites... |
|
| data structure, fast lookup for recently found items? Hello Is there is any data structure that once you search for an item, it will put that item at the top. If you keep searching for that same item it will be returned... |
|
|
All times are GMT. The time now is 06:32 PM. | Privacy Policy
|