Slow Collections (VB)

Grimfort

Regular
Joined
Mar 6, 2003
Messages
57
I have a collection in my project (its very large!), that acts as a queue
during runtime.
I add a few thousand objects to it, and later on remove them. This happens
many many times (hundreds of thousands) over a long time. I can test how
many (and for how long) loops my program has been through and after 12 hours
of running the same loop the time the loop runs for its getting longer and
longer.

I completly emptied the collection of data each time the loop starts and it
still gets slower and slower over time. If I then after hours of running
this, I set my collection to a New collection, it starts running at full
speed again. Ive tested things all over the system and know it to be the
collection slowing it down. It takes a good few hours for it to slow down
(any many many jobs) but as my software is created to run whole
infrastructures and can not slow down. It runs 4000 jobs the first hour,
then by the time you get to 12 hours later, its down to 1000 jobs and hour,
restting the collection to a new one, makes it go back up to 4000. If needs
be I will have to once an hour create a new collection and shift things
over, but I thought I would check in here first to see if any1 has seen or
heard anything like this.

Also to note, the memory is fine. Bout 200 meg constant, its designed to
grow to an adiquate size and then stay there, so thats not an issue.
 
Sometimes the garbage collector needs a little help, so if after you've removed a lot of items you make a call to GC.Collect() that might well help things along too.
 
Its just the standard collection object, Ill give both your ideas a twirl though, takes ages to test :).

The colleciton is just a holder of current actions, things get added and removed constantly, memory doesnt go up so I can only guess that objects are being unloaded properly. Even if I completely empty the collection, then start the loop again, its still slow. hmmmm....
 
Removing the items will set the objects up for destruction but the collection, as I understand it, will still maintain all the pointers it needed to maintain the collection at it's largest size.

btw. 'standard collection object'? If it's FIFO type collection, why not use the Queue? and what class are you using as I don't see a Collection class.
 
ActionsTableWaiting = New Collection()

^ one of those :) The actions are not really used as a FIFO or FILO its a random add/remove, by keyword.
 
I suppose the following:

As the collections allows searching via key there needs to be the required data for efficient searching. A balanced B*Tree or something like that. This tree does not require much space, even a regular balanced tree would just require a depth of 20 for 1 million objects.

However, after many operations the tree will probably get out of balance. This means, the searching algorithms will lose much of their efficiency, and inserting new keys may be even worse.

I suppose, that is what you are experiencing.


***

Conclusion:
Wait for .NET to handle this problem better.
OR
Give it a try with another object, something that is better suited to your needs than a collection
OR
Refresh / Renew every hour.
 
I give MS more credit than that. I don't claim to know what the underlying structure of the Collection class is (but I doubt it's a tree). Even if it is a B*Tree or one of the many variants, they are *very* mature structures that stay balanced by definition (which is why they are behind most commercial RDBMSs) -- so methinks it's highly unlikely that's the issue.
 
If you're storing key/value pairs, you could consider the Hashtable class. Any list class will likely be better than the Collection class which is only provided in VB for compatibility :)
 
Surely some sort of tree would provide for the most efficient ways to search a collection via keys. And with B*Trees being mature structures, this does not necessarily imply that MS has implemented them properly.

Whatever means MS have chosen to allow efficient "key searching" in collections ... it appears to me that this very way is a possible and plausible cause for the delays - because it consumes more and more calculating power.
 
The problem with that argument is that if the key searching algorithm was the pig, then the rate of decrease in performance would be directly related to the rate of growth of the data and the same for when the data decreases. He's not seeing that. He takes away all the elements re-adds some and it is this process that seems to be causing the performance issues.

C'mon, I think MS has figured out by now how to implement a B*Tree properly. That's surely kids play to them.
 
Ahhh all the theorys are coming out now :) this is good stuff.
As it takes so long to test this thing, Ill try them out over the weekend on a few different machines. Ill post any info I find.
 
After a lot of messing about, Ive replaced my Collection with a Hash table. It does have downsides, like the fact that you have to say

For each Blah in HashTable.Values

and that took me ages to figure out :), also the fact that an error is not thrown if you pull out an entry by key value that does not exist. Any thing else I should know ?
 
Back
Top