String lookup / dictionary

Mondeo

Centurion
Joined
Nov 10, 2006
Messages
128
Location
Sunny Lancashire
Hi

I need a function to check if a string contains any 1 of 800 different values, if it does I need to return the matched value.

I just wondered what collection I should use for best performance, i'm assuming I should use code similar to the following

Function Check(byval strInp as string) as string
For each s as string in mycollection
if strinp.contains(s) Then
return s
End If
Next
End Function

Is this the best way?

Thanks
 
How about using hash tables?

Judging from your code. It's like you're planning to use arrays as your collection. I don't think that would be wise since you have to iterate everytime you need to check your collection.

CMIIW.
 
Amir, how could one check to see if a string contains one of 800 values without iterating through those values? Not that it can't be done, but is there an algorithm or technique you would propose?
 
You could probably apply a tree structure to great advantage here. You would progress down a tree until you've reach a leaf and then return true if everything matched up to the leaf.

Example:

root of tree = ""
level 1 = a b c
level 2a = m s t //yields substrings as, at, and am -- all are leaf nodes
level 2b = e i //yields substring be -- e is a leaf
level 2c = a
level 3b = g //yields substring big -- g is a leaf attached to i
level 3c = n p //yields substrings can and cap -- both n and p are leaf nodes

An attempt at an ascii art representation:

Code:
                  ""
             /    |     \
           a      b      c
         / | \    | \      \
        m s  t   e   i     a
                      |    | \
                      g    n  p
This should give you speed somewhere in the neighborhood of an O(log n) lookup versus an of O(n).

So if I have the string "tucan" an algorithm might go something like:

starting with t from "tucan" and skipping the root node, "".
do I have a 't' at level 1 in my tree? No, go to 'u'
do I have a 'u' at level 1 in my tree? No, go to 'c'
do I have a 'c' at level 1 in my tree? Yes, go to level 2 for 'c', go to 'a'
do I have a 'a' at level 2 letter c in my tree? yes, go to level 3 for 'a', go to 'n'
do I have a 'n' at level 3 letter a in my tree? yes, it's a leaf, return substring "can".

It looks like you might be able to get this implementation for free by using a Generic sorted list. In this case your key would be the string and the value would be null or "". You might also be able to find an implementation of a tree data structure somewhere on the web.
 
Amir, how could one check to see if a string contains one of 800 values without iterating through those values? Not that it can't be done, but is there an algorithm or technique you would propose?

I was thinking of proposing hash tables to handle the problem. But I guess I got it wrong. I thought the problem was checking the evaluated string to the collection. But I guess it goes the opposite. Mondeo was asking how to check each item in the collection to the string. :D

My deepest apology. :D
 
Tries

You might also be able to find an implementation of a tree data structure somewhere on the web.

Excellent suggestion. This sort of data structure is called a trie (usually pronounced try). It would certainly suit this kind of application, but the implementation may need to be coded from scratch, although that should not pose too much of a problem. Properly implemented, it will perform considerably faster than a simple iterative approach. You can find out more about tries here (Wikipedia).

Mondeo, one thing you neglect to mention is what should happen if the string contains multiple keywords, or if some keywords could contain others (eg here and there), as this will affect the implementation and perhaps the efficiency of the algorithm.

;)
 
Back
Top