Threads Posted April 29, 2003 Posted April 29, 2003 Does anyone know of any sites that compare the use of Regular Expressions versus the Instr and Mid$ functions when extracting data? I know how to use Instr and Mid$ quite well already and I know with Regular Expressions it is possible to extract data as well (and give me more power since I can use more pattern matching), but I am curious what kind of performance hit I may take if I start using Regular Expressions instead. Any help would be appreciated :) Thanks. Quote mov ax, 13h int 10h
Moderators Robby Posted April 29, 2003 Moderators Posted April 29, 2003 I won't address the performance between the two, but I can tell you... don't use Mid, left, instr. All the string functions can be done using the new .net methods. Mid ... myString.Substring() Replace... myString.Replace() Instr ... myString.IndexOf() and many more.... Quote Visit...Bassic Software
*Experts* Nerseus Posted April 29, 2003 *Experts* Posted April 29, 2003 Out of curiosity, I created a small test program. The results were somewhat expected (Regular Expression matching is slower), but there are a number of factors to consider. First, Regular Expressions are VERY powerful. Besides doing matching, they can do validation. Also, you can create very powerful expressions much easier than you could with IndexOf and Substring. I'll make two notes about the sample code. First, I wrote the regular expression code in about 5 minutes. Writing the IndexOf and Substring took about 15 minutes. Also, my first regular expression is MUCH more robust than the IndexOf/Substring method. For instance, the expression will automatically trim off any spaces or whitespace along with weird characters. Also, the code for regular expressions is MUCH more readable since each match is named. To get the last name, I simply use: lastName = match.Groups["LastName"].Value; Using IndexOf, I had to use: = 2; 2 = smallData.IndexOf(' ', +1); lastName = smallData.Substring( +1, 2 - - 1); Without comments, it's hard to say what's going on. Which code would you rather look at a year from now? Having said that, the speed is really dependent on what you need to do. If you need to parse through a 4 gig text file, I'd go with the fastest method possible and hard-code as many settings as possible. If you're parsing a string or two, I'd go with whatever is easier to maintain as both Regular Expressions and IndexOf are going to be perceptibly the same to the user. Here are the results after running the project in Debug mode in the IDE on my machine: short Data RegEx: 828 large Data RegEx: 2578 short Data Substring: 31 large Data Substring: 94 Press ENTER to close Keep in mind this is for 100,000 iterations. For 1000 iterations, all 4 tests come in at 0ms on my machine. -Nerseusregextest.zip Quote "I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut
Threads Posted April 29, 2003 Author Posted April 29, 2003 Thank you, Nerseus. That was very very helpful. I'll use your example also to compare the regular expression -vs- other method speeds in VB 6 as well. Since the purpose of this is usually to parse HTML pages that I've downloaded into the application, the regular expressions may be much more easily maintained. When it is a time critical part, then I may return back to my old methods depending on what the tests show me. Thanks again for all your help. Quote mov ax, 13h int 10h
*Gurus* Derek Stone Posted April 29, 2003 *Gurus* Posted April 29, 2003 Let it be noted that the RegEx times will decrease by almost half once the application is running without debug symbols. Obviously the raw string manipulation is still many times faster, however. Quote Posting Guidelines
Threads Posted April 30, 2003 Author Posted April 30, 2003 Using Regular Expressions sure is easier than the other methods I was using before. I'm now able to parse the page with ease and I don't have to worry about little things changing on the page like the background color of cells in a table since I can use pattern matching. I don't notice a slowdown once it is compiled (Derek was definitely right about it being at least twice as fast). Anyway, thanks again. Quote mov ax, 13h int 10h
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.