Jump to content
Xtreme .Net Talk

Recommended Posts

Posted
Does anyone know of any sites that compare the use of Regular Expressions versus the Instr and Mid$ functions when extracting data? I know how to use Instr and Mid$ quite well already and I know with Regular Expressions it is possible to extract data as well (and give me more power since I can use more pattern matching), but I am curious what kind of performance hit I may take if I start using Regular Expressions instead. Any help would be appreciated :) Thanks.

mov ax, 13h

int 10h

  • Moderators
Posted

I won't address the performance between the two, but I can tell you... don't use Mid, left, instr.

 

All the string functions can be done using the new .net methods.

 

Mid ... myString.Substring()

Replace... myString.Replace()

Instr ... myString.IndexOf()

and many more....

Visit...Bassic Software
  • *Experts*
Posted

Out of curiosity, I created a small test program. The results were somewhat expected (Regular Expression matching is slower), but there are a number of factors to consider.

 

First, Regular Expressions are VERY powerful. Besides doing matching, they can do validation. Also, you can create very powerful expressions much easier than you could with IndexOf and Substring.

 

I'll make two notes about the sample code. First, I wrote the regular expression code in about 5 minutes. Writing the IndexOf and Substring took about 15 minutes. Also, my first regular expression is MUCH more robust than the IndexOf/Substring method. For instance, the expression will automatically trim off any spaces or whitespace along with weird characters.

 

Also, the code for regular expressions is MUCH more readable since each match is named. To get the last name, I simply use:

lastName = match.Groups["LastName"].Value;

Using IndexOf, I had to use:

   =    2;
  2 = smallData.IndexOf(' ',    +1);
lastName = smallData.Substring(   +1,    2 -     - 1);

Without comments, it's hard to say what's going on. Which code would you rather look at a year from now?

 

Having said that, the speed is really dependent on what you need to do. If you need to parse through a 4 gig text file, I'd go with the fastest method possible and hard-code as many settings as possible. If you're parsing a string or two, I'd go with whatever is easier to maintain as both Regular Expressions and IndexOf are going to be perceptibly the same to the user.

 

Here are the results after running the project in Debug mode in the IDE on my machine:

short Data RegEx: 828
large Data RegEx: 2578
short Data Substring: 31
large Data Substring: 94

Press ENTER to close

 

Keep in mind this is for 100,000 iterations. For 1000 iterations, all 4 tests come in at 0ms on my machine.

 

-Nerseus

regextest.zip

"I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut
Posted

Thank you, Nerseus. That was very very helpful. I'll use your example also to compare the regular expression -vs- other method speeds in VB 6 as well.

 

Since the purpose of this is usually to parse HTML pages that I've downloaded into the application, the regular expressions may be much more easily maintained. When it is a time critical part, then I may return back to my old methods depending on what the tests show me.

 

Thanks again for all your help.

mov ax, 13h

int 10h

Posted
Using Regular Expressions sure is easier than the other methods I was using before. I'm now able to parse the page with ease and I don't have to worry about little things changing on the page like the background color of cells in a table since I can use pattern matching. I don't notice a slowdown once it is compiled (Derek was definitely right about it being at least twice as fast). Anyway, thanks again.

mov ax, 13h

int 10h

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...