Jump to content
Xtreme .Net Talk

Recommended Posts

Posted
I have a VB.NET application with a string variable that contains a mix of text and numbers. The text is always at the beginning of the string, however it is possible that the string can be of different lengths. I want to remove the text only portion on the string leaving me with only the numbers. Not quite sure how to accomplish this. Any help would be greatly appreciated.
Posted

One way to do this would be to use a regular expression - or more specifically use the Replace method of a regular expression object. Here's a C# example...the VB.Net syntax would be very similar:

 

Regex oRegex = new Regex(@"\D");

string sTest = "abcXYZ123abc";

sTest = oRegex.Replace(sTest, "");

MessageBox.Show(sTest);  //  displays  "123"

 

The pattern for the regex is setup as "\D" (in C# that @ before the string literal tells C# to take the string contents literally, since backslash in an escape character otherwise - I don't think you'd need that @ in VB.Net), which indicates to match any non-digit character. That pattern is applied to the target string when the Replace method is invoked, and in this case each non-digit occurance is being replaced with an empty string, so you end up with a string stripped of all non-digit characters.

 

For more help with regular expressions you might visit that particular forum here or check out 'regular expressions, syntax' (and other topics) in MSDN.

 

Good luck,

Paul

  • Leaders
Posted

I wouldn't jump into RegEx. RegEx is certainly useful, but sometimes it is overkill. It might even produce less code, but the runtime cost can be much much greater.

 

If you have a whole lot of strings, you should probably use RegEx, but it doesn't sound like you have that many strings.

 

If you have a single or only a few strings, I would recommend doing something like the follwing:

 

Public Structure SplitString
   Text As String
   Number As Integer 'Make this a double / long if you need to
End Structure
   
Public Function SplitMyString(Text As String) As SplitString
   Dim Result As SplitString
   Dim FirstNumericChar As Integer
   Dim Chars() As Char = Text.ToCharArray()
'
   Do While Chars(FirstNumericChar -1).IsNumeric And FirstNumericChar  > 0
       FirstNumericChar  -= 1
   Loop
   
   Result.Number = Integer.Parse(Text.SubString(FirstNumericChar))
   Result.Text = Text.SubString(0, FirstNumericChar - 1)
   
   Return Result
End Function

 

I haven't tested it, but you get the idea.

[sIGPIC]e[/sIGPIC]
Posted
I wouldn't jump into RegEx.

 

I'm not a regex fanatic but I think the use is appropriate here. Why use 10 lines of code to handle a specific situation when 3 lines of code that are much more flexible do the job? In this case the regex use is extremely simple, it's not like a highly complex pattern is in use or isolating substrings is occurring.

 

In your sample you create a structure and a special function that converts text to char, loops, and processes - and you're saying using regex is overkill? Ok...

 

but the runtime cost can be much much greater.

 

On any machine that can run .Net reasonably well I seriously doubt the performance hit of using regex is going to be an issue.

 

I haven't tested it' date=' but you get the idea.[/quote']

 

Well, I did test my code by running it and it works flawlessly.

 

I love doing custom parsing but for a situation like this I see the regex solution as much simpler, cleaner (and easier to read) than converting, looping, and processing. But I guess that's just me. I suppose if you're completely unfamiliar with regex it could be tough to read, even as simple as it is.

 

Cheers,

Paul

  • Leaders
Posted
Still not clear how to accomplish this task' date=' could you provide me with a VB example?[/quote']

There are so many .NET examples on the web written in C# that you will almost certainly need to learn the syntax.

 

Here is the example posted by PWNettle in VB syntax:

Dim oRegex as Regex = new Regex("\D")
Dim sTest As String = "abcXYZ123abc"
sTest = oRegex.Replace(sTest, "")
MessageBox.Show(sTest)

Make sure you use Imports System.Text.RegularExpressions

 

The framework SDK is a a great resource as well. Just look up RegEx and you will get many examples including an example of the Replace method.

 

For your paticular situation you may be able to get away with the insanely simple val() function you get with VB. Here is an example of it:

Dim sTest As String = "abcXYZ123abc"
MessageBox.Show(val(sTest).ToString())

This will also display "123" in a MessageBox.

"These Patriot playoff wins are like Ray Charles songs, Nantucket sunsets, and hot fudge sundaes. Each one is better than the last." - Dan Shaughnessy
  • Leaders
Posted

PWNettle, you make it sound like I'm calling RegEx the devil (or making a personal attack on you).

 

Originally Posted by marble_eater

I haven't tested it, but you get the idea.

 

Well, I did test my code by running it and it works flawlessly.[/Quote]

 

I wasn't providing copy and paste code, but illustrating a method to parse a simple string.

 

In your sample you create a structure and a special function that converts text to char' date=' loops, and processes - and you're saying using regex is overkill? Ok...[/Quote']

 

Yes, a structure with not one, but two whole members. And a special function that converts text to char? That's part of the string class. Loops and processes, oh my!

 

What I said was "RegEx might be overkill," not "RegEx is the devil, eats all your RAM, and freezes the CPU, and once you use is you will never be the same."

 

I'm just trying to give options and different ideas for programmers. Yes, my code might have been three times as big, but if someone were to put just a little effort into optimizing and writing the extra six lines of code, it will take up less memory and cpu. Sometimes less is more. Will Regex work here? Sure! Will it really make much of a difference? Probably not. But maybe if the project becomes more advanced, or bigger, having read my post, the developer decide that my solution may be an applicable and effective optimization.

[sIGPIC]e[/sIGPIC]
Posted

OK, how about we test it out?

 

Here's the test app:

namespace TestIt
{
   using System;
   using System.Diagnostics;
   using System.Text.RegularExpressions;

   class Program
   {
       static void Main()
       {
           string test;
           int iterations = 10000;
           Stopwatch stopWatch = new Stopwatch();

           while ((test = Console.ReadLine()) != "")
           {
               int iterator = iterations;
               stopWatch.Start();
               while (iterator-- != 0) SplitWithStruct(test);
               stopWatch.Stop();
               Console.WriteLine(stopWatch.ElapsedTicks);
               iterator = iterations;
               stopWatch.Reset();
               stopWatch.Start();
               while (iterator-- != 0) SplitWithRegex(test);
               stopWatch.Stop();
               Console.WriteLine(stopWatch.ElapsedTicks);
               stopWatch.Reset();
           }
       }

       static Regex regex = new Regex(@"\D");
       static string SplitWithRegex(string text)
       {
           return regex.Replace(text, "");
       }

       struct SplitString
       {
           public string Text;
           public int Number;
       }

       static SplitString SplitWithStruct(string text)
       {
           int firstNumericChar = text.Length;
           char[] chars = text.ToCharArray();

           while (Char.IsNumber(chars[--firstNumericChar]) && (firstNumericChar > 0));

           SplitString result;
           result.Number = int.Parse(text.Substring(firstNumericChar + 1));
           result.Text = text.Substring(0, firstNumericChar);
           return result;
       }
   }
}

With minimal input (1 char and 1 digit) Regex is ~4.5 times faster. As input size doubles the Struct method run time nearly doubles, but Regex time increases at a much slower rate.

 

Why? There's two reasons. One is that regex is highly optimized. For the second reason let's look at the MSIL:

 

.method private hidebysig static string  SplitWithRegex(string text) cil managed
{
 // Code size       17 (0x11)
 .maxstack  8
 IL_0000:  ldsfld     class [system]System.Text.RegularExpressions.Regex ConsoleApplication1.Program::regex
 IL_0005:  ldarg.0
 IL_0006:  ldstr      ""
 IL_000b:  callvirt   instance string [system]System.Text.RegularExpressions.Regex::Replace(string,
                                                                                            string)
 IL_0010:  ret
} // end of method Program::SplitWithRegex

.method private hidebysig static valuetype ConsoleApplication1.Program/SplitString 
       SplitWithStruct(string text) cil managed
{
 // Code size       70 (0x46)
 .maxstack  4
 .locals init (int32 V_0,
          char[] V_1,
          valuetype ConsoleApplication1.Program/SplitString V_2)
 IL_0000:  ldarg.0
 IL_0001:  callvirt   instance int32 [mscorlib]System.String::get_Length()
 IL_0006:  stloc.0
 IL_0007:  ldarg.0
 IL_0008:  callvirt   instance char[] [mscorlib]System.String::ToCharArray()
 IL_000d:  stloc.1
 IL_000e:  ldloc.1
 IL_000f:  ldloc.0
 IL_0010:  ldc.i4.1
 IL_0011:  sub
 IL_0012:  dup
 IL_0013:  stloc.0
 IL_0014:  ldelem.u2
 IL_0015:  call       bool [mscorlib]System.Char::IsNumber(char)
 IL_001a:  brfalse.s  IL_0020
 IL_001c:  ldloc.0
 IL_001d:  ldc.i4.0
 IL_001e:  bgt.s      IL_000e
 IL_0020:  ldloca.s   V_2
 IL_0022:  ldarg.0
 IL_0023:  ldloc.0
 IL_0024:  ldc.i4.1
 IL_0025:  add
 IL_0026:  callvirt   instance string [mscorlib]System.String::Substring(int32)
 IL_002b:  call       int32 [mscorlib]System.Int32::Parse(string)
 IL_0030:  stfld      int32 ConsoleApplication1.Program/SplitString::Number
 IL_0035:  ldloca.s   V_2
 IL_0037:  ldarg.0
 IL_0038:  ldc.i4.0
 IL_0039:  ldloc.0
 IL_003a:  callvirt   instance string [mscorlib]System.String::Substring(int32,
                                                                         int32)
 IL_003f:  stfld      string ConsoleApplication1.Program/SplitString::Text
 IL_0044:  ldloc.2
 IL_0045:  ret
} // end of method Program::SplitWithStruct

See all those calvirts? They add overhead that Regex doesn't have.

 

So it appears that Regex is much much faster. Could you write a faster function? Maybe, but it would be very difficult and the gains would be minimal.

 

BTW, I'm not a fan of Regex myself. The Regex filter strings are nearly impossible to read by the uninitiated.

"Who is John Galt?"
Posted
most of the time, I prefer easy to maintain code (achieving the same end result in less lines of code). Maybe the op wants that too, even if it means a decrease in speed at rtuntime.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...