Jump to content
Xtreme .Net Talk

Recommended Posts

Posted

i have the following html\txt based file. (only one line from it)

<TR ALIGN=CENTER VALIGN=CENTER><TD><FONT COLOR='#FFCC33'> 11:40</FONT></TD> <TD><FONT COLOR='#FFCC33'>1.54</FONT></TD>

 

(its one line...not two in the source file)

i want to make a pattern which give this : 11:40 and this : 1.54

 

i know how to do it,but in two seperated patterns...my question is,how to do it in one pattern.

i have the pattern which take this:11:40 out...here it is:

("(\>\s\d{2}):(\d{2})")

but how to take both out in one pattern??????

 

10x in advance.

Posted

This can be done using groups. This allows you to name a part of the regular expression, as you would a variable, and then retrieve only this part by name. It is a very powerful feature of regular expressions. This may be a bit of a jump from what you were doing. So if you have an problems I'd be happy to help.

 

Your text.

<TR ALIGN=CENTER VALIGN=CENTER><TD><FONT COLOR='#FFCC33'> 11:40</FONT></TD> <TD><FONT COLOR='#FFCC33'>1.54</FONT></TD>

 

This expression will match the whole of the text. I've labeled the bits you are interested in 'itemone' and 'itemtwo'

\<TR ALIGN=CENTER VALIGN=CENTER\>\<TD\>\<FONT COLOR='[a-zA-Z0-9#]{1,}'\>(?<itemone>[0-9:]{1,})\<\/FONT\>\<\/TD\> \<TD\>\<FONT COLOR='[a-zA-Z0-9#]{1,}'\>(?<itemtwo>[0-9.]{1,})\<\/FONT\>\<\/TD\>

 

To get the part of the regular expression you are interested in you can use some code like this. Not great but gives you the idea. :)

 

   Public Function ReturnValues(ByVal RegularExpression As String, ByVal mytext As String, ByVal item As String) As String()
       Dim myRegExp As New Regex(RegularExpression, RegexOptions.IgnoreCase)
       Dim Matchs As MatchCollection = myRegExp.Matches(mytext)
       Dim currentMatch As Match

       Dim matchedValues As New ArrayList


       For Each currentMatch In Matchs
           Dim myCaptures As CaptureCollection = currentMatch.Groups(item).Captures
           Dim currentItem As Capture
           For Each currentItem In myCaptures
               matchedValues.Add(currentItem.Value)
           Next

       Next

       Return CType(matchedValues.ToArray(GetType(String)), String())
   End Function

 

and call it by

       Dim myPattern As String = "\<TR ALIGN=CENTER VALIGN=CENTER\>\<TD\>\<FONT COLOR='[a-zA-Z0-9#]{1,}'\>(?<itemone>[0-9:]{1,})\<\/FONT\>\<\/TD\> \<TD\>\<FONT COLOR='[a-zA-Z0-9#]{1,}'\>(?<itemtwo>[0-9.]{1,})\<\/FONT\>\<\/TD\>"
       Dim myText As String = "<TR ALIGN=CENTER VALIGN=CENTER><TD><FONT COLOR='#FFCC33'>11:40</FONT></TD> <TD><FONT COLOR='#FFCC33'>1.54</FONT></TD>"
       Dim oneValues() As String = ReturnValues(myPattern, myText, "itemone")
       Dim twoValues() As String = ReturnValues(myPattern, myText, "itemtwo")

 

In this example the oneValues array will contain only "11:40" and the twoValues "1.54" but if there were more lines matching the pattern, i.e. a table, then you'd get a list of all the matching numbers in that column.

 

This is what I spend my time doing, using regular expressions to read tables of data and do calculations on it :-)

Posted (edited)
This can be done using groups. This allows you to name a part of the regular expression, as you would a variable, and then retrieve only this part by name. It is a very powerful feature of regular expressions. This may be a bit of a jump from what you were doing. So if you have an problems I'd be happy to help.

 

Your text.

<TR ALIGN=CENTER VALIGN=CENTER><TD><FONT COLOR='#FFCC33'> 11:40</FONT></TD> <TD><FONT COLOR='#FFCC33'>1.54</FONT></TD>

 

This expression will match the whole of the text. I've labeled the bits you are interested in 'itemone' and 'itemtwo'

\<TR ALIGN=CENTER VALIGN=CENTER\>\<TD\>\<FONT COLOR='[a-zA-Z0-9#]{1,}'\>(?<itemone>[0-9:]{1,})\<\/FONT\>\<\/TD\> \<TD\>\<FONT COLOR='[a-zA-Z0-9#]{1,}'\>(?<itemtwo>[0-9.]{1,})\<\/FONT\>\<\/TD\>

 

To get the part of the regular expression you are interested in you can use some code like this. Not great but gives you the idea. :)

 

   Public Function ReturnValues(ByVal RegularExpression As String, ByVal mytext As String, ByVal item As String) As String()
       Dim myRegExp As New Regex(RegularExpression, RegexOptions.IgnoreCase)
       Dim Matchs As MatchCollection = myRegExp.Matches(mytext)
       Dim currentMatch As Match

       Dim matchedValues As New ArrayList


       For Each currentMatch In Matchs
           Dim myCaptures As CaptureCollection = currentMatch.Groups(item).Captures
           Dim currentItem As Capture
           For Each currentItem In myCaptures
               matchedValues.Add(currentItem.Value)
           Next

       Next

       Return CType(matchedValues.ToArray(GetType(String)), String())
   End Function

 

and call it by

       Dim myPattern As String = "\<TR ALIGN=CENTER VALIGN=CENTER\>\<TD\>\<FONT COLOR='[a-zA-Z0-9#]{1,}'\>(?<itemone>[0-9:]{1,})\<\/FONT\>\<\/TD\> \<TD\>\<FONT COLOR='[a-zA-Z0-9#]{1,}'\>(?<itemtwo>[0-9.]{1,})\<\/FONT\>\<\/TD\>"
       Dim myText As String = "<TR ALIGN=CENTER VALIGN=CENTER><TD><FONT COLOR='#FFCC33'>11:40</FONT></TD> <TD><FONT COLOR='#FFCC33'>1.54</FONT></TD>"
       Dim oneValues() As String = ReturnValues(myPattern, myText, "itemone")
       Dim twoValues() As String = ReturnValues(myPattern, myText, "itemtwo")

 

In this example the oneValues array will contain only "11:40" and the twoValues "1.54" but if there were more lines matching the pattern, i.e. a table, then you'd get a list of all the matching numbers in that column.

 

This is what I spend my time doing, using regular expressions to read tables of data and do calculations on it :-)

 

thanks for that!!! i've tried something like that:

 

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

Dim fw As New StreamReader("d:\wave2.txt")

Dim str As String = ""

Dim reg As New Regex("(\<TR ALIGN=CENTER VALIGN=CENTER\>\<TD\>\<(?<name>(\noop)) COLOR='#FFCC33'\> (?<time>\d{3}\:\d{2})\<\/FONT\>\<\/TD\>)")

 

 

str = fw.ReadToEnd

Dim match As MatchCollection

match = reg.Matches(str)

Dim mt As Match

For Each mt In match

MsgBox(mt.Groups("time").Value.ToString)

Next

 

End Sub

End Class

 

and it doesnt displays anything...something wrong???? bahh...i have to know this "grouping" thing

Edited by FlyBoy
Posted

I think it isn't returning anything. It returns the whole string because you have this:

MsgBox(str)

 

Your expression is missing an '*'

Try:

Dim expres As New Regex("(.)(?<tm>\d{2}:\d{2})(.*)(?<ht>\d{1}\.\d{2})(.)")

Posted

10x again.!!!! :cool:

 

ok i figured out what is going on...when i use more then one grouping in one pattern its not returning anything :( :(

for e.g:

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

Dim fw As New StreamReader("d:\wave2.txt")

Dim str As String = ""

Dim reg As New Regex("(\<TR ALIGN=CENTER VALIGN=CENTER\>\<TD\>\<FONT COLOR='#FFCC33'\> (?<time>\d{3}\:\d{2})\<\/FONT\>\<\/TD\>\<TD\>\<FONT COLOR='#FFCC33'\>(?<ht>\d{1}\.\d{2})\<\/FONT\>\<\/TD\>)")

 

 

str = fw.ReadToEnd

Dim match As MatchCollection

match = reg.Matches(str)

Dim mt As Match

For Each mt In match

MsgBox(mt.Groups("time").Value.ToString)

Next

 

End Sub

End Class

 

doesnt returning time. but when i remove "<ht>" group back to its default,i get the time string\group to be displayed.

and its not that my "<ht>" group has any syntax mistake...it doesnt have any.

(?<ht>\d{1}\.\d{2}) = suppose to match 1.23.

what is wrong with it???

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...