FlyBoy Posted September 11, 2004 Posted September 11, 2004 i have the following html\txt based file. (only one line from it) <TR ALIGN=CENTER VALIGN=CENTER><TD><FONT COLOR='#FFCC33'> 11:40</FONT></TD> <TD><FONT COLOR='#FFCC33'>1.54</FONT></TD> (its one line...not two in the source file) i want to make a pattern which give this : 11:40 and this : 1.54 i know how to do it,but in two seperated patterns...my question is,how to do it in one pattern. i have the pattern which take this:11:40 out...here it is: ("(\>\s\d{2}):(\d{2})") but how to take both out in one pattern?????? 10x in advance. Quote
John_0025 Posted September 15, 2004 Posted September 15, 2004 This can be done using groups. This allows you to name a part of the regular expression, as you would a variable, and then retrieve only this part by name. It is a very powerful feature of regular expressions. This may be a bit of a jump from what you were doing. So if you have an problems I'd be happy to help. Your text. <TR ALIGN=CENTER VALIGN=CENTER><TD><FONT COLOR='#FFCC33'> 11:40</FONT></TD> <TD><FONT COLOR='#FFCC33'>1.54</FONT></TD> This expression will match the whole of the text. I've labeled the bits you are interested in 'itemone' and 'itemtwo' \<TR ALIGN=CENTER VALIGN=CENTER\>\<TD\>\<FONT COLOR='[a-zA-Z0-9#]{1,}'\>(?<itemone>[0-9:]{1,})\<\/FONT\>\<\/TD\> \<TD\>\<FONT COLOR='[a-zA-Z0-9#]{1,}'\>(?<itemtwo>[0-9.]{1,})\<\/FONT\>\<\/TD\> To get the part of the regular expression you are interested in you can use some code like this. Not great but gives you the idea. :) Public Function ReturnValues(ByVal RegularExpression As String, ByVal mytext As String, ByVal item As String) As String() Dim myRegExp As New Regex(RegularExpression, RegexOptions.IgnoreCase) Dim Matchs As MatchCollection = myRegExp.Matches(mytext) Dim currentMatch As Match Dim matchedValues As New ArrayList For Each currentMatch In Matchs Dim myCaptures As CaptureCollection = currentMatch.Groups(item).Captures Dim currentItem As Capture For Each currentItem In myCaptures matchedValues.Add(currentItem.Value) Next Next Return CType(matchedValues.ToArray(GetType(String)), String()) End Function and call it by Dim myPattern As String = "\<TR ALIGN=CENTER VALIGN=CENTER\>\<TD\>\<FONT COLOR='[a-zA-Z0-9#]{1,}'\>(?<itemone>[0-9:]{1,})\<\/FONT\>\<\/TD\> \<TD\>\<FONT COLOR='[a-zA-Z0-9#]{1,}'\>(?<itemtwo>[0-9.]{1,})\<\/FONT\>\<\/TD\>" Dim myText As String = "<TR ALIGN=CENTER VALIGN=CENTER><TD><FONT COLOR='#FFCC33'>11:40</FONT></TD> <TD><FONT COLOR='#FFCC33'>1.54</FONT></TD>" Dim oneValues() As String = ReturnValues(myPattern, myText, "itemone") Dim twoValues() As String = ReturnValues(myPattern, myText, "itemtwo") In this example the oneValues array will contain only "11:40" and the twoValues "1.54" but if there were more lines matching the pattern, i.e. a table, then you'd get a list of all the matching numbers in that column. This is what I spend my time doing, using regular expressions to read tables of data and do calculations on it :-) Quote
FlyBoy Posted September 15, 2004 Author Posted September 15, 2004 (edited) This can be done using groups. This allows you to name a part of the regular expression, as you would a variable, and then retrieve only this part by name. It is a very powerful feature of regular expressions. This may be a bit of a jump from what you were doing. So if you have an problems I'd be happy to help. Your text. <TR ALIGN=CENTER VALIGN=CENTER><TD><FONT COLOR='#FFCC33'> 11:40</FONT></TD> <TD><FONT COLOR='#FFCC33'>1.54</FONT></TD> This expression will match the whole of the text. I've labeled the bits you are interested in 'itemone' and 'itemtwo' \<TR ALIGN=CENTER VALIGN=CENTER\>\<TD\>\<FONT COLOR='[a-zA-Z0-9#]{1,}'\>(?<itemone>[0-9:]{1,})\<\/FONT\>\<\/TD\> \<TD\>\<FONT COLOR='[a-zA-Z0-9#]{1,}'\>(?<itemtwo>[0-9.]{1,})\<\/FONT\>\<\/TD\> To get the part of the regular expression you are interested in you can use some code like this. Not great but gives you the idea. :) Public Function ReturnValues(ByVal RegularExpression As String, ByVal mytext As String, ByVal item As String) As String() Dim myRegExp As New Regex(RegularExpression, RegexOptions.IgnoreCase) Dim Matchs As MatchCollection = myRegExp.Matches(mytext) Dim currentMatch As Match Dim matchedValues As New ArrayList For Each currentMatch In Matchs Dim myCaptures As CaptureCollection = currentMatch.Groups(item).Captures Dim currentItem As Capture For Each currentItem In myCaptures matchedValues.Add(currentItem.Value) Next Next Return CType(matchedValues.ToArray(GetType(String)), String()) End Function and call it by Dim myPattern As String = "\<TR ALIGN=CENTER VALIGN=CENTER\>\<TD\>\<FONT COLOR='[a-zA-Z0-9#]{1,}'\>(?<itemone>[0-9:]{1,})\<\/FONT\>\<\/TD\> \<TD\>\<FONT COLOR='[a-zA-Z0-9#]{1,}'\>(?<itemtwo>[0-9.]{1,})\<\/FONT\>\<\/TD\>" Dim myText As String = "<TR ALIGN=CENTER VALIGN=CENTER><TD><FONT COLOR='#FFCC33'>11:40</FONT></TD> <TD><FONT COLOR='#FFCC33'>1.54</FONT></TD>" Dim oneValues() As String = ReturnValues(myPattern, myText, "itemone") Dim twoValues() As String = ReturnValues(myPattern, myText, "itemtwo") In this example the oneValues array will contain only "11:40" and the twoValues "1.54" but if there were more lines matching the pattern, i.e. a table, then you'd get a list of all the matching numbers in that column. This is what I spend my time doing, using regular expressions to read tables of data and do calculations on it :-) thanks for that!!! i've tried something like that: Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click Dim fw As New StreamReader("d:\wave2.txt") Dim str As String = "" Dim reg As New Regex("(\<TR ALIGN=CENTER VALIGN=CENTER\>\<TD\>\<(?<name>(\noop)) COLOR='#FFCC33'\> (?<time>\d{3}\:\d{2})\<\/FONT\>\<\/TD\>)") str = fw.ReadToEnd Dim match As MatchCollection match = reg.Matches(str) Dim mt As Match For Each mt In match MsgBox(mt.Groups("time").Value.ToString) Next End Sub End Class and it doesnt displays anything...something wrong???? bahh...i have to know this "grouping" thing Edited September 15, 2004 by FlyBoy Quote
John_0025 Posted September 15, 2004 Posted September 15, 2004 I think it isn't returning anything. It returns the whole string because you have this: MsgBox(str) Your expression is missing an '*' Try: Dim expres As New Regex("(.)(?<tm>\d{2}:\d{2})(.*)(?<ht>\d{1}\.\d{2})(.)") Quote
FlyBoy Posted September 15, 2004 Author Posted September 15, 2004 10x again.!!!! :cool: ok i figured out what is going on...when i use more then one grouping in one pattern its not returning anything :( :( for e.g: Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click Dim fw As New StreamReader("d:\wave2.txt") Dim str As String = "" Dim reg As New Regex("(\<TR ALIGN=CENTER VALIGN=CENTER\>\<TD\>\<FONT COLOR='#FFCC33'\> (?<time>\d{3}\:\d{2})\<\/FONT\>\<\/TD\>\<TD\>\<FONT COLOR='#FFCC33'\>(?<ht>\d{1}\.\d{2})\<\/FONT\>\<\/TD\>)") str = fw.ReadToEnd Dim match As MatchCollection match = reg.Matches(str) Dim mt As Match For Each mt In match MsgBox(mt.Groups("time").Value.ToString) Next End Sub End Class doesnt returning time. but when i remove "<ht>" group back to its default,i get the time string\group to be displayed. and its not that my "<ht>" group has any syntax mistake...it doesnt have any. (?<ht>\d{1}\.\d{2}) = suppose to match 1.23. what is wrong with it??? Quote
FlyBoy Posted September 16, 2004 Author Posted September 16, 2004 Ok problem Solved!!! thank for the help!!!! many many thanks! Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.