Regular Expression Help please

burak

Centurion
Joined
Jun 17, 2003
Messages
127
Hi,

I am trying to parse html tags.

For tags with quotes like

<input type="submit" value="order bed">

I am using

\s*=\s*\"*\'*[^"'>]*

and for tags without any quotes

<td align=right SIZE=5 >

I am using

\s*=\s*[^\s]*


Is there a way to combine the two expressions? When I tried to combine them like follows,

\s*=\s*\"*\'*[^"'s>]*

I did not get good results

Thank you,

Burak
 
the pattern I'd use would be "<\s*(?<TagName>[\w]+)\s*((?<Attrib>[^=\s]+)\s*=\s*(?<AttribValue>("[^"]+"|[^\s>]+))\s*)+"

If you need help about how to use it - just ask!

Andreas
 
Hamburger1984 said:
the pattern I'd use would be "<\s*(?<TagName>[\w]+)\s*((?<Attrib>[^=\s]+)\s*=\s*(?<AttribValue>("[^"]+"|[^\s>]+))\s*)+"

If you need help about how to use it - just ask!

Andreas

Hello Andreas,

Could you take a look at how I am using the reg. exp. you wrote above, in my code and let me know if I am onthe right track.

Thanks,

Burak

------------------------------------------

Dim strIn, strPatrn, RetStr As String

' escape the quotes
strIn = "<input type=""submit"" value=""order bed"" size=5>"

' escape the quotes
' what do I put in for AttibValue???
strPatrn = "<\s*(?input[\w]+)\s*((?type[^=\s]+)\s*=\s*(?<AttribValue>(\""[^\""]+"|[^\s>]+))\s*)+\""

Dim regEx As New Regex(strPatrn, RegexOptions.IgnoreCase)

Dim Matches As MatchCollection = regEx.Matches(strIn)
Dim Match As Match
Dim RetStr As String
For Each Match In Matches2
RetStr = Match.Value
Exit For
Next
 
Back
Top