vb.net regular expression question

burak

Centurion
Joined
Jun 17, 2003
Messages
127
Hello,

I am working on a .net application that will extract all the hyperlinks from a web page.

Here is my code

--------

href_pattern = "<a\s+(.*?)href\s*=\s*""*'*\s*(?<hrf>.+?)""*'*\s*>\s*(?<content>.*?)</a>"

Dim Regex As New Regex(href_pattern, RegexOptions.IgnoreCase)

Dim mt As Match

'strng is string which I read from a web page
mt = Regex.Match(strng)

Dim i As Integer

i = 0

While mt.Success

Dim c As Capture

For Each c In mt.Captures

arr_href(i) = Trim(mt.Result("${hrf}"))

arr_text(i) = mt.Result("${content}")

Next

i = i + 1

mt = mt.NextMatch()

-------

I sometimes get the hyperlinks urls rught but other times "target=..." gets returned along with the href url.

Is there a better regular expression to parse the url and description of a hyperlink?

Thank you,

Burak
 
Back
Top