Jump to content
Xtreme .Net Talk

Recommended Posts

Posted

Hello,

 

I am using visual basic 2005. I found on the web the following function that extracts HTML from webpages. It is very useful but unfortunately it does not work with redirected pages. That is, when I put in it a URL of a redirect page it gives me nothing or error. I added to it ".AllowAutoRedirect = True" but still it did not work. I wonder how to make it work for redirected pages.

 

I appreciate the help.

Public Function GetPageHTML(ByVal URL As String, _
     Optional ByVal TimeoutSeconds As Integer = 10) _
    As String
       ' Retrieves the HTML from the specified URL,
       ' using a default timeout of 10 seconds
       Dim objRequest As Net.HttpWebRequest
       Dim objResponse As Net.HttpWebResponse
       Dim objStreamReceive As System.IO.Stream
       Dim objEncoding As System.Text.Encoding
       Dim objStreamRead As System.IO.StreamReader

       Try
           ' Setup our Web request
           objRequest = Net.WebRequest.Create(URL)
           objRequest.Method = "GET"
           objRequest.KeepAlive = True
           objRequest.AllowAutoRedirect = True
           objRequest.Timeout = TimeoutSeconds * 1000
           ' Retrieve data from request
           objResponse = objRequest.GetResponse()
           objStreamReceive = objResponse.GetResponseStream
           objEncoding = System.Text.Encoding.GetEncoding( _
               "utf-8")
           objStreamRead = New System.IO.StreamReader( _
               objStreamReceive, objEncoding)
           ' Set function return value
           GetPageHTML = objStreamRead.ReadToEnd()
           ' Check if available, then close response
           If Not objResponse Is Nothing Then
               objResponse.Close()
           End If
       Catch
          Return "error"
       End Try
   End Function

  • 4 weeks later...
Posted

Hi,

Pages that redirects sends a http header back to the client showing the location of the page being redirected to. Using VB 2008 i found the following code will give you this location:

 

objresponse.ResponseUri.AbsoluteUri

 

You Could then do the following:

 

If not url=objresponse.ResponseUri.AbsoluteUri then
return GetPageHTML(objresponse.ResponseUri.AbsoluteUri)
End IF

 

Hope this helps.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...