Extract HTML from a redirected page

HD666

Newcomer
Joined
Nov 24, 2011
Messages
1
Hello,

I am using visual basic 2005. I found on the web the following function that extracts HTML from webpages. It is very useful but unfortunately it does not work with redirected pages. That is, when I put in it a URL of a redirect page it gives me nothing or error. I added to it ".AllowAutoRedirect = True" but still it did not work. I wonder how to make it work for redirected pages.

I appreciate the help.
Code:
Public Function GetPageHTML(ByVal URL As String, _
      Optional ByVal TimeoutSeconds As Integer = 10) _
     As String
        ' Retrieves the HTML from the specified URL,
        ' using a default timeout of 10 seconds
        Dim objRequest As Net.HttpWebRequest
        Dim objResponse As Net.HttpWebResponse
        Dim objStreamReceive As System.IO.Stream
        Dim objEncoding As System.Text.Encoding
        Dim objStreamRead As System.IO.StreamReader

        Try
            ' Setup our Web request
            objRequest = Net.WebRequest.Create(URL)
            objRequest.Method = "GET"
            objRequest.KeepAlive = True
            objRequest.AllowAutoRedirect = True
            objRequest.Timeout = TimeoutSeconds * 1000
            ' Retrieve data from request
            objResponse = objRequest.GetResponse()
            objStreamReceive = objResponse.GetResponseStream
            objEncoding = System.Text.Encoding.GetEncoding( _
                "utf-8")
            objStreamRead = New System.IO.StreamReader( _
                objStreamReceive, objEncoding)
            ' Set function return value
            GetPageHTML = objStreamRead.ReadToEnd()
            ' Check if available, then close response
            If Not objResponse Is Nothing Then
                objResponse.Close()
            End If
        Catch
           Return "error"
        End Try
    End Function
 
Hi,
Pages that redirects sends a http header back to the client showing the location of the page being redirected to. Using VB 2008 i found the following code will give you this location:

Code:
objresponse.ResponseUri.AbsoluteUri

You Could then do the following:

Code:
If not url=objresponse.ResponseUri.AbsoluteUri then
return GetPageHTML(objresponse.ResponseUri.AbsoluteUri)
End IF

Hope this helps.
 
Back
Top