Obtaining Source of Web Page

JWA

Newcomer
Joined
Jan 13, 2006
Messages
2
Guys & Gals

I have a new project and I'm not really sure how to start at all. Basically I will be iterating through a set of web pages and performing actions depending on the page's source. So my first task is to find how to get a copy of the HTML behind a page on the net.

The only thought I have had is to use the Web Browser control, but even then I have no idea if it has the methods I require to obtain the source code of a page.

Thoughts?

JWA
 
Easiest option is probably the System.Net.WebClient class as it provides a simple DownloadData method that will return a URL as a byte array, this can easily be turned into a string via a StreamReader object or one of the classes under System.Text

Visual Basic:
        Dim x As New System.Net.WebClient()

        Dim b() As Byte = x.DownloadData("http://www.microsoft.com")

        Dim ms As New System.IO.MemoryStream(b)
        Dim sr As New IO.StreamReader(ms)

        Dim s As String = sr.ReadToEnd

        MessageBox.Show(s)
 
Brilliant this does the trick.... to an extent. On some sites I get 502 errors returned from the web server!

Any ideas?

JWA
 
If you are running this through a proxy server a 502 error could be caused by the proxy configuration.
Alternatively it can indicate a bad gateway between you and the server.
 
Back
Top