Obtaining Source of Web Page

JWA · January 13, 2006

Guys & Gals

I have a new project and I'm not really sure how to start at all. Basically I will be iterating through a set of web pages and performing actions depending on the page's source. So my first task is to find how to get a copy of the HTML behind a page on the net.

The only thought I have had is to use the Web Browser control, but even then I have no idea if it has the methods I require to obtain the source code of a page.

Thoughts?

JWA

PlausiblyDamp · January 13, 2006

Easiest option is probably the System.Net.WebClient class as it provides a simple DownloadData method that will return a URL as a byte array, this can easily be turned into a string via a StreamReader object or one of the classes under System.Text

       Dim x As New System.Net.WebClient()

       Dim b() As Byte = x.DownloadData("http://www.microsoft.com")

       Dim ms As New System.IO.MemoryStream(b)
       Dim sr As New IO.StreamReader(ms)

       Dim s As String = sr.ReadToEnd

       MessageBox.Show(s)

JWA · January 13, 2006

Brilliant this does the trick.... to an extent. On some sites I get 502 errors returned from the web server!

Any ideas?

JWA

PlausiblyDamp · January 14, 2006

If you are running this through a proxy server a 502 error could be caused by the proxy configuration.

Alternatively it can indicate a bad gateway between you and the server.

Sign In

Obtaining Source of Web Page

Recommended Posts

JWA

PlausiblyDamp

JWA

PlausiblyDamp

Join the conversation

Browse

Activity