JWA Posted January 13, 2006 Posted January 13, 2006 Guys & Gals I have a new project and I'm not really sure how to start at all. Basically I will be iterating through a set of web pages and performing actions depending on the page's source. So my first task is to find how to get a copy of the HTML behind a page on the net. The only thought I have had is to use the Web Browser control, but even then I have no idea if it has the methods I require to obtain the source code of a page. Thoughts? JWA Quote
Administrators PlausiblyDamp Posted January 13, 2006 Administrators Posted January 13, 2006 Easiest option is probably the System.Net.WebClient class as it provides a simple DownloadData method that will return a URL as a byte array, this can easily be turned into a string via a StreamReader object or one of the classes under System.Text Dim x As New System.Net.WebClient() Dim b() As Byte = x.DownloadData("http://www.microsoft.com") Dim ms As New System.IO.MemoryStream(b) Dim sr As New IO.StreamReader(ms) Dim s As String = sr.ReadToEnd MessageBox.Show(s) Quote Posting Guidelines FAQ Post Formatting Intellectuals solve problems; geniuses prevent them. -- Albert Einstein
JWA Posted January 13, 2006 Author Posted January 13, 2006 Brilliant this does the trick.... to an extent. On some sites I get 502 errors returned from the web server! Any ideas? JWA Quote
Administrators PlausiblyDamp Posted January 14, 2006 Administrators Posted January 14, 2006 If you are running this through a proxy server a 502 error could be caused by the proxy configuration. Alternatively it can indicate a bad gateway between you and the server. Quote Posting Guidelines FAQ Post Formatting Intellectuals solve problems; geniuses prevent them. -- Albert Einstein
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.