Arokh Posted December 20, 2006 Posted December 20, 2006 I have searched through the forum for a method to download the html code from webpages and found some threads easily. So far I have: Public Function GetURLSource(ByVal URL As String) As String Dim wClient = New System.Net.WebClient() Dim buffer As Byte() buffer = wClient.DownloadData(URL) GetURLSource = System.Text.Encoding.Default.GetString(buffer, 0, buffer.Length) End Function On most webpages it works as it should, but on the webpage I want to read from it doesn't. For example http://anidb.info/perl-bin/animedb.pl?show=anime&aid=96 I want to read the page so I can get the episode names from the series. But all the function returns is "���". Why? And is there any solution to my problem? Quote
MrPaul Posted December 20, 2006 Posted December 20, 2006 GZip encoded According to the page headers, the content is GZip compressed. If you are using version 2.0 of the framework, you can use the System.IO.Compression.GZipStream to decompress it. Besides that, I would recommend using the System.Net.WebRequest class rather than System.Web. This way you can handle different encodings, content types, and so forth: Public Function GetURLSource(ByVal URL As String) As String Dim httpReq As WebRequest Dim httpRes As HttpWebResponse Dim gzStm As Compression.GZipStream Dim buffer As Byte() httpReq = System.Net.WebRequest.Create(URL) httpRes = DirectCast(httpReq.GetResponse(), HttpWebResponse) 'Perhaps check status code here 'Check encoding If (httpRes.ContentEncoding = "gzip") Then 'Content is GZip'ed, must extract first gzStm = New Compression.GZipStream(httpRes.GetResponseStream(), Compression.CompressionMode.Decompress) 'etc Else 'Not GZip compressed, do something else End If 'Code here to determine character encoding (eg UTF-8) 'Do NOT assume UTF-8, should check with returned data Return Encoding.UTF8.GetString(buffer) End Function Good luck :cool: Quote Never trouble another for what you can do for yourself.
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.