Jump to content
Xtreme .Net Talk

Recommended Posts

Posted

I have searched through the forum for a method to download the html code from webpages and

found some threads easily.

 

So far I have:

    Public Function GetURLSource(ByVal URL As String) As String
       Dim wClient = New System.Net.WebClient()
       Dim buffer As Byte()

       buffer = wClient.DownloadData(URL)
       GetURLSource = System.Text.Encoding.Default.GetString(buffer, 0, buffer.Length)
   End Function

 

On most webpages it works as it should, but on the webpage I want to read from it doesn't.

For example http://anidb.info/perl-bin/animedb.pl?show=anime&aid=96

I want to read the page so I can get the episode names from the series.

But all the function returns is "���".

 

 

Why? And is there any solution to my problem?

Posted

GZip encoded

 

According to the page headers, the content is GZip compressed. If you are using version 2.0 of the framework, you can use the System.IO.Compression.GZipStream to decompress it. Besides that, I would recommend using the System.Net.WebRequest class rather than System.Web. This way you can handle different encodings, content types, and so forth:

 

    Public Function GetURLSource(ByVal URL As String) As String
       Dim httpReq As WebRequest
       Dim httpRes As HttpWebResponse
       Dim gzStm As Compression.GZipStream
       Dim buffer As Byte()

       httpReq = System.Net.WebRequest.Create(URL)
       httpRes = DirectCast(httpReq.GetResponse(), HttpWebResponse)

       'Perhaps check status code here


       'Check encoding
       If (httpRes.ContentEncoding = "gzip") Then
           'Content is GZip'ed, must extract first
           gzStm = New Compression.GZipStream(httpRes.GetResponseStream(), Compression.CompressionMode.Decompress)

           'etc
       Else
           'Not GZip compressed, do something else
       End If

       'Code here to determine character encoding (eg UTF-8)

       'Do NOT assume UTF-8, should check with returned data
       Return Encoding.UTF8.GetString(buffer)
   End Function

 

Good luck :cool:

Never trouble another for what you can do for yourself.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...