Getting title of an html page.

rbulph

Junior Contributor
Joined
Feb 17, 2003
Messages
397
How would I get the title tag of an html file? In VB6 I used two APIs: InternetOpenFile and InternetReadUrl. But APIs don't seem to feature very heavily in .net, so I expect there's a more straightforward way. Any idea how to do this?
 
Hmm, it seems you can declare APIs in much the same way in .net as in VB, so I've done the following, just adding the word "Auto" for each identifier and changing all Long parameters and return types to Integer. But sBuffer is empty. Any thoughts?

Code:
Module Module2

    Private Const INTERNET_FLAG_RELOAD = &H80000000

    Private Declare Auto Function InternetOpenUrl Lib "wininet" Alias "InternetOpenUrlA" (ByVal hInternetSession As Integer, ByVal lpszUrl As String, ByVal lpszHeaders As String, ByVal dwHeadersLength As Integer, ByVal dwFlags As Integer, ByVal dwContext As Integer) As Integer
    Private Declare Auto Function InternetReadFile Lib "wininet" (ByVal hFile As Integer, ByVal sBuffer As String, ByVal lNumBytesToRead As Integer, ByVal lNumberofBytesRead As Integer) As Integer
    Private Const INTERNET_OPEN_TYPE_DIRECT = 1
    Declare Auto Function InternetOpen Lib "wininet" Alias "InternetOpenA" (ByVal sAgent As String, ByVal lAccessType As Integer, ByVal sProxyName As String, ByVal sProxyBypass As String, ByVal lFlags As Integer) As Integer
    Friend Declare Auto Function InternetCloseHandle Lib "wininet" (ByRef hInet) As Integer

    Friend hOpen As Integer

    Friend Sub ShowTitle()


        hOpen = InternetOpen("App1", INTERNET_OPEN_TYPE_DIRECT, vbNullString, vbNullString, 0)

        Dim sBuffer As String = Space(1000)   '1000 characters must surely be enough.

        Dim hFile As Integer
        Dim Ret As Integer

        Dim sPath As String = "http://www.google.co.uk/"

        hFile = InternetOpenUrl(hOpen, sPath, vbNullString, 0&, INTERNET_FLAG_RELOAD, 0&)
        InternetReadFile(hFile, sBuffer, 1000, Ret)
        InternetCloseHandle(hFile)
        Debug.Print(sBuffer)

        Dim t1 As Long
        Dim t2 As Long
        t1 = InStr(sBuffer, "<TITLE>") + 7
        t2 = InStr(sBuffer, "</TITLE>")
        If t1 <> 0 And t2 <> 0 Then Debug.Print(Mid$(sBuffer, t1, t2 - t1))

        InternetCloseHandle(hOpen)

    End Sub
End Module
 
well i used the System.NET.HttpWebrequest class to create this simple example for you ....
Visual Basic:
Dim req As Net.HttpWebRequest = DirectCast(Net.HttpWebRequest.Create("[url="http://google.com/"]http://google.com/[/url]"), Net.HttpWebRequest)
 
Dim res As Net.HttpWebResponse = DirectCast(req.GetResponse, Net.HttpWebResponse)
 
Dim sReader As New IO.StreamReader(res.GetResponseStream)
Dim html As String = sReader.ReadToEnd
 
sReader.Close()
res.Close()
 
Dim title As String = System.Text.RegularExpressions.Regex.Split(html, "(<title>)|(</title>)", System.Text.RegularExpressions.RegexOptions.IgnoreCase)(2)
 
 
Console.WriteLine(title)
 
dynamic_sysop said:
well i used the System.NET.HttpWebrequest class to create this simple example for you ....
Visual Basic:
Dim req As Net.HttpWebRequest = DirectCast(Net.HttpWebRequest.Create("[url="http://google.com/"]http://google.com/[/url]"), Net.HttpWebRequest)
 
Dim res As Net.HttpWebResponse = DirectCast(req.GetResponse, Net.HttpWebResponse)
 
Dim sReader As New IO.StreamReader(res.GetResponseStream)
Dim html As String = sReader.ReadToEnd
 
sReader.Close()
res.Close()
 
Dim title As String = System.Text.RegularExpressions.Regex.Split(html, "(<title>)|(</title>)", System.Text.RegularExpressions.RegexOptions.IgnoreCase)(2)
 
 
Console.WriteLine(title)

Thanks.

Had to lengthen this message - how bizarre.

The message you have entered is too short. Please lengthen your message to at least 10 characters.
 
This procedure works, but it is a bit slow. It takes up to a third of a second which becomes a problem where I have a number of pages to get the title of. The command that seems to take the time is "req.GetResponse". What would be helpful would be if I could set this up to run in the background - so I set req up to get a response, and then an event fires when req has got its response. But the HttpWebRequest has no events. Any ideas?
 
I thought I posted a reply to this, but it's not there, so I'll post again.

Yes, the HttpWebRequest class does have asynchronous methods such as BeginGetResponse. But I ran into problems with accessing controls outside of their thread when doing that, so I took to using BackgoundWorker componenents, one for each web page. And that works fine. It was quite easy in fact.
 
Back
Top