Retreiving data from web sites

yami

Newcomer
Joined
Aug 14, 2003
Messages
5
I am writing an application that retreives stock information from a financial website. Presently I am using the WebClient method to retreive the HTML, and then I parse it using string functions. Then general structure of my code looks like this:
Visual Basic:
    Public Function StockDownload()
        Dim strStocks() = '{array of stock symbols}
        Dim intAllStocks = 'total number of stock symbols
        Dim strData As String
        Dim wc As New Net.WebClient()
        Dim url As String
 
       'Main code        
        For i = 1 To intAllStocks
          'Get data from website
           url = "web address" & "/q?s=" & strStocks(i - 1)
           Dim b() As Byte = wc.DownloadData(url)
           strData = System.Text.ASCIIEncoding.ASCII.GetString(b)

           'Parse Data
           'Here I have my code to parse the data

        Next

My code works fine, but is (I think) kind of slow. To download data for 500 stocks takes over ten minutes. I think my parsing code is quite efficient, so most of the time is taken up in getting the data from the website.

Is there a more efficient way to do this? Perhaps a faster method than the WebClient method? In particular, I'm wondering if some advantage can be gained by the fact that all of the http requests are to the same website. The only thing that changes with each request is the search string on the end of the URL. In other words, once the connection to the server is established, perhaps you can recursively retreive information from the server without initiating a new request for each stock symbol?

Any suggestions or advice from you experts out there would be much appreciated! Thanks!
 
Of course it will take time to download them all. Even if it takes only 1 second to download the info for each stock, it will take 500 seconds, which is approximately 8 minutes and 20 seconds.

So no, there's no faster way to do it than to get a really really fast internet connection. :p
 
I realize that a little over one second is not an unreasonable amount of time for downloading data. It's when you repeat the process 500 times that it starts to become a problem. I guess that my real question is whether you can streamline the process of retreiving multiple pages from the same server. It just seems to me that, fundamentally, once you have established a connection with a server, you should be able to retreive mulitiple pages from that server without establishing a new connection for each page. Maybe it doesn't really matter. It just seem to me that most of the computation time is used up in establishing the connection. The actual download of the html (~30kb) should, theoretically, only take a fraction of a second with a DSL connection, I think.

Any additional thoughts or suggestions are much appreciated!
 
I don't think establishing the connection is not what takes time. It is sending the request (the HTTP header) and recieving the data. You could try using sockets to manually connect to the server and send the HTTP header manually without disconnecting each time, but I'm not sure you can do that, or that it will help you much.
 
Back
Top