Jump to content
Xtreme .Net Talk

Recommended Posts

Posted

I have code that performs an html scrape:

Dim myConnection As New SqlConnection("server=(local);database=xxxx;Trusted_Connection=yes")
           
       Dim strAddress As String = ("http:" & straddress & ".html")
       
       Dim oRequest As WebRequest = WebRequest.Create(strAddress)
       Dim oResponse As WebResponse = oRequest.GetResponse()

       Dim oStream As Stream = oResponse.GetResponseStream()
       Dim oStreamReader As New StreamReader(oStream, Encoding.UTF8)

       Dim strData As String = oStreamReader.ReadToEnd()
             
       Dim regGames As New Regex("[regexpattern]", RegexOptions.Singleline)
       
       Dim mGames As Match = regGames.Match(strData)
       '***************************************
   While mGames.Success
       Dim writeGames As New SqlCommand( _
       "INSERT [insert statement]", myConnection)
               
       writeGames.Connection.Open()
       writeGames.ExecuteNonQuery()
       writeGames.Connection.Close()
                        
       Dim strGame As String = mGames.Groups(1).Value & " " & _
       mGames.Groups(2).Value & " @ "
           
       'next game
       mGames = mGames.NextMatch
           
       strGame = strGame & mGames.Groups(1).Value & " " & _
       mGames.Groups(2).Value()
           
       'print
       Response.Write(strGame & "<br>")
           
       'next set and loop
       mGames = mGames.NextMatch
End While

 

What I need to do is once I apply the regex to the scrape, I want to get the total number of elements (games) before I do any further processing.

 

AFAIK there is no way to get a "count" of the number of matches from the "Match" object, but there is with matchcollection?

 

Could someone tell me if there is a way to get a count from the regex before I do any processing?

Posted

Matches vs Match

 

I'm not sure how you expect to count the number of matches before actually performing the matching process. As you rightly point out the MatchCollection object does expose a count, so this raises the question of why you're not using it. If you use the Matches method, rather than Match, you can collect all the matches at once and then loop through them:

 

Dim mAllGames As MatchCollection = regGames.Matches(strData)

Response.Write("There are " + mAllGames.Count + " games")

For Each mGame As Match In mAllGames
   'Process mGame here
Next

 

I would also suggest you open the database connection before entering the loop, and close it after the loop, rather than opening and closing on each iteration.

 

Good luck :)

Never trouble another for what you can do for yourself.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...