Jump to content
Xtreme .Net Talk

Increase speed of reading and writing text files in VB.NET 2008


Recommended Posts

Posted

Dear all,

 

I have created a code in VB.NET to read data from text files. Data is read from a list of files, where each file is changed using a for loop. Data is written to a new text files. Each file is read one by one and written in the same way. Now, my speed of execution is very slow. I am using a Quadcore processor with only 20-30% of CPU utilization when my code runs. Is there anyway I can increase the speed of reading and writing? To read only 125 files it takes 10 minutes or more, which is very slow indeed, because in the end I need to read thousands of files and write them. Each file is approxiamately 30-50kb.

 

Here is my code.

 

   Public Sub ReadRMRDataFileIntoTextFiles()

       'Read in the customerids once up front
       Dim customerids As Collections.Generic.List(Of String)
       Dim idFileName As String = customer_id_file
       If IO.File.Exists(idFileName) Then
           customerids = IO.File.ReadAllLines(idFileName).ToList()
       Else
           customerids = New Collections.Generic.List(Of String)()
       End If

       'now process files
       current_rmrfile = 0
       For Each curFile As String In rmr_files
           Dim customer_name As String
           customer_name = ""

           'Open rmr data file 
           Dim RmrData() As String = IO.File.ReadAllLines(curFile)

           For Each curLine As String In RmrData
               'RemoveEmptyEntires option takes care of Pack() and Trim() 
               'If line has proper data inside to be read
               If InStr(curLine, "METER") > 0 Then
                   If InStr(curLine, ":") > 0 Then
                       Dim newcurLine() As String = curLine.Replace(" ", "").Split(":")
                       customer_name = Trim(newcurLine(1))

                       'If customer already added in list - do not add
                       'Else if customer not added - add into list
                       If Not customerids.Contains(customer_name) Then
                           customerids.Add(customer_name)
                           IO.File.AppendAllText(idFileName, customer_name & vbCrLf)
                       End If
                   ElseIf InStr(curLine, "=") > 0 Then
                       Dim newcurLine() As String = curLine.Replace(" ", "").Split("=")
                       customer_name = Trim(newcurLine(1))

                       'If customer already added in list - do not add
                       'Else if customer not added - add into list
                       If Not customerids.Contains(customer_name) Then
                           customerids.Add(customer_name)
                           IO.File.AppendAllText(idFileName, customer_name & vbCrLf)
                       End If
                   End If
               End If


               'Split and Join string to apply "Trim" and "Pack"
               words = curLine.Trim(" ").Split(vbTab)

               'Count occurences of string
               countchar1 = CountOccurrences(curLine, "/", False)
               countchar2 = CountOccurrences(curLine, ":", False)


               'If data has started, then read it
               If countchar1 = 2 And countchar2 = 1 And words.Length >= 1 Then

                   'Get data from line
                   Dim trimwords As String = String.Join(" ", words)
                   Dim datewrite As String = trimwords.Substring(0, 10)
                   Dim timewrite As String = trimwords.Substring(11, 5)
                   Dim kwhwrite As String = words(1)


                   'Splitting date
                   Dim day_write As String = datewrite.Substring(3, 2)
                   Dim month_write As String = datewrite.Substring(0, 2)
                   Dim year_write As String = datewrite.Substring(6, 4)
                   datewrite = String.Format("{0}-{1}-{2}", day_write, month_write, year_write)

                   ''''Time
                   If timewrite = "24:00" Then
                       timewrite = "00:00:00"
                   Else
                       timewrite = String.Format("{0}:{1}", timewrite, "00")
                   End If

                   Dim writetofile As String = String.Format("{0},{1},{2}", datewrite, timewrite, kwhwrite & vbCrLf)
                   IO.File.AppendAllText(app_dir & "\" & customer_name & ".txt", writetofile)
               Else
                   'If data has not yet started, skip the initial lines
                   Continue For
               End If

           Next curLine
           current_rmrfile = current_rmrfile + 1
           UpdateProgressBar()
       Next curFile
       System.Threading.Thread.Sleep(3000)
       Me.Close()

   End Sub

   Function CountOccurrences(ByVal p_strStringToCheck, ByVal p_strSubString, ByVal p_boolCaseSensitive)
       Dim arrstrTemp
       Dim strBase, strToFind

       If p_boolCaseSensitive Then
           strBase = p_strStringToCheck
           strToFind = p_strSubString
       Else
           strBase = LCase(p_strStringToCheck)
           strToFind = LCase(p_strSubString)
       End If

       arrstrTemp = Split(strBase, strToFind)
       CountOccurrences = UBound(arrstrTemp)
   End Function

 

 

One of the sample data files to read.

 

Service Point ID=060430_00001587
AKAUN=601011
METER=28509864
DATE/TIME=01/05/2009 00:00 TO 30/06/2009 00:00

A= KWH IMPORT
B= KWH EXPORT
C= KVARH IMPORT
D= KVARH IMPORT

 DATE        TIME   A            B           C           D
05/01/2009 00:30	74	50	0	0
05/01/2009 01:00	77	61	0	0
05/01/2009 01:30	76	62	0	0
05/01/2009 02:00	77	60	0	0
05/01/2009 02:30	76	61	0	0
05/01/2009 03:00	76	61	0	0
05/01/2009 03:30	77	62	0	0
05/01/2009 04:00	76	61	0	0
05/01/2009 04:30	76	51	0	0
05/01/2009 05:00	73	49	0	0
05/01/2009 05:30	75	50	0	0
05/01/2009 06:00	74	50	0	0
05/01/2009 06:30	74	49	0	0
05/01/2009 07:00	75	50	0	0
05/01/2009 07:30	73	48	0	0
05/01/2009 08:00	74	50	0	0
05/01/2009 08:30	76	62	0	0
05/01/2009 09:00	72	59	0	0
05/01/2009 09:30	71	59	0	0

 

All help is appreciated.

  • Administrators
Posted

I should have more time to have a proper look at this later, at a glance though you seem to be leaving a lot of method parameters and return types as object e.g. CountOccurrences should really return an integer and the parameters should be specified as string, string and boolean - this can stop a lot of runtime data type checks and coercions.

 

As a quick idea try adding Option Explicit to the top of the source file and fix any errors it generates due to not specifying data types - it may not make a big difference but if this code is being executed repeatedly in a loop it could be noticeable improvement.

Posting Guidelines FAQ Post Formatting

 

Intellectuals solve problems; geniuses prevent them.

-- Albert Einstein

Posted
I should have more time to have a proper look at this later, at a glance though you seem to be leaving a lot of method parameters and return types as object e.g. CountOccurrences should really return an integer and the parameters should be specified as string, string and boolean - this can stop a lot of runtime data type checks and coercions.

 

As a quick idea try adding Option Explicit to the top of the source file and fix any errors it generates due to not specifying data types - it may not make a big difference but if this code is being executed repeatedly in a loop it could be noticeable improvement.

 

 

I have included some explanations and reduced the code to see what I am doing. Then maybe you can have an idea of how I can fire multiple threads using this type of mechanism.

 

 

'Go through each in file list
For Each curFile As String In rmr_files

           'Open file and read all data
           Dim RmrData() As String = IO.File.ReadAllLines(curFile)

           'For each line in file, read data and process
           For Each curLine As String In RmrData

           'Do some data processing here
           .........................................
           .........................................

           'Write processed data to a new text file - USE APPEND
           IO.File.AppendAllText(app_dir & "\" & customer_name & ".txt", writetofile)
           
           'Move to the next line in the file till some criteria is met
           Next curLine

       'Move to the next file in the list
       Next curFile

   End Sub

 

Hope this helps. I have already included the Option Explicit for the data type. Just didn't paste it since the code was getting too long. Here it is now, the rest of it.

 

Option Strict Off
Option Explicit On

Public Class Form1
   Dim app_dir As String 'Location to application directory
   Dim tempfiles_dir As String 'Location to \TempFiles directory
   Dim customer_id_file As String 'Location to file \customerids.txt
   Dim rmr_file_list As String 'Location to rmrfiles.txt file
   Dim rmr_files() As String 'Array containing directories and rmr data file names
   Dim count_rmrfiles As Long 'Number of rmr data files
   Dim current_rmrfile As Integer 'Current file being read

   Dim customerids As Collections.Generic.List(Of String)
   Dim idFileName As String = customer_id_file
   Dim words() As String
   Dim countchar1 As String
   Dim countchar2 As String


   Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
       Dim th As New System.Threading.Thread(AddressOf ReadRMRDataFileIntoTextFiles)

       'Set directories and files
       app_dir = Application.StartupPath
       tempfiles_dir = app_dir & "\TempFiles\"
       rmr_file_list = app_dir & "\TempFiles\loadprofilefiles.txt"
       customer_id_file = app_dir & "\customerids.txt"

       'Read file names from a text file into an array.. this only takes a sec or so..
       Call ReadLoadProfileFilesIntoArray()

       ProgressBar1.Minimum = 0
       ProgressBar1.Maximum = count_rmrfiles
       ProgressBar1.Value = 0

       'Start reading rmr data files
       th.Start()

   End Sub

   Private Sub UpdateProgressBar()
       If Me.InvokeRequired Then
           Me.Invoke(New MethodInvoker(AddressOf UpdateProgressBar))
       Else
           ProgressBar1.Value = current_rmrfile
       End If
   End Sub

 

As for the CountOccurrences sub I found it somewhere on the Internet, done by someone. I also found RegExp doing the same thing, however I found it regular expression matching is slower.

  • Administrators
Posted

Will have another look later but running a profiler shows that changing your CountOccurrences to

 Function CountOccurrences(ByVal p_strStringToCheck As String, ByVal p_strSubString As String, ByVal p_boolCaseSensitive As Boolean) As Integer
       Dim arrstrTemp() As String
       Dim strBase, strToFind As String

       If p_boolCaseSensitive Then
           strBase = p_strStringToCheck
           strToFind = p_strSubString
       Else
           strBase = p_strStringToCheck.ToLower
           strToFind = p_strSubString.ToLower
       End If

       arrstrTemp = strBase.Split(strToFind)
       CountOccurrences = arrstrTemp.GetUpperBound(0)
   End Function

results in big improvemets in that one routine - if this is called many times in a loop that could be a big win.

Posting Guidelines FAQ Post Formatting

 

Intellectuals solve problems; geniuses prevent them.

-- Albert Einstein

Posted
Will have another look later but running a profiler shows that changing your CountOccurrences to

 Function CountOccurrences(ByVal p_strStringToCheck As String, ByVal p_strSubString As String, ByVal p_boolCaseSensitive As Boolean) As Integer
       Dim arrstrTemp() As String
       Dim strBase, strToFind As String

       If p_boolCaseSensitive Then
           strBase = p_strStringToCheck
           strToFind = p_strSubString
       Else
           strBase = p_strStringToCheck.ToLower
           strToFind = p_strSubString.ToLower
       End If

       arrstrTemp = strBase.Split(strToFind)
       CountOccurrences = arrstrTemp.GetUpperBound(0)
   End Function

results in big improvemets in that one routine - if this is called many times in a loop that could be a big win.

 

 

Thank you for the valueable help. Will try it out and let you know. :)

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...