Increase speed of reading and writing text files in VB.NET 2008


Feb 26, 2009
Dear all,

I have created a code in VB.NET to read data from text files. Data is read from a list of files, where each file is changed using a for loop. Data is written to a new text files. Each file is read one by one and written in the same way. Now, my speed of execution is very slow. I am using a Quadcore processor with only 20-30% of CPU utilization when my code runs. Is there anyway I can increase the speed of reading and writing? To read only 125 files it takes 10 minutes or more, which is very slow indeed, because in the end I need to read thousands of files and write them. Each file is approxiamately 30-50kb.

Here is my code.

    Public Sub ReadRMRDataFileIntoTextFiles()

        'Read in the customerids once up front
        Dim customerids As Collections.Generic.List(Of String)
        Dim idFileName As String = customer_id_file
        If IO.File.Exists(idFileName) Then
            customerids = IO.File.ReadAllLines(idFileName).ToList()
            customerids = New Collections.Generic.List(Of String)()
        End If

        'now process files
        current_rmrfile = 0
        For Each curFile As String In rmr_files
            Dim customer_name As String
            customer_name = ""

            'Open rmr data file 
            Dim RmrData() As String = IO.File.ReadAllLines(curFile)

            For Each curLine As String In RmrData
                'RemoveEmptyEntires option takes care of Pack() and Trim() 
                'If line has proper data inside to be read
                If InStr(curLine, "METER") > 0 Then
                    If InStr(curLine, ":") > 0 Then
                        Dim newcurLine() As String = curLine.Replace(" ", "").Split(":")
                        customer_name = Trim(newcurLine(1))

                        'If customer already added in list - do not add
                        'Else if customer not added - add into list
                        If Not customerids.Contains(customer_name) Then
                            IO.File.AppendAllText(idFileName, customer_name & vbCrLf)
                        End If
                    ElseIf InStr(curLine, "=") > 0 Then
                        Dim newcurLine() As String = curLine.Replace(" ", "").Split("=")
                        customer_name = Trim(newcurLine(1))

                        'If customer already added in list - do not add
                        'Else if customer not added - add into list
                        If Not customerids.Contains(customer_name) Then
                            IO.File.AppendAllText(idFileName, customer_name & vbCrLf)
                        End If
                    End If
                End If

                'Split and Join string to apply "Trim" and "Pack"
                words = curLine.Trim(" ").Split(vbTab)

                'Count occurences of string
                countchar1 = CountOccurrences(curLine, "/", False)
                countchar2 = CountOccurrences(curLine, ":", False)

                'If data has started, then read it
                If countchar1 = 2 And countchar2 = 1 And words.Length >= 1 Then

                    'Get data from line
                    Dim trimwords As String = String.Join(" ", words)
                    Dim datewrite As String = trimwords.Substring(0, 10)
                    Dim timewrite As String = trimwords.Substring(11, 5)
                    Dim kwhwrite As String = words(1)

                    'Splitting date
                    Dim day_write As String = datewrite.Substring(3, 2)
                    Dim month_write As String = datewrite.Substring(0, 2)
                    Dim year_write As String = datewrite.Substring(6, 4)
                    datewrite = String.Format("{0}-{1}-{2}", day_write, month_write, year_write)

                    If timewrite = "24:00" Then
                        timewrite = "00:00:00"
                        timewrite = String.Format("{0}:{1}", timewrite, "00")
                    End If

                    Dim writetofile As String = String.Format("{0},{1},{2}", datewrite, timewrite, kwhwrite & vbCrLf)
                    IO.File.AppendAllText(app_dir & "\" & customer_name & ".txt", writetofile)
                    'If data has not yet started, skip the initial lines
                    Continue For
                End If

            Next curLine
            current_rmrfile = current_rmrfile + 1
        Next curFile

    End Sub

    Function CountOccurrences(ByVal p_strStringToCheck, ByVal p_strSubString, ByVal p_boolCaseSensitive)
        Dim arrstrTemp
        Dim strBase, strToFind

        If p_boolCaseSensitive Then
            strBase = p_strStringToCheck
            strToFind = p_strSubString
            strBase = LCase(p_strStringToCheck)
            strToFind = LCase(p_strSubString)
        End If

        arrstrTemp = Split(strBase, strToFind)
        CountOccurrences = UBound(arrstrTemp)
    End Function

One of the sample data files to read.

Service Point ID=060430_00001587
DATE/TIME=01/05/2009 00:00 TO 30/06/2009 00:00


  DATE        TIME   A            B           C           D
05/01/2009 00:30	74	50	0	0
05/01/2009 01:00	77	61	0	0
05/01/2009 01:30	76	62	0	0
05/01/2009 02:00	77	60	0	0
05/01/2009 02:30	76	61	0	0
05/01/2009 03:00	76	61	0	0
05/01/2009 03:30	77	62	0	0
05/01/2009 04:00	76	61	0	0
05/01/2009 04:30	76	51	0	0
05/01/2009 05:00	73	49	0	0
05/01/2009 05:30	75	50	0	0
05/01/2009 06:00	74	50	0	0
05/01/2009 06:30	74	49	0	0
05/01/2009 07:00	75	50	0	0
05/01/2009 07:30	73	48	0	0
05/01/2009 08:00	74	50	0	0
05/01/2009 08:30	76	62	0	0
05/01/2009 09:00	72	59	0	0
05/01/2009 09:30	71	59	0	0

All help is appreciated.
I should have more time to have a proper look at this later, at a glance though you seem to be leaving a lot of method parameters and return types as object e.g. CountOccurrences should really return an integer and the parameters should be specified as string, string and boolean - this can stop a lot of runtime data type checks and coercions.

As a quick idea try adding Option Explicit to the top of the source file and fix any errors it generates due to not specifying data types - it may not make a big difference but if this code is being executed repeatedly in a loop it could be noticeable improvement.
I should have more time to have a proper look at this later, at a glance though you seem to be leaving a lot of method parameters and return types as object e.g. CountOccurrences should really return an integer and the parameters should be specified as string, string and boolean - this can stop a lot of runtime data type checks and coercions.

As a quick idea try adding Option Explicit to the top of the source file and fix any errors it generates due to not specifying data types - it may not make a big difference but if this code is being executed repeatedly in a loop it could be noticeable improvement.

I have included some explanations and reduced the code to see what I am doing. Then maybe you can have an idea of how I can fire multiple threads using this type of mechanism.

'Go through each in file list
For Each curFile As String In rmr_files

            'Open file and read all data
            Dim RmrData() As String = IO.File.ReadAllLines(curFile)

            'For each line in file, read data and process
            For Each curLine As String In RmrData

            'Do some data processing here

            'Write processed data to a new text file - USE APPEND
            IO.File.AppendAllText(app_dir & "\" & customer_name & ".txt", writetofile)
            'Move to the next line in the file till some criteria is met
            Next curLine

        'Move to the next file in the list
        Next curFile

    End Sub

Hope this helps. I have already included the Option Explicit for the data type. Just didn't paste it since the code was getting too long. Here it is now, the rest of it.

Option Strict Off
Option Explicit On

Public Class Form1
    Dim app_dir As String 'Location to application directory
    Dim tempfiles_dir As String 'Location to \TempFiles directory
    Dim customer_id_file As String 'Location to file \customerids.txt
    Dim rmr_file_list As String 'Location to rmrfiles.txt file
    Dim rmr_files() As String 'Array containing directories and rmr data file names
    Dim count_rmrfiles As Long 'Number of rmr data files
    Dim current_rmrfile As Integer 'Current file being read

    Dim customerids As Collections.Generic.List(Of String)
    Dim idFileName As String = customer_id_file
    Dim words() As String
    Dim countchar1 As String
    Dim countchar2 As String

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        Dim th As New System.Threading.Thread(AddressOf ReadRMRDataFileIntoTextFiles)

        'Set directories and files
        app_dir = Application.StartupPath
        tempfiles_dir = app_dir & "\TempFiles\"
        rmr_file_list = app_dir & "\TempFiles\loadprofilefiles.txt"
        customer_id_file = app_dir & "\customerids.txt"

        'Read file names from a text file into an array.. this only takes a sec or so..
        Call ReadLoadProfileFilesIntoArray()

        ProgressBar1.Minimum = 0
        ProgressBar1.Maximum = count_rmrfiles
        ProgressBar1.Value = 0

        'Start reading rmr data files

    End Sub

    Private Sub UpdateProgressBar()
        If Me.InvokeRequired Then
            Me.Invoke(New MethodInvoker(AddressOf UpdateProgressBar))
            ProgressBar1.Value = current_rmrfile
        End If
    End Sub

As for the CountOccurrences sub I found it somewhere on the Internet, done by someone. I also found RegExp doing the same thing, however I found it regular expression matching is slower.
Will have another look later but running a profiler shows that changing your CountOccurrences to
Visual Basic:
  Function CountOccurrences(ByVal p_strStringToCheck As String, ByVal p_strSubString As String, ByVal p_boolCaseSensitive As Boolean) As Integer
        Dim arrstrTemp() As String
        Dim strBase, strToFind As String

        If p_boolCaseSensitive Then
            strBase = p_strStringToCheck
            strToFind = p_strSubString
            strBase = p_strStringToCheck.ToLower
            strToFind = p_strSubString.ToLower
        End If

        arrstrTemp = strBase.Split(strToFind)
        CountOccurrences = arrstrTemp.GetUpperBound(0)
    End Function
results in big improvemets in that one routine - if this is called many times in a loop that could be a big win.
Will have another look later but running a profiler shows that changing your CountOccurrences to
Visual Basic:
  Function CountOccurrences(ByVal p_strStringToCheck As String, ByVal p_strSubString As String, ByVal p_boolCaseSensitive As Boolean) As Integer
        Dim arrstrTemp() As String
        Dim strBase, strToFind As String

        If p_boolCaseSensitive Then
            strBase = p_strStringToCheck
            strToFind = p_strSubString
            strBase = p_strStringToCheck.ToLower
            strToFind = p_strSubString.ToLower
        End If

        arrstrTemp = strBase.Split(strToFind)
        CountOccurrences = arrstrTemp.GetUpperBound(0)
    End Function
results in big improvemets in that one routine - if this is called many times in a loop that could be a big win.

Thank you for the valueable help. Will try it out and let you know. :)