Increase speed of reading and writing text files in VB.NET 2008

awyeah

Newcomer
Joined
Feb 26, 2009
Messages
8
Dear all,

I have created a code in VB.NET to read data from text files. Data is read from a list of files, where each file is changed using a for loop. Data is written to a new text files. Each file is read one by one and written in the same way. Now, my speed of execution is very slow. I am using a Quadcore processor with only 20-30% of CPU utilization when my code runs. Is there anyway I can increase the speed of reading and writing? To read only 125 files it takes 10 minutes or more, which is very slow indeed, because in the end I need to read thousands of files and write them. Each file is approxiamately 30-50kb.

Here is my code.

Code:
    Public Sub ReadRMRDataFileIntoTextFiles()

        'Read in the customerids once up front
        Dim customerids As Collections.Generic.List(Of String)
        Dim idFileName As String = customer_id_file
        If IO.File.Exists(idFileName) Then
            customerids = IO.File.ReadAllLines(idFileName).ToList()
        Else
            customerids = New Collections.Generic.List(Of String)()
        End If

        'now process files
        current_rmrfile = 0
        For Each curFile As String In rmr_files
            Dim customer_name As String
            customer_name = ""

            'Open rmr data file 
            Dim RmrData() As String = IO.File.ReadAllLines(curFile)

            For Each curLine As String In RmrData
                'RemoveEmptyEntires option takes care of Pack() and Trim() 
                'If line has proper data inside to be read
                If InStr(curLine, "METER") > 0 Then
                    If InStr(curLine, ":") > 0 Then
                        Dim newcurLine() As String = curLine.Replace(" ", "").Split(":")
                        customer_name = Trim(newcurLine(1))

                        'If customer already added in list - do not add
                        'Else if customer not added - add into list
                        If Not customerids.Contains(customer_name) Then
                            customerids.Add(customer_name)
                            IO.File.AppendAllText(idFileName, customer_name & vbCrLf)
                        End If
                    ElseIf InStr(curLine, "=") > 0 Then
                        Dim newcurLine() As String = curLine.Replace(" ", "").Split("=")
                        customer_name = Trim(newcurLine(1))

                        'If customer already added in list - do not add
                        'Else if customer not added - add into list
                        If Not customerids.Contains(customer_name) Then
                            customerids.Add(customer_name)
                            IO.File.AppendAllText(idFileName, customer_name & vbCrLf)
                        End If
                    End If
                End If


                'Split and Join string to apply "Trim" and "Pack"
                words = curLine.Trim(" ").Split(vbTab)

                'Count occurences of string
                countchar1 = CountOccurrences(curLine, "/", False)
                countchar2 = CountOccurrences(curLine, ":", False)


                'If data has started, then read it
                If countchar1 = 2 And countchar2 = 1 And words.Length >= 1 Then

                    'Get data from line
                    Dim trimwords As String = String.Join(" ", words)
                    Dim datewrite As String = trimwords.Substring(0, 10)
                    Dim timewrite As String = trimwords.Substring(11, 5)
                    Dim kwhwrite As String = words(1)


                    'Splitting date
                    Dim day_write As String = datewrite.Substring(3, 2)
                    Dim month_write As String = datewrite.Substring(0, 2)
                    Dim year_write As String = datewrite.Substring(6, 4)
                    datewrite = String.Format("{0}-{1}-{2}", day_write, month_write, year_write)

                    ''''Time
                    If timewrite = "24:00" Then
                        timewrite = "00:00:00"
                    Else
                        timewrite = String.Format("{0}:{1}", timewrite, "00")
                    End If

                    Dim writetofile As String = String.Format("{0},{1},{2}", datewrite, timewrite, kwhwrite & vbCrLf)
                    IO.File.AppendAllText(app_dir & "\" & customer_name & ".txt", writetofile)
                Else
                    'If data has not yet started, skip the initial lines
                    Continue For
                End If

            Next curLine
            current_rmrfile = current_rmrfile + 1
            UpdateProgressBar()
        Next curFile
        System.Threading.Thread.Sleep(3000)
        Me.Close()

    End Sub

    Function CountOccurrences(ByVal p_strStringToCheck, ByVal p_strSubString, ByVal p_boolCaseSensitive)
        Dim arrstrTemp
        Dim strBase, strToFind

        If p_boolCaseSensitive Then
            strBase = p_strStringToCheck
            strToFind = p_strSubString
        Else
            strBase = LCase(p_strStringToCheck)
            strToFind = LCase(p_strSubString)
        End If

        arrstrTemp = Split(strBase, strToFind)
        CountOccurrences = UBound(arrstrTemp)
    End Function


One of the sample data files to read.

Code:
Service Point ID=060430_00001587
AKAUN=601011
METER=28509864
DATE/TIME=01/05/2009 00:00 TO 30/06/2009 00:00

A= KWH IMPORT
B= KWH EXPORT
C= KVARH IMPORT
D= KVARH IMPORT

  DATE        TIME   A            B           C           D
05/01/2009 00:30	74	50	0	0
05/01/2009 01:00	77	61	0	0
05/01/2009 01:30	76	62	0	0
05/01/2009 02:00	77	60	0	0
05/01/2009 02:30	76	61	0	0
05/01/2009 03:00	76	61	0	0
05/01/2009 03:30	77	62	0	0
05/01/2009 04:00	76	61	0	0
05/01/2009 04:30	76	51	0	0
05/01/2009 05:00	73	49	0	0
05/01/2009 05:30	75	50	0	0
05/01/2009 06:00	74	50	0	0
05/01/2009 06:30	74	49	0	0
05/01/2009 07:00	75	50	0	0
05/01/2009 07:30	73	48	0	0
05/01/2009 08:00	74	50	0	0
05/01/2009 08:30	76	62	0	0
05/01/2009 09:00	72	59	0	0
05/01/2009 09:30	71	59	0	0

All help is appreciated.
 
I should have more time to have a proper look at this later, at a glance though you seem to be leaving a lot of method parameters and return types as object e.g. CountOccurrences should really return an integer and the parameters should be specified as string, string and boolean - this can stop a lot of runtime data type checks and coercions.

As a quick idea try adding Option Explicit to the top of the source file and fix any errors it generates due to not specifying data types - it may not make a big difference but if this code is being executed repeatedly in a loop it could be noticeable improvement.
 
I should have more time to have a proper look at this later, at a glance though you seem to be leaving a lot of method parameters and return types as object e.g. CountOccurrences should really return an integer and the parameters should be specified as string, string and boolean - this can stop a lot of runtime data type checks and coercions.

As a quick idea try adding Option Explicit to the top of the source file and fix any errors it generates due to not specifying data types - it may not make a big difference but if this code is being executed repeatedly in a loop it could be noticeable improvement.


I have included some explanations and reduced the code to see what I am doing. Then maybe you can have an idea of how I can fire multiple threads using this type of mechanism.


Code:
'Go through each in file list
For Each curFile As String In rmr_files

            'Open file and read all data
            Dim RmrData() As String = IO.File.ReadAllLines(curFile)

            'For each line in file, read data and process
            For Each curLine As String In RmrData

            'Do some data processing here
            .........................................
            .........................................

            'Write processed data to a new text file - USE APPEND
            IO.File.AppendAllText(app_dir & "\" & customer_name & ".txt", writetofile)
            
            'Move to the next line in the file till some criteria is met
            Next curLine

        'Move to the next file in the list
        Next curFile

    End Sub

Hope this helps. I have already included the Option Explicit for the data type. Just didn't paste it since the code was getting too long. Here it is now, the rest of it.

Code:
Option Strict Off
Option Explicit On

Public Class Form1
    Dim app_dir As String 'Location to application directory
    Dim tempfiles_dir As String 'Location to \TempFiles directory
    Dim customer_id_file As String 'Location to file \customerids.txt
    Dim rmr_file_list As String 'Location to rmrfiles.txt file
    Dim rmr_files() As String 'Array containing directories and rmr data file names
    Dim count_rmrfiles As Long 'Number of rmr data files
    Dim current_rmrfile As Integer 'Current file being read

    Dim customerids As Collections.Generic.List(Of String)
    Dim idFileName As String = customer_id_file
    Dim words() As String
    Dim countchar1 As String
    Dim countchar2 As String


    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        Dim th As New System.Threading.Thread(AddressOf ReadRMRDataFileIntoTextFiles)

        'Set directories and files
        app_dir = Application.StartupPath
        tempfiles_dir = app_dir & "\TempFiles\"
        rmr_file_list = app_dir & "\TempFiles\loadprofilefiles.txt"
        customer_id_file = app_dir & "\customerids.txt"

        'Read file names from a text file into an array.. this only takes a sec or so..
        Call ReadLoadProfileFilesIntoArray()

        ProgressBar1.Minimum = 0
        ProgressBar1.Maximum = count_rmrfiles
        ProgressBar1.Value = 0

        'Start reading rmr data files
        th.Start()

    End Sub

    Private Sub UpdateProgressBar()
        If Me.InvokeRequired Then
            Me.Invoke(New MethodInvoker(AddressOf UpdateProgressBar))
        Else
            ProgressBar1.Value = current_rmrfile
        End If
    End Sub

As for the CountOccurrences sub I found it somewhere on the Internet, done by someone. I also found RegExp doing the same thing, however I found it regular expression matching is slower.
 
Will have another look later but running a profiler shows that changing your CountOccurrences to
Visual Basic:
  Function CountOccurrences(ByVal p_strStringToCheck As String, ByVal p_strSubString As String, ByVal p_boolCaseSensitive As Boolean) As Integer
        Dim arrstrTemp() As String
        Dim strBase, strToFind As String

        If p_boolCaseSensitive Then
            strBase = p_strStringToCheck
            strToFind = p_strSubString
        Else
            strBase = p_strStringToCheck.ToLower
            strToFind = p_strSubString.ToLower
        End If

        arrstrTemp = strBase.Split(strToFind)
        CountOccurrences = arrstrTemp.GetUpperBound(0)
    End Function
results in big improvemets in that one routine - if this is called many times in a loop that could be a big win.
 
Will have another look later but running a profiler shows that changing your CountOccurrences to
Visual Basic:
  Function CountOccurrences(ByVal p_strStringToCheck As String, ByVal p_strSubString As String, ByVal p_boolCaseSensitive As Boolean) As Integer
        Dim arrstrTemp() As String
        Dim strBase, strToFind As String

        If p_boolCaseSensitive Then
            strBase = p_strStringToCheck
            strToFind = p_strSubString
        Else
            strBase = p_strStringToCheck.ToLower
            strToFind = p_strSubString.ToLower
        End If

        arrstrTemp = strBase.Split(strToFind)
        CountOccurrences = arrstrTemp.GetUpperBound(0)
    End Function
results in big improvemets in that one routine - if this is called many times in a loop that could be a big win.


Thank you for the valueable help. Will try it out and let you know. :)
 
Back
Top