Dear all,


I am trying to complete a project. My task is to read large TXT files (300 files, 2.2 GB in total) read the customers inside, perform some data calculation and write each customer data to a separate txt file.


I made a code in VB6, it runs fine, however it takes 8 days to run on a quadcore processor, utilizing 20-30% of the CPU.


Now I upgraded the code and am running on VB.NET 2008, the latest version. The speed still remains the same, and the CPU usage is also about the same.


Anyway I can make this process faster. I have a quadcore PC with 8GB of ram, possible to maybe make VB utilize all the CPU say 100%? so it executes atleast 2 to 3 times faster.


Here is my code in VB.NET 2008:



Option Strict Off
Option Explicit On
Imports VB = Microsoft.VisualBasic

Public Class Form1
Inherits System.Windows.Forms.Form
Dim rmr_files() As String 'Array containing directories and rmr data file names

Public Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

   Call ReadRMRFileNamesIntoArray

   'Read RMR data files
   Call ReadRMRDataFileIntoTextFiles

End Sub

Public Sub ReadRMRDataFileIntoTextFiles()

   Dim str_Renamed As String
   Dim str2 As String
   Dim sJoin As String
   Dim row As Integer
   Dim customer_name As String
   Dim line As Integer
   Dim line2 As Integer
   Dim count As Integer
   Dim count2 As Integer
   Dim countdata1 As Integer
   Dim countdata2 As Integer
   Dim writetofile As String
   Dim writecustid As String
   Dim custrecidfile As Boolean
   Dim custpresent As Boolean

   Dim datewrite As String
   Dim timewrite As String
   Dim kwhwrite As String
   Dim tempdate As String
   Dim temptime As String
   Dim tempkwh As String
   Dim custfileexists As Boolean
   Dim origfilepath As String

   Dim i As Integer
   Dim sArray As Object
   Dim custArray As Object
   Dim myObjFs1 As Scripting.FileSystemObject
   Dim objWrite1 As Object
   Dim myObjFs2 As Scripting.FileSystemObject
   Dim objWrite2 As Object
   Dim col As Integer
   Dim tempArray As Object
   Dim myObjFs3 As Scripting.FileSystemObject
   Dim objWrite3 As Object
   Dim year_Renamed As String
   Dim month_Renamed As String
   Dim day_Renamed As String
   Dim splitrmrdata1() As String
   Dim splitrmrdata2() As String
   Dim customerids() As String

   For count = LBound(rmr_files) To UBound(rmr_files)

       'Open rmr data file and begin to read
       splitrmrdata1 = Split(returnContents(rmr_files(count)), vbNewLine)

       row = 1
       line = 1
       For countdata1 = LBound(splitrmrdata1) To UBound(splitrmrdata1)
           str_Renamed = Trim(Pack(StripOut(splitrmrdata1(countdata1), """")))

           'Split string into separate words and characters

           sArray = Split(str_Renamed, " ")
           For i = LBound(sArray) To UBound(sArray)
               sArray(i) = """" & sArray(i) & """"

           'Join back array to convert into csv format
           sJoin = Join(sArray, ",")
           If UCase(Mid(sJoin, 2, 8)) = "RECORDER" Then
               sJoin = Replace(sJoin, "RECORDER"",""ID", "RECORDER ID")
           End If

           'New customer found
           If InStr(sJoin, "RECORDER") <> 0 Then
               row = 1
           End If

           'Open new file
           'Get the name of the customer from second line after the "RECORDER" line
           If row = 1 Then
               splitrmrdata2 = Split(returnContents(rmr_files(count)), vbNewLine)

               line2 = 1
               For countdata2 = LBound(splitrmrdata2) To UBound(splitrmrdata2)
                   str2 = Trim(Pack(StripOut(splitrmrdata2(countdata2), """")))

                   If line2 = line + 1 Then
                       'Split string into separate words and characters

                       custArray = Split(str2, " ")
                       'Get the name of customer (Recorder ID)
                       customer_name = custArray(0)

                       Exit For
                   End If
                   line2 = line2 + 1
           End If

           'Check if customer recorder id file exists
           custrecidfile = FileExists(my.Application.Info.DirectoryPath & "\customerids.txt")

           'If file does not exist - create the file and add the first customer id
           If custrecidfile = False Then

               myObjFs1 = New Scripting.FileSystemObject

               'Create an empty text file
               myObjFs1 = CreateObject("Scripting.FileSystemObject")
               objWrite1 = myObjFs1.CreateTextFile(my.Application.Info.DirectoryPath & "\customerids.txt")

               'Write to the text file and close it
               objWrite1.WriteLine (customer_name)
           End If

           'Get contents of customer recorder id file list
           customerids = Split(returnContents(my.Application.Info.DirectoryPath & "\customerids.txt"), vbNewLine)

           'Check if customer present in customer id list or not
           custpresent = False
           For count2 = LBound(customerids) To UBound(customerids)
               If StrComp(customerids(count2), Trim(Pack(customer_name)), 1) = 0 Then
                   custpresent = True
                   Exit For
               End If

           'If customer already added in list - do not add
           'Else if customer not added - add into list
           If custpresent = False Then

               myObjFs2 = New Scripting.FileSystemObject

               'Create an empty text file
               myObjFs2 = CreateObject("Scripting.FileSystemObject")
               objWrite2 = myObjFs2.OpenTextFile(my.Application.Info.DirectoryPath & "\customerids.txt", Scripting.IOMode.ForAppending, True)

               'Write to the text file and close it
               objWrite2.WriteLine (customer_name)
           End If

           'If line is not empty, only then proceed
           If sJoin <> "" Then

               'If row does not contain names like RECORDER ID, DATE, HOUR etc the continue
               If row <> 1 Then

                   'Write data into text file rows

                   tempArray = Split(sJoin, ",")
                   tempdate = tempArray(1)
                   temptime = tempArray(2)
                   tempkwh = tempArray(6)

                   tempdate = Trim(Pack(Replace(tempdate, """", "")))
                   temptime = Trim(Pack(Replace(temptime, """", "")))
                   tempkwh = Trim(Pack(Replace(tempkwh, """", "")))

                   'Splitting date into proper format
                   'Splitting date into: dd/mm/yy
                   day_Renamed = Microsoft.VisualBasic.Strings.Left(tempdate, 2)
                   month_Renamed = Microsoft.VisualBasic.Strings.Mid(tempdate, 3, 2)
                   year_Renamed = Microsoft.VisualBasic.Strings.Right(tempdate, 2)

                   'Adjust dd/mm/yy to dd-mm-yyyy
                   If CDbl(Microsoft.VisualBasic.Strings.Left(year_Renamed, 1)) = 8 Or CDbl(Microsoft.VisualBasic.Strings.Left(year_Renamed, 1)) = 9 Then
                       year_Renamed = "19" & year_Renamed & ""
                   ElseIf CDbl(Microsoft.VisualBasic.Strings.Left(year_Renamed, 1)) = 0 Or CDbl(Microsoft.VisualBasic.Strings.Left(year_Renamed, 1)) = 1 Then
                       year_Renamed = "20" & year_Renamed & ""
                   End If

                   'Set date format: dd-mm-yyyy
                   datewrite = "" & day_Renamed & "-" & month_Renamed & "-" & year_Renamed & ""
                   timewrite = temptime
                   kwhwrite = tempkwh

                   'If file does not exist create it
                   'If file exists - open it, write to it and close it.
                   origfilepath = my.Application.Info.DirectoryPath & "\" & customer_name & ".txt"

                   myObjFs3 = New Scripting.FileSystemObject
                   custfileexists = FileExists(origfilepath)

                   'If temp file does not exist, create empty text file
                   If custfileexists = False Then
                       myObjFs3 = CreateObject("Scripting.FileSystemObject")
                       objWrite3 = myObjFs3.CreateTextFile(origfilepath)
                       myObjFs3 = CreateObject("Scripting.FileSystemObject")
                       objWrite3 = myObjFs3.OpenTextFile(origfilepath, Scripting.IOMode.ForAppending, True)
                   End If

                   'Write to text file and close it
                   writetofile = "" & datewrite & "," & timewrite & "," & kwhwrite & ""
                   objWrite3.WriteLine (writetofile)

               End If
           End If

           'Increment the row
           row = row + 1
           line = line + 1


End Sub

You might want to start by replacing the use of FileSystemObject with the inbuilt .Net classes and methods found under system.io e.g


objWrite1 = myObjFs1.CreateTextFile(My.Application.Info.DirectoryPath & "\customerids.txt")

'could be replaced with
Dim sw As StreamWriter
sw = File.CreateText(My.Application.Info.DirectoryPath & "\customerids.txt")


myObjFs2 = New Scripting.FileSystemObject

'Create an empty text file
myObjFs2 = CreateObject("Scripting.FileSystemObject")
objWrite2 = myObjFs2.OpenTextFile(My.Application.Info.DirectoryPath & "\customerids.txt", Scripting.IOMode.ForAppending, True)

'could be replaced by
Dim sw As StreamWriter
sw = File.AppendText(My.Application.Info.DirectoryPath & "\customerids.txt")


I would also look at replacing the various functions found under Microsoft.VisualBasic.Strings with the methods of the string class directly.


Without having access to the Pack, StripOut and ReturnContents methods I couldn't say if any optimisations could also be made there.


That should get you started, if there are still issues feel free to post back here though.

Thanks you for your expert advise. I will use the streamwriter as you have mentioned.


Yes I use Left(), Right() and Mid() functions in VB6, not sure how I can correctly replace those Microsoft.VisualBasic.Strings with the methods of the string class. Any tips for replacing these?


I have also considered to remove "DoEvents" now, since in VB6 the GUI might not respond, but in .NET it seems to run a bit more faster now.


As per your request, here are the three functions, returnContents, Pack and StripOut.



   'Read all data in the text file into array
   Public Function returnContents(ByVal strFile As String) As String
       Dim filenum As Short
       filenum = FreeFile()
       FileOpen(filenum, strFile, OpenMode.Input)
       returnContents = InputString(1, LOF(filenum))
   End Function

   'Remove extra white spaces in string
   Public Function Pack(ByRef str_Renamed As String) As String
       Dim words As Object
       Dim X As Integer
       Dim temp As String

       words = Split(str_Renamed, " ")
       For X = LBound(words) To UBound(words)
           If words(X) <> "" Then
               temp = temp & " " & words(X)
           End If
       Next X
       Pack = temp
   End Function

   Public Function FileExists(ByRef OrigFile As String) As Object
       Dim fs As Object
       fs = CreateObject("Scripting.FileSystemObject")
       FileExists = fs.FileExists(OrigFile)
   End Function

   Public Function StripOut(ByRef From As String, ByRef What As String) As String
       Dim i As Short

       StripOut = From
       For i = 1 To Len(What)
           StripOut = Replace(StripOut, Mid(What, i, 1), "")
       Next i
   End Function

The string handling stuff could use String.Substring e.g.

Dim s as string
Dim h as string =  "Hello World"
s = Lefth, 3)
s = Right(h,3)
s = Mid(h, 2, 3)
'could be written as 
s = h.SubString(3)
s = h.SubString(h, h.Length -2, 3)
s = h.SubString(h,2,3)

The FileExists Method could be replaced with System.Io.File.Exists() and will save instantiating the FileSystemObject every time.


The ReturnContents could be replaced with

Public Function returnContents(ByVal strFile As String) As String
Dim sr as new StreamReader(strFile)
dim s as string = sr.ReadToEnd()
return s
End Function


In the pack method you might want to declare words as string rather than object.


I'm not sure what the StripOut method is supposed to be doing from just glancing at it - any chance you could give an explanation?

Thanks for all these wonderful suggestions. I have implemented the streamwriter and am in process of doing the rest as now.


The "stripout" function basically removes certain characters from a string. Similar to replace, the string to be replaced is "" (null) so nothing is replaced and occurrences of that character in the entire string are deleted.

Any faster method for the "replace" or stripout function?? Replace takes one of the most longest time to execute as so I saw.



Also I join (concat) strings in a very terrible way I see.

Any better way to achieve this??


datewrite = "" & day_Renamed & "-" & month_Renamed & "-" & year_Renamed & ""
writetofile = "" & datewrite & "," & timewrite & "," & kwhwrite & ""


Thanks, everything works well now, after I have implemented the changes you have mentioned. Just one thing..


In the "Pack" function when I Dim "words" as String rather than Object, I get the following errors and it doesn't compile:


Error	1	Value of type '1-dimensional array of String' cannot be converted to 'String'.	C:\Users\jawad\Desktop\WindowsApplication1\WindowsApplication1\Form1.vb	418	17	WindowsApplication1
Error	2	Value of type 'String' cannot be converted to 'System.Array'.	C:\Users\jawad\Desktop\WindowsApplication1\WindowsApplication1\Form1.vb	419	24	WindowsApplication1
Error	3	Value of type 'String' cannot be converted to 'System.Array'.	C:\Users\jawad\Desktop\WindowsApplication1\WindowsApplication1\Form1.vb	419	41	WindowsApplication1

Not really looked at the performance of string.replace under .Net, especially the later versions - it might be worth trying it and seeing if the performance does suffer.


I suppose you could also try something like

       Dim res As String = String.Empty

       For Each c As Char In From
           If c <> What Then
               res &= c
           End If

       Return res

and see if that compares - I certainly wouldn't take my code as an improvement without doing some real performance testing though. An alternative might be to investigate using RegEx.Replace instead - again I have no idea how this will affect the performance but it is worth considering.

