awyeah Posted August 10, 2009 Posted August 10, 2009 Dear all, I am trying to complete a project. My task is to read large TXT files (300 files, 2.2 GB in total) read the customers inside, perform some data calculation and write each customer data to a separate txt file. I made a code in VB6, it runs fine, however it takes 8 days to run on a quadcore processor, utilizing 20-30% of the CPU. Now I upgraded the code and am running on VB.NET 2008, the latest version. The speed still remains the same, and the CPU usage is also about the same. Anyway I can make this process faster. I have a quadcore PC with 8GB of ram, possible to maybe make VB utilize all the CPU say 100%? so it executes atleast 2 to 3 times faster. Here is my code in VB.NET 2008: Option Strict Off Option Explicit On Imports VB = Microsoft.VisualBasic Public Class Form1 Inherits System.Windows.Forms.Form Dim rmr_files() As String 'Array containing directories and rmr data file names Public Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click Call ReadRMRFileNamesIntoArray 'Read RMR data files Call ReadRMRDataFileIntoTextFiles End Sub Public Sub ReadRMRDataFileIntoTextFiles() Dim str_Renamed As String Dim str2 As String Dim sJoin As String Dim row As Integer Dim customer_name As String Dim line As Integer Dim line2 As Integer Dim count As Integer Dim count2 As Integer Dim countdata1 As Integer Dim countdata2 As Integer Dim writetofile As String Dim writecustid As String Dim custrecidfile As Boolean Dim custpresent As Boolean Dim datewrite As String Dim timewrite As String Dim kwhwrite As String Dim tempdate As String Dim temptime As String Dim tempkwh As String Dim custfileexists As Boolean Dim origfilepath As String Dim i As Integer Dim sArray As Object Dim custArray As Object Dim myObjFs1 As Scripting.FileSystemObject Dim objWrite1 As Object Dim myObjFs2 As Scripting.FileSystemObject Dim objWrite2 As Object Dim col As Integer Dim tempArray As Object Dim myObjFs3 As Scripting.FileSystemObject Dim objWrite3 As Object Dim year_Renamed As String Dim month_Renamed As String Dim day_Renamed As String Dim splitrmrdata1() As String Dim splitrmrdata2() As String Dim customerids() As String For count = LBound(rmr_files) To UBound(rmr_files) 'Open rmr data file and begin to read splitrmrdata1 = Split(returnContents(rmr_files(count)), vbNewLine) row = 1 line = 1 For countdata1 = LBound(splitrmrdata1) To UBound(splitrmrdata1) str_Renamed = Trim(Pack(StripOut(splitrmrdata1(countdata1), """"))) 'Split string into separate words and characters sArray = Split(str_Renamed, " ") For i = LBound(sArray) To UBound(sArray) sArray(i) = """" & sArray(i) & """" Next 'Join back array to convert into csv format sJoin = Join(sArray, ",") If UCase(Mid(sJoin, 2, 8)) = "RECORDER" Then sJoin = Replace(sJoin, "RECORDER"",""ID", "RECORDER ID") End If 'New customer found If InStr(sJoin, "RECORDER") <> 0 Then row = 1 End If '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' 'Open new file 'Get the name of the customer from second line after the "RECORDER" line If row = 1 Then splitrmrdata2 = Split(returnContents(rmr_files(count)), vbNewLine) line2 = 1 For countdata2 = LBound(splitrmrdata2) To UBound(splitrmrdata2) str2 = Trim(Pack(StripOut(splitrmrdata2(countdata2), """"))) If line2 = line + 1 Then 'Split string into separate words and characters custArray = Split(str2, " ") 'Get the name of customer (Recorder ID) customer_name = custArray(0) Exit For End If line2 = line2 + 1 System.Windows.Forms.Application.DoEvents() Next End If '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' 'Check if customer recorder id file exists custrecidfile = FileExists(my.Application.Info.DirectoryPath & "\customerids.txt") 'If file does not exist - create the file and add the first customer id If custrecidfile = False Then myObjFs1 = New Scripting.FileSystemObject 'Create an empty text file myObjFs1 = CreateObject("Scripting.FileSystemObject") objWrite1 = myObjFs1.CreateTextFile(my.Application.Info.DirectoryPath & "\customerids.txt") 'Write to the text file and close it objWrite1.WriteLine (customer_name) objWrite1.Close() End If 'Get contents of customer recorder id file list customerids = Split(returnContents(my.Application.Info.DirectoryPath & "\customerids.txt"), vbNewLine) 'Check if customer present in customer id list or not custpresent = False For count2 = LBound(customerids) To UBound(customerids) If StrComp(customerids(count2), Trim(Pack(customer_name)), 1) = 0 Then custpresent = True Exit For End If System.Windows.Forms.Application.DoEvents() Next 'If customer already added in list - do not add 'Else if customer not added - add into list If custpresent = False Then myObjFs2 = New Scripting.FileSystemObject 'Create an empty text file myObjFs2 = CreateObject("Scripting.FileSystemObject") objWrite2 = myObjFs2.OpenTextFile(my.Application.Info.DirectoryPath & "\customerids.txt", Scripting.IOMode.ForAppending, True) 'Write to the text file and close it objWrite2.WriteLine (customer_name) objWrite2.Close() End If 'If line is not empty, only then proceed If sJoin <> "" Then 'If row does not contain names like RECORDER ID, DATE, HOUR etc the continue If row <> 1 Then 'Write data into text file rows tempArray = Split(sJoin, ",") tempdate = tempArray(1) temptime = tempArray(2) tempkwh = tempArray(6) tempdate = Trim(Pack(Replace(tempdate, """", ""))) temptime = Trim(Pack(Replace(temptime, """", ""))) tempkwh = Trim(Pack(Replace(tempkwh, """", ""))) 'Splitting date into proper format 'Splitting date into: dd/mm/yy day_Renamed = Microsoft.VisualBasic.Strings.Left(tempdate, 2) month_Renamed = Microsoft.VisualBasic.Strings.Mid(tempdate, 3, 2) year_Renamed = Microsoft.VisualBasic.Strings.Right(tempdate, 2) 'Adjust dd/mm/yy to dd-mm-yyyy If CDbl(Microsoft.VisualBasic.Strings.Left(year_Renamed, 1)) = 8 Or CDbl(Microsoft.VisualBasic.Strings.Left(year_Renamed, 1)) = 9 Then year_Renamed = "19" & year_Renamed & "" ElseIf CDbl(Microsoft.VisualBasic.Strings.Left(year_Renamed, 1)) = 0 Or CDbl(Microsoft.VisualBasic.Strings.Left(year_Renamed, 1)) = 1 Then year_Renamed = "20" & year_Renamed & "" End If 'Set date format: dd-mm-yyyy datewrite = "" & day_Renamed & "-" & month_Renamed & "-" & year_Renamed & "" timewrite = temptime kwhwrite = tempkwh 'If file does not exist create it 'If file exists - open it, write to it and close it. origfilepath = my.Application.Info.DirectoryPath & "\" & customer_name & ".txt" myObjFs3 = New Scripting.FileSystemObject custfileexists = FileExists(origfilepath) 'If temp file does not exist, create empty text file If custfileexists = False Then myObjFs3 = CreateObject("Scripting.FileSystemObject") objWrite3 = myObjFs3.CreateTextFile(origfilepath) Else myObjFs3 = CreateObject("Scripting.FileSystemObject") objWrite3 = myObjFs3.OpenTextFile(origfilepath, Scripting.IOMode.ForAppending, True) End If 'Write to text file and close it writetofile = "" & datewrite & "," & timewrite & "," & kwhwrite & "" objWrite3.WriteLine (writetofile) objWrite3.Close() End If End If 'Increment the row row = row + 1 line = line + 1 System.Windows.Forms.Application.DoEvents() Next System.Windows.Forms.Application.DoEvents() Next End Sub Quote
Administrators PlausiblyDamp Posted August 10, 2009 Administrators Posted August 10, 2009 You might want to start by replacing the use of FileSystemObject with the inbuilt .Net classes and methods found under system.io e.g objWrite1 = myObjFs1.CreateTextFile(My.Application.Info.DirectoryPath & "\customerids.txt") 'could be replaced with Dim sw As StreamWriter sw = File.CreateText(My.Application.Info.DirectoryPath & "\customerids.txt") myObjFs2 = New Scripting.FileSystemObject 'Create an empty text file myObjFs2 = CreateObject("Scripting.FileSystemObject") objWrite2 = myObjFs2.OpenTextFile(My.Application.Info.DirectoryPath & "\customerids.txt", Scripting.IOMode.ForAppending, True) 'could be replaced by Dim sw As StreamWriter sw = File.AppendText(My.Application.Info.DirectoryPath & "\customerids.txt") I would also look at replacing the various functions found under Microsoft.VisualBasic.Strings with the methods of the string class directly. Without having access to the Pack, StripOut and ReturnContents methods I couldn't say if any optimisations could also be made there. That should get you started, if there are still issues feel free to post back here though. Quote Posting Guidelines FAQ Post Formatting Intellectuals solve problems; geniuses prevent them. -- Albert Einstein
awyeah Posted August 10, 2009 Author Posted August 10, 2009 You might want to start by replacing the use of FileSystemObject with the inbuilt .Net classes and methods found under system.io e.g objWrite1 = myObjFs1.CreateTextFile(My.Application.Info.DirectoryPath & "\customerids.txt") 'could be replaced with Dim sw As StreamWriter sw = File.CreateText(My.Application.Info.DirectoryPath & "\customerids.txt") myObjFs2 = New Scripting.FileSystemObject 'Create an empty text file myObjFs2 = CreateObject("Scripting.FileSystemObject") objWrite2 = myObjFs2.OpenTextFile(My.Application.Info.DirectoryPath & "\customerids.txt", Scripting.IOMode.ForAppending, True) 'could be replaced by Dim sw As StreamWriter sw = File.AppendText(My.Application.Info.DirectoryPath & "\customerids.txt") I would also look at replacing the various functions found under Microsoft.VisualBasic.Strings with the methods of the string class directly. Without having access to the Pack, StripOut and ReturnContents methods I couldn't say if any optimisations could also be made there. That should get you started, if there are still issues feel free to post back here though. Thanks you for your expert advise. I will use the streamwriter as you have mentioned. Yes I use Left(), Right() and Mid() functions in VB6, not sure how I can correctly replace those Microsoft.VisualBasic.Strings with the methods of the string class. Any tips for replacing these? I have also considered to remove "DoEvents" now, since in VB6 the GUI might not respond, but in .NET it seems to run a bit more faster now. As per your request, here are the three functions, returnContents, Pack and StripOut. 'Read all data in the text file into array Public Function returnContents(ByVal strFile As String) As String Dim filenum As Short filenum = FreeFile() FileOpen(filenum, strFile, OpenMode.Input) returnContents = InputString(1, LOF(filenum)) FileClose(filenum) End Function 'Remove extra white spaces in string Public Function Pack(ByRef str_Renamed As String) As String Dim words As Object Dim X As Integer Dim temp As String words = Split(str_Renamed, " ") For X = LBound(words) To UBound(words) If words(X) <> "" Then temp = temp & " " & words(X) End If Next X Pack = temp End Function Public Function FileExists(ByRef OrigFile As String) As Object Dim fs As Object fs = CreateObject("Scripting.FileSystemObject") FileExists = fs.FileExists(OrigFile) End Function Public Function StripOut(ByRef From As String, ByRef What As String) As String Dim i As Short StripOut = From For i = 1 To Len(What) StripOut = Replace(StripOut, Mid(What, i, 1), "") Next i End Function Quote
Administrators PlausiblyDamp Posted August 10, 2009 Administrators Posted August 10, 2009 The string handling stuff could use String.Substring e.g. Dim s as string Dim h as string = "Hello World" s = Lefth, 3) s = Right(h,3) s = Mid(h, 2, 3) 'could be written as s = h.SubString(3) s = h.SubString(h, h.Length -2, 3) s = h.SubString(h,2,3) The FileExists Method could be replaced with System.Io.File.Exists() and will save instantiating the FileSystemObject every time. The ReturnContents could be replaced with Public Function returnContents(ByVal strFile As String) As String Dim sr as new StreamReader(strFile) dim s as string = sr.ReadToEnd() sr.Close return s End Function In the pack method you might want to declare words as string rather than object. I'm not sure what the StripOut method is supposed to be doing from just glancing at it - any chance you could give an explanation? Quote Posting Guidelines FAQ Post Formatting Intellectuals solve problems; geniuses prevent them. -- Albert Einstein
awyeah Posted August 10, 2009 Author Posted August 10, 2009 The string handling stuff could use String.Substring e.g. Dim s as string Dim h as string = "Hello World" s = Lefth, 3) s = Right(h,3) s = Mid(h, 2, 3) 'could be written as s = h.SubString(3) s = h.SubString(h, h.Length -2, 3) s = h.SubString(h,2,3) The FileExists Method could be replaced with System.Io.File.Exists(<path>) and will save instantiating the FileSystemObject every time. The ReturnContents could be replaced with Public Function returnContents(ByVal strFile As String) As String Dim sr as new StreamReader(strFile) dim s as string = sr.ReadToEnd() sr.Close return s End Function In the pack method you might want to declare words as string rather than object. I'm not sure what the StripOut method is supposed to be doing from just glancing at it - any chance you could give an explanation? Thanks for all these wonderful suggestions. I have implemented the streamwriter and am in process of doing the rest as now. The "stripout" function basically removes certain characters from a string. Similar to replace, the string to be replaced is "" (null) so nothing is replaced and occurrences of that character in the entire string are deleted. Any faster method for the "replace" or stripout function?? Replace takes one of the most longest time to execute as so I saw. Also I join (concat) strings in a very terrible way I see. Any better way to achieve this?? datewrite = "" & day_Renamed & "-" & month_Renamed & "-" & year_Renamed & "" writetofile = "" & datewrite & "," & timewrite & "," & kwhwrite & "" Quote
awyeah Posted August 10, 2009 Author Posted August 10, 2009 Thanks, everything works well now, after I have implemented the changes you have mentioned. Just one thing.. In the "Pack" function when I Dim "words" as String rather than Object, I get the following errors and it doesn't compile: Error 1 Value of type '1-dimensional array of String' cannot be converted to 'String'. C:\Users\jawad\Desktop\WindowsApplication1\WindowsApplication1\Form1.vb 418 17 WindowsApplication1 Error 2 Value of type 'String' cannot be converted to 'System.Array'. C:\Users\jawad\Desktop\WindowsApplication1\WindowsApplication1\Form1.vb 419 24 WindowsApplication1 Error 3 Value of type 'String' cannot be converted to 'System.Array'. C:\Users\jawad\Desktop\WindowsApplication1\WindowsApplication1\Form1.vb 419 41 WindowsApplication1 Quote
Administrators PlausiblyDamp Posted August 10, 2009 Administrators Posted August 10, 2009 Not really looked at the performance of string.replace under .Net, especially the later versions - it might be worth trying it and seeing if the performance does suffer. I suppose you could also try something like Dim res As String = String.Empty For Each c As Char In From If c <> What Then res &= c End If Next Return res and see if that compares - I certainly wouldn't take my code as an improvement without doing some real performance testing though. An alternative might be to investigate using RegEx.Replace instead - again I have no idea how this will affect the performance but it is worth considering. Quote Posting Guidelines FAQ Post Formatting Intellectuals solve problems; geniuses prevent them. -- Albert Einstein
Administrators PlausiblyDamp Posted August 10, 2009 Administrators Posted August 10, 2009 That was my fault, I meant declare it as a string array i.e. words() As String Quote Posting Guidelines FAQ Post Formatting Intellectuals solve problems; geniuses prevent them. -- Albert Einstein
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.