awyeah Posted September 1, 2009 Posted September 1, 2009 Dear all, I have created a code in VB.NET to read data from text files. Data is read from a list of files, where each file is changed using a for loop. Data is written to a new text files. Each file is read one by one and written in the same way. Now, my speed of execution is very slow. I am using a Quadcore processor with only 20-30% of CPU utilization when my code runs. Is there anyway I can increase the speed of reading and writing? To read only 125 files it takes 10 minutes or more, which is very slow indeed, because in the end I need to read thousands of files and write them. Each file is approxiamately 30-50kb. Here is my code. Public Sub ReadRMRDataFileIntoTextFiles() 'Read in the customerids once up front Dim customerids As Collections.Generic.List(Of String) Dim idFileName As String = customer_id_file If IO.File.Exists(idFileName) Then customerids = IO.File.ReadAllLines(idFileName).ToList() Else customerids = New Collections.Generic.List(Of String)() End If 'now process files current_rmrfile = 0 For Each curFile As String In rmr_files Dim customer_name As String customer_name = "" 'Open rmr data file Dim RmrData() As String = IO.File.ReadAllLines(curFile) For Each curLine As String In RmrData 'RemoveEmptyEntires option takes care of Pack() and Trim() 'If line has proper data inside to be read If InStr(curLine, "METER") > 0 Then If InStr(curLine, ":") > 0 Then Dim newcurLine() As String = curLine.Replace(" ", "").Split(":") customer_name = Trim(newcurLine(1)) 'If customer already added in list - do not add 'Else if customer not added - add into list If Not customerids.Contains(customer_name) Then customerids.Add(customer_name) IO.File.AppendAllText(idFileName, customer_name & vbCrLf) End If ElseIf InStr(curLine, "=") > 0 Then Dim newcurLine() As String = curLine.Replace(" ", "").Split("=") customer_name = Trim(newcurLine(1)) 'If customer already added in list - do not add 'Else if customer not added - add into list If Not customerids.Contains(customer_name) Then customerids.Add(customer_name) IO.File.AppendAllText(idFileName, customer_name & vbCrLf) End If End If End If 'Split and Join string to apply "Trim" and "Pack" words = curLine.Trim(" ").Split(vbTab) 'Count occurences of string countchar1 = CountOccurrences(curLine, "/", False) countchar2 = CountOccurrences(curLine, ":", False) 'If data has started, then read it If countchar1 = 2 And countchar2 = 1 And words.Length >= 1 Then 'Get data from line Dim trimwords As String = String.Join(" ", words) Dim datewrite As String = trimwords.Substring(0, 10) Dim timewrite As String = trimwords.Substring(11, 5) Dim kwhwrite As String = words(1) 'Splitting date Dim day_write As String = datewrite.Substring(3, 2) Dim month_write As String = datewrite.Substring(0, 2) Dim year_write As String = datewrite.Substring(6, 4) datewrite = String.Format("{0}-{1}-{2}", day_write, month_write, year_write) ''''Time If timewrite = "24:00" Then timewrite = "00:00:00" Else timewrite = String.Format("{0}:{1}", timewrite, "00") End If Dim writetofile As String = String.Format("{0},{1},{2}", datewrite, timewrite, kwhwrite & vbCrLf) IO.File.AppendAllText(app_dir & "\" & customer_name & ".txt", writetofile) Else 'If data has not yet started, skip the initial lines Continue For End If Next curLine current_rmrfile = current_rmrfile + 1 UpdateProgressBar() Next curFile System.Threading.Thread.Sleep(3000) Me.Close() End Sub Function CountOccurrences(ByVal p_strStringToCheck, ByVal p_strSubString, ByVal p_boolCaseSensitive) Dim arrstrTemp Dim strBase, strToFind If p_boolCaseSensitive Then strBase = p_strStringToCheck strToFind = p_strSubString Else strBase = LCase(p_strStringToCheck) strToFind = LCase(p_strSubString) End If arrstrTemp = Split(strBase, strToFind) CountOccurrences = UBound(arrstrTemp) End Function One of the sample data files to read. Service Point ID=060430_00001587 AKAUN=601011 METER=28509864 DATE/TIME=01/05/2009 00:00 TO 30/06/2009 00:00 A= KWH IMPORT B= KWH EXPORT C= KVARH IMPORT D= KVARH IMPORT DATE TIME A B C D 05/01/2009 00:30 74 50 0 0 05/01/2009 01:00 77 61 0 0 05/01/2009 01:30 76 62 0 0 05/01/2009 02:00 77 60 0 0 05/01/2009 02:30 76 61 0 0 05/01/2009 03:00 76 61 0 0 05/01/2009 03:30 77 62 0 0 05/01/2009 04:00 76 61 0 0 05/01/2009 04:30 76 51 0 0 05/01/2009 05:00 73 49 0 0 05/01/2009 05:30 75 50 0 0 05/01/2009 06:00 74 50 0 0 05/01/2009 06:30 74 49 0 0 05/01/2009 07:00 75 50 0 0 05/01/2009 07:30 73 48 0 0 05/01/2009 08:00 74 50 0 0 05/01/2009 08:30 76 62 0 0 05/01/2009 09:00 72 59 0 0 05/01/2009 09:30 71 59 0 0 All help is appreciated. Quote
Administrators PlausiblyDamp Posted September 1, 2009 Administrators Posted September 1, 2009 I should have more time to have a proper look at this later, at a glance though you seem to be leaving a lot of method parameters and return types as object e.g. CountOccurrences should really return an integer and the parameters should be specified as string, string and boolean - this can stop a lot of runtime data type checks and coercions. As a quick idea try adding Option Explicit to the top of the source file and fix any errors it generates due to not specifying data types - it may not make a big difference but if this code is being executed repeatedly in a loop it could be noticeable improvement. Quote Posting Guidelines FAQ Post Formatting Intellectuals solve problems; geniuses prevent them. -- Albert Einstein
awyeah Posted September 1, 2009 Author Posted September 1, 2009 I should have more time to have a proper look at this later, at a glance though you seem to be leaving a lot of method parameters and return types as object e.g. CountOccurrences should really return an integer and the parameters should be specified as string, string and boolean - this can stop a lot of runtime data type checks and coercions. As a quick idea try adding Option Explicit to the top of the source file and fix any errors it generates due to not specifying data types - it may not make a big difference but if this code is being executed repeatedly in a loop it could be noticeable improvement. I have included some explanations and reduced the code to see what I am doing. Then maybe you can have an idea of how I can fire multiple threads using this type of mechanism. 'Go through each in file list For Each curFile As String In rmr_files 'Open file and read all data Dim RmrData() As String = IO.File.ReadAllLines(curFile) 'For each line in file, read data and process For Each curLine As String In RmrData 'Do some data processing here ......................................... ......................................... 'Write processed data to a new text file - USE APPEND IO.File.AppendAllText(app_dir & "\" & customer_name & ".txt", writetofile) 'Move to the next line in the file till some criteria is met Next curLine 'Move to the next file in the list Next curFile End Sub Hope this helps. I have already included the Option Explicit for the data type. Just didn't paste it since the code was getting too long. Here it is now, the rest of it. Option Strict Off Option Explicit On Public Class Form1 Dim app_dir As String 'Location to application directory Dim tempfiles_dir As String 'Location to \TempFiles directory Dim customer_id_file As String 'Location to file \customerids.txt Dim rmr_file_list As String 'Location to rmrfiles.txt file Dim rmr_files() As String 'Array containing directories and rmr data file names Dim count_rmrfiles As Long 'Number of rmr data files Dim current_rmrfile As Integer 'Current file being read Dim customerids As Collections.Generic.List(Of String) Dim idFileName As String = customer_id_file Dim words() As String Dim countchar1 As String Dim countchar2 As String Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load Dim th As New System.Threading.Thread(AddressOf ReadRMRDataFileIntoTextFiles) 'Set directories and files app_dir = Application.StartupPath tempfiles_dir = app_dir & "\TempFiles\" rmr_file_list = app_dir & "\TempFiles\loadprofilefiles.txt" customer_id_file = app_dir & "\customerids.txt" 'Read file names from a text file into an array.. this only takes a sec or so.. Call ReadLoadProfileFilesIntoArray() ProgressBar1.Minimum = 0 ProgressBar1.Maximum = count_rmrfiles ProgressBar1.Value = 0 'Start reading rmr data files th.Start() End Sub Private Sub UpdateProgressBar() If Me.InvokeRequired Then Me.Invoke(New MethodInvoker(AddressOf UpdateProgressBar)) Else ProgressBar1.Value = current_rmrfile End If End Sub As for the CountOccurrences sub I found it somewhere on the Internet, done by someone. I also found RegExp doing the same thing, however I found it regular expression matching is slower. Quote
Administrators PlausiblyDamp Posted September 1, 2009 Administrators Posted September 1, 2009 Will have another look later but running a profiler shows that changing your CountOccurrences to Function CountOccurrences(ByVal p_strStringToCheck As String, ByVal p_strSubString As String, ByVal p_boolCaseSensitive As Boolean) As Integer Dim arrstrTemp() As String Dim strBase, strToFind As String If p_boolCaseSensitive Then strBase = p_strStringToCheck strToFind = p_strSubString Else strBase = p_strStringToCheck.ToLower strToFind = p_strSubString.ToLower End If arrstrTemp = strBase.Split(strToFind) CountOccurrences = arrstrTemp.GetUpperBound(0) End Function results in big improvemets in that one routine - if this is called many times in a loop that could be a big win. Quote Posting Guidelines FAQ Post Formatting Intellectuals solve problems; geniuses prevent them. -- Albert Einstein
awyeah Posted September 1, 2009 Author Posted September 1, 2009 Will have another look later but running a profiler shows that changing your CountOccurrences to Function CountOccurrences(ByVal p_strStringToCheck As String, ByVal p_strSubString As String, ByVal p_boolCaseSensitive As Boolean) As Integer Dim arrstrTemp() As String Dim strBase, strToFind As String If p_boolCaseSensitive Then strBase = p_strStringToCheck strToFind = p_strSubString Else strBase = p_strStringToCheck.ToLower strToFind = p_strSubString.ToLower End If arrstrTemp = strBase.Split(strToFind) CountOccurrences = arrstrTemp.GetUpperBound(0) End Function results in big improvemets in that one routine - if this is called many times in a loop that could be a big win. Thank you for the valueable help. Will try it out and let you know. :) Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.