kcwallace Posted April 18, 2006 Posted April 18, 2006 I have a flat file that can contain up to 20000 lines. Each line represents a record that needs parsed and added to a database. I have code that works, however, I was wondering if there was a faster way to add the data to a database. Currently I am looping throuhg each line and adding each row's data executing a command. I have also tried updating the database using a dataset and dataadapter. Both work. Is there a better way that I am missing? Quote Go Beavs!!!
*Experts* Nerseus Posted April 18, 2006 *Experts* Posted April 18, 2006 Going faster can mean lots of things. Let us know what you're currently doing so we can offer more help. Here are just a few things that may come up: 1. DataSets with DataAdapters will be slower than straight Command objects (using ExecuteNonQuery for example). 2. Parsing the file with Substring will be faster than using Regular Expressions. 3. If you're using SQL Server 2000 maybe you can use bcp or DTS (Bulk Copy Program and Data Transformation Services). 4. If you're using SQL Server 2005 maybe you can use Integration Services (the new version of DTS that let's you write a Visual Studio application hosted inside of SQL Server). 5. If you're using .NET 2.0 with SQL Server you may be able to use the new SqlBulkCopy object. If you just want tweaks to what you currently have, let us see some of the code you're using so we can offer specific help. -ner Quote "I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut
kcwallace Posted April 18, 2006 Author Posted April 18, 2006 (edited) The Current Code: I know I can tweak it more by using parameterized update statements, but that is just a tweak. What are "Bulk Copy Program and Data Transformation Services" and "Parsing the file with Substring"? I am using VS2005 with SQL Server 2000 While Not input Is Nothing PB.Maximum = RowCnt Cnt += 1 strCnt = 1 WriteToLog(FileLog.Current, "Evaluating the Length of the Line. Len(input)=" & Len(input)) If Len(input) > 15 Then SQL = "INSERT INTO badges (AssmDate, Acc, Loc, WearNum, WearDate, UID, PID, BT, BR, BP, Freq, WearPeriod, Daily, WearerName, BU, BadgeID, WhoAssm, FName) VALUES (" 'FileTmplt(X) is an array containing the structure of the datafile. I.e., AssmDate= character 1-8 For X = 1 To UBound(FileTmplt) If DTypeArr(X) = 1 Then 'string field TmpVal = Mid(input, strCnt, FileTmplt(X)) TmpVal = ADO.RepSngQut(TmpVal) 'removes all single quotes from the field data SQL = SQL & "'" & TmpVal & "', " ElseIf DTypeArr(X) = 3 Then 'date field TmpVal = Mid(input, strCnt, FileTmplt(X)) TmpVal = "'" & Mid(TmpVal, 5, 2) & "/" & Mid(TmpVal, 7, 2) & "/" & Mid(TmpVal, 1, 4) & "'" SQL = SQL & TmpVal & ", " Else 'number SQL = SQL & Mid(input, strCnt, FileTmplt(X)) & ", " End If strCnt += FileTmplt(X) Next X SQL = SQL & "'" & FileSrch & "')" ADO.ActionSQL(SQL, Cnn) 'creates a SQLCommand and performs an ExecuteNonQuery WriteToLog(FileLog.Current, "Reading Line") End If input = sr.ReadLine() If Not input Is Nothing Then WriteToLog(FileLog.Current, input) Else WriteToLog(FileLog.Current, "End of File") End If Label2.Text = "Loaded " & Cnt & " of " & RowCnt & " Records" PB.Value = Cnt Me.Refresh() Application.DoEvents() End While Edited April 18, 2006 by kcwallace Quote Go Beavs!!!
*Experts* Nerseus Posted April 18, 2006 *Experts* Posted April 18, 2006 Ah, code! First the questions you asked about... You are already doing "substring" parsing (you use Mid(), a VB6 leftover). Some people define a line from their flat file in terms of Regular Expressions (using the RegEx object) and let that object do the parsing. It's very readable code (maybe not the regular expression itself), but slower than finding chunks of a string using Mid. BCP is the fastest way to get data into the database. bcp is a command line tool for SQL Server that takes an input file and a layout file (defines the fields, the delimiter, whether strings have quotes on them, etc.) and sucks the file into the table - VERY fast. DTS is the next best thing. In fact, if you design DTS correctly it will use bcp behind the scenes. It does allow you to "transform" the data, by sticking a piece of VBScript in the middle. In other words, it will generate a default script for you along the lines of: Sub Main() DTSDestination("Field1") = DTSSource("Field1") DTSDestination("Field2") = DTSSource("Field2") End Sub In there you can write your own logic, including subroutines and more. DTS is also very fast and allows you to tweak the reading/writing. For small to medium sized complexity, I'd use DTS over a custom EXE. Since you're using VS2005, you may be able to use the new SqlBulkCopy object. I admit I haven't had a chance to try it out yet, but it supposedly lets you write your VB.NET application as you want, and take advantage of BCP for speed. Now, for your sample code: One of the first things I would do is change all the string variables to use StringBuilder object. Every time you concatenate a string, it must allocate space for the new string and then copy the old string plus the new to the new location. Here's your code with some "dings" for each string copy: ' Ding 1 for the ReadLine() method If Len(input) > 15 Then ' Ding 2 SQL = "INSERT INTO badges (..." For X = 1 To UBound(FileTmplt) If DTypeArr(X) = 1 Then ' Ding 3 TmpVal = Mid(input, strCnt, FileTmplt(X)) ' Ding 4 (maybe more, inside of RepSngQut) TmpVal = ADO.RepSngQut(TmpVal) ' Ding 5 (I think only one ding, although multiple strings are concatenated) SQL = SQL & "'" & TmpVal & "', " Else ' ... End If Next X ' Ding 6 SQL = SQL & "'" & FileSrch & "')" ADO.ActionSQL(SQL, Cnn) WriteToLog(FileLog.Current, "Reading Line") End If As the example shows, there are 6 dings (string copies) for each line in the file. If your file is 10 meg (relatively small for some jobs), that's like reading and copying around 60 meg of data. Before making any changes, I'd do some profiling to see how things look before/after. Some changes may help, some may not. As you pointed out, you may also try using SqlParameter objects - you only have to build the collection of parameters one time, then reset the values on each loop. You may also try the good old do-it-yourself-batching. That is, save up 20 to 50 of the final SQL statements into a string and execute them all at once. So the string will look like: INSERT INTO badges (AssmDate, Acc, Loc, WearNum... INSERT INTO badges (AssmDate, Acc, Loc, WearNum... INSERT INTO badges (AssmDate, Acc, Loc, WearNum... Then one call to ADO.ActionSQL(SQL, Cnn) to commit them all. May take a little extra work for logging, to handle problems with failed logs (like a separate routine that loops through the big string and commits each INSERT one at a time in case of bad data). -ner Quote "I want to stand as close to the edge as I can without going over. Out on the edge you see all the kinds of things you can't see from the center." - Kurt Vonnegut
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.