Runtime_error Posted July 19, 2004 Posted July 19, 2004 :mad: Hi everyone I have a 90mb CSV file which the application needs to read. I can do it using the OleDb or ODBC data classes but i am stuck on deciding what is the best way. Forexample , when we read large amount of data in a Dataset, doesnt it just uses up a lot of the memory? What is the best way to process a large amount of data from a source such as text files or csv files. Do we read the records line by line or do we read the entire file into a dataset or some buffer and then process it. Please advice as to what the best approach would be. Thanks. Quote
Arch4ngel Posted July 19, 2004 Posted July 19, 2004 May vary. :p Well ... it depends on what you have to do. Loading the whole file in memory will make futur access to it much more faster while reading line by line could cause some slowdown because of the "slow" HD ( a HD take MUCH more time to access a file than access a file in RAM). But the DataSet solution isn't the best in all situation... You'll have to explain us a little bit what you are loading... and how will you use it. Quote "If someone say : "Die mortal !"... don't stay to see if he isn't." - Unknown "Learning to program is like going out with a new girl friend. There's always something that wasn't mentioned in the documentation..." - Me "A drunk girl is like an animal... it scream at everything like a cat and roll in the grass like a dog." - Me after seeing my girlfriend drunk and some of her drunk friend. C# TO VB TRANSLATOR
Runtime_error Posted July 19, 2004 Author Posted July 19, 2004 ;) its a standard CSV file, the size is around 90 mb and will soon be going up and up. File Structure "column1","column2","column3","column4" So, if i go with line by line option, it will be slower. Loading the entire file in a DATASET or any other object is not efficient as a lot of memory will be used at once. So are there any other ways? What about File Streams? :confused: AS for wat its for, well each line of the file will be processed and then written in a Sqldatabase. Quote
Arch4ngel Posted July 19, 2004 Posted July 19, 2004 Are you exporting your CSV file to a SQL Database ? Quote "If someone say : "Die mortal !"... don't stay to see if he isn't." - Unknown "Learning to program is like going out with a new girl friend. There's always something that wasn't mentioned in the documentation..." - Me "A drunk girl is like an animal... it scream at everything like a cat and roll in the grass like a dog." - Me after seeing my girlfriend drunk and some of her drunk friend. C# TO VB TRANSLATOR
bri189a Posted July 19, 2004 Posted July 19, 2004 I think along the lines of Arch4ngel, export to SQL Database and let it do the hard work... something that large you really shouldn't be trying to handle with File i/o; SQL is designed for that kind of load. Quote
Joe Mamma Posted July 19, 2004 Posted July 19, 2004 the ideal would be to change the file creating to a console app that writes to standard output, and have your app be a console app that reads from standard input. pipe the output from the former into the later. barring that, follow the example here: [mshelp=ms-help://MS.VSCC.2003/MS.MSDNQTR.2003FEB.1033/cpguide/html/cpconreadingtextfromfile.htm]Reading Text from a File[/mshelp] Quote Joe Mamma Amendment 4: The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no warrants shall issue, but upon probable cause, supported by oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized. Amendment 9: The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people.
Arch4ngel Posted July 19, 2004 Posted July 19, 2004 To follow my point of view... you should really consider doing it on SQL-S... something that big is considered "huge" in File I/O. SQL-S is able to make it faster. It support SQL, Select, update, Delete... Don't invent the wheel again buddy... if you don't use SQL-S... at least use MsAccess or MySql if you don't have any. I'm against the use of a CSV file of 90Meg and still growing... Quote "If someone say : "Die mortal !"... don't stay to see if he isn't." - Unknown "Learning to program is like going out with a new girl friend. There's always something that wasn't mentioned in the documentation..." - Me "A drunk girl is like an animal... it scream at everything like a cat and roll in the grass like a dog." - Me after seeing my girlfriend drunk and some of her drunk friend. C# TO VB TRANSLATOR
Runtime_error Posted July 19, 2004 Author Posted July 19, 2004 lol sorry, i gave the wrong impression. The CSV file comes from another source which we dont have any control over. Our system is Sql server 2000 based. The client gets huge CSV files that the system needs to read and process. The manual process will be limited to just moving the files to a chosen directory and then Biztalk server and our application will do the rest. Basically, the data that the client gets from another company needs to be processed and entered into our Sql Database. If it was just data from one source and in one standard format then it wouldnt be bad for someone to manually import the CSV files into the tables. But the requirement is to Read the data and process it ( Merge it with data from other data sources ) , Enter it inot our Sql Database. Quote
Joe Mamma Posted July 19, 2004 Posted July 19, 2004 my suggestion is to create a temporary table (a table with name prefixed by '#') with the layout of your data file, execute bulk insert against that and do your processing against the temp table. for example. . . c:\Data.txt contains: 0,"Value0",0 1,"Value1",0.000283687943262411 2,"Value2",0.000567375886524823 3,"Value3",0.000851063829787234 my code would look something like: cmd.CommandText = "create table #importTable (i int, label varchar(255), n float)"; cmd.ExecuteNonQuery(); cmd.CommandText = "BULK INSERT #importTable from 'd:\data.txt' " + " WITH (FIELDTERMINATOR = ',' , KEEPNULLS)" cmd.ExecuteNonQuery() then I would do what ever I needed to do with #importTable remember #importTable will be dropped when the connection closes because it is temporary Quote Joe Mamma Amendment 4: The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no warrants shall issue, but upon probable cause, supported by oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized. Amendment 9: The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people.
Joe Mamma Posted July 19, 2004 Posted July 19, 2004 by the way, doing the above imported a 100mb file in 36 seconds on my 3.4 mhz HT with 1 meg of memory Quote Joe Mamma Amendment 4: The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no warrants shall issue, but upon probable cause, supported by oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized. Amendment 9: The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people.
joelny Posted April 1, 2005 Posted April 1, 2005 Same problem Hi all, I am trying to accomplish the same thing, except the file isn't in CSV format. Here's my case: I have a 25mb file with 3000 employee's record. Each record takes up exactly 5001 characters in the file, with information such as an employee's name, address, phone number, salary, paycheck information etc. I get a file like that bi-weekly, which I need to import into an Oracle database through the ASP.NET app. I will need to read the entire file, and from each 5001 characters to pick out the information I need, and call a Stored Procedure to save the infomration. The information will eventually be broken down into smaller chunks and stored into different tables (I have table for just personal information, another table for their tax information, a table for retirement plan information, a table for paycheck, etc.) Right now I'm doing the following: 1. Have an array to store the whole text file, with each block storing exactly 5001 characters, then 2. Use a Stringbuilder object to store one block of the array, then 3. Use smaller strings to save information I need from the stringbuilder, and 4. call Stored Procedures to save those information, and 5. go back to step 2 and repeat 3000 times. This eats up a lot of RAM and sometimes even crashes my computer . I am sure the above method is a very stupid one...but I'm a beginner programmer and this is the first time I work on an app with such intensive text-parsing. Now I had to asks users to open the text file in Excel, save it as a spreadsheet, and have my application read it as a database so it reads one record at a time. This has caused inconvenience to them, and I really need an alternative solution to this... The information in that text file isn't seperated by any symbols, so I am breaking it down solely by character position. Shall I still use bulk insert, put it into a temp table in Oracle, and break down the chunks from the database side? And if I do break it down from the database, how can I tell (from the application) that the whole process is finished? Thank you... Quote
limeyredneck Posted July 19, 2005 Posted July 19, 2005 Loading text files to SQL server using DTS .Being myself basically lazy, and adverse to writing any more "grunt" code than absolutely necessary, I suggest you use DTS and load the data into a "work-in-progress" (WIP) table in Sql Server. DTS gives for free all the stuff you would otherwise write such as the ability to define the file source, format, layout, select the fields to keep, create and load the data. I might suggest (for fun and experience) that you write a handler that uses the DTS system programmatically and then does any post processing you want. here's how: http://sqldev.net/DTS/DotNETCookBook.htm http://www.c-sharpcorner.com/Code/2002/Sept/DTSnCS.asp http://www.15seconds.com/issue/030909.htm Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.