booler Posted October 7, 2005 Posted October 7, 2005 (edited) Binary File Reading Code Optimization Hi guys, I am having a problem with my binary file reading, and wonder if anybody knows a better way to achieve what I am getting at. I am trying to read in a binary database file record by record. Each record is split into fields, and each record contains different data types (which are known at runtime). I have to cast each data field to an appropriate .NET type, and perform a calculation on each one. So far, so good, except the performance is not what I had hoped. In the database there are around 50 million records, and each is 32 bytes. I need to complete the full read in less than 45 seconds- and so far I cannot get it to run in less than 150 seconds. I am reading the fields like this (binary reader is already assigned): Public Function Read() As Boolean If Me.cursor >= Me.recordcount Then Return (False) End If Try 'instantiate custom structure to hold byte array for record Me.currentRecord = New DataRecord(Me.recordsize) 'buffer holds System.Collections.Queue containing next 100 records If Me.buffer.Count = 0 Then Me.RefillBuffer(Me.buffer) End If 'assign byte array inside custom structure to current record by pulling 'next byte array from queue Me.currentRecord.data = CType(Me.buffer.Dequeue, Byte()) 'increment record counter Me.cursor += 1 Return (True) Catch ex As Exception Throw New System.Data.DataException("File is not accessible.") End Try End Function So the idea is there is a custom structure which points to the current record, and a queue which reads and holds 100 records which is incrementally dequeued, and then refilled. This is the code for the queue refilling: Public Function RefillBuffer(ByRef buffer As Queue) For i as integer = 0 To 99 'add record to queue if records remaining If Me.currentfillpointer < Me.recordcount Then buffer.Enqueue(CType(Me.dbfReader.ReadBytes(Me.recordsize), Byte())) Me.currentfillpointer += 1 Else Exit For End If Next End Function And finally this is the code for the custom structure that holds the data for indvidual records: Public Structure FoxproDataRecord Public data As Byte() Private length As Integer 'constructor to pass in record length Public Sub New(ByVal dataLength As Integer) length = dataLength data = New Byte(dataLength) {} End Sub End Structure The actual data casts are running reasonably quickly, but the data reading is just not fast enough. Does anyone have any ideas on how I can speed this up? Thanks, Adam Edited October 7, 2005 by booler Quote
Diesel Posted October 7, 2005 Posted October 7, 2005 Check out Binary Serialization. It's probably faster and will do most of the work for you. Best thing since random access files. Quote
booler Posted October 10, 2005 Author Posted October 10, 2005 Check out Binary Serialization. It's probably faster and will do most of the work for you. Best thing since random access files. Hi! Thanks for the reply. I have had a look at the BinaryFormatter class- is this what you mean? As far as I can see, it it has one deserialize method to which you pass a filestream object. However, I cannot deserialize the whole file without using some kind of buffer because the file is 2Gb. Do you know of any way to deserialize a file in smaller pieces? I can see that this approach could be quick if I was able to create something like a custom structure to cast the returned data to. My other problem with this is that, although the data structure is known at runtime, it is not known at design time, so this limits my options in terms of constructing a custom container for the data. Do you have any ideas how I might get around this? Thanks for your help, Adam Quote
Administrators PlausiblyDamp Posted October 10, 2005 Administrators Posted October 10, 2005 The deserialise method accepts a stream as a parameter and will deserialise the next object at the current file location - it doesn't attempt to deserialise the entire file in one go. If the structures are at known boundaries (seems to be the case if they are all 32 bytes long), you could read a chunk of the file in to a byte array and process that - then read the next chunk and so forth. Quote Posting Guidelines FAQ Post Formatting Intellectuals solve problems; geniuses prevent them. -- Albert Einstein
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.