Streaming Techniques

joe_pool_is

Contributor
Joined
Jan 18, 2004
Messages
507
Location
Longview, TX [USA]
What are the Pros and Cons with performing a stream operation in chunks as opposed to performing that same operation on the full file?

"In Chunks" Example:
Code:
void ReadFileInChunks(string file) {
  using (FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read)) {
    int len;
    byte[] buffer = new byte[1024];
    do {
      len = fs.Read(buffer, 0, 1024);
      Console.WriteLine("Read 1024 bytes of data.");
    } while (0 < len);
    fs.Close();
  }
}
"Full File" Example:
Code:
void ReadFileAtOnce(string file) {
  using (FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read)) {
    byte[] buffer = new byte[fs.Length];
    if (fs.Read(buffer, 0, buffer.Length) == buffer.Length) {
      Console.WriteLine("Finished!");
    }
    fs.Close();
  }
}
I just want to make sure I use a technique that will give me the greatest results while not putting my code into a dangerous situation.

I'm guessing the ReadFileAtOnce method works fine as long as there is enough RAM to read the entire file. Passing ReadFileAtOnce a 6GB ZIP file backup of a DVD would likely cause the method to fail.

Any other thoughts on the subject?
 
Reading in chunks gives you the abbility to "resume" where you left off; if you are doing this in a single thread, it lets you call Application.DoEvents() to refresh your UI (you may want to investigate threading though...). It will also prevent OutOfMemory exceptions from being thrown.

IMO, the "chunk" method is best.
 
I've seen 4096 (4k) used alot. I'm not sure what the most efficient number of byte is.. A larger size will be faster because there are fewer drive operations, but too large a size you'll end up waiting for the drive to read the data.

Play with some numbers and see what works best for your situation.
 
The origins of 4k

I believe the size of 4096 (4kB) most likely came into common usage as it is historically the default size of a memory page on x86, meaning it represented a good size/speed tradeoff. It is a value I generally use for data buffers.

The chunk approach is definitely preferable - in certain situations I even go as far as to copy data a byte at a time.

Good luck :cool:
 
Back
Top