Calculate progress of a thread or built-in method, possible?

Nazgulled

Centurion
Joined
Jun 1, 2004
Messages
119
I have this code that calculates a file md5 hash using threads and I want to know if it's possible to get the progress of this calculation so I can add a progress bar to the application (right now I have a simple animation letting the user now the application is doing something).

Here's the code of this calculation:

Visual Basic:
Delegate Sub DisplayHashCallback(ByVal text As String)

Dim fileStream As Stream
Dim hashComp As New Thread(AddressOf HashComputation)

Private Sub HashComputation()
		Dim MD5 As New MD5CryptoServiceProvider
		Dim bHash As Byte() = Nothing
		Dim sHash As String

		bHash = MD5.ComputeHash(fileStream)
		sHash = BitConverter.ToString(bHash)
		sHash = sHash.Replace("-", "").ToUpper

		DisplayHash(sHash)
	End Sub

Private Sub DisplayHash(ByVal text As String)
		If frmMain.FileHash.InvokeRequired Then
			Dim d As New DisplayHashCallback(AddressOf DisplayHash)
			Me.Invoke(d, New Object() {text})
		Else
			frmMain.FileHash.Enabled = True
			frmMain.FileHash.Text = text

			Me.Close()
		End If
	End Sub

This code is part of the WaitForm, the form with the activity animation. The WaitForm is shown when the user loads a file in the MainForm (frmMain). The WaitForm, as you can see, is the one that will compute the file hash, set the results in the MainForm and close itself.

Is it possible to do what I want?
 
If you are using .Net 2.0, the background worker thread is definitely the way to go. If not, the same principles will apply you'll just need to do a little more work. Define a custom event and raise that event every time you want to send progress. Have the event captured by the display thread.
 
Yes, I'm using .NET 2.0 but that didn't work. I saw your example on the other thread and tried to implement it on my program but it doesn't work the way I want it. As you can see in my code in the first post, the line "bHash = MD5.ComputeHash(fileStream)" is the one that will calculate the md5 hash for the file.

If the calculation takes like 5mins, the following lines of code won't be processed until that ComputeHash() method is finished...
 
I did not realize that was the blocking call. Sadly, because it is a call in the .Net framework, you can't really get any more information out. I've had similar problems in the past with this as well. I ended up using the marquee style progress bar as a work around. It doesn't give the user a percentage complete, but it does let the user know your program is still cranking and hasn't frozen.
 
Yeah, that's what I had before. Actually, I had ana niamtion of a progressbar I designed myself (with PNG files) cause when I frist started to develop this application, .NET framework 2.0 wasn't out yet so...

But now, I'm moved on to 2.0 and I'm going to use the marquee style...
 
TransformBlock

If, instead of ComputeHash, you use TransformBlock and TransformFinalBlock then you can easily monitor progress:

C#:
    public byte[] ComputeHashMultiBlock(byte[] input, int size)
    {
        MD5CryptoServiceProvider md5prov = new MD5CryptoServiceProvider();
        int offset = 0;

        while (input.Length - offset >= size) {
            offset += md5prov.TransformBlock(input, offset, size, input, offset);
            Console.WriteLine("Completed " + ((offset * 100) / input.Length) + "%");
        }

        md5prov.TransformFinalBlock(input, offset, input.Length - offset);
        return md5prov.Hash;
    }

By varying size you can change the responsiveness of the method - smaller size values require more calls to TranformBlock but increase responsiveness. The hash produced is identical to that produced by ComputeHash.

Good luck :cool:
 
VB-o-matic

Sorry about that, was in C# mode:

Visual Basic:
    Public Function ComputeHashMultiBlock(ByVal input As Byte(), ByVal size As Integer) As Byte()

        Dim md5prov As New MD5CryptoServiceProvider()
        Dim offset As Integer = 0

        While (input.Length - offset >= size)
            offset = offset + md5prov.TransformBlock(input, offset, size, input, offset)
            Console.WriteLine("Completed " & ((offset * 100) / input.Length) & "%")
        End While

        md5prov.TransformFinalBlock(input, offset, input.Length - offset)
        Return md5prov.Hash

    End Function
 
Thanks.

But I wasn't able to correctly use the code. What's exactly the input and size parameteres recieved by the ComputeHashMultiBlock method?

This is what I have:

Visual Basic:
Dim fsBytes(fileStream.Length) As Byte
        fileStream.Read(fsBytes, 0, fileStream.Length)

        bHash = ComputeHashMultiBlock(fsBytes, fileStream.Length)

        sHash1 = BitConverter.ToString(bHash)
        sHash1 = sHash.Replace("-", "").ToUpper

Visual Basic:
Dim MD5 As New MD5CryptoServiceProvider
        Dim bHash As Byte() = Nothing
        Dim sHash As String

        bHash = MD5.ComputeHash(fileStream)

        sHash2 = BitConverter.ToString(bHash)
        sHash2 = sHash.Replace("-", "").ToUpper

But sHash1 is different from sHash2.

What's wrong?
 
ComputeHashMultiBlock parameters

The input parameter is, as you've used it, the bytes to compute the hash of. The size parameter specifies the block sized used when computing the hash. Smaller values for size mean the method will take slightly longer but will be more responsive (provide more progress feedback). You should not pass the length of your data as this negates the whole purpose of using this method. I would recommend using a block size of something like 2048 or 4096.

As far as the computed hashes not being identical, I have not experienced this before, but I do not normally pass a Stream to ComputeHash. I find the following code produces two identical hash values:

Visual Basic:
Dim input As Byte() = New Byte(19999) '20000 elements
Dim output As Byte()

'Generate some random data to test hashing
Dim rnd As RandomNumberGenerator = RandomNumberGenerator.Create()
rnd.GetBytes(input)

'Hash using ComputeHash
Dim md5prov As New MD5CryptoServiceProvider()
output = md5prov.ComputeHash(input)
Console.WriteLine("ComputeHash  : {0}", BytesToStr(output))

'Hash using ComputeHashMultiBlock
output = ComputeHashMultiBlock(input, 2048)
Console.WriteLine("ComputeHashMultiBlock  : {0}", BytesToStr(output))

The BytesToStr method is a utility function for outputting the hex value of a byte array:

Visual Basic:
Public Function BytesToStr(ByVal bytes As Byte()) As String
    Dim str As New StringBuilder()

    For i As Integer = 0 To bytes.Length - 1
        str.AppendFormat("{0:X2}", bytes(i))
    Next

    Return str.ToString()
End Function

Apologies if there are any mistakes in my translation from C#.

Good luck :cool:
 
Sorry, but it was my problem... I was calculating the hash with the 2 different methods in the same block of code and the byte array for the ComputeHashMultiBlock method was having only zeros, that's why the hash was different. I think the stream was being read for the ComputeHash from .NET and then it was at the end of the stream, so no bytes could be read... The array was being created with the needed amount of bytes but their values were all zero. Also, the fsBytes array needed to be initialized with fileStream.Length minus 1, otherwise, the hash was wrong.

But now, I'm having a different kind of problem on your code. I tried to compute the hash for a 50mb file and I got an exception on this:

((offset * 100) / input.Length)

System.OverflowException was unhandled
Message="Arithmetic operation resulted in an overflow."
Source="MD5 Fingerprint"
StackTrace:
at MD5_Fingerprint.WaitForm.ComputeHashMultiBlock(Byte[] input, Int32 size) in C:\Documents and Settings\Nazgulled\My Documents\Visual Studio 2005\Projects\MD5 Fingerprint\WaitForm.vb:line 130
at MD5_Fingerprint.WaitForm.HashComputation() in C:\Documents and Settings\Nazgulled\My Documents\Visual Studio 2005\Projects\MD5 Fingerprint\WaitForm.vb:line 109
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
 
Learn basic debugging

The purpose of that particular line was to demonstrate how you could monitor the progress of the hash value computation. You would probably want to replace it with raising an event, invoking a delegate or calling a method.

The overflow is caused by the multiplication operation, and can therefore be avoided by either using a larger data type for offset, and/or rearranging the statement so the multiplication occurs after the division. This simple problem is both easy to diagnose and to fix, and in my opinion is something you should learn to be able to sort out yourself. I'm not saying this just to get at you - learning how to fix these small issues will save you a lot of time and effort in the long term.

Good luck :cool:
 
A delegate is what I have right now, I just didn't feel the need to post the code here to the problem I'm having. I just have that overflow problem...

If you say it can be fixed by using a larger data type, like long or is there a bigger one? As for rearranging the statement, I don't know how to do that. Math is one thing I was never good at, and I really don't see how can I have the multiplication after the division...

I can't really test anything right now, nor search the MSDN database but will later...
 
It seems to be fixed just using Long for the offset instead of Integer.

However, I think I'll use the marquee sytle instead and I'm going to tell you why. I don't care that with your method it takes a bit longer to calculate the hash but I care if the whole process takes twice as much.

Well, for small files this doesn't matter, but for large files, it's a different story. If the files are small, the progress bar serves no porpuse at all to be really functional (read: not marquee style) cause it will be gone in no time, but if the files are really large, it's good to have a progress bar, but this is the problem.

Let's say that I want to compute a file hash with a size of 100mb. With all of the above code this will take twice as much as the time it would take with the normal method. Why? Because of the line: fileStream.Read(fsBytes, 0, fileStream.Length). The progress bar only starts to fill when computing the hash, not while reading the file bytes into an array. And I can easly see that before the progress bar starts to move, it takes a while toread all the bytes, but when the progress bar starts to move, computing the hash for the 100mb files is quick with your method. What takes time is reading the bytes.

What if the file is like 500mb? Well, I tried, but it takes so long to read all the bytes, it may not take that long to compute the hash with your method or .NET's native (doesn't matter), but reading the bytes (which I have to, to use your method), takes too long...

There's no solution for this right? To speed up the process of reading the bytes I mean. I could read the bytes by parts and also update the progress bar but that would just serve the progress bar porpuse and I want speed on top of that...
 
Don't read all at once

Reading an entire 100Mb file into a byte array is always going to be a time consuming process, and is very wasteful due to the huge amount of memory that must be allocated. You will certainly see a speed increase if you process the file in chunks, and the progress bar will be responsive throughout.

(Untested code)

Visual Basic:
Dim md5prov As New MD5CryptoServiceProvider()
Dim chunk As Byte() = New Byte(8191) {} 'Size 8192
Dim totalDone As Long = 0

While ((fileStream.Length - totalDone) > chunk.Length)
    'Read some data
    fileStream.Read(chunk, 0, chunk.Length)
    'Update hash calculation
    totalDone = totalDone + md5prov.TransformBlock(chunk, 0, chunk.Length, chunk, 0)
    'Invoke delegate here
End While

'Process final chunk
fileStream.Read(chunk, 0, fileStream.Length - totalDone)
md5prov.TransformFinalBlock(chunk, 0, fileStream.Length - totalDone)

'Return hash value
Return md5prov.Hash

Good luck :)
 
Hum... and a block size of 8191 is fast enough for a 100mb files?

I didn't tested your code yet, but on the read command you have 0 for the offset, shouldn't this be the totalDone as we are reading in chunks instead of the whole file?
 
Block sizes and Read methods

Hum... and a block size of 8191 is fast enough for a 100mb files?

Perhaps 8192 is a little small, you could try larger block sizes. You would have to experiment to find optimal block sizes. Try to stick to powers of 2.

I didn't tested your code yet, but on the read command you have 0 for the offset, shouldn't this be the totalDone as we are reading in chunks instead of the whole file?

The second parameter of the Read method specifies where in the destination array to place the read data. In this case we want to overwrite the previous data, filling the array from the beginning (index 0). The read position within the FileStream is automatically advanced.

Good luck :)
 
Thanks for everything, now it's all working fine :) and it's not that slow anymore...

Maybe I'll add 3 different block sizes depending on the file size...
 
.NET Matters article

Although Nazgulled's particular issue is resolved, I would like to add another suggestion.

In the December 2006 issue of MSDN Magazine, the .NET Matters column deals with a very similar issue - displaying the progress of a BinaryFormatter as it deserializes objects from a Stream. In this case, the BinaryFormatter does not provide any means to monitor progress. The solution presented is to wrap the source Stream with a custom Stream that raises an event when data is read.

Note that the solution would also work for the situation described in this thread, and similar Stream-based processes.

:cool:
 
Back
Top