Jump to content
Xtreme .Net Talk

Calculate progress of a thread or built-in method, possible?


Recommended Posts

Posted

I have this code that calculates a file md5 hash using threads and I want to know if it's possible to get the progress of this calculation so I can add a progress bar to the application (right now I have a simple animation letting the user now the application is doing something).

 

Here's the code of this calculation:

 

Delegate Sub DisplayHashCallback(ByVal text As String)

Dim fileStream As Stream
Dim hashComp As New Thread(AddressOf HashComputation)

Private Sub HashComputation()
	Dim MD5 As New MD5CryptoServiceProvider
	Dim bHash As Byte() = Nothing
	Dim sHash As String

	bHash = MD5.ComputeHash(fileStream)
	sHash = BitConverter.ToString(bHash)
	sHash = sHash.Replace("-", "").ToUpper

	DisplayHash(sHash)
End Sub

Private Sub DisplayHash(ByVal text As String)
	If frmMain.FileHash.InvokeRequired Then
		Dim d As New DisplayHashCallback(AddressOf DisplayHash)
		Me.Invoke(d, New Object() {text})
	Else
		frmMain.FileHash.Enabled = True
		frmMain.FileHash.Text = text

		Me.Close()
	End If
End Sub

 

This code is part of the WaitForm, the form with the activity animation. The WaitForm is shown when the user loads a file in the MainForm (frmMain). The WaitForm, as you can see, is the one that will compute the file hash, set the results in the MainForm and close itself.

 

Is it possible to do what I want?

Posted
If you are using .Net 2.0, the background worker thread is definitely the way to go. If not, the same principles will apply you'll just need to do a little more work. Define a custom event and raise that event every time you want to send progress. Have the event captured by the display thread.
Posted

Yes, I'm using .NET 2.0 but that didn't work. I saw your example on the other thread and tried to implement it on my program but it doesn't work the way I want it. As you can see in my code in the first post, the line "bHash = MD5.ComputeHash(fileStream)" is the one that will calculate the md5 hash for the file.

 

If the calculation takes like 5mins, the following lines of code won't be processed until that ComputeHash() method is finished...

Posted
I did not realize that was the blocking call. Sadly, because it is a call in the .Net framework, you can't really get any more information out. I've had similar problems in the past with this as well. I ended up using the marquee style progress bar as a work around. It doesn't give the user a percentage complete, but it does let the user know your program is still cranking and hasn't frozen.
Posted

Yeah, that's what I had before. Actually, I had ana niamtion of a progressbar I designed myself (with PNG files) cause when I frist started to develop this application, .NET framework 2.0 wasn't out yet so...

 

But now, I'm moved on to 2.0 and I'm going to use the marquee style...

Posted

TransformBlock

 

If, instead of ComputeHash, you use TransformBlock and TransformFinalBlock then you can easily monitor progress:

 

    public byte[] ComputeHashMultiBlock(byte[] input, int size)
   {
       MD5CryptoServiceProvider md5prov = new MD5CryptoServiceProvider();
       int offset = 0;

       while (input.Length - offset >= size) {
           offset += md5prov.TransformBlock(input, offset, size, input, offset);
           Console.WriteLine("Completed " + ((offset * 100) / input.Length) + "%");
       }

       md5prov.TransformFinalBlock(input, offset, input.Length - offset);
       return md5prov.Hash;
   }

 

By varying size you can change the responsiveness of the method - smaller size values require more calls to TranformBlock but increase responsiveness. The hash produced is identical to that produced by ComputeHash.

 

Good luck :cool:

Never trouble another for what you can do for yourself.
Posted

VB-o-matic

 

Sorry about that, was in C# mode:

 

   Public Function ComputeHashMultiBlock(ByVal input As Byte(), ByVal size As Integer) As Byte()

       Dim md5prov As New MD5CryptoServiceProvider()
       Dim offset As Integer = 0

       While (input.Length - offset >= size)
           offset = offset + md5prov.TransformBlock(input, offset, size, input, offset)
           Console.WriteLine("Completed " & ((offset * 100) / input.Length) & "%")
       End While

       md5prov.TransformFinalBlock(input, offset, input.Length - offset)
       Return md5prov.Hash

   End Function

Never trouble another for what you can do for yourself.
Posted

Thanks.

 

But I wasn't able to correctly use the code. What's exactly the input and size parameteres recieved by the ComputeHashMultiBlock method?

 

This is what I have:

 

Dim fsBytes(fileStream.Length) As Byte
       fileStream.Read(fsBytes, 0, fileStream.Length)

       bHash = ComputeHashMultiBlock(fsBytes, fileStream.Length)

       sHash1 = BitConverter.ToString(bHash)
       sHash1 = sHash.Replace("-", "").ToUpper

 

Dim MD5 As New MD5CryptoServiceProvider
       Dim bHash As Byte() = Nothing
       Dim sHash As String

       bHash = MD5.ComputeHash(fileStream)

       sHash2 = BitConverter.ToString(bHash)
       sHash2 = sHash.Replace("-", "").ToUpper

 

But sHash1 is different from sHash2.

 

What's wrong?

Posted

ComputeHashMultiBlock parameters

 

The input parameter is, as you've used it, the bytes to compute the hash of. The size parameter specifies the block sized used when computing the hash. Smaller values for size mean the method will take slightly longer but will be more responsive (provide more progress feedback). You should not pass the length of your data as this negates the whole purpose of using this method. I would recommend using a block size of something like 2048 or 4096.

 

As far as the computed hashes not being identical, I have not experienced this before, but I do not normally pass a Stream to ComputeHash. I find the following code produces two identical hash values:

 

Dim input As Byte() = New Byte(19999) '20000 elements
Dim output As Byte()

'Generate some random data to test hashing
Dim rnd As RandomNumberGenerator = RandomNumberGenerator.Create()
rnd.GetBytes(input)

'Hash using ComputeHash
Dim md5prov As New MD5CryptoServiceProvider()
output = md5prov.ComputeHash(input)
Console.WriteLine("ComputeHash  : {0}", BytesToStr(output))

'Hash using ComputeHashMultiBlock
output = ComputeHashMultiBlock(input, 2048)
Console.WriteLine("ComputeHashMultiBlock  : {0}", BytesToStr(output))

 

The BytesToStr method is a utility function for outputting the hex value of a byte array:

 

Public Function BytesToStr(ByVal bytes As Byte()) As String
   Dim str As New StringBuilder()

   For i As Integer = 0 To bytes.Length - 1
       str.AppendFormat("{0:X2}", bytes(i))
   Next

   Return str.ToString()
End Function

 

Apologies if there are any mistakes in my translation from C#.

 

Good luck :cool:

Never trouble another for what you can do for yourself.
Posted

Sorry, but it was my problem... I was calculating the hash with the 2 different methods in the same block of code and the byte array for the ComputeHashMultiBlock method was having only zeros, that's why the hash was different. I think the stream was being read for the ComputeHash from .NET and then it was at the end of the stream, so no bytes could be read... The array was being created with the needed amount of bytes but their values were all zero. Also, the fsBytes array needed to be initialized with fileStream.Length minus 1, otherwise, the hash was wrong.

 

But now, I'm having a different kind of problem on your code. I tried to compute the hash for a 50mb file and I got an exception on this:

 

((offset * 100) / input.Length)

 

System.OverflowException was unhandled

Message="Arithmetic operation resulted in an overflow."

Source="MD5 Fingerprint"

StackTrace:

at MD5_Fingerprint.WaitForm.ComputeHashMultiBlock(Byte[] input, Int32 size) in C:\Documents and Settings\Nazgulled\My Documents\Visual Studio 2005\Projects\MD5 Fingerprint\WaitForm.vb:line 130

at MD5_Fingerprint.WaitForm.HashComputation() in C:\Documents and Settings\Nazgulled\My Documents\Visual Studio 2005\Projects\MD5 Fingerprint\WaitForm.vb:line 109

at System.Threading.ThreadHelper.ThreadStart_Context(Object state)

at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)

at System.Threading.ThreadHelper.ThreadStart()

Posted

Learn basic debugging

 

The purpose of that particular line was to demonstrate how you could monitor the progress of the hash value computation. You would probably want to replace it with raising an event, invoking a delegate or calling a method.

 

The overflow is caused by the multiplication operation, and can therefore be avoided by either using a larger data type for offset, and/or rearranging the statement so the multiplication occurs after the division. This simple problem is both easy to diagnose and to fix, and in my opinion is something you should learn to be able to sort out yourself. I'm not saying this just to get at you - learning how to fix these small issues will save you a lot of time and effort in the long term.

 

Good luck :cool:

Never trouble another for what you can do for yourself.
Posted

A delegate is what I have right now, I just didn't feel the need to post the code here to the problem I'm having. I just have that overflow problem...

 

If you say it can be fixed by using a larger data type, like long or is there a bigger one? As for rearranging the statement, I don't know how to do that. Math is one thing I was never good at, and I really don't see how can I have the multiplication after the division...

 

I can't really test anything right now, nor search the MSDN database but will later...

Posted

It seems to be fixed just using Long for the offset instead of Integer.

 

However, I think I'll use the marquee sytle instead and I'm going to tell you why. I don't care that with your method it takes a bit longer to calculate the hash but I care if the whole process takes twice as much.

 

Well, for small files this doesn't matter, but for large files, it's a different story. If the files are small, the progress bar serves no porpuse at all to be really functional (read: not marquee style) cause it will be gone in no time, but if the files are really large, it's good to have a progress bar, but this is the problem.

 

Let's say that I want to compute a file hash with a size of 100mb. With all of the above code this will take twice as much as the time it would take with the normal method. Why? Because of the line: fileStream.Read(fsBytes, 0, fileStream.Length). The progress bar only starts to fill when computing the hash, not while reading the file bytes into an array. And I can easly see that before the progress bar starts to move, it takes a while toread all the bytes, but when the progress bar starts to move, computing the hash for the 100mb files is quick with your method. What takes time is reading the bytes.

 

What if the file is like 500mb? Well, I tried, but it takes so long to read all the bytes, it may not take that long to compute the hash with your method or .NET's native (doesn't matter), but reading the bytes (which I have to, to use your method), takes too long...

 

There's no solution for this right? To speed up the process of reading the bytes I mean. I could read the bytes by parts and also update the progress bar but that would just serve the progress bar porpuse and I want speed on top of that...

Posted

Don't read all at once

 

Reading an entire 100Mb file into a byte array is always going to be a time consuming process, and is very wasteful due to the huge amount of memory that must be allocated. You will certainly see a speed increase if you process the file in chunks, and the progress bar will be responsive throughout.

 

(Untested code)

 

Dim md5prov As New MD5CryptoServiceProvider()
Dim chunk As Byte() = New Byte(8191) {} 'Size 8192
Dim totalDone As Long = 0

While ((fileStream.Length - totalDone) > chunk.Length)
   'Read some data
   fileStream.Read(chunk, 0, chunk.Length)
   'Update hash calculation
   totalDone = totalDone + md5prov.TransformBlock(chunk, 0, chunk.Length, chunk, 0)
   'Invoke delegate here
End While

'Process final chunk
fileStream.Read(chunk, 0, fileStream.Length - totalDone)
md5prov.TransformFinalBlock(chunk, 0, fileStream.Length - totalDone)

'Return hash value
Return md5prov.Hash

 

Good luck :)

Never trouble another for what you can do for yourself.
Posted

Hum... and a block size of 8191 is fast enough for a 100mb files?

 

I didn't tested your code yet, but on the read command you have 0 for the offset, shouldn't this be the totalDone as we are reading in chunks instead of the whole file?

Posted

Block sizes and Read methods

 

Hum... and a block size of 8191 is fast enough for a 100mb files?

 

Perhaps 8192 is a little small, you could try larger block sizes. You would have to experiment to find optimal block sizes. Try to stick to powers of 2.

 

I didn't tested your code yet, but on the read command you have 0 for the offset, shouldn't this be the totalDone as we are reading in chunks instead of the whole file?

 

The second parameter of the Read method specifies where in the destination array to place the read data. In this case we want to overwrite the previous data, filling the array from the beginning (index 0). The read position within the FileStream is automatically advanced.

 

Good luck :)

Never trouble another for what you can do for yourself.
  • 3 weeks later...
Posted

.NET Matters article

 

Although Nazgulled's particular issue is resolved, I would like to add another suggestion.

 

In the December 2006 issue of MSDN Magazine, the .NET Matters column deals with a very similar issue - displaying the progress of a BinaryFormatter as it deserializes objects from a Stream. In this case, the BinaryFormatter does not provide any means to monitor progress. The solution presented is to wrap the source Stream with a custom Stream that raises an event when data is read.

 

Note that the solution would also work for the situation described in this thread, and similar Stream-based processes.

 

:cool:

Never trouble another for what you can do for yourself.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...