Jump to content
Xtreme .Net Talk

Recommended Posts

Posted (edited)

Hi, I am really-really new to VB .NET 2003, to the point where I don't know much syntax yet but I'm learning by working through some problems and looking at example. Anyway I was hoping somebody can help me by actually working through the full code with me on how to do this:

 

I have a directory with many files, some of these files are duplicates of each other with the only difference being the name of the file itself. I would like to create a way to loop through all files and if the file is unique, move the file into a seperate directory. So when done, the second directory will contain 1 version of each file without any duplicates.

 

There are a few other things I want to do, but any help getting me started would be great. Some other things I want to do is to rename every file placed into the unique directory with a sequential 1 up number. So the first file would be 1.dat, the second 2.dat...

 

I know this is asking for a lot, feel free to limit your response.

 

Oh, I forgot, these files are binary files, NOT text files so a text compare will not work.

Edited by talahaski
  • *Experts*
Posted

Forum is not unactive, some people simply don't know the answer or don't notice a particular thread :).

 

You could generate hashes of the files using some of the cryptography classes which make it easy to make a hash from an IO stream. All you would have to then do it compare the results. This method is used a lot to check downloaded files for corruption (mostly by the open source community).

An example:

  'create a new object that will compute the hash for you
  Dim h As New Security.Cryptography.MD5CryptoServiceProvider
  'declare an array of bytes that will store the produced hash
  'The ComputeHash method takes an IO stream as an argument, just what you need
  Dim res As Byte() = h.ComputeHash(New IO.FileStream("path to the file", IO.FileMode.Open))
  'decode the bytes if you want so you can easily compare them later
  MessageBox.Show(System.Text.ASCIIEncoding.ASCII.GetString(res))
  

This should work for you as in theory no two different sets of data can create the same hash, but some people are trying to disprove the theory :).

  • 2 weeks later...
Posted

So what your saying is I would need to create this hash for my primary file, and then loop through every file in the folder and create hashes for each of them, then compare the hash values.

 

'get primary file

 

Dim FolderToSearch as string="c:\temp"

 

ofd.show()

Dim PrimaryHash As New Security.Cryptography.MD5CryptoServiceProvider

 

Dim PrimaryHashBytes As Byte() = PrimaryHash.ComputeHash(New IO.FileStream(ofd.filename, IO.FileMode.Open))

 

'Loop for each file in folder FolderToSearch -- Not sure yet how to do this loop

 

Dim TempHash As New Security.Cryptography.MD5CryptoServiceProvider

Dim TempHashBytes As Byte() = TempHash.ComputeHash(New IO.FileStream(NextFileName, IO.FileMode.Open))

 

If PrimaryHashBytes=TempHashBytes then

messagebox("Files are the same")

end if

 

 

Does this appear correct? Can you help with the loop through the folder please.

 

Also, what kind of clean-up do I need to perform. If the folder has a lot of files, I'm guessing opening all these files and hashing them is going to create a lot of overhead.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...