talahaski Posted April 29, 2004 Posted April 29, 2004 (edited) Hi, I am really-really new to VB .NET 2003, to the point where I don't know much syntax yet but I'm learning by working through some problems and looking at example. Anyway I was hoping somebody can help me by actually working through the full code with me on how to do this: I have a directory with many files, some of these files are duplicates of each other with the only difference being the name of the file itself. I would like to create a way to loop through all files and if the file is unique, move the file into a seperate directory. So when done, the second directory will contain 1 version of each file without any duplicates. There are a few other things I want to do, but any help getting me started would be great. Some other things I want to do is to rename every file placed into the unique directory with a sequential 1 up number. So the first file would be 1.dat, the second 2.dat... I know this is asking for a lot, feel free to limit your response. Oh, I forgot, these files are binary files, NOT text files so a text compare will not work. Edited April 29, 2004 by talahaski Quote
talahaski Posted May 2, 2004 Author Posted May 2, 2004 Anybody have any ideas that can help? It appears this forum is not very active. Quote
*Experts* mutant Posted May 2, 2004 *Experts* Posted May 2, 2004 Forum is not unactive, some people simply don't know the answer or don't notice a particular thread :). You could generate hashes of the files using some of the cryptography classes which make it easy to make a hash from an IO stream. All you would have to then do it compare the results. This method is used a lot to check downloaded files for corruption (mostly by the open source community). An example: 'create a new object that will compute the hash for you Dim h As New Security.Cryptography.MD5CryptoServiceProvider 'declare an array of bytes that will store the produced hash 'The ComputeHash method takes an IO stream as an argument, just what you need Dim res As Byte() = h.ComputeHash(New IO.FileStream("path to the file", IO.FileMode.Open)) 'decode the bytes if you want so you can easily compare them later MessageBox.Show(System.Text.ASCIIEncoding.ASCII.GetString(res)) This should work for you as in theory no two different sets of data can create the same hash, but some people are trying to disprove the theory :). Quote
talahaski Posted May 14, 2004 Author Posted May 14, 2004 So what your saying is I would need to create this hash for my primary file, and then loop through every file in the folder and create hashes for each of them, then compare the hash values. 'get primary file Dim FolderToSearch as string="c:\temp" ofd.show() Dim PrimaryHash As New Security.Cryptography.MD5CryptoServiceProvider Dim PrimaryHashBytes As Byte() = PrimaryHash.ComputeHash(New IO.FileStream(ofd.filename, IO.FileMode.Open)) 'Loop for each file in folder FolderToSearch -- Not sure yet how to do this loop Dim TempHash As New Security.Cryptography.MD5CryptoServiceProvider Dim TempHashBytes As Byte() = TempHash.ComputeHash(New IO.FileStream(NextFileName, IO.FileMode.Open)) If PrimaryHashBytes=TempHashBytes then messagebox("Files are the same") end if Does this appear correct? Can you help with the loop through the folder please. Also, what kind of clean-up do I need to perform. If the folder has a lot of files, I'm guessing opening all these files and hashing them is going to create a lot of overhead. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.