Identical Compile

neodammer

Centurion
Joined
Sep 10, 2003
Messages
197
Location
Atlanta GA
I was wondering if it was realistic to believe that everytime you compile a program it has the exact same data in the exact same order in the exact same way it was put together. For example are two programs with the same code compiled 100% identical to each other? I never really gave it much thought till now. I see checksum's but really im still guessing.
 
Is this for curiosity, or for a "real" reason? I wouldn't count on the two being equal even if you compile and rename the DLL and compile again - I've seen that produce two different binaries. Why? No idea... but it happens.

-ner
 
It depends on the compiler. It's been my experience that they are usually identical, all other things being equal. .NET may be an exception with it's ample use of GUIDs.
 
Main reason I wonder is basically just for curiosity and statistical analysis. Given that even the random function isnt 100% random I wondered how close two compiles of the exact code would hold up with each other. Does the processor handle both exe's the exact same way with each register doing the same thing at the same given time? Curiosly i have been wondering if two compiles are not alike 100% would that make it easier/harder for security purposes? Interesting, I think some compilers like you stated are probably more exact than others.
 
Since there's nothing random about a compiler (aside from GUIDs) then you can bet things will be the same. Would different binaries help security? Yes, but that's not very feasable if you plan on wide distribution.
 
My friends and I have tried developed a very weak and simple program that takes input code and tries to spit the code back in a different but exact functioning via program way. It works with basic things so far but nothing remarkable. I think it will be a breakthrough in security if you can like you said make a differ in programs binary and still hold the compatability. VB.net so far is the easist to manipulate because it has the easist code of the .net language syntax wise (in my opinion.)

This leads to the idea of creating a type of compression much like todays compression methods but instead of taking the random hex paterns and just applying them to uncompress it will try to rearrange them but still carry out the functions that particular part cutted was intended to. Compress it and somewhere in the compression field i suppose you could have the key for compression; a unique compress algorithim for each program you compress. Easier said than done, so far it has only worked with simple programs like Hello World. hello world programs can be manipulated to the extremes :p
 
I would expect the binaries to be different. If you create two identical applications on two identical machines, they are still going to have different Guids. They might be the same size, but they won't look the same. I'm sure that the order of most of the binary code would be the same though. Why wouldn't it... the compiler looks at the code in a certain way and I doubt that it looks at it differently with each compile.
 
I've messed with this stuff a little bit. There was a small business innovative research solicitation out that asked from something that would do this (and I think a similiar one this round that wants a solution for Ada). The whole concept is flawed in my opinion...attempting to stop the all-knowing attacker by preventing a "diff-attack" makes little sense.

Obfuscation does quite a lot, but two programs will still have the same control flow and I think that is what neodammer wants to manipulate in order to alter the binary signature, yes? This is hardly the weakest link in preventing security holes in software. And deployment would be a wreak. Think about all the testing you would have to do on every single binary you create?

If you are really interested, you should check out the DoD SBIR program. If you find one that matches this and you have some good ideas you can write a proposal and possibly recieve a grant for $50,000 for 6 months work, the right to a phase II proposal (if you win, up to $750,000 over two years) and then you get the rights any commercial products you create -- the solicitating agency gets free rights. It's a cool program with a lot of potential.
 
Bad:

1. Testing every compile would take forever. (Hopefully we can eliminate the need for testing every compile with a algorithim we can trust. :eek: )

2. Knowing that control flow is the same regardless of the binary difference, security will not be enhanced greatly. (Perhaps it would slow down key-gens etc.. )


Good:

1. Creation of key-gens etc should be slowed
2. Could develop into a new way to distribute bios (yet again hopefully with a algorithim that can be trusted to do so :rolleyes: otherwise it would be chaos)
3. A way for Microsoft to release new software that doesnt do anything different for more $ :p I can see the ads now.. lol


I have thought of many ways to approach this situation.

1. Program that uses the algorthim and rearranges the binary of another program/file etc. Poses the problem of creating an omni program :rolleyes:

2. A control or library for the algorthim that programs could use for different binary compiles. We still need time to research how vb.net (testing compiler) actually takes code and does what with what at what time rofl did that make sense? Its very hard to try to fool the compiler i dont know if it can even be accomplished.

3. Have a in-memory program that reads what the compiler is doing when it compiles the program and copies it and rearranges the data and outputs its own file using the algorithm to randomize etc.. Crazy idea involving alot of bug-prone areas (yes thats what we need in this project more random bugs lol)

4. Tried thinking of a way to have the program itself at runtime change its own code or an external file its dependent on binary wise. Not a bad idea but causes alot of confusion and I dont think its possible to change binary data in the ram. I believe thats write over material and cannot be changed "directly" Kinda like changing a text file on a HD it cannot be done on the RAM. Or can it ?
 
Last edited:
Obfuscators (the good ones) DO change the control flow. In fact, you have to specify which functions to skip because it slows performance on your tightly looped functions.

I do think that untill we're using built-in processor encryption, anything done to curb cracking is viewed as a call to a challenge more than a hinderance. The point of obfuscation is to keep your code closed-source not to protect it from cracks.
 
Programs some im not sure if all do or not have useless binary data in them like data that could be pulled and the program still function properly. It would be easier to code an algorithim for binary diff's if you could send commands directly to the processor. Binary code I suppose is the closest thing but who programs in binary? lol


You'd think it was possible to give commands directly to the processor :p

I agree Ingis that the more security a program or set of programs have the more it shouts "crack me." But I guess I can hope to curve some folks with new security methods. Granted elite crackers (what irony :D ) will crack whatever it is eventually
 
Not that I'm approving of or encouraging anything, but you should do some research on polymorphic and metamorphic viruses. This sort of stuff is extremely useful for a virus maker becuase the virus (in theory) will have a different signature each time it mutates making it impossible for anti-virus software to find it and eliminate it. In actuality, this is nearly (completely?) impossible becuase the entry point for the binary needs to remain the same -- otherwise the OS will have no idea how to run the virus.

It's pretty easy to find sample code and tutorials for some of the more famous viruses that use these techniques (there's a handful of extremely smart folks in that community that innovate and publish). You'll have to brush up on your assembly language skills and l33t speak. It's almost impossible to read some of the "real hackers'" publications becuase they are all written like pure pwnage...entire papers written in l33t. It's amazing.


IngisKahn said:
Obfuscators (the good ones) DO change the control flow.
I didn't realize that, but now that you say it, I think I did know that. Unfortunately, most of the obfuscators that I have played with only do simple string substitutions. Sometimes you get so tangled up in trying to think of new or different things that you forget where you come from and what is already available. I was thinking of randomly scrambling assembly using a variety of means -- jumping to different parts of the assembly in a rearanged way, loop unrolling/rerolling, using extra operations, etc.
 
Last edited:
Observing the flow of a program by another program in an attempt to snare and destroy will cause chaos. Note I said the flow not the actual binary because as you said they can change via morphing code. You can use a simple if statement at the beginning of any program to change the binary in the ram at any given time but on file that is something strange. I must research this material carefully thank you for the information.
 
Back
Top