Alright, we will directly jump to the subject matter. If you want to get some basics on what we are talking about please read my earlier post – Crack document password – recover word document password
This article is for learning purpose only, shows the vulnerability of legacy RC4 40 bit encryption on documents.
As explained in my previous hub, we will brute force the encryption key instead of password, the easiest and possible way. So we need to validate each possible key available in the key space against the ‘verifier hash’ which is stored in the RC4 encryption header in the document (word/excel).
RC4 Encryption Header
Now we will check the document RC4 header structure and see what is stored there. (source: MSDN)
EncryptionVersionInfo (4 bytes): Version information of the product or feature, in our case the encryption. There are two part for this – version major and version minor, and values for these should be 1 (0x00001), which tell us this is the RC4 encryption.
Salt (16 bytes): A randomly generated array of bytes, which is the salt value used during the password generation.
EncryptedVerifier (16 bytes): Additional 16 byte verifier encrypted using a 40 bit RC4 cipher. Read more about this here
EncryptedVerifierHash (16 bytes): A 40-bit RC4 encrypted MD5 hash of the verifier used to generate the EncryptedVerifier field.
Simply, we need all these 3 fields – Salt, EncryptedVerifier and EncryptedVerifierHash to generate a final decrypted hash value which then will be compared against each key in the key space (brute forcing). And if a match is found, then that’s our actual key which can be used to decrypt the document content.
How to read the document header?
Microsoft word and excel are compound/OLE documents, which means, it has different sections (object) stored in one file and each section carries different types of information. So our RC4 header would be stored in one section, the encrypted content would be in another section and so on (called Ole Storage).
It would be a good idea to use an OLE programming method to read the file so that we can directly read the RC4 header information instead of searching and seeking through the file and reach the correct position of the RC4 header.
Each section of the file (OLE Storage) has a unique name which can be used to access that particular section. RC4 header section name is “1table” so in our code we will get access to this section through OLE by using this unique name (there are other sections also like “0table”, “worddocument” etc. in a word document.)
For programming on .NET framework, we can use OLE interopservice class available in .NET (System.Runtime.InteropServices) with Win32 API call to “ole32.dll”. If you are comfortable with any other OLE implementation that should be fine, choice is yours. And if it is not for testing, but you really want to develop something robust then I suggest C or C++, may be with VC++ .net.
, CharSet = CharSet.Unicode)]
//-- parameters -- )
Once we read the content (stream) available in the “1table” section, we will take first 52 bytes of ‘1table’ stream which has all our required details to brute force.
The first 4 bytes has version Major and version Minor info. As mentioned above, it should be 1 (0x00001) to ensure that we a have proper version of encryption header.
The next 16 byte is Salt.
The Next 16 byte is EncryptedVerifier and the
Last 16 byte is EncryptedVerifierHash
Well, now we have got all the required information to brute force the key. And we use these details to build the final decrypted verifier hash to compare against each key in the key space.
So here we have two things to do mainly:
1. Write an algorithm to get all available keys in the key space. – you may search on the net for a code piece which will output all the key one by one in a 40 bit key space, or you can write your own code. It is just looping through..yea our “for int i=0….” stuff only.
2. Write the code to create the ‘decrypted verifier hash’ using header details (salt, encrypted verifier etc.) to validate against the key. I have given a link below to get some sample code, go through the link and try your self. My time is limited now, and when I get time probably I will write fully optimized code to test this and add a link here.
Then finally match the ‘verifier hash’ with each key and if we found a match – decrypted verifier hash = key – we go the key to decrypt the document content. Use an RC4 decryption algorithm to decrypt the content using the key, once decrypted save this changes. Our document should now be unprotected, enjoy.
Here is the link to sample source code. But in this code, the word file is accessed via direct file stream operation (File.OpenRead) but not OLE method. When I tried this, the code is failed to show me the RC4 encryption header details. Then I did some search on the net, changed the file reading to OLE and read the “1table” stream and it worked well. Also I had to do some minor changes. So test yourself and learn, it is interesting (to me at least ;-))
And final words, there are tools called guaword and guaexcel which does all these. You can download demo versions of them, and its beta version is free. But no source obviously!
I am not very sure but I think for for secured pdf documents (old versions) also it uses same method to secure the document. Do you know hot to remove pdf security, can we use the same technique?
( “First image courtesy: “Stuart Miles” / FreeDigitalPhotos.net”)