During the reverse engineering of an archive or an unknown file it may happen to see that it uses compression due to some parameters found in the index table and/or due to its "scrambled" content:
Usually there are some tricks to know if it's a known compression algorithm, for example zlib starts with 0x78, lzma with 0x5d followed by some zeroes, lzss and lzo show parts of the uncompressed content and so on.
But if we don't know the algorithm or we want to be sure of its name or we want to know what's the result which is closer to the original uncompressed file, we need to use the following script and bat file:http://aluigi.org/papers/bms/comtype_scan2.bathttp://aluigi.org/papers/bms/comtype_scan2.bms
The following is the situation in our folder, with dump.dat that is our compressed file:
And this is the runtime help of comtype_scan2.bat:
Let's insert this command-line to start the scan:
comtype_scan2.bat comtype_scan2.bms dump.dat output
Please note that if we already know what is the uncompressed size, it's HIGHLY recommended to add it to the command-line like in this example:
comtype_scan2.bat comtype_scan2.bms dump.dat output 0x7cf
During the scanning QuickBMS will show lot of messages and errors.
That's perfectly normal.
Usually you will notice that it freezes like in this case:
No problem, press CTRL-C and type 'n':
Finally we reach the end of the scanning:
The next step is the manual checking of the results dumped in the output folder.
There are some ways to automize this process, anyway the simplest way is ordering the files by size in decrescent order:
And then open them one-by-one with a hex editor:
That 8.dmp seems to contain valid PNG data, let's try to open it with an image viewer:
Bingo, that's the correct algorithm.
Now open defs.h text file inside the QuickBMS source code (src folder in quickbms.zip) and check what algorithm is that number 8:
Yeah, the algorithm is lzo1x.
Don't think that it's ever so easy to find the correct algorithm, sometimes you don't know the name of the file and its content is a custom format or a raw audio/image.