ZenHAX

Free Game Research Forum | Official QuickBMS support | twitter @zenhax | SSL HTTPS://zenhax.com
It is currently Thu May 23, 2019 8:44 pm

All times are UTC




Post new topic  Reply to topic  [ 14 posts ] 
Author Message
PostPosted: Thu Aug 07, 2014 3:23 pm 
Site Admin
User avatar

Joined: Wed Jul 30, 2014 9:32 pm
Posts: 10262
During the reverse engineering of an archive or an unknown file it may happen to see that it uses compression due to some parameters found in the index table and/or due to its "scrambled" content:
Image


Usually there are some tricks to know if it's a known compression algorithm, for example zlib starts with 0x78, lzma with 0x5d followed by some zeroes, lzss and lzo show parts of the uncompressed content and so on.

But if we don't know the algorithm or we want to be sure of its name or we want to know what's the result which is closer to the original uncompressed file, we need to use the following script and bat file:
http://aluigi.org/papers/bms/comtype_scan2.bat
http://aluigi.org/papers/bms/comtype_scan2.bms

The following is the situation in our folder, with dump.dat that is our compressed file:
Image


And this is the runtime help of comtype_scan2.bat:
Image


Let's insert this command-line to start the scan:
Code:
comtype_scan2.bat comtype_scan2.bms dump.dat output

Please note that if we already know what is the uncompressed size, it's HIGHLY recommended to add it to the command-line like in this example:
Code:
comtype_scan2.bat comtype_scan2.bms dump.dat output 0x7cf


During the scanning QuickBMS will show lot of messages and errors.
That's perfectly normal.
Usually you will notice that it freezes like in this case:
Image


No problem, press CTRL-C and type 'n':
Image


Finally we reach the end of the scanning:
Image


The next step is the manual checking of the results dumped in the output folder.
There are some ways to automize this process, anyway the simplest way is ordering the files by size in decrescent order:
Image


And then open them one-by-one with a hex editor:
Image


That 8.dmp seems to contain valid PNG data, let's try to open it with an image viewer:
Image


Bingo, that's the correct algorithm.

Now open defs.h text file inside the QuickBMS source code (src folder in quickbms.zip) and check what algorithm is that number 8:
Image


Yeah, the algorithm is lzo1x.

Don't think that it's ever so easy to find the correct algorithm, sometimes you don't know the name of the file and its content is a custom format or a raw audio/image.


Attachments:
img9.png [8.86 KiB]
Not downloaded yet
img8.png [19.11 KiB]
Not downloaded yet
img7.png [37.13 KiB]
Not downloaded yet
img6.png [29.28 KiB]
Not downloaded yet
img5.png [10 KiB]
Not downloaded yet
img4.png [10.39 KiB]
Not downloaded yet
img3.png [10.02 KiB]
Not downloaded yet
img2.png [10.32 KiB]
Not downloaded yet
img1.png [18.6 KiB]
Not downloaded yet
img0.png [18.82 KiB]
Not downloaded yet
Top
   
PostPosted: Thu Aug 07, 2014 5:10 pm 
Site Admin
User avatar

Joined: Wed Jul 30, 2014 9:32 pm
Posts: 10262
Ah, I have attached the original dump.dat in case someone wants to make his own tests.

You can even create it by yourself with quickbms:
Code:
comtype lzo1x_compress
get SIZE asize
clog "dump.dat" 0 SIZE SIZE


Attachments:
dump.zip [959 Bytes]
Downloaded 288 times
Top
   
PostPosted: Tue Jan 17, 2017 8:15 am 
Site Admin
User avatar

Joined: Wed Jul 30, 2014 9:32 pm
Posts: 10262
I want to stress the fact that the comtype scanner should be used only if you know really what you are doing.

Very quickly:

- do you have a file that may contain chunks of compressed data?
DO NOT USE the comtype scanner

- do you have a raw file that may contain anything?
DO NOT USE the comtype scanner

- do you have a raw file that you are sure contain compressed data from offset 0 till its end?
YES, USE the comtype scanner

- is the comtype scanner a way to find compressed chunks of data in a file?
NO

- is the comtype scanner a way to find what algorithm is used on a specific piece of data?
YES, the compressed data must cover the whole file, so if the file is 0x123 bytes big and the compressed data is from offset 0 to 0x10 or from offset 0x10 to 0x123 it will fail!

- example, if you use comtype scanner on a ZIP archive you will find absolutely NOTHING

- example, if you use comtype scanner on the compressed part of a ZIP archive you will have success (deflate algorithm)

In general the rule is not using the scanner except if you want to waste your time and your resources, that's up to you but then don't complain with quickbms for your faults.


Top
   
PostPosted: Wed Nov 14, 2018 6:36 am 

Joined: Tue Nov 13, 2018 7:34 pm
Posts: 11
comtype_scan2.bat not used on win 10 ?


Top
   
PostPosted: Wed Nov 14, 2018 12:41 pm 
Site Admin
User avatar

Joined: Wed Jul 30, 2014 9:32 pm
Posts: 10262
@usabdt
It works with win10 too, do you get any error and what error?


Top
   
PostPosted: Wed Nov 14, 2018 12:51 pm 

Joined: Tue Nov 13, 2018 7:34 pm
Posts: 11
aluigi wrote:
@usabdt
It works with win10 too, do you get any error and what error?

can not run the file comtype_scan2.bat . Can you capture your operation when doing on Win 10 ??


Top
   
PostPosted: Wed Nov 14, 2018 6:26 pm 
Site Admin
User avatar

Joined: Wed Jul 30, 2014 9:32 pm
Posts: 10262
usabdt wrote:
can not run the file comtype_scan2.bat .

Details?
Anyway that's something meant only for advanced users. If you want support for a format or a compression ask on the forum and do NOT try it, just as written in my FAQ post above.


Top
   
PostPosted: Mon Dec 17, 2018 12:28 pm 

Joined: Fri Dec 15, 2017 1:42 pm
Posts: 30
In version 0.1.2 of the bms script, there're lines writing:
Code:
set NAME string QUICKBMS_COMTYPE
if NAME & "_COMTYPE"    # check if the variable is set
    set NAME string i
endif

But with QuickBMS v0.9.2 the output NAME will always be the value of the index coz QUICKBMS_COMTYPE will be considered as the string "QUICKBMS_COMTYPE", is that normal?
aluigi wrote:
Please note that if we already know what is the uncompressed size, it's HIGHLY recommended to add it to the command-line

I found out that for some algorithms like oodle, if you don't specify the uncompressed size, it'll just throw you an error even if the buffer size is ever larger than the uncompressed size.
But in most of the cases, I'd like not to add the uncompressed size but using it as a filter condition for the right result since the amount of the outputs don't seem to be reduced after adding it anyway.
So is it possible to always have both the benefits at the same time? :roll:


Top
   
PostPosted: Wed Dec 19, 2018 5:59 pm 
Site Admin
User avatar

Joined: Wed Jul 30, 2014 9:32 pm
Posts: 10262
Yes because it's an internal variable used by quickbms :)
Just like QUICKBMS_HASH and QUICKBMS_CRC.
So if QUICKBMS_COMTYPE contains the string "_COMTYPE" it means that the variable has not been set by quickbms.

The problem with oodle is correct, it's just like oodle is meant to work. The library requires the exact compressed and uncompressed size or it gives an error.
For bypassing these "picky" compression algorithm I implemented the -e option in quickbms that ignores any error from the algorithms, but it forces you to check every scanned file by hand... 700 files! :D


Top
   
PostPosted: Thu Dec 20, 2018 2:11 pm 

Joined: Fri Dec 15, 2017 1:42 pm
Posts: 30
aluigi wrote:
So if QUICKBMS_COMTYPE contains the string "_COMTYPE" it means that the variable has not been set by quickbms.

But if so then that condition will always be true, which means the Name will be set as a numeric index(or it's never meant to assign the algo name to the output?), whereas the indexs are removed in "comtype.h".
aluigi wrote:
The problem with oodle is correct, it's just like oodle is meant to work. The library requires the exact compressed and uncompressed size or it gives an error.
For bypassing these "picky" compression algorithm I implemented the -e option in quickbms that ignores any error from the algorithms, but it forces you to check every scanned file by hand... 700 files! :D

If the decompression succeed and the size is correct -- havn't actually tested it yet though, it's easy to narrow the search basing on the uncompressed size in the raw file.
Point is, if the compression you're dealing with happens to be one of those "picky" algorithms and none of the outputs matches, it's just futile to check all of them. Perhaps it's better to specify the size and take another chance then? :mrgreen:


Top
   
PostPosted: Thu Jan 10, 2019 4:21 pm 
Site Admin
User avatar

Joined: Wed Jul 30, 2014 9:32 pm
Posts: 10262
BCGhost wrote:
aluigi wrote:
So if QUICKBMS_COMTYPE contains the string "_COMTYPE" it means that the variable has not been set by quickbms.

But if so then that condition will always be true, which means the Name will be set as a numeric index(or it's never meant to assign the algo name to the output?), whereas the indexs are removed in "comtype.h".

What's the exact problem?
I mean do you have a script or command proof-of-concept that gives a problem?
Here everything works perfectly.


BCGhost wrote:
If the decompression succeed and the size is correct -- havn't actually tested it yet though, it's easy to narrow the search basing on the uncompressed size in the raw file.
Point is, if the compression you're dealing with happens to be one of those "picky" algorithms and none of the outputs matches, it's just futile to check all of them. Perhaps it's better to specify the size and take another chance then? :mrgreen:

comtype_scan2.bat already has an optional field for specifying the decompressed size, do you mean that one?
Code:
comtype_scan2 c:\comtype_scan2.bms c:\dump.dat c:\output_folder [max_size]


Top
   
PostPosted: Thu Jan 17, 2019 11:28 am 

Joined: Fri Dec 15, 2017 1:42 pm
Posts: 30
aluigi wrote:
What's the exact problem?
I mean do you have a script or command proof-of-concept that gives a problem?
Here everything works perfectly.

Here's the thing: since when QuickBMS was updated(not sure from which version on), the enum of the algos are placed into a separate file where the comments of scan ids per 5 algos that appeared in the original defs.h are now gone in this comtype.h. So I assumed that comtype scan should have been updated as well and basing on that extra code I thought it might use the names of the algos directly as the output names instead of their IDs. But the truth is that I still need to look for the IDs to find the algo names, in the OLD defs.h in order not to count them one by one in comtype.h. But if comtype scan hasn't been added any new feature then everything do work perfectly.

aluigi wrote:
comtype_scan2.bat already has an optional field for specifying the decompressed size, do you mean that one?

What I was talking about is that if I knew the decompressed size, I would like not to specify this field but use it to filter the possible results that meet with this size after comtype scan did its job, coz on the one hand, not specifying the decompressed size but using it as a filter condition can increase the chance to find the correct output, but I have to make sure that the algo I'm about to deal with doesn't necessarily need a decompressed size to perform the decompression correctly; and on the other hand, if I did specify the decompressed size, some wrong outputs might just be truncated to this size after the decompression process being terminated. Of course, since I don't really know too much about compression algorithms all those questions are based on my assumptions, which is why I would like someone famiiar enough with this area to answer them.


Top
   
PostPosted: Thu Jan 17, 2019 8:11 pm 
Site Admin
User avatar

Joined: Wed Jul 30, 2014 9:32 pm
Posts: 10262
The numeric IDs are no longer available in the source code because they are no longer used in comtype_scan2.bat/bms.
I guess you are using an old version of the bat/bms.
If you have an old dump from a previous version of comtype_scan2, you can easily recover the ID by counting the lines. ID 1 starts from QUICKBMS_COMP(ZLIB).

Regarding the decompressed size you can filter the size from Windows or by using the hexdump_scanner.bms script that reports the exact size of the file:
http://aluigi.org/bms/hexdump_scanner.bms
Usage:
Code:
quickbms.exe hexdump_scanner.bms c:\folder_with_dumps_from_comtype_scan2


From my experience I suggest to specify ever the exact decompressed size once avoiding false positives and false negatives (lz4, oodle and so on).


Top
   
PostPosted: Fri Jan 18, 2019 2:08 am 

Joined: Fri Dec 15, 2017 1:42 pm
Posts: 30
aluigi wrote:
I guess you are using an old version of the bat/bms.

Something like that. Turns out that I forgot to replace the old QuickBMS executable(v0.9.0) in my work directory while I didn't add it to the environment variable. Now everything works just fine. Thanks for this new update!
aluigi wrote:
From my experience I suggest to specify ever the exact decompressed size once avoiding false positives and false negatives (lz4, oodle and so on).

Um, guess it's the optimal solution from now on. :)


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 14 posts ] 

All times are UTC


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Limited