ZenHAX

Free Game Research Forum | Official QuickBMS support | twitter @zenhax | SSL HTTPS://zenhax.com
It is currently Sun May 26, 2019 9:23 am

All times are UTC




Post new topic  Reply to topic  [ 12 posts ] 
Author Message
PostPosted: Sun Nov 11, 2018 6:07 am 

Joined: Sun Nov 11, 2018 5:54 am
Posts: 2
Hello! I've hit a wall in making progress on ripping the assets from Harvest Moon - A Wonderful Life on Gamecube. The issue is that most of the assets are compressed in .clz files, which are an absolute mystery to me. I'm assuming it's some variation of LZ compression that seems to be common for a lot of Nintendo stuff, but the tools available for this confirm that it's a non-standard compression. As far as I can tell, this compression was only used on this game, the release of the same game with a female main character, and the PS2 port.

I'm sorry this isn't terribly specific - if someone could point me in a helpful direction, that would be great. I'm still playing around with some tools from QuickBMS, but I have a growing sense of dread that I need to reverse engineer the compression. Luckily, there are test files included on the disk that have the same information, where only one is compressed! I've included one example here. Optimistically, this would make determining the compression the file easy.

I'm sure you could tell, but this is my first exploration into reverse engineering something like this. Please pardon my ignorance.

Thanks! :)


Attachments:
File comment: Same file, uncompressed
Dummy_Uncompressed.txt [188 Bytes]
Downloaded 50 times
File comment: CLZ compressed file
Dummy_Compressed.txt [65 Bytes]
Downloaded 46 times
Top
   
PostPosted: Mon Nov 12, 2018 9:21 am 
Site Admin
User avatar

Joined: Wed Jul 30, 2014 9:32 pm
Posts: 10284
Can you provide bigger compressed samples?
The only good results I obtained (bcl_rice, lzfu_raw and SCUMMVM39) look false positives so I don't think I have a ready solution.

Dummy_Uncompressed.txt is not related to Dummy_Compressed.txt.


Top
   
PostPosted: Mon Nov 12, 2018 5:54 pm 

Joined: Sun Nov 11, 2018 5:54 am
Posts: 2
Sure, most of the data for the game is included in .arc.clz files. I think this is similar to a tarball, so after decompressing it should be a valid .arc file. I've used the comtype_scan2.bat tool and similarly had found a couple that were close to being a valid .arc file, but none that worked.

Thank you so much for looking at this, and sorry for the unrelated files. I made a correlation between file sizes, which in hindsight was a bad assumption.


Attachments:
mainchapter1.txt [4.27 MiB]
Downloaded 46 times
Top
   
PostPosted: Sun Dec 02, 2018 6:48 am 

Joined: Sun Dec 02, 2018 5:53 am
Posts: 7
I've also been looking into the clz compression from A Wonderful Life.

From what I can tell, the file header is composed of several parts.
  1. 4 bytes at 0x00000000 which is the CLZ identifier (i.e. 43 4C 5A 00)
  2. 4 bytes at 0x00000004 of the size (in bytes) of the decompressed data, in hex (e.g. 00 53 54 90 [5.46MB] for AWL’s commonall.arc)
    Currently this is only speculated. I am unable to confirm that this is what this variable actually is until I successfully decompress a clz file.
  3. 4 bytes at 0x00000008 with blank space (i.e. 00 00 00 00)
  4. A repeat at 0x000000c of the size in bytes (in hex). (e.g. 00 53 54 90 for the above file)
  5. One null byte at 0x00000010 (e.g. 00)
  6. The compressed file data starting at 0x00000011 (e.g. 55 AA 38 2D as this file contains a U8 [arc] Archive)

Image

I ran signsrch on the game executables and got the following results:
Quote:
A Wonderful Life: dvdroot/&&systemdata/Start.dol
Code:
  offset   num  description [bits.endian.size]
  --------------------------------------------
  0024bc70 3049 DMC compression [32.be.16&]
  0024bee1 1038 padding used in hashing algorithms (0x80 0 ... 0) [..64]
  002521c8 2304 zinflate_distanceExtraBits [32.be.120]
  002521cb 2303 zinflate_distanceExtraBits [32.le.120]
  0028e19b 1040 SSL3 #define [32.le.176&]
  0028e7a8 2417 MBC2 [32.le.248&]
  0028e7ab 2418 MBC2 [32.be.248&]
  002939c8 1563 libavcodec ff_zigzag_direct [..64]

- 8 signatures found in the file in 1 seconds


Another Wonderful Life (girl version of the game): dvdroot/&&systemdata/Start.dol
Code:
  offset   num  description [bits.endian.size]
  --------------------------------------------
  0023bd54 2417 MBC2 [32.le.248&]
  0023c36b 2418 MBC2 [32.be.248&]
  0024d3c4 3049 DMC compression [32.be.16&]
  0024d5d1 1038 padding used in hashing algorithms (0x80 0 ... 0) [..64]
  00250dd0 2304 zinflate_distanceExtraBits [32.be.120]
  00250dd3 2303 zinflate_distanceExtraBits [32.le.120]
  0028ebb8 1563 libavcodec ff_zigzag_direct [..64]

- 7 signatures found in the file in 1 seconds


Interestingly, the PS2 version of A Wonderful Life Special Edition contains both a compressed and uncompressed version of what appears to be the same file (mainchapter0.arc.clz and mainchapter0.arc).


Top
   
PostPosted: Mon Dec 10, 2018 12:26 am 

Joined: Sun Dec 02, 2018 5:53 am
Posts: 7
I ran another one of the files (preload.arc.clz) through comtype_scan2 and it seems like the best candidate would be some variant of either LZFU (most likely) or FIN (less likely).


Attachments:
File comment: LZFU.dmp output from comtype_scan2 analysis of preload.arc.clz
LZFU.dmp.zip [275.75 KiB]
Downloaded 15 times
File comment: FIN.dmp output from comtype_scan2 analysis of preload.arc.clz
FIN.dmp.zip [461.51 KiB]
Downloaded 15 times
File comment: CLZ-Compressed version of preload.arc (U8 archive).
preload.arc.clz.zip [372.7 KiB]
Downloaded 14 times
Top
   
PostPosted: Mon Dec 10, 2018 12:49 am 

Joined: Sun Dec 02, 2018 5:53 am
Posts: 7
I also tried scanning the above file (preload.arc.clz) using offzip, and got the following results:
Attachment:
File comment: Output of "offzip.exe -z -15 -S preload.arc.clz 0x00000010"
offzip_output_preload.arc.clz.txt [4.24 KiB]
Downloaded 26 times


Summary of valid compressed streams:
Code:
+------------+-----+----------------------------+----------------------+
| hex_offset | ... | zip -> unzip size / offset | spaces before | info |
+------------+-----+----------------------------+----------------------+
  0x00000fd1  61201 -> 61187 / 0x0000fee2 _ 4049
  0x00019447  45209 -> 45199 / 0x000244e0 _ 38245
  0x0002d8b0  36618 -> 36608 / 0x000367ba _ 37840
  0x00037206  46 -> 321 / 0x00037234 _ 2636
  0x0003a263  65375 -> 65365 / 0x0004a1c2 _ 12335
  0x0004aeca  42 -> 342 / 0x0004aef4 _ 3336
  0x0004d330  34 -> 347 / 0x0004d352 _ 9276
  0x00052f48  55 -> 665 / 0x00052f7f _ 23542
  0x00057771  37 -> 47 / 0x00057796 _ 18418
  0x0005ab5a  36 -> 85 / 0x0005ab7e _ 13252
  0x0005f9b6  34 -> 103 / 0x0005f9d8 _ 20024
 
- 11 valid compressed streams found
- 0x00032f2f -> 0x0003355d bytes covering the 51% of the file


Top
   
PostPosted: Mon Dec 10, 2018 7:46 am 
Site Admin
User avatar

Joined: Wed Jul 30, 2014 9:32 pm
Posts: 10284
deflate is prone to many false positives because it's just the compressed data without any crc or header (which is instead available in zlib).
So you can ignore those results.


Top
   
PostPosted: Mon Dec 17, 2018 12:21 am 

Joined: Sun Dec 02, 2018 5:53 am
Posts: 7
After running mainchapter0 through the speculated decompression formats, I'm beginning to think that I was incorrect thinking it might have been LZFU.

When viewing the output, I examined the section of the file (in this case, a compressed U8 archive) which would include a list of filenames.
A lot of the data seems to be missing when attempting to decompress using either LZFU or MSLZSS1.

Original compressed data:
Image

Expected output:
Image

LZFU:
Image

LZFU_RAW:
Image

MSLZSS1:
Image

Overall, there seems to be an issue with repeated strings (e.g. "_0.arc"), where they'll show up once, but then be missing in subsequent entries.

I'll try examining some other decompressed dumps from comtype and will update if I find anything of significance.


Top
   
PostPosted: Sun Dec 30, 2018 1:31 am 

Joined: Sun Dec 02, 2018 5:53 am
Posts: 7
If it helps, I was able to decompile the game's main executable (Start.dol) into a python-formatted script using RetDec 3.0.

There are some references to clz files, but I can't quite make sense of it.


Attachments:
File comment: Start.dol GCN executable
Start.zip [1.07 MiB]
Downloaded 10 times
Top
   
PostPosted: Sun Dec 30, 2018 1:33 am 

Joined: Sun Dec 02, 2018 5:53 am
Posts: 7
I also tried opening Start.dol in BrawlBox's memory editor, and found that the mainchapter%d.arc.clz seems to be possibly related to some sort of SceneInit function.
Image

There are also references to preload.arc.clz and commonall.arc.clz.
Image


Top
   
PostPosted: Sun Jan 13, 2019 3:29 am 

Joined: Sun Dec 02, 2018 5:53 am
Posts: 7
I'm wondering is the algorithm could be some variant of LZSS.

Is there a way to batch-test possible lzss configurations, similar to how comtype runs through the different compression types?


Top
   
PostPosted: Sun Jan 13, 2019 1:09 pm 
Site Admin
User avatar

Joined: Wed Jul 30, 2014 9:32 pm
Posts: 10284
Yes and no, I mean that it's extremely rare that lzss is used with settings different than the usual "12 4 2 2 0x20" or "12 4 2 2 0" (lzss0)

In 10 years the only non-standard lzss has been the following:
comtype lzss "11 5 2 2 0"

You can build a sort of fuzzer by generating the first 4 fields of the settings but it's just a waste of time since 99.9% is not the classical lzss algorithm.
It's probably faster to analyze or debug (via emulator) the game.
Anyway I have no additional suggestions.


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 12 posts ] 

All times are UTC


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Limited