ZenHAX

Free Game Research Forum | Official QuickBMS support | twitter @zenhax
It is currently Sat Aug 13, 2022 3:49 am

All times are UTC




Post new topic  Reply to topic  [ 2 posts ] 
Author Message
 Post subject: custom lz4 compression
PostPosted: Mon Feb 14, 2022 8:41 pm 

Joined: Sat Sep 28, 2019 7:00 pm
Posts: 647
Originally, this game was using light obfuscation (xor in particular) for compressed assets, but since official release they've switched to something that looks like modified implementation of lz4.

Personally I'm not interested in the game, so the question would be - is that possible to guess something from the compressed data instead of reversing compression code? I took a quick look at how lz4 blocks are formed (according to official documentation) and it appeared like some sequences in a block are malformed or handled differently (different approach to overlap match?). For example, in uasset block from samples first 4 sequences seems fine, but in the 5th suggested sequence size if bigger than actual one, so decompressor fails on reading offset part of that sequence - also, I've noticed that function returns actual position in file where decompressor fails in the quickbms error, which is quite convenient.

Regardless, any thoughts would be appreciated just for research purposes. Note that samples contains suggested uncompressed size in the name, and one of sample can be actually decompressed (not sure if correctly or not).


Attachments:
lz4_samples.zip [61.56 KiB]
Downloaded 66 times

_________________
You can request AES keys on rin forums (the list with keys is also there)
AES keys finder and latest UE4 bms scripts: in this post
Top
   
PostPosted: Fri Mar 18, 2022 2:41 pm 

Joined: Thu Mar 10, 2022 10:05 pm
Posts: 3
I'm looking into this and my current hypothesis is that there's no alternative LZ4 compression handling logic. Instead, I believe that it is regular LZ4 compression altered after the fact by some kind of encryption or obfuscation. Perhaps something like the XOR that you mentioned the game used to use.

The obfuscation seems to alter some of the bytes, leaving most of the bytes in the clear.

When this obfuscation alters bytes that are compression "literals" or "offsets", a regular decompression routine will output wrong values but continues to decompress without noticing a problem.
When this obfuscation alters compression "tokens" or "lengths", a regular decompressor will likely get lost, hit invalid sequences, and maybe give up.

So far, I haven't seen any obvious pattern for which bytes get altered nor how they change. Sometimes there are 10 clear bytes in a row, then an altered one. Sometimes the clear run is as few as 4 bytes. I haven't yet spotted a definite case of two altered bytes in a row.

I've been focusing on the asset file you mentioned because it's got plenty of ASCII strings that I can spot holes and guess what was there originally.
Here's a list of some offsets in that compressed asset file, my guess as to the clear byte that was there before encryption, and the byte that I really found there. (Numbers are in decimal and CSV format so you can import them into a spreadsheet or script for analysis.)

Code:
compressed_pos,guessed_clear_value,found_obfuscated_value
60,14,50
124,105,138
135,108,251
146,87,165
167,117,107
213,85,38
220,47,48
228,116,10
234,50,34
242,110,16
308,111,26
342,105,131
387,108,218
428,108,206
457,105,204
471,99,49
477,67,101
529,111,193
571,101,219
579,109,182
638,108,211
766,104,41
777,100,86
891,97,118
900,101,0
907,101,115
1196,105,72



Notice that the byte value 108 seems to be represented in the encrypted stream as at least four different values: 251, 218, 206, 211.

On the other hand, I have yet to find any of the encrypted values reused.

Also notice that the byte value 101 encodes to 0 at offset 900, while 7 bytes later the same input value encodes to 115. What's the relationship between these numbers? I don't know.


As for detecting an LZ4 file with these modifications, I suppose an algorithm could look for "unreasonable" offset values (seeking further back than the decompressed bytes go). Theoretically, it could even try to get back on track by trying out some nearby possibilities.


Attachments:
century_uasset_encryption_guesses.csv [361 Bytes]
Downloaded 37 times
Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 2 posts ] 

All times are UTC


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Limited