ZenHAX

Free Game Research Forum | Official QuickBMS support | twitter @zenhax | SSL HTTPS://zenhax.com
It is currently Sat Feb 16, 2019 1:30 am

All times are UTC




Post new topic  Reply to topic  [ 7 posts ] 
Author Message
PostPosted: Mon Jan 28, 2019 9:34 pm 

Joined: Mon Jan 28, 2019 9:16 pm
Posts: 7
Hello ZenHax!

I'm trying to parse toc and dag files from Ratchet & Clank (PS4) and Sunset Overdrive.

I'll tell what i already know:
- Both Ratchet & Clank (PS4) and Sunset Overdrive uses the same engine (i'm guessing that, because there's a lot of similarities in these files (also filenames are the same))
- TOC file looks like that:
Code:
struct TOC
{
   int Magic; // 0xAF12AF17 - in both of games
   int DecompressedTOCSize; // Decompressed size (in bytes) of CompressedTOC
   byte[FileSize - 8] CompressedTOC; // Till the end of the file (FileSize is the "psychical file size" - 8 because of 2 ints)
}

Note: This is not correct struct, i written that just to let you know what i think it looks like

CompressedTOC is an ZLib compressed data (starts with familiar header
Code:
78 DA
)
After decompressing it, i saw another header "1TAD"

And then there is some data:
Code:
        public struct TAD_Header
        {
            public int Unknown;
            public int FileEndOffset;
            public int Unknown2; // Always 6? (Version?)
            public int Unknown3;
            public int ArchiveFilesBlockOffset;
            public int Unknown4; // 480? What?
            public int Unknown5;
            public int Unknown6; // 1024? (Block size?)
            public int UnknownOffset; // Most probably offset
            public int Unknown7;
            public int UnknownOffset2; // Also most probably offset
            public int UnknownOffset3;
            public int Unknown8;
            public int UnknownOffset4;
            public int Unknown9; // 5628??
            public uint Unknown10;
            public int UnknownOffset5;
            public int UnknownOffset6;
            public uint Unknown11;
            public int Unknown12; // 112?
            public int Unknown13; // 912?
        }

Note: This is not correct struct, i written that just to let you know what i think it looks like
Note 2: Please ignore my comments :P

I didn't looked at dag file but it also seems that it also have Magic, *some data*, and compressed data.
After decompressing dag files i noticed filenames?

Somebody can help me with parsing that data? I'm making some tools to make modding of these games easy.

I almost forgotten, here's both decompressed & compressed files from both games: https://drive.google.com/file/d/1XmzV8iE2GazF9HwhaLldtGNPlgZw3kpO/view?usp=sharing

Also there's useful files not used in game but (i think) it's useful for us:
- layout.csv
- scan.csv

(I'll upload them if needed)

Files in asset_archive folder from Ratchet & Clank (PS4):
Quote:
2018-01-22 03:13 3 219 804 160 g00s001
2018-01-22 03:13 127 696 896 a00s003.us
2018-01-22 03:13 205 807 616 a00s004.dk
2018-01-22 03:13 205 791 232 a00s005.nl
2018-01-22 03:13 206 594 048 a00s006.fi
2018-01-22 03:13 206 540 800 a00s007.fr
2018-01-22 03:13 207 151 104 a00s008.de
2018-01-22 03:13 206 016 512 a00s009.it
2018-01-22 03:13 219 938 816 a00s011.no
2018-01-22 03:13 221 827 072 a00s012.pl
2018-01-22 03:13 206 778 368 a00s013.pt
2018-01-22 03:13 205 426 688 a00s014.ru
2018-01-22 03:13 188 735 488 a00s015.es
2018-01-22 03:13 206 430 208 a00s016.se
2018-01-22 03:13 206 946 304 a00s019.ar
2018-01-22 03:13 218 836 992 a00s020.tr
2018-01-22 03:13 662 472 archive_input.json
2018-01-22 03:13 603 chunkmap.txt
2018-01-22 03:13 3 388 548 dag
2018-01-22 03:13 2 702 249 984 g00s000
2018-01-22 03:13 1 067 180 032 g00s002
2018-01-22 03:13 2 311 626 752 g01s000
2018-01-22 03:13 1 062 604 800 g02s000
2018-01-22 03:13 2 294 554 624 g03s000
2018-01-22 03:13 1 063 329 792 g04s000
2018-01-22 03:13 813 359 104 g05s000
2018-01-22 03:13 986 714 112 g06s000
2018-01-22 03:13 887 775 232 g07s000
2018-01-22 03:13 1 244 053 504 g08s000
2018-01-22 03:13 802 992 128 g09s000
2018-01-22 03:13 499 343 360 g10s000
2018-01-22 03:13 2 984 398 848 g11s000
2018-01-22 03:13 599 597 056 g11s001
2018-01-22 03:13 1 386 311 680 g12s000
2018-01-22 03:13 1 159 507 968 g13s000
2018-01-22 03:13 41 573 019 layout.csv
2018-02-14 15:16 2 826 240 p000035
2018-02-14 15:16 28 049 408 p000036
2018-02-14 15:16 2 060 288 p000037
2018-02-14 15:16 1 929 216 p000038
2018-02-14 15:16 1 765 376 p000039
2018-02-14 15:16 1 249 280 p000040
2018-02-14 15:16 712 704 p000041
2018-02-14 15:16 1 961 984 p000042
2018-02-14 15:16 585 728 p000043
2018-02-14 15:16 606 208 p000044
2018-02-14 15:16 1 118 208 p000045
2018-02-14 15:16 901 120 p000046
2018-02-14 15:16 684 032 p000047
2018-02-14 15:16 20 480 p000048
2018-01-22 03:13 26 997 867 scan.csv
2018-02-14 15:16 2 998 864 toc


Thank you in advance :)


Top
   
PostPosted: Tue Jan 29, 2019 11:50 am 

Joined: Mon Jan 28, 2019 9:16 pm
Posts: 7
Small update:
I noticed that . already made BMS script for Insomniac Engine "edge_of_nowhere.bms"

Based on that script i updated my structs

Code:
        public struct TAD_Header
        {
            public uint ID;
            public int EndOffset;
            public int PartsCount;
            public TAD_Header_Part[] Parts;
        }

        public struct TAD_Header_Part
        {
            public uint ID;
            public int Offset;
            public int Size;
        }


I'm reading it via this code:
Code:

        private void LoadHeader()
        {
            tocHeader.ID = compressedReader.ReadUInt32();
            tocHeader.EndOffset = compressedReader.ReadInt32();
            tocHeader.PartsCount = compressedReader.ReadInt32();
            tocHeader.Parts = new TAD_Header_Part[tocHeader.PartsCount];

            Console.WriteLine("TOC Header ID: " + tocHeader.ID);
            Console.WriteLine("TOC Header EndOffset: " + tocHeader.EndOffset);
            Console.WriteLine("TOC Header Parts: " + tocHeader.PartsCount);

            for (int i = 0; i != tocHeader.Parts.Length; i++)
            {
                Console.WriteLine("--- TOC Header Part " + i + " ---");

                tocHeader.Parts[i].ID = compressedReader.ReadUInt32();
                tocHeader.Parts[i].Offset = compressedReader.ReadInt32();
                tocHeader.Parts[i].Size = compressedReader.ReadInt32();

                Console.WriteLine("ID: " + tocHeader.Parts[i].ID);
                Console.WriteLine("Offset: " + tocHeader.Parts[i].Offset);
                Console.WriteLine("Size: " + tocHeader.Parts[i].Size);
            }
        }


It works very well, but has anybody parsed DAG file(s)?


Top
   
PostPosted: Fri Feb 01, 2019 9:18 am 

Joined: Mon Jan 28, 2019 9:16 pm
Posts: 7
Okay guys.

It seems that DAG file only lists files that have full-names, and are not generated by engine(?)
I noticed that every "readable" file(names, paths) are in gXXsXXX archives, NONE of the files (based on layout.csv) seems to be in audio files (aXXsXXX.LANG).

., you did great job in this script "edge_of_nowhere.bms", unfortunately you aren't parsing DAG files at all.

Somebody will help me?
I want to extract files with correct names.


Top
   
PostPosted: Fri Feb 01, 2019 9:30 am 

Joined: Mon Jan 28, 2019 9:16 pm
Posts: 7
Small update:
Audio files are listed in TOC files
And it seems that entry looks like that:

- ID of audio archive file?
- Name of audio file with .wem extension

Image


Attachments:
Bez tytułu.png [430.33 KiB]
Not downloaded yet
Top
   
PostPosted: Fri Feb 01, 2019 11:14 am 

Joined: Mon Jan 28, 2019 9:16 pm
Posts: 7
Okay,
I finally uploaded what i say "Help Files" which may can help you with parsing toc & dag files (archive_input.json, chunkmap.txt, layout.csv, scan.csv)

https://drive.google.com/file/d/1YlZEGj ... sp=sharing

Anybody can help?


Top
   
PostPosted: Sat Feb 02, 2019 5:51 pm 
Site Admin
User avatar

Joined: Wed Jul 30, 2014 9:32 pm
Posts: 9764
Just downloaded the samples and tried the script.
The format seems correct and the script works if you rename toc_rac as toc.
The only thing I can add to the script is avoiding to open "TOC" if you already selected a toc* file, let me know if that helps.

Regarding the filenames, there are no names stored in the archive.


Top
   
PostPosted: Sat Feb 02, 2019 7:03 pm 

Joined: Mon Jan 28, 2019 9:16 pm
Posts: 7
., filenames and other stuff are stored in DAG file, but it seems that not every file have filename.

Some of the files (like audio files) have static path and their filename is basically built path saved as integer in layout.csv

As you now TOC & DAG files are splitted into parts, your script reads part 0 (archive names), part 2 (file sizes) and part 4 (file offsets & archive)

I realized that part 1 is built file names (the one listed in layout.csv) (also, remember the endianness)

Back to the topic: it seems that some file names are being generated by these built file names (these files arent listed in DAG file BUT it exists in layout.csv)

Also in your script you wrote that DAG file have 1/4 of the file names, I'm wondering why.

It seems that layout.csv is being generated BY these DAG & TOC files.

I know how to parse TOC files (even built files part (just read 8 bytes till end of the part)), but I want filenames (it much better than just xxxxxx.dat file, huh?)


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 7 posts ] 

All times are UTC


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Limited