ZenHAX

Free Game Research Forum | Official QuickBMS support | twitter @zenhax
It is currently Thu Mar 30, 2023 5:09 am

All times are UTC




Post new topic  Reply to topic  [ 7 posts ] 
Author Message
PostPosted: Sat Mar 04, 2023 10:12 am 

Joined: Sat Mar 04, 2023 8:41 am
Posts: 6
Hello. I've been rummaging through Metal Gear Solid games to publish complete, isolated collections of voiced audio. I'm interested in extracting the dialogue from this game's cutscenes, as they seem to be totally separate from accompanying music within the game's files. I popped open an iso of the game, and the material I'm looking for was in a folder called oss. It seems to contain both cutscene music and cutscene dialogue. The music is helpfully labeled bgm, while the voiced dialogue seems to be ordered chronologically which are both a big help for what I want to do. But this file format is giving me trouble. After looking at them in HxD editor, I started by deleting all the code before the header containing RIFF...WAVE, and then saving the file with an .at3 extension. Just taking a guess at that format since it's common for PSP games. From there I opened it up in foobar and got an audible, convertible soundbite. But it was only 1-2 seconds of material from each sep file. Taking another look at the code I found that RIFF header repeated several times as I scrolled down the page. It occurred to me that these must be a file container, with files organized in a "tower" of code. When I deleted the code preceding up to that second header, sure enough I got the next soundbite in that character's sentence.

Here's one of the .sep files unedited: https://drive.google.com/file/d/1VIpTcP ... sp=sharing

By putting the first two RIFF headers at the top and converting them separately, I got the soundbites "Don't worry" and "I'm not here to give you a new mission". They correspond with a line of dialogue in the game's first cutscene, timestamped in this link: https://youtu.be/5ljUp_4Crf0?t=83

So does anyone have any advice on how to extract all the files from this container type? Theoretically I could go down the list of each file container grabbing one 1-2 second piece of audio at a time, but that would take many hours of monotonous work. Maybe a script could be written to do it? I'll confess I'm a total amateur and have only gotten this far via googling odd things. Any answers you could provide in plain terminology would be very helpful to me.


Top
   
PostPosted: Sat Mar 04, 2023 11:40 am 
User avatar

Joined: Fri Mar 10, 2017 7:23 am
Posts: 396
Here's BMS script to extract your sample:

Code:
IDstring "SP{\x0"
get SEP_SZ long
get INFO_OFF long
get START_OFF long
goto INFO_OFF

for i = 0
   savepos INFO_OFF
   if INFO_OFF == START_OFF
      break
   endif
   get UNK long
   get UNK2 long
   get OFFSET long
   get ZERO long
   get ZERO longlong
   get UNK3 long
   get UNK4 long
   savepos TMP
   if OFFSET == 0xffffffff
      continue
   else
      math OFFSET + START_OFF
      goto OFFSET
      get SIZE long
      savepos OFFSET
      string NAME p "%03i.at3" i
      log NAME OFFSET SIZE
      goto TMP
   endif
next i


Top
   
PostPosted: Sat Mar 04, 2023 8:10 pm 

Joined: Sat Mar 04, 2023 8:41 am
Posts: 6
BloodRaynare wrote:
Here's BMS script to extract your sample:


I pasted that into a txt document and renamed the file extension to bms so that I could select in quickbms GUI. It extracted 22 at3 files from the sep file I provided, which I can convert to .wav in Foobar. This is exactly what I need

Unfortunately it won't work with the other sep files. Maybe it's because there's a difference in the text beyond "SP" at the start. Is there any way to get code that satisfies all of these files? If not, how would I manually adjust the script to account for the differences in each one? Here's all of them so you can see:

https://drive.google.com/file/d/15uao_- ... sp=sharing


Top
   
PostPosted: Thu Mar 09, 2023 12:14 am 

Joined: Sat Mar 04, 2023 8:41 am
Posts: 6
BloodRaynare wrote:
Here's BMS script to extract your sample:



Looks like I flubbed the quote system and it didn't send a notification of my reply. But I tried changing the top line of the code - the IDstring portion before "\x0", and was able to extract from more sep files just like the example. s_01_01b seems to contain all the files you could get from the others that begin with s_01_01 - it's every voice file in the game's first cutscene. All I had to do was match it with what the top line said in a HxD.

However this method doesn't seem to work on all of them because I can't interpret what their hex code is. HxD just calls it "...". I consulted a dictionary like this one and copied the result into the BMS script: http://www.unit-conversion.info/texttools/hexadecimal/

But it still returned a discrepancy when running the bms script. Almost like my computer can't "write" the appropriate symbol unless the result is an alphabetical letter. Any advice on how to interpret files like oss_04_06_02 correctly?

I'm also curious if you could have multiple ID strings in a bms script so that one .bms file could satisfy every file. I don't mind adjusting the script for each file, but it would sure be more convenient for somebody following in my footsteps.


Top
   
PostPosted: Thu Mar 09, 2023 1:41 am 
User avatar

Joined: Fri Mar 10, 2017 7:23 am
Posts: 396
ZappBranniglenn wrote:

Looks like I flubbed the quote system and it didn't send a notification of my reply. But I tried changing the top line of the code - the IDstring portion before "\x0", and was able to extract from more sep files just like the example. s_01_01b seems to contain all the files you could get from the others that begin with s_01_01 - it's every voice file in the game's first cutscene. All I had to do was match it with what the top line said in a HxD.

However this method doesn't seem to work on all of them because I can't interpret what their hex code is. HxD just calls it "...". I consulted a dictionary like this one and copied the result into the BMS script: http://www.unit-conversion.info/texttools/hexadecimal/

But it still returned a discrepancy when running the bms script. Almost like my computer can't "write" the appropriate symbol unless the result is an alphabetical letter. Any advice on how to interpret files like oss_04_06_02 correctly?

I'm also curious if you could have multiple ID strings in a bms script so that one .bms file could satisfy every file. I don't mind adjusting the script for each file, but it would sure be more convenient for somebody following in my footsteps.


You need to change the IDString value to just "SP" then add "get ID short" so, like this:

Code:
IDstring "SP"
get ID short
...


However, there's another problem. Some of the SEP files combines Sony ADPCM audio data and the AT3 file itself. And those files have a very confusing offset calculation. That's why the script fails once it reaches those kind of files. I'm currently still working on it to fix the script.


Top
   
PostPosted: Sun Mar 12, 2023 6:43 am 

Joined: Sat Mar 04, 2023 8:41 am
Posts: 6
BloodRaynare wrote:

You need to change the IDString value to just "SP" then add "get ID short" so, like this:

Code:
IDstring "SP"
get ID short
...


However, there's another problem. Some of the SEP files combines Sony ADPCM audio data and the AT3 file itself. And those files have a very confusing offset calculation. That's why the script fails once it reaches those kind of files. I'm currently still working on it to fix the script.


Now we're rolling! I've spent a few hours extracting from oss_s01_b.sep down to oss_s19_01tel.sep. It's got the cutscene dialogue that I want. It appears you may be right though, I think some files are being missed after cataloguing them by scene. It's not a lot. I've looked at about half the cutscenes of the game (they are organized chronologically) and I'm going to guess the rate of missing dialogue is about 20%. Some cutscenes just don't have an associated sep file while some other scenes are just missing a handful of dialogue pieces. If they have anything in common, it seems be more action-y sequences. Where dialogue, music, and sounds of battle are ramping up. In previous MGS games, some story-prominent codec calls would have the music baked in to the dialogue, so maybe this is Portable Ops' version of that.

Of course there is a possibility that some of the files I've yet to look at are numbered out of order and contain what is missing. Or maybe the script works on the files with "bgm" or "se" in their name and the missing content is contained somewhere within them. I'm crossing my fingers as I keep looking.

As for why the script is failing, I wish I had an answer. I can only offer a potential clue. I seen .sep files referenced among PS1 games as containers for "seq" files, including konami games like MGS1 and the PSX port of Snatcher. Maybe my assumption that these are meant to be Sony Atrac3 format was off base? I wouldn't know how to tell what a file is supposed to be just by looking at it, but this webpage talks about headers: https://loveemu.hatenablog.com/entry/20 ... SEQ_Format


Top
   
PostPosted: Sun Mar 19, 2023 4:31 am 

Joined: Sat Mar 04, 2023 8:41 am
Posts: 6
After running the script on all of them, I'd say the collection rate of cutscene dialogue was about 70%. Interestingly, one small piece of dialogue was hidden within one of the sound effect sep containers, but it's just the one instance of that. Probably just a genuine mistake on the part of the audio engineers. The script seemed to only fail when running through the sound effects. se_se_04_06_05 for instance. Is that where our problem audio files are? Many other sound effects came out as "unsupported format" in foobar.

Just trying to get a better idea of what the issue is, as I'm not sure where to even look for the problem. My goal is still the dialogue, but I'm expecting the remaining missing dialogue to have been mixed with associated music or possibly sound effects. They did that with previous MGS games. Usually they don't cause an issue but MGS1's have proven to be a problem to this day. Never quite sounding right after extraction. Even if the missing dialogue of these files comes out sounding the same as what you could get recording off game footage, I'd still like to get it for the sake of completeness.


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 7 posts ] 

All times are UTC


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Limited