Hey there people!
I've started researching Game Maker Studio's data.win file, and thought I'd share some stuff with you.
Lately I've been into translating Game Maker Studio games, but I quickly realised that it's not an easy task, yet I still want to continue on. Looking on the Internet I've found out that some people have managed to translate Undertale into Spanish language. Since Undertale is a GMS game, I wanted to ask the person in charge of the localization on how to do what he did. To my surprise he wrote a complete tutorial on it, and I've read it thoroughly.
Sadly I can't test my theories yet, but I'll post if what I have discovered is true or not. But I know these things are certain, and someone more talented than me could create a universal program for data.win files (IF I'm not mistaken, that is).
My game in question would be Fran Bow, I'd localize that first, but I'm not looking for a tool for just this one game, I'm looking for a universal one for all GMS games (well, most of them, anyways).
Here's how a data.win file looks like for a translators point of view, or what I need to look for:
1. If I open up the file in a HEX Editor, then it immediately starts with the text "FORM". After that 4 bytes store the file's lenght minus 8 bytes AND it's in reverse order. So in my case Fran Bow's data.win file is 474 042 532 bytes, and if I look at the bytes after FORM it says "9C 50 41 1C". If I reverse order it, then it becomes "1C 41 50 9C", which translates to 474 042 524. The program could get the file's size in bytes, substract 8 from it, get that value in hex and reverse it, then replace the 4 bytes after FORM in the beginning with the new data. Another method is to go to the very end of the file, then go back 8 bytes, and where we are standing is the size of the file (but we still have to reverse it). In case you want to know where we are standing, in HEX Workshop if you stand on a byte you can look for the "Caret" section in the bottom right corner of the program. That's the value we need to reverse and put after FORM.
2. Then I need to look for the STRG section of the file. This has 2 sections, the first one has Pointer which determine where a certain string starts, and then the strings themselves.
Strings are also divided up to 3 parts:
Firstly there are 4 bytes as header, and the 1st byte of that determines the strings' length in Uint32 format, and the next 3 bytes are just null characters.
Then comes the actual string, which is as long as said header's first byte determines so.
And then when it finishes it closes with a null character, and then the next string comes (with header, actual string and a null character).
Okay, but there's a catch: The pointers.
After the STRG section (which you can search for, just like for "FORM", although that one's at the beginning of the file), there are lots of characters.
I think we don't need to edit the first 8 bytes after STRG, just the ones after it.
Every 4 bytes store the offset at which our strings start. There's no division between them, they come right after each other. So in my case the very first string's first header byte started at "02 06 A0 08" (if I look at the Caret thingy). Now from the 9th byte after the STRG, I need to write down the REVERSE of this, so "08 A0 06 02", then I get my next string's header's first byte, and write that for the next 4 bytes and so on.
So the program should determine how much strings the game has (based on the number of pointers), and generate the strings according to it.
It inserts the strings in the game, then the pointers to them. Note that the STRG section can end with lots of null characters, and then a new section called TXTR (I assume this is Texture) begins. We don't need to worry about that, we just need to edit 4 bytes after FORM, a lot of bytes (from like 9 bytes after STRG), and then insert actual strings.
The only thing I don't understand yet (and I think we don't even need to edit that) is what those 8 bytes after STRG mean. The number of strings in the game? The whole lenght of it? It could be something like that, because when the last pointer ends (4 bytes like I've said before), then a string begins (with header, string and null). So there's no distinguishing between where the pointers end and where the strings start. It has to be in the first 8 bytes after STRG...
Okay, I know what it is (or I assume it is): The first 8 bytes' Uint32 values are actually two different 4 byte values.
The first 4 byte value determines the overall length of the STRG section, after STRG's G letter to the very last letter before TXTR (so before T), minus 4. I've selected all of the bytes after STRG's G letter (so from the 5th byte, S being the first) and allllll the way to TXTR, and the value I've got is 2D 1F DC. I've noticed that this is almost the exact same as the 4 bytes after STRG, but there's a null separator.
After that null separator there's another Uint32 value, which says 63319. This is the number of strings (and pointers!!!) in the file. This is also a 4 byte header-type thingy, because it has 2 null characters in it, and after that the very first pointer immediately starts.
That's all I could come up with, but will continue my research tomorrow with live tests on it. If you guys have any ideas on this, or you could create a program that does all of this (and not just for Fran Bow, for all GMS games with data.win files), then I think we just made something big
data.win starts with FORM, after FORM there are 4 bytes that determine the file's overall lenght (in Uint32).
Somewhere there's the STRG section, followed by the TXTR section.
Between these two the first 4 bytes of the STRG section determines the overall length of this section in Uint32 format, from the 5th byte counting from STRG's S to the character before TXTR's T.
The next 4 bytes store how much strings there are actually here.
Immediately after these 4 bytes the pointers start. These are 4 byte HEX values pointing to a string's beginning in reverse HEX order (so if something is at "00 01 AC 3B", this pointer becomes "3B AC 01 00" in value). These pointers come one after another, with no dividing character between them.
After every pointer has been inserted, a null character is written as a separator, then the strings begin.
A string has 3 parts: the first 4 bytes determine it's length, then the second part is the string itself, then it ends with a null character. After that another string goes on and on and on until it has ended. Note that no matter how long the whole section is, because the STRG's first header's say how long the STRG section is, we can fill the end up with null characters if we want to (so we have 5412 pointers and string with a length of 32010 characters, and if we only write 32000, then we can insert 10 null characters in the end before TXTR, so that the whole section is as long as it was written in HEX).
Now I'm off to sleep
Added images here to demonstrate what I'm saying.