Skip to content

Docs and Thoughts

LittleChungi edited this page Aug 6, 2022 · 8 revisions

Thoughts

Going to put my thoughts here, as well as progress and more additional info about these formats. So this library is still WIP, and well, it's badly coded, I am new to this so you will see a lot of weird stuff in the code and whatnot. As for the codecs themselves, I plan to add more, to cover all Criware formats, from old to new if I have the resources and the time. I got tired from using multiple tools to get things managed, so I am trying to compile them in one place using my limited knowledge.

Codecs and Options

ADX

In fact this is not as ideal as I want it, it still only supports ADX 1 (Same as AHX?) but it can encode as well as decode using a C++ extension. I want to add support for looping as well and many other stuff to come, but I mainly made this lib to then add it with the (still uncomplete) USM parser. In hopes of getting things tightly done.

Now for the codec it self: The header is fairly simple, at least for the bare level when not supporting any kind of looping.

struct AdxHeader{
    unsigned short signature;
    unsigned short dataoffset;
    unsigned char encoding;
    unsigned char blocksize;
    unsigned char bitdepth;
    unsigned char channelcount;
    unsigned int samplerate;
    unsigned int samplecount;
    unsigned short highpassfrequency;
    unsigned char adxversion;
    unsigned char flags;
    // No looping support yet.
};

The code that I shamelessly copied (then modified) from Nyagamon/bnnm CRID extractor for this to work, managed to make it pretty good. It supports encoding for any blocksize and channel count given the correct wav file (didn't test it) and it also decodes any given blocksize, unlike VGMstream which surprised me when I noticed that it doesn't. But it still doesn't support arbitrary bitdepths, so that is in my TODO list.

As for my lib, you can specify some other stuff when encoding.

def encode(self, Blocksize = 0x12, AdxVersion = 0x4, DataOffset = 0x011C) -> bytearray:

As you can see, you can specify Blocksize, AdxVersion (ideally just change it to 0x3 or 0x5 since they seem the same). or Dataoffset, in which I defaulted to 0x011C just because it's used for USM's.

I would still need to test ADX decoding for other versions or encodings, for now at least it works for ADX encoding of version 2, 3 and 4. Encoding 2 being for pre-calculated coefficients, Encoding 3 for standard encoding, and encoding 3 for exponential scale. Adding more encoding is in my TODO list. The code might crash a lot, if it does report to me, possibly due to WAV files having extra data in the header.

CPK

Finished this one recently, but it took so long thanks to building and standardizing the @UTF chunk parser to Donmai's WannaCri, my code is insanely bad, so I advise you not to look at it if possible.

But CPK extraction and building is now supported for CpkMode 0, 1, 2, and 3 (2 and 3 not building yet, but will finish it quickly.) CPK's are made of "tables" and "Contents", in which each table has a @UTF chunk inside, there are up until now, 7 known tables:

class CPKChunkHeaderType(Enum):
    CPK   = b"CPK "
    TOC   = b"TOC "
    ITOC  = b"ITOC"
    ETOC  = b"ETOC"
    GTOC  = b"GTOC"
    HTOC  = b"HTOC"
    HGTOC = b"HGTOC"

and their data is in little endian (contrary to the rest of file formats used in Criware formats). The header is 16 bytes long, containing possibly 4 elements

struct CpkHeader{
    unsigned int header;
    unsigned int unk04;
    unsigned int packet_size;
    unsigned int unk0C;
};

Right after those tables, follows a @UTF chunk which has the needed informaton.

CpkMode 0 is the rarest I could find, which "Nichijou - Uchujin" uses, and only uses an ITOC table, so files are sorted by size and name with no filenames to match.

CpkMode of 1 has just a TOC, found this in one of "Tekken 7"'s CPKs.

CpkMode of 2 has both a TOC and an ITOC, with an ETOC at the end as well.

CpkMode of 3 includes a TOC, GTOC and an ETOC. I still haven't found a game which uses Hgtoc or an Htoc, perhaps modes 4 and 5 if they exist.

This lib also support CIRLAYLA decompression taken from code made by tpu (and modified greatly), it does not support CRILAYLA compression yet when building a CPK file.

UTF

This was pain to code, and is abysmally bad on my code, however it works. Given a UTF file or UTF chunk bytes, the PyCriCodecs.UTF class will parse it in two different ways. Onne way used internally by my CPK tools (needs to be scrapped) and the other is a standardized array of dictionaries used to build custom UTF table. The standardization comes from WannaCri's lib (Although my approach is largely different) which has a list of dictionaries, each dictionary will have key, and value of a tuple;

CpkHeader = [
                {
                    "UpdateDateTime": (UTFTypeValues.ullong, 0),
                    "ContentOffset": (UTFTypeValues.ullong, ContentOffset),
                    "ContentSize": (UTFTypeValues.ullong, ContentSize),
                    "TocOffset": (UTFTypeValues.ullong, 0x800),
                    # etc....
                },
                {
                    ....
                }
            ]

This format allows us to parse any data into a UTF table for possible use of mod-ing these files. I use this one to build the CPK archives, and possible the USM, ACB/AWB files as well. You can do it fairly simple as well:

from PyCriCodecs import *
# Parsing UTF
data = UTF("filename_or_bytes.dat")
data.table # This will give you the internal parsed table that is used by my CPK lib.
payload = data.get_payload() # This will return a list of dictionaries which can be used to build UTF tables, you can modify them as well.
data.encoding # Returns the type of encoding of the UTF data, although not specified in the UTF table, the code will try all 3 possible encodings.

You can also use these payloads or make your own to build UTF data as well:

def __init__(self, dictarray: list[dict], encrypt: bool = False, encoding: str = "utf-8", table_name: str = "PyCriCodecs_table") -> None:

As seen above, the UTFBuilder class only needs a list of dictionaries to build any table, ideally you would want to put your table_name as well The encrypt option is to encrypt the UTF as some games have that feature, it's fairly simple to implement, so why not. The encoding option is to encode the strings given to the UTF data, in which they can be either UTF-8, shift-jis, or UTF-16, although using these 3, the code will check if they produce any null bytes.

utfObj = UTFBuilder(payload, table_name="my_table")
utfObj.parse() # returns a bytearray of the UTF binary data to use.

Closing thoughts.

That's all for now, will update this wiki with the C++-API usage as well, and well for any other encodings and whatnot.

Clone this wiki locally