-
-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Features to get maximum possible encode length of given mnemonic(or +given operands) #548
Comments
There is no simple way to determine the instruction length so it would be almost as complex as encoding it fully. Why do you have to actually pre-allocate per instruction? Just have 15 bytes on the stack and then copy it to your real buffer. |
well even "max possible size" of certain mnemonic, as database, not going to be easy? I am writing a recompiler. because I don't know the max size of the insn, I just preallocate 15 bytes(per insn) and then shrink hierarchically by actually trying to assemble. it would have been fantastic if there was a database like the above. just a feature proposal. *btw, I am writing a recompiler and testing with UnrealEngine exe, which is almost 100MB. without ability to determine reasonable max size of insns, 'shrink' algorithm was essential(by actually trying encoding) to prevent the recompiler output code to be "prohibitedly large" |
It's fundamental to understand that this statement is true. After figuring out: that instruction is possible to encode, finding the definition which matches operands, validating that operands obey all the rules imposed by the definition, computing all the necessary parts of an instruction (ModR/M, SIB, prefixes, etc.) you are basically done encoding the whole thing. Instruction length becomes known at the very end of the process. If you really need length you can just encode and discard the output.
I don't see how that's essential at all, it seems rather unconventional. If you have 100 MB of instructions to recompile you can start with assumption that you will generate around 100 MB of code. That would be your initial buffer allocation (later you can +N% to this based on empirical observations). You keep encoding instructions, append them to this buffer and simply grow the buffer if it ever runs out of space. In other words, you are working with a std::vector, a structure that grows when necessary rather than shrinks. |
Feature Proposal: Maximum Instruction Size Lookup for ZydisEncoder First, I want to express my appreciation for Zydis. The library's exceptional code quality has saved us significant development time, and its completeness is evident as it's the only external dependency in our project. This is a feature proposal aimed at enhancing ZydisEncoder for compiler frontend use cases, particularly when encoding multiple functions. While I've already implemented a workaround for my specific needs, I believe this feature could benefit others who want to compile multiple functions at once with ZydisEncoder, especially for parallel compilation scenarios. Current Challenge:
Proposed Solution:
I acknowledge implementing this may be more complex than initially apparent, given all the instruction encoding steps (checking possibility, operand validation, ModR/M/SIB/prefixes computation, etc.) you provided. However, precomputing those and make a table of sizes sound possible to me. Current Workaround: There are of course various tricks and workarounds, but it will be more convenient if we had such precomputed table. Common value of this feature: |
For example: I gave up doing this manual table and just gone sequential compile + multiple shrink passes, lol |
Why can't you first encode it into a stack buffer and then expand your function/global buffer? A memcpy of 16 bytes is fairly cheap which is most likely two mov instructions but can be also done with just one. In one of my projects I have the local buffer defined as: struct EncodeResult {
uint8_t size;
uint8_t bytes[15];
};
static_assert(sizeof(EncodeResult) == 16); So with that I just do following const auto encodedInfo = encodeInstr(...);
finalBuffer.insert(finalBuffer.end(), std::begin(encodeInfo.bytes), std::begin(encodeInfo.bytes) + encodeInfo.size)); Also perhaps you might want to have a look at https://github.com/zyantific/zasm, I used it for a binary rewriter as well so I know it works fairly well. |
Thank you for the attention. Let me provide more materialized example case:
## Core Problem Visualization
```nasm
Block0:
push reg1
push reg2
sub rsp, const1 ; size uncertainty
jmp Block1 ; Size uncertainty
Block1:
jmp Block2
Block2:
... First Pass (Naive Parallel)Maximum allocation approach enables parallelism with space overhead: Block0:
nop * 15 ; Space for push reg1
nop * 15 ; Space for push reg2
nop * 15 ; Space for sub rsp, const1
nop * 15 ; Space for jmp Block1
; Pre-determined addresses enable parallel compilation
Block1:
...
Block2:
... Key Points:
Second Pass (Sequential Shrinking)Block0:
push reg1 ; Size resolved through encoding
push reg2 ; Size resolved
sub rsp, const1 ; Size determined by const1
nop * 15 ; RIP-relative uncertainty remains
Block1:
... Challenge:
Third Pass & Loop ShrinkBlock0:
push rbp ; Known size
push r15 ; Known size
sub rsp, 28 ; Short sub
jmp block2 ; 2-byte short jmp possible
Block2:
... Iterative Approach:
Ideal First Pass (With Size Estimation)Efficient single-pass approach using size prediction: Block0:
nop * 2-3 ; push reg1 size predictable
nop * 2-3 ; push reg2 size predictable
sub rsp, const1 ; 4-byte immediate assumed
jmp Block1 ; 5 bytes allocated (near jump)
Block1:
... Key Benefits:
|
So I wrote this small test to see what the cost of doing 1'000'000 instructions are with zasm: TEST(SerializationTests, Encode1MillionRandom)
{
Program program(zasm::MachineMode::AMD64);
x86::Assembler assembler(program);
const auto count = 1'000'000;
for (int64_t i = 0; i < count; ++i)
{
const auto& instr = tests::data::Instructions[i % std::size(tests::data::Instructions)];
instr.emitter(assembler);
}
zasm::Serializer serializer{};
ASSERT_EQ(serializer.serialize(program, 0x140015000), ErrorCode::None);
} This runs in under 500ms and peak memory usage is 312,668 K (301.4 MiB), I would say that is fairly efficient already. Like before, getting the instruction length is not trivial and since you have to consider all the corner cases you will end up with the same complexity as just encoding it. Anyway, if you just need the instruction size you could do following: ZyanUSize ZydisEncoderGetInstructionSize(ZydisEncoderRequest* req, ZyanU64 address)
{
ZyanU8 buf[15];
ZyanUSize len = sizeof(buf);
ZyanStatus res = ZydisEncoderEncodeInstructionAbsolute(req, buf, &len, address);
if (res != ZYAN_STATUS_SUCCESS)
{
return 0;
}
return len;
} Which is how the API probably would work if Zydis were to ever add something like this, it would have at least near identical complexity minus the writes to the buffer which are trivial already. |
Still want to argue this feature would be valuable:
Other than that I came up with a new workaround:
About Zydis:
|
Currently I am preallocating 15 bytes(absolute maximum size) per an insn, for a recompilation.
However, most of the instructions are not that large. And the preallocation gets too big compared to what it needs to be
Tried out ZydisGetEncodableInstructions, but there was no straightforward length information, and it seemed like zydis encoder is designed that like it figures out the length by actually trying the encoding.
If there was a ZydisEncoderGetMaxLengthForMnemonic(), ZydisEncoderGetMaxLength(mnem, operands), it would have been very very helpful.
The text was updated successfully, but these errors were encountered: