-
-
Notifications
You must be signed in to change notification settings - Fork 94
15: Chapter 8 | LAB Exercise Playbook
In this exercise we will focus on call stack analysis and compare the call stacks of all the loaders. We will compare the techniques of direct and indirect syscalls in the context of EDR evasion. We will look at why direct syscalls can be detected by EDRs (depending on the EDR), how indirect syscalls can help in this case, and the limitations of indirect syscalls.
The main part of this exercise is about how EDRs can use or analyse the callstack of a loader, or more precisely a function, to check whether the return address appears to be legitimate or not. In this chapter we will analyse the callstack of each loader (Win32, Native, Direct Syscalls and Indirect Syscalls). You can use Process Hacker to analyse the callstack.
- The tasks in this chapter require you to have completed all the previous chapters, and to use the shellcode loaders you have created.
Task Nr. | Task Description |
---|---|
1 | Run a standard application such as cmd.exe and analyse the call stack. |
2 | Run your win32, native, direct syscall and indirect syscall loaders. Compare their call stacks with each other, and also with cmd.exe's stack. Which do you think has the most legitimate call stack? |
3 | Based on your call stack analysis, why might indirect syscalls help bypass syscall and return address checking EDRs compared to direct syscall loaders? |
4 | Compare the callstack between the native loader and the indirect syscall loader. Could the native loader also be used to bypass EDRs? |
Before we start the call stack analysis exercises, what are the Indicators of Compromise (IOCs) that might help us identify malware in memory, or that might be used by EDR vendors to identify malware? You can use these IOCs as a guide to identify IOCs in your loaders.
-
The
syscall
andreturn
instruction should always be executed from a memory region inntdll.dll
, so that when the shellcode execution is complete,ntdll.dll
is placed on top of the stack as the last element with the lowest memory address. -
If a native function, for example
ZwWaitForSingleObject
, is executed outside of a memory region inntdll.dll
. Native functions are part ofntdll.dll
and should always be executed from memory inntdll.dll
.
As additional information, not directly an IOC in the context of the call stack itself, but in the context of not legitimate memory regions, also look for unbacked memory regions in the context of the meterpreter payload. For additional information, an unbacked memory region, sometimes referred to as "anonymous memory", is a region of memory that is not associated with a file on disk. This means that it's not backed up by a specific file, such as an executable (.exe) or dynamic link library (.dll) file. For example, if you look at legitimate memory areas with Process Hacker, you will see that they are of the type 'image' and also point to the associated image. If you look at a meterpreter payload in memory, you will see that there are also some memory areas of type "private" that do not point to an image. For example, the 4kB
meterpreter stager can be identified. These types of memory areas are called "unbacked executable sections" and are usually classified as malicious by EDRs. Similarly, from an EDR's point of view, it is rather unusual for a thread to have, for example, memory areas in the .text (code) section marked as read (R), write (W) and executable (X) at the same time. By default, the .text section is a read-only section in the PE structure. When using a Meterpreter payload, this is not entirely true, because by using the Windows API VirtualAlloc, certain areas are additionally marked as write (W) and executable (X), or the affected memory area is marked as RWX
in its entirety (PAGE_EXECUTE_READWRITE
). See the following section for more details.
As a first step, we want to compare the call stack of a standard application like cmd.exe with the call stack of the Win32 loader. So we need to run an instance of cmd.exe
and the win32 loader and take a look at the call stack, more specifically we want to take a look at the stack frames from the main function. As mentioned earlier, we want to use Process Hacker to analyse the call stack. To see how Process Hacker can be used for call stack analysis, check out the detail section below.
You can double-click cmd.exe
or right-click and select Properties.
Then we select a thread, again we can double click or right click and select Inspect.
When analysing the win32 loader with Process Hacker, we were unable to identify any IOCs. This sounds logical, but let's write down our findings anyway.
- No native functions executed outside of
ntdll.dll
memory - The
ntdll.dll
is on top of the call stack and is an indicator of a legitimate stack. - No unbacked memory regions
- No
RWX
regions in the.text
section
These results from analysing the default application can be used as a reference or guide when analysing your shellcode loaders.
In this step we want to analyse the call stack from the Win32-API loader and compare it with the call stack from cmd.exe in the previous step. Remember that in the Win32-API loader the control flow is loader.exe
-> kernel32.dll
-> kernelbase.dll
-> ntdll.dll
-> syscall
, based on that what to expect or how the order of the stack frames should look like? In case of the Win32-API loader we want to analyse the main thread mainCRTStartup
. By analysing the Win32-API loader and comparing it to cmd.exe
, the following results can be observed.
Results
Due to the technical principle of the Win32-API loader, the call stack or the order of the stack frames looks legitimate. The ntdll.dll
is placed on top of the stack and is an indicator that the return
instruction is being executed from memory of the ntdll.dll
. Also, the Win32 API is executed from memory of kernel32.dll
or kernelbase.dll
and the native function ZwWaitForSingleObject
is executed from memory of ntdll.dll
. Both of these observations are indicators of non-malicious behaviour.
From this point of view we could say that this is a stack with high legitimacy and should be good to go to bypass an EDR in the context of the return
address check in the call stack. But don't forget that as soon as an EDR uses use mode hooking or a similar mechanism to analyse executed code in the context of APIs - and this is more or less always the case today - your Win32-API loader will normally be detected by the EDR.
Looking at the memory regions of the Win32-API loader, things get more interesting. Perhaps not a strong indicator, but still useful, we can identify the meterpreter payload in memory. The default meterpreter stage is about 4kb
and the stage loaded afterwards is about 200kb
. By analysing these in-memory regions, we will see that we could identify two clear IOCs that lead to two malicious in-memory behaviours.
- Unbacked memory regions
-
RWX
commited private memory in.text
section
In this step we want to analyse the call stack from the NTAPI-Loader and compare it with the call stack from the Win32-API loader in the previous step. Remember that in the NTAPI-Loader the control flow is loader.exe
-> ntdll.dll
-> syscall
, based on that what to expect or how the order of the stack frames should look like? Also in this case we want to analyse the main thread mainCRTStartup
. When analysing the NTAPI-Loader, the following results can be observed.
Results
Comparing the call stack from the NTAPI-Loader with the stack from the Win32-API loader or the default application, the call stack doesn't look totally weird in this case either. In my opinion a possible IOC could be that ZwWaitForSingleObject
is executed directly without or before using the corresponding Win32 API WaitForSingleObject
. In the context of ZwWaitForSingleObject
I would say it could be a possible IOC. But in general, it's not uncommon for some native Windows function to be executed directly from ntdll.dll
memory.
From this point of view we could say that this is a stack with high legitimacy and should be good to go to bypass an EDR in the context of the return
address check in the call stack. But also in this case, don't forget that as soon as an EDR uses use mode hooking your NTAPI-Loader will normally be detected by the EDR.
Also in case of the NTAPI-Loader, in context of the memory regions we could identify the same IOCs as with the Win32-API loader.The default meterpreter stage is about 4kb
and the stage loaded afterwards is about 200kb
. By analysing these in-memory regions, we will see that we could identify two clear IOCs that lead to two malicious in-memory behaviours.
- Unbacked memory regions
-
RWX
commited private memory in.text
section
In this step we want to analyse the call stack from the direct syscall loader and compare it. Remember that in the direct syscall loader the whole syscall stub from the used native function is directly implemented in to the loader itself and because of this the control flow is loader.exe
-> syscall
. Based on that what to expect or how the order of the stack frames should look like? Also in this case we want to analyse the main thread mainCRTStartup
. When analysing the direct syscall loader, the following results can be observed.
Results
Comparing the call stack from the direct syscall loader with the call stack from the Win32-API loader or the NTAPI-Loader, we could observe that the call stack from the direct syscall loader looks totally weird. The following clear IOCs can be observed.
-
The
return
from the native functionZwWaitForSingleObject
is not executed in the memory ofntdll.dll
, otherwise we would findntdll.dll
at the top of the stack, or more precisely we would find the stack framentdll.dll!ZwWaitForSingleObject
at the top of the call stack. Instead, thereturn
comes from a memory region in the assembly (.exe), which is a 100% IOC for illegitimate behaviour. -
Furthermore in context of
ZwWaitForSingleObject
we are not able to identify the usage from corresponding Win32 APIWaitForSingleObject
before the native function isZwWaitForSingleObject
is executed.
Based on these IOCs, and depending on the EDR you are facing, your payload will be detected in memory with a very high probability.
As we also use the same x64 staged meterpreter payload for the direct syscall loader, we have the same IOCs in the context of analysing the memory regions.
- Unbacked memory regions
-
RWX
commited private memory in.text
section
In this step we want to analyse and compare the call stack from the indirect syscall loader. Remember that in the indirect syscall loader only part of the syscall stub from a native function is implemented directly into the loader itself. The syscall
instruction is replaced by jmp qwrd ptr
, so we jump into memory from ntdll.dll
and execute the syscall and return instruction from that memory region. Based on this, what should we expect or how should the order of the stack frames look like? Again, we want to analyse the main thread mainCRTStartup
. When analysing the direct syscall loader, the following results can be observed.
Results
![](https://private-user-images.githubusercontent.com/50073731/246624271-9f2d912a-ddbe-4c8e-af00-f007fc1d202a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyMjQ0NjcsIm5iZiI6MTczOTIyNDE2NywicGF0aCI6Ii81MDA3MzczMS8yNDY2MjQyNzEtOWYyZDkxMmEtZGRiZS00YzhlLWFmMDAtZjAwN2ZjMWQyMDJhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEwVDIxNDkyN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTk4MmE3ZDgyMzFhZTQ2ZDk3OTRlNjJiZDY2MTExNmVkMzgzODQzZGZlNTlhMmNlZDBlNjJkYWVjNmUzNTk2MTImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.J-sELhW1ilglxt0V7NPN-3DhQFAupqegFphtrnyrSzY)
![](https://private-user-images.githubusercontent.com/50073731/246624286-53a8d581-4ae0-41df-ac78-7c88d09aa753.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyMjQ0NjcsIm5iZiI6MTczOTIyNDE2NywicGF0aCI6Ii81MDA3MzczMS8yNDY2MjQyODYtNTNhOGQ1ODEtNGFlMC00MWRmLWFjNzgtN2M4OGQwOWFhNzUzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEwVDIxNDkyN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWU5MzZiY2FmNzc1ZmFhYjQxMDA4MjBmNjA3MDYxYzY1N2MxMmRiMzVjNzkwMGUxMzQ4NTA3ZTk0NWUyODQ2YzkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.0Arm56_tYfRecd02k1GX_WTpB4fXMqF_4tp-LO3ATKA)
If we compare the call stack from the indirect syscall loader with the call stack from the direct syscall loader, we can see that the call stack looks completely different. Furthermore, if we compare the indirect syscall call stack with the legitimate stack from cmd.exe
, we can see that the stack from the indirect syscall loader has a good level of legitimacy. Compared to the direct syscall loader we got rid of the following IOCs.
- The
return
from the native functionZwWaitForSingleObject
is executed in the memory ofntdll.dll
, so this putsntdll.dll
at the top of the stack and leads to legitimate behaviour compared to the legitimate stack ofcmd.exe
.
This means that by replacing direct syscalls with indirect syscalls, we can successfully spoof the return
address of a native function that we use in our indirect syscall loader, and based on that, and depending on the EDR, we can bypass the return
address check of an EDR.
- The
syscall
instruction is executed in memory ofntdll.dll
, so if an EDR checks from which memory region thesyscall
instruction is executed, the EDR could be successfully bypassed.
Don`t forget in the context of the meterpreter payload used you will still be using a high proabality detected by the EDR, based on the memory region IOCs from before.
- Unbacked memory regions
-
RWX
commited private memory in.text
section
Based on the results of our analysis, we could say that indirect syscalls are a good improvement over direct syscalls. However, even indirect syscalls are not a silver bullet for EDR evasion and have their limitations.
-
The first limitation is that we are able to spoof the
return
address of a native function, but despite this if an EDR uses full stack analysis the EDR would probably be able to identify malicious behaviour. -
Furthermore, by looking at the stack frame order, we should be able to see that the native function
ZwWaitForSingleObject
was executed directly without using the corresponding Win32 APIWaitForSingleObject
. Depending on the API this may not be an IOC, but in the context of e.g.ZwWaitForSingleObject
it is.
So indirect syscalls might help to make the call stack a bit more logical or legitimate, but regardless we still have the problem of unbacked memory regions and RWX
committing memory pages. The latter is more a problem of the meterpreter payload itself, in this case it might help to switch the memory pages from RWX
to RW
or RX
using the VirtualProtectAPI
. The unbacked memory problem is a bit more complicated and cannot be solved directly by using indirect syscalls. To get rid of these unbacked regions you need to use a technique like module stomping, thanks to @NinjaParanoid and @KlezVirus and @ShitSecure for the great discussion about this and teaching me about this.