On the most recent phishing attacks, PowerShell is usually employed to load and execute position-independant shellcode via a macro-enabled Office document.

Infection process

So, in order to know what actions are being carried away the truly interesting part here is the shellcode being executed. However, to slow down analysis or lower detection, shellcode is usually encoded, being shikata ga nai the most used encoder (for the samples I have observed at least).

Shikata ga nai

Shikata ga nai is a polymorphic encoder based on a decoder stub. The decoder stub XORs the encoded bytes with an incremental key. Having an incremental key means that for each decrypted byte the XOR key is modified (with an add instruction). For more information about shikata ga nai you may read this blogpost about the matter.

Generic decoding

A (shikata) encoded shellcode will have the following structure

Where the stub will modify the encoded bytes at runtime. So the main idea is to detect modification on the encoded bytes block at runtime (via emulation or sandboxing) and dump those modified bytes.

Using unicorn emulator for decoding

To do this we can hook memory write in Unicorn to detect self-modifying code.

Note: Code is partly based on a public decoder on GitHub which I could no longer find to give proper credit

import sys
import logging

from unicorn import *
from unicorn.x86_const import *
from unicorn.arm_const import *
from unicorn.arm64_const import *
from unicorn.mips_const import *

log = logging.getLogger()

class DecodeEngine:
    def __init__(self, opts={}):
        Initializes the engine. If no options dict is supplied
        engine will be initialized for x86 code emulation.
        Available options:
        arch: Unicorn architecture (default UC_ARCH_X86)
        mode: Unicorn mode (default UC_MODE_32)
        instr: Unicorn registry used as instruction pointer (default UC_X86_REG_EIP)
        stack: Unicorn registry used as stack pointer (default UC_X86_REG_ESP)
        debug: Enable debugging (default False)
        Keyword arguments:
        opts -- A dict with options for the engine (optional)
        self.opts = {
            "arch": opts.get("arch", UC_ARCH_X86),
            "mode": opts.get("mode", UC_MODE_32),
            "instr": opts.get("instr", UC_X86_REG_EIP),
            "stack": opts.get("stack", UC_X86_REG_ESP),

        self.write_bounds = [None, None]

    def decode(self, bin_code):
        Decode a encoded shellcode by using Unicorn engine for emulation.
        bin_code: (string) The raw shellcode to be decoded
        # Emulation memory
        MEM_SIZE = 2 * 1024 * 1024  # 2MB
        # Start base addres
        ADDRESS = 0x1000

        emu = Uc(self.opts["arch"], self.opts["mode"])
        emu.opts = self.opts
        emu.mem_map(ADDRESS, MEM_SIZE)
        emu.mem_write(ADDRESS, bin_code)
        # Write a INT 0x3 near the end of the code to force stop
        emu.mem_write(ADDRESS + len(bin_code) + 0xff, b"\xcc\xcc\xcc\xcc")

        emu.hook_add(UC_HOOK_MEM_INVALID, self.hook_mem_invalid)
        emu.hook_add(UC_HOOK_MEM_WRITE, self.hook_mem_write)
        emu.hook_add(UC_HOOK_INTR, self.hook_intr)

        # Init stack to half-way the mapped mem
        emu.reg_write(self.opts["stack"], ADDRESS + MEM_SIZE / 2)

            emu.emu_start(ADDRESS, len(bin_code))
        except UcError as e:
            log.error("ERROR: %s" % e)

        if self.write_bounds[0] != None:
            # Read & return the modified code
            return emu.mem_read(self.write_bounds[0],
                            (self.write_bounds[1] - self.write_bounds[0]))
            log.warning("No self-modifying code detected, could not do anything")
            return None

    ########### HOOKS ############

    def hook_intr(self, uc, intno, user_data):
        """Stop on INT 3 instruction"""
        if intno == 0x3:
            return False
            return True

    def hook_mem_invalid(self, uc, access, address, size, value, user_data):
        """Print errors for illegal instructions"""
        eip = uc.reg_read(uc.opts["instr"])

        if access == UC_MEM_WRITE:
            print("invalid WRITE of 0x%x at 0x%X, data size = %u, data value = 0x%x" % (address, eip, size, value))
        if access == UC_MEM_READ:
            print("invalid READ of 0x%x at 0x%X, data size = %u" % (address, eip, size))

        return False

    def hook_mem_write(self, uc, access, address, size, value, user_data):
        """Hook memory write instructions to detect self modifying code"""
        # Maximum write RVA
        MAX_LEN = 0x200  # 512B
        instr_ptr = uc.reg_read(uc.opts["instr"])

        if abs(instr_ptr - address) < MAX_LEN:
            if self.write_bounds[0] == None:
                # Initialize bounds to written addr
                self.write_bounds[0] = address
                self.write_bounds[1] = address
            elif address < self.write_bounds[0]:
                # Expand lower bound
                self.write_bounds[0] = address
            elif address > self.write_bounds[1]:
                # Expand higher bound
                self.write_bounds[1] = address

if __name__ == '__main__':
    if not sys.argv[1]:
        print "You need to specify a file as first argument"

    for filename in sys.argv[1:]:
        bin_code = open(filename, "rb").read()
        decoded = DecodeEngine().decode(bin_code)
        if decoded:
            print "Shellcode decoded!"
            outf = filename + ".dcded"
            open(outf, "wb").write(decoded)
            print "Decoded shellcode has been written to %s" % outf
            print "Could not decode the file"


We can test it by generating two metasploit shellcodes, one encoded and another one without any encoding.

❯ msfvenom -a x86 --platform Windows -p windows/shell/reverse_tcp -f raw -o reverse.raw 
No encoder or badchars specified, outputting raw payload
Payload size: 333 bytes
Saved as: reverse.raw

❯ yara ~/malware_analysis/yara/metasploit.yar reverse.raw 
meterpreter_reverse_tcp_shellcode reverse.raw
meterpreter_reverse_tcp_shellcode_rev1 reverse.raw

❯ msfvenom -a x86 --platform Windows -p windows/shell/reverse_tcp -e x86/shikata_ga_nai -b '\x00' -i 1 -f raw -o reverse_enc.raw
Found 1 compatible encoders
Attempting to encode payload with 1 iterations of x86/shikata_ga_nai
x86/shikata_ga_nai succeeded with size 360 (iteration=0)
x86/shikata_ga_nai chosen with final size 360
Payload size: 360 bytes
Saved as: reverse_enc.raw

❯ yara ~/malware_analysis/yara/metasploit.yar reverse_enc.raw

We can see how the unencoded shellcode is detected as meterpreter_reverse_tcp_shellcode by Yara while the encoded one is not. So, looks like a good target to test the decoder on.

scripts❯ python decoder.py ~/reverse_enc.raw   
Shellcode decoded!
Decoded shellcode has been written to /home/fernando/reverse_enc.raw.dcded

scripts❯ yara ~/malware_analysis/yara/metasploit.yar ~/reverse_enc.raw.dcded
meterpreter_reverse_tcp_shellcode /home/fernando/reverse_enc.raw.dcded

Back to being detected!

We can even have a look at what the decoded contents are:

scripts❯ ndisasm ~/reverse.raw | more                                                                                                                malware_analysis/git/master 
00000000  FC                cld
00000001  E88200            call word 0x86
00000004  0000              add [bx+si],al
00000006  60                pushaw
00000007  89E5              mov bp,sp
00000009  31C0              xor ax,ax
0000000B  648B5030          mov dx,[fs:bx+si+0x30]
0000000F  8B520C            mov dx,[bp+si+0xc]
00000012  8B5214            mov dx,[bp+si+0x14]
00000015  8B7228            mov si,[bp+si+0x28]
00000018  0F                db 0x0f
00000019  B74A              mov bh,0x4a
0000001B  2631FF            es xor di,di
0000001E  AC                lodsb
0000001F  3C61              cmp al,0x61

scripts❯ ndisasm ~/reverse_enc.raw.dcded | more                                                                                                      malware_analysis/git/master 
00000000  0FE2F5            psrad mm6,mm5
00000003  FC                cld
00000004  E88200            call word 0x89
00000007  0000              add [bx+si],al
00000009  60                pushaw
0000000A  89E5              mov bp,sp
0000000C  31C0              xor ax,ax
0000000E  648B5030          mov dx,[fs:bx+si+0x30]
00000012  8B520C            mov dx,[bp+si+0xc]
00000015  8B5214            mov dx,[bp+si+0x14]
00000018  8B7228            mov si,[bp+si+0x28]
0000001B  0F                db 0x0f
0000001C  B74A              mov bh,0x4a
0000001E  2631FF            es xor di,di
00000021  AC                lodsb
00000022  3C61              cmp al,0x61
00000024  7C02              jl 0x28
00000026  2C20              sub al,0x20
00000028  C1CF0D            ror di,byte 0xd
0000002B  01C7              add di,ax
0000002D  E2F2              loop 0x21
0000002F  52                push dx
00000030  57                push di

They are the same with the exception of the first instruction. This is due to shikata ga nai also encoding the last (or last bytes) stub instruction (usually the loop instruction). This intruction needs to be the first one to be decoded for the stub to work properly, so it will make it into the dump as well.