SIMPLE IS BETTER: Monitor/Trap Software Interrupt INT 80h (System Call) with x86/Intel Virtualization Technology

In an unix-like system, before syscall/sysret and/or sysenter/sysexit instructions were introduced by x86/Intel processors, software interrupt "INT 80h" was used as system call interface. Unlike my previous post, this one is going to talk about how to monitor this old type system call.

The INT n instruction is the general mnemonic for executing a software-generated call to an interrupt handler. So when INT 80h is executed in an user application, the processor will immediately switch to kernel mode (normally ring 0, depends on the 0x80 interrupt descriptor settings), and jump to the corresponding interrupt handler. After the request is served by the handler in kernel mode, IRET (interrupt return) is executed, and processor then will switch back to user mode and continue to execute user mode application.

In a x86 hardware virtualization environment, Intel VT-x technology unfortunately doesn't provide a means to make software interrupt through INTn instruction generate a VMexit, which means that the system call INT 80h cannot be monitored directly by VMM software.

However, we can still have a small trick so that VMM can monitor and trap any INT 80h system call. The idea is basically the same as the one that is summarized as a generic solution in my previous post.

If you take a look at the Intel Architecture Instruction Set Reference Manual, you may see there is a special exception type for INT instruction execution in protected mode, as below.

#NP(error_code)

If code segment, interrupt-, trap-, or task gate, or TSS is not present.

Here, the #NP is exception Segment Not Present, one of Intel reserved exception type whose vector is 11. According to Intel Software Development Manual,

"A not-present indication in a gate descriptor, however, does not indicate that a segment is not present (because gates do not correspond to segments). The operating system may use the present flag for gate descriptors to trigger exceptions of special significance to the operating system"

In other words, if we intentionally clear the "present" bit in the interrupt or trap gate descriptor of vector 80h in IDT table, all the software interrupts through INT 80h system call, will trigger a #NP exception, hence the CPU control will instead be transferred to the fault handler in vector 11 (#NP) of IDT.

Meanwhile, by checking the corresponding "error_code" for this exception in the exception stack, and decoding the instruction address at which the #NP exception was triggered, we can easily know that this #NP fault is generated intentionally by executing "INT 80h" system call.

In a virtualization environment, similarly VMM software can configure Exception-Bitmap VMCS structure to make such an #NP fault generate a VMexit.

So, finally here is the idea to monitor and trap the Linux/Unix system call through software interrupt INT 0x80:

VMM software locates the interrupt descriptor of vector 0x80 in guest IDT table, and clears the "present" bit;
VMM software also configures Exception-Bitmap VMCS structure to trap guest #NP fault exception, which means that all the guest #NP exceptions will incur a VMexit, instead of being handled directly by guest #NP handler in guest IDT;
At runtime, all the system call "INT 0x80" from user mode application will generate a CPU #NP fault because the corresponding 0x80 descriptor "present" bit is clear in step 1 above;
Such a #NP fault will then cause a VMexit because of step 2 settings above;
Then, VMM software can check various VMexit VMCS information structures (e.g. VM-Exit Interruption-Information Field, IDT-vectoring information Field, and IDT-vectoring Error Code Field), and decode the fault instruction to determine whether or not this #NP fault VMexit is an intended exception by executing system call "INT 0x80":

If yes, VMM software emulates the "INT 0x80" behavior, discards this #NP exception, and resumes back to guest with RIP pointing to the system call handler that is defined in vector 0x80 interrupt descriptor of guest IDT table.
Otherwise, inject this exception back to guest OS, and let guest OS handle the #NP fault as normally.

With this idea, whenever a system call through INT 0x80 is made by user application software, VMM software can get notified immediately.

Update:
The same idea can apply to any external interrupt (vector 32~255) traps, e.g. ATA device, Audio interrupts.

SIMPLE IS BETTER

Friday, August 15, 2014

Monitor/Trap Software Interrupt INT 80h (System Call) with x86/Intel Virtualization Technology

No comments:

Post a Comment