When x86 processor CR4.SMEP bit is set, the system software executing in kernel mode (CPL<3) cannot fetch instructions from any linear address with a translation for which the U/S flag is 1 (User) in every paging-structure entries controlling the translation. In other words, If SMEP is enabled, software operating in supervisor mode cannot fetch instructions from linear addresses that are accessible in user mode. When such an instruction fetch occurs, a #PF exception will be generated by SMEP-capable processor.
So, how to implement a software-based SMEP feature?
This paper (SecVisor: A Tiny Hypervisor to Provide Lifetime Kernel Code Integrity for Commodity OSes from CyLab/CMU) presents an great idea: Create two separate EPT protection memory views for guest kernel (ring 0) and user (ring 3) mode respectively, with different EPT permissions for corresponding GPA->HPA translations, and then switch these two EPT page table views by intercepting kernel<->kernel mode switches. In x86/Intel processor, hypervisor can configure different VMCS EPTP pointers (which points to different Extended Page Tables) and switch among them at appropriate time.
To make the discussion easier, we can call these two guest memory translation tables (pointed by two different EPTP pointers) as protected memory views: one is used in guest Kernel mode, named as Kernel View; the other is for User mode, named as User View.
Besides, as that paper indicates, for both views, the identity map (GPA=HPA) is created in both EPT page tables by default. But EPT page table entry permissions may be different for the same GPA addresses. The latter is the key part for emulating SMEP behaviors, I will talk about it later.
By intercepting guest kernel/user mode switches, we can do this below in hypervisor:
- Switch to use Kernel View when guest logical processor entering Kernel mode;
As we know that, in x86 processor, there are several ways to cause logical processor enter Kernel mode, for example in Windows OS, interrupt/fault/trap (through IDT table), syscall instructions. Based upon my previous project experience, some others like task gate (only NMI on 32bit OS), call gate, are not never used in Windows OS.
- Switch to use User View when guest logical processor leaving kernel (or entering User mode);
The SecVisor does it like pictures below (snapshots from this link): In User View, the Execution permission is removed in the EPT page tables for Kernel Code pages , whenever entering kernel mode to fetch the entry point instruction from Kernel code page, an EPT violation vmexit occurs, then the control is transferred to hypervisor (SecVisor), so SecVisor can switch to Kernel View by updating the corresponding EPTP pointer in VMCS. Similarly, we can switch to User View whenever leaving kernel mode.
Now, obviously we can get to know how to emulate SMEP behaviors.
Assumed that the guest logical processor is running Kernel mode, and EPTP hence points to the mapping tables in Kernel View and also assumed that only approved code (e.g. Kernel and trusted LKM modules) has EPT execution permission in Kernel View, see picture below in the meanwhile, provided that there is a kernel vulnerability that can be exploited by malware to execute arbitrary user mode code. When the logical processor starts to execution user accessible code in kernel mode, an EPT violation will be generated because that user mode code cannot be executable in EPT Kernel View.
When hypervisor gets the control, the following policy could be applied to check the execution (instruction fetch) violate SMEP functionality:
- Read the current CPL value from corresponding guest VMCS area to see if it is ZERO (kernel mode);
- Get the current guest CR3 value (also from VMCS) and guest violation linear address (actually for EPT violation due to execution fault, that address is guest RIP) from corresponding VMCS area, then traverse the guest page table to see if U/S bit (accessible in user mode) flags in every page structures are ONE.
If both conditions above are true, then we catch a SMEP-like violation in guest kernel mode.
However, there are many challenges to implement this software-based SMEP feature with virtualization technology.
- Performance impacts.
Because in that paper, we create two EPT memory protection views (Kernel View and User View), in order to switch back and forth at run time, the hypervisor must have to trap every event of entering and leaving kernel. This introduces significant performance cost because kernel-user mode switches are normally very frequent.
I think one of solutions of switching EPTP pointers (Views) without VMExit is to leverage the latest Virtualization features, like Virtualization Exception (#VE) and EPTP switching function (VMFUNC) in my previous post, and also use IDT Shadow/Virtualization technique in my another post to trap every kernel/user mode switches due to interrupt/trap/fault events. However, on those #VE/VMFUNC-capable machines, SMEP is also available:-)
For the mode switches due to syscall/sysret, you can brainstorm how to handle it without vmexit!
- In Kernel View, we configure all the kernel code executable in EPT tables. When there is an LKM module loaded or unloaded, we must update the module memory to be executable in Kernel View and to be non-executable in User View immediately.
The author in the paper has a solution to solve it by adding code in load_module() and the free_module() function.
However, without guest kernel code changes, for module loading, I think we can use a lazy solution to solve it, for example, when a new loaded LKM module starts to run at the first time in Kernel mode, a EPT violation occurs, then in hypervisor we can check if it is a trusted LKM module, if yes, then we just allow that LKM code page executable in Kernel View, and remove the execution permissions in User View. But how to update the LKM code page EPT permissions in Kernel View when such a LKM module gets unloaded from kernel?
- In the case of low memory pressure, will Linux OS page out or swap out LKM code pages to the disk storage?
I know this is true on Windows OS system, but I have no idea if Linux will do the same thing. (Anybody can tell me?)
If it is the case on Linux system, then without guest kernel hooks, it is also a challenge to update LKM code page permissions in EPT Kernel View and User View.
Note that what I'm talking about in this post is for fun. I don't think it is worth doing all those things just only for emulating SMEP-like feature with virtualization technology:(. As a matter of fact, I have yet another solution to implement a software-based SMEP feature without Virtualization/Hypervisor. Please stay tuned...in my next post.
Question: thinking of how to implement a software SMAP (Supervisor Mode Access Protection) with virtualization technology......
SecVisor: A Tiny Hypervisor to Provide Lifetime Kernel Code Integrity for Commodity OSes, and its presentation link．