SIMPLE IS BETTER: Thoughts on Hardware Virtualization Exception (#VE)

Recently in the latest Intel SDM, you can find that there is a new exception type, Virtualization Exception, in Chapter 6 (Interrupt and Exception Handling) of VOL2.

What is it? How to use it? ...
x86 Hardware Virtualization technology introduces a stage 2 virtual memory translation mechanism which translates guest physical address to (host or machine) physical address in real RAM space. It is called as Extended Page Table (Intel VT-x).

According to the specification, when EPT is in use, certain addresses that would normally be treated as physical addresses (and used to access memory) are instead treated as guest-physical addresses. Guest-physical addresses are translated by traversing a set of EPT paging structures to produce physical addresses that are used to access memory.

In addition to translating a guest-physical address to a physical address, EPT specifies the privileges that software is allowed when accessing the address. Attempts at disallowed accesses (e.g. write to an read-only guest physical address, or access to non-present address) in vmx non-root mode are called EPT violations and cause VM exits into vmx root mode (host) in previous VT-x architecture w/o #VE implementation. However, when #VE is available and enabled, such an virtualization exception will induce an guest exception in VMX non-root mode instead that will be handled with a ISR in vector 20 of GUEST IDT (interrupt descriptor table).

Some key information on understanding #VE, Virtualization Exception:

This exception (#VE) is reserved on a processor that has no VT-x support, or VMX is off, or #VE is disabled or not available.
Normally, it is guest's responsibility to configure and setup #VE ISR (Interrupt Service Routine).
Even though it is enabled, not all EPT violations cause virtualization exceptions. See Intel SDM for more details.
Like other exceptions, the processor also provides the corresponding exception information in Virtualization-Exception Information Area used by ISR, e.g. the violation permissions, guest linear and physical address. This area is populated by processor when such an exception happens. However, VMM or #VE ISR is responsible for pre-allocating physical memory space for that area (4KB size currently) before activating #VE.
Even though EPT violation can cause #VE exception, it doesn't mean only EPT violation can cause such an exception. It seems that Intel wants it also to be used for other virtualization exception types (e.g. CPU events) in future. But now, only EPT violation exception.
Unlike EPT vmexit, it will induce processor VMX mode switch overhead (VMEXIT and VMENTRY or VMRESUME). #VE doesn't introduce VMX mode switch. Therefore, it achieves better performance when handling a virtualization exception.

So, how to use it probably?

Whatever you are using EPT to do, you can consider using #VE to catch EPT exceptions and handle them in guest OS directly without introducing extra EPT violation VMEXIT/VMRESUME overhead. In this manner, I don't think there is any other difference but performance improvement.

However, if you look at the Intel SDM for #VE exception description, you probably can see this

"After the virtualization exception handler has corrected the violation (for example, by executing the EPTP-switching VM function), execution of the program or task can be resumed."

... here the manual says "EPTP-switching VM function".

Actually, it is a new VMX instruction, VMFUNC, which can only be executed in guest OS (VMX non-root mode). According to the Intel manual, this instruction allows software in VMX non-root operation (guest) to invoke a VM function, which is processor functionality enabled and configured by software in VMX root operation (host). However, It seems that Intel currently defines only one VM function, EPTP-switching.

This EPTP-switching VM function allows software in VMX non-root operation to load a new value for the EPT pointer (EPTP), thereby establishing a different EPT paging-structure hierarchy. However, software is limited to selecting from a list of potential EPTP values configured in advance by software in VMX root operation. Imagine that, in a traditional system configuration , CR3 (page directory base register) is pointing to a virtual address translation structure hierarchy, switching CR3 means switching different virtual address spaces. EPT pointer is just like CR3 pointer, but the difference is that EPT pointer establishes the translation structure from guest physical address to host physical address, CR3 pointer establishes the translation structure from guest virtual address to guest physical address.

So, in a typical system, different VMs can be configured to use different EPT paging-structures pointed by different EPT pointers, when guest VM switches, the host VMM software can change to different EPTP pointers accordingly, hence the different VM can has different memory map view (memory isolation). However, even one single VM software (guest OS) can also be configured to use different EPT paging-structures, for example host VMM software can configure two different EPT paging-structure mappings pointed by two different EPTP pointer, one is a "privileged", the other is "unprivileged". When code running with "unprivileged" mapping attempts to access the guest physical memory referenced in "privileged" mapping, an EPT violation vmexit might happen, and then VMM can switch EPTP pointers to let access success if such an access is legitimate.

EPTP-switching VM function is introduced to do such a switch without transferring control to VMX root mode (HOST), because it allows guest software do it directly in VMX non-root mode, hence it can reduce performance overhead as well.

Note that in guest OS software, we cannot directly see the real value of each particular EPTP pointer, instead, we can only see the the EPTP-index value (0~511) that is corresponded to each one of available EPTP pointer value. So it is VMM software's responsibility to maintain/update the mapping between EPTP-index and real EPTP pointer value with a EPTP-list structure (4KB page, so total 512 entries at most currently).

So now when you combine these two new features: VMFUNC (EPTP-switching) and #VE, you probably could understand

"After the virtualization exception handler has corrected the violation (for example, by executing the EPTP-switching VM function), execution of the program or task can be resumed."

said in Intel manual. Eventually, it has two purposes:

handle EPT violation directly in guest OS context without EPT violation VMEXIT;
switch EPTP pointer (hence EPT paging-structure hierarchy) directly in guest OS context without transferring control or exiting to VMM.

Performance matters!!!

but wait... questions: VMWare, XEN or any other public well-known hypervisor use it? and for what? Anyone knows?

SIMPLE IS BETTER

Sunday, May 18, 2014

Thoughts on Hardware Virtualization Exception (#VE)

2 comments: