Hello community,
here is the log from the commit of package xen for openSUSE:Factory checked in at 2018-03-30 12:00:34
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/xen (Old)
and /work/SRC/openSUSE:Factory/.xen.new (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "xen"
Fri Mar 30 12:00:34 2018 rev:245 rq:591751 version:4.10.0_16
Changes:
--------
--- /work/SRC/openSUSE:Factory/xen/xen.changes 2018-03-20 21:50:48.542316318 +0100
+++ /work/SRC/openSUSE:Factory/.xen.new/xen.changes 2018-03-30 12:00:43.480265750 +0200
@@ -1,0 +2,12 @@
+Mon Mar 26 08:20:45 MDT 2018 - carnold@suse.com
+
+- Upstream patches from Jan (bsc#1027519) and fixes related to
+ Page Table Isolation (XPTI). See also bsc#1074562 XSA-254
+ 5a856a2b-x86-xpti-hide-almost-all-of-Xen-image-mappings.patch
+ 5a9eb7f1-x86-xpti-dont-map-stack-guard-pages.patch
+ 5a9eb85c-x86-slightly-reduce-XPTI-overhead.patch
+ 5a9eb890-x86-remove-CR-reads-from-exit-to-guest-path.patch
+ 5aa2b6b9-cpufreq-ondemand-CPU-offlining-race.patch
+ 5aaa9878-x86-vlapic-clear-TMR-bit-for-edge-triggered-intr.patch
+
+-------------------------------------------------------------------
New:
----
5a856a2b-x86-xpti-hide-almost-all-of-Xen-image-mappings.patch
5a9eb7f1-x86-xpti-dont-map-stack-guard-pages.patch
5a9eb85c-x86-slightly-reduce-XPTI-overhead.patch
5a9eb890-x86-remove-CR-reads-from-exit-to-guest-path.patch
5aa2b6b9-cpufreq-ondemand-CPU-offlining-race.patch
5aaa9878-x86-vlapic-clear-TMR-bit-for-edge-triggered-intr.patch
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Other differences:
------------------
++++++ xen.spec ++++++
--- /var/tmp/diff_new_pack.XjVnAv/_old 2018-03-30 12:00:46.148169274 +0200
+++ /var/tmp/diff_new_pack.XjVnAv/_new 2018-03-30 12:00:46.152169129 +0200
@@ -126,7 +126,7 @@
BuildRequires: pesign-obs-integration
%endif
-Version: 4.10.0_14
+Version: 4.10.0_16
Release: 0
Summary: Xen Virtualization: Hypervisor (aka VMM aka Microkernel)
License: GPL-2.0
@@ -211,13 +211,19 @@
Patch48: 5a843807-x86-spec_ctrl-fix-bugs-in-SPEC_CTRL_ENTRY_FROM_INTR_IST.patch
Patch49: 5a856a2b-x86-emul-fix-64bit-decoding-of-segment-overrides.patch
Patch50: 5a856a2b-x86-use-32bit-xors-for-clearing-GPRs.patch
-Patch51: 5a8be788-x86-nmi-start-NMI-watchdog-on-CPU0-after-SMP.patch
-Patch52: 5a95373b-x86-PV-avoid-leaking-other-guests-MSR_TSC_AUX.patch
-Patch53: 5a95571f-memory-dont-implicitly-unpin-in-decrease-res.patch
-Patch54: 5a95576c-gnttab-ARM-dont-corrupt-shared-GFN-array.patch
-Patch55: 5a955800-gnttab-dont-free-status-pages-on-ver-change.patch
-Patch56: 5a955854-x86-disallow-HVM-creation-without-LAPIC-emul.patch
-Patch57: 5a956747-x86-HVM-dont-give-wrong-impression-of-WRMSR-success.patch
+Patch51: 5a856a2b-x86-xpti-hide-almost-all-of-Xen-image-mappings.patch
+Patch52: 5a8be788-x86-nmi-start-NMI-watchdog-on-CPU0-after-SMP.patch
+Patch53: 5a95373b-x86-PV-avoid-leaking-other-guests-MSR_TSC_AUX.patch
+Patch54: 5a95571f-memory-dont-implicitly-unpin-in-decrease-res.patch
+Patch55: 5a95576c-gnttab-ARM-dont-corrupt-shared-GFN-array.patch
+Patch56: 5a955800-gnttab-dont-free-status-pages-on-ver-change.patch
+Patch57: 5a955854-x86-disallow-HVM-creation-without-LAPIC-emul.patch
+Patch58: 5a956747-x86-HVM-dont-give-wrong-impression-of-WRMSR-success.patch
+Patch59: 5a9eb7f1-x86-xpti-dont-map-stack-guard-pages.patch
+Patch60: 5a9eb85c-x86-slightly-reduce-XPTI-overhead.patch
+Patch61: 5a9eb890-x86-remove-CR-reads-from-exit-to-guest-path.patch
+Patch62: 5aa2b6b9-cpufreq-ondemand-CPU-offlining-race.patch
+Patch63: 5aaa9878-x86-vlapic-clear-TMR-bit-for-edge-triggered-intr.patch
# Our platform specific patches
Patch400: xen-destdir.patch
Patch401: vif-bridge-no-iptables.patch
@@ -465,6 +471,12 @@
%patch55 -p1
%patch56 -p1
%patch57 -p1
+%patch58 -p1
+%patch59 -p1
+%patch60 -p1
+%patch61 -p1
+%patch62 -p1
+%patch63 -p1
# Our platform specific patches
%patch400 -p1
%patch401 -p1
++++++ 5a856a2b-x86-xpti-hide-almost-all-of-Xen-image-mappings.patch ++++++
# Commit 422588e88511d17984544c0f017a927de3315290
# Date 2018-02-15 11:08:27 +0000
# Author Andrew Cooper
# Committer Andrew Cooper
x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
The current XPTI implementation isolates the directmap (and therefore a lot of
guest data), but a large quantity of CPU0's state (including its stack)
remains visible.
Furthermore, an attacker able to read .text is in a vastly superior position
to normal when it comes to fingerprinting Xen for known vulnerabilities, or
scanning for ROP/Spectre gadgets.
Collect together the entrypoints in .text.entry (currently 3x4k frames, but
can almost certainly be slimmed down), and create a common mapping which is
inserted into each per-cpu shadow. The stubs are also inserted into this
mapping by pointing at the in-use L2. This allows stubs allocated later (SMP
boot, or CPU hotplug) to work without further changes to the common mappings.
Signed-off-by: Andrew Cooper
Reviewed-by: Jan Beulich
# Commit d1d6fc97d66cf56847fc0bcc2ddc370707c22378
# Date 2018-03-06 16:46:27 +0100
# Author Jan Beulich
# Committer Jan Beulich
x86/xpti: really hide almost all of Xen image
Commit 422588e885 ("x86/xpti: Hide almost all of .text and all
.data/.rodata/.bss mappings") carefully limited the Xen image cloning to
just entry code, but then overwrote the just allocated and populated L3
entry with the normal one again covering both Xen image and stubs.
Drop the respective code in favor of an explicit clone_mapping()
invocation. This in turn now requires setup_cpu_root_pgt() to run after
stub setup in all cases. Additionally, with (almost) no unintended
mappings left, the BSP's IDT now also needs to be page aligned.
The moving ahead of cleanup_cpu_root_pgt() is not strictly necessary
for functionality, but things are more logical this way, and we retain
cleanup being done in the inverse order of setup.
Signed-off-by: Jan Beulich
Acked-by: Andrew Cooper
# Commit 044fedfaa29b5d5774196e3fc7d955a48bfceac4
# Date 2018-03-09 15:42:24 +0000
# Author Andrew Cooper
# Committer Andrew Cooper
x86/traps: Put idt_table[] back into .bss
c/s d1d6fc97d "x86/xpti: really hide almost all of Xen image" accidentially
moved idt_table[] from .bss to .data by virtue of using the page_aligned
section. We also have .bss.page_aligned, so use that.
Signed-off-by: Andrew Cooper
Reviewed-by: Jan Beulich
Reviewed-by: Wei Liu
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1897,9 +1897,6 @@ mode.
Override default selection of whether to isolate 64-bit PV guest page
tables.
-** WARNING: Not yet a complete isolation implementation, but better than
-nothing. **
-
### xsave
`= <boolean>`
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -644,13 +644,24 @@ static int clone_mapping(const void *ptr
{
unsigned long linear = (unsigned long)ptr, pfn;
unsigned int flags;
- l3_pgentry_t *pl3e = l4e_to_l3e(idle_pg_table[root_table_offset(linear)]) +
- l3_table_offset(linear);
+ l3_pgentry_t *pl3e;
l2_pgentry_t *pl2e;
l1_pgentry_t *pl1e;
- if ( linear < DIRECTMAP_VIRT_START )
- return 0;
+ /*
+ * Sanity check 'linear'. We only allow cloning from the Xen virtual
+ * range, and in particular, only from the directmap and .text ranges.
+ */
+ if ( root_table_offset(linear) > ROOT_PAGETABLE_LAST_XEN_SLOT ||
+ root_table_offset(linear) < ROOT_PAGETABLE_FIRST_XEN_SLOT )
+ return -EINVAL;
+
+ if ( linear < XEN_VIRT_START ||
+ (linear >= XEN_VIRT_END && linear < DIRECTMAP_VIRT_START) )
+ return -EINVAL;
+
+ pl3e = l4e_to_l3e(idle_pg_table[root_table_offset(linear)]) +
+ l3_table_offset(linear);
flags = l3e_get_flags(*pl3e);
ASSERT(flags & _PAGE_PRESENT);
@@ -742,6 +753,10 @@ static __read_mostly int8_t opt_xpti = -
boolean_param("xpti", opt_xpti);
DEFINE_PER_CPU(root_pgentry_t *, root_pgt);
+static root_pgentry_t common_pgt;
+
+extern const char _stextentry[], _etextentry[];
+
static int setup_cpu_root_pgt(unsigned int cpu)
{
root_pgentry_t *rpt;
@@ -762,8 +777,23 @@ static int setup_cpu_root_pgt(unsigned i
idle_pg_table[root_table_offset(RO_MPT_VIRT_START)];
/* SH_LINEAR_PT inserted together with guest mappings. */
/* PERDOMAIN inserted during context switch. */
- rpt[root_table_offset(XEN_VIRT_START)] =
- idle_pg_table[root_table_offset(XEN_VIRT_START)];
+
+ /* One-time setup of common_pgt, which maps .text.entry and the stubs. */
+ if ( unlikely(!root_get_intpte(common_pgt)) )
+ {
+ const char *ptr;
+
+ for ( rc = 0, ptr = _stextentry;
+ !rc && ptr < _etextentry; ptr += PAGE_SIZE )
+ rc = clone_mapping(ptr, rpt);
+
+ if ( rc )
+ return rc;
+
+ common_pgt = rpt[root_table_offset(XEN_VIRT_START)];
+ }
+
+ rpt[root_table_offset(XEN_VIRT_START)] = common_pgt;
/* Install direct map page table entries for stack, IDT, and TSS. */
for ( off = rc = 0; !rc && off < STACK_SIZE; off += PAGE_SIZE )
@@ -773,6 +803,8 @@ static int setup_cpu_root_pgt(unsigned i
rc = clone_mapping(idt_tables[cpu], rpt);
if ( !rc )
rc = clone_mapping(&per_cpu(init_tss, cpu), rpt);
+ if ( !rc )
+ rc = clone_mapping((void *)per_cpu(stubs.addr, cpu), rpt);
return rc;
}
@@ -781,6 +813,7 @@ static void cleanup_cpu_root_pgt(unsigne
{
root_pgentry_t *rpt = per_cpu(root_pgt, cpu);
unsigned int r;
+ unsigned long stub_linear = per_cpu(stubs.addr, cpu);
if ( !rpt )
return;
@@ -825,6 +858,16 @@ static void cleanup_cpu_root_pgt(unsigne
}
free_xen_pagetable(rpt);
+
+ /* Also zap the stub mapping for this CPU. */
+ if ( stub_linear )
+ {
+ l3_pgentry_t *l3t = l4e_to_l3e(common_pgt);
+ l2_pgentry_t *l2t = l3e_to_l2e(l3t[l3_table_offset(stub_linear)]);
+ l1_pgentry_t *l1t = l2e_to_l1e(l2t[l2_table_offset(stub_linear)]);
+
+ l1t[l2_table_offset(stub_linear)] = l1e_empty();
+ }
}
static void cpu_smpboot_free(unsigned int cpu)
@@ -848,6 +891,8 @@ static void cpu_smpboot_free(unsigned in
if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
free_cpumask_var(per_cpu(scratch_cpumask, cpu));
+ cleanup_cpu_root_pgt(cpu);
+
if ( per_cpu(stubs.addr, cpu) )
{
mfn_t mfn = _mfn(per_cpu(stubs.mfn, cpu));
@@ -865,8 +910,6 @@ static void cpu_smpboot_free(unsigned in
free_domheap_page(mfn_to_page(mfn));
}
- cleanup_cpu_root_pgt(cpu);
-
order = get_order_from_pages(NR_RESERVED_GDT_PAGES);
free_xenheap_pages(per_cpu(gdt_table, cpu), order);
@@ -922,9 +965,6 @@ static int cpu_smpboot_alloc(unsigned in
set_ist(&idt_tables[cpu][TRAP_nmi], IST_NONE);
set_ist(&idt_tables[cpu][TRAP_machine_check], IST_NONE);
- if ( setup_cpu_root_pgt(cpu) )
- goto oom;
-
for ( stub_page = 0, i = cpu & ~(STUBS_PER_PAGE - 1);
i < nr_cpu_ids && i <= (cpu | (STUBS_PER_PAGE - 1)); ++i )
if ( cpu_online(i) && cpu_to_node(i) == node )
@@ -938,6 +978,9 @@ static int cpu_smpboot_alloc(unsigned in
goto oom;
per_cpu(stubs.addr, cpu) = stub_page + STUB_BUF_CPU_OFFS(cpu);
+ if ( setup_cpu_root_pgt(cpu) )
+ goto oom;
+
if ( secondary_socket_cpumask == NULL &&
(secondary_socket_cpumask = xzalloc(cpumask_t)) == NULL )
goto oom;
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -102,7 +102,8 @@ DEFINE_PER_CPU_READ_MOSTLY(struct desc_s
DEFINE_PER_CPU_READ_MOSTLY(struct desc_struct *, compat_gdt_table);
/* Master table, used by CPU0. */
-idt_entry_t idt_table[IDT_ENTRIES];
+idt_entry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
+ idt_table[IDT_ENTRIES];
/* Pointer to the IDT of every CPU. */
idt_entry_t *idt_tables[NR_CPUS] __read_mostly;
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -13,6 +13,8 @@
#include
#include
+ .section .text.entry, "ax", @progbits
+
ENTRY(entry_int82)
ASM_CLAC
pushq $0
@@ -270,6 +272,9 @@ ENTRY(compat_int80_direct_trap)
call compat_create_bounce_frame
jmp compat_test_all_events
+ /* compat_create_bounce_frame & helpers don't need to be in .text.entry */
+ .text
+
/* CREATE A BASIC EXCEPTION FRAME ON GUEST OS (RING-1) STACK: */
/* {[ERRCODE,] EIP, CS, EFLAGS, [ESP, SS]} */
/* %rdx: trap_bounce, %rbx: struct vcpu */
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -14,6 +14,8 @@
#include
#include
+ .section .text.entry, "ax", @progbits
+
/* %rbx: struct vcpu */
ENTRY(switch_to_kernel)
leaq VCPU_trap_bounce(%rbx),%rdx
@@ -357,6 +359,9 @@ int80_slow_path:
subq $2,UREGS_rip(%rsp)
jmp handle_exception_saved
+ /* create_bounce_frame & helpers don't need to be in .text.entry */
+ .text
+
/* CREATE A BASIC EXCEPTION FRAME ON GUEST OS STACK: */
/* { RCX, R11, [ERRCODE,] RIP, CS, RFLAGS, RSP, SS } */
/* %rdx: trap_bounce, %rbx: struct vcpu */
@@ -487,6 +492,8 @@ ENTRY(dom_crash_sync_extable)
jmp asm_domain_crash_synchronous /* Does not return */
.popsection
+ .section .text.entry, "ax", @progbits
+
ENTRY(common_interrupt)
SAVE_ALL CLAC
@@ -846,8 +853,7 @@ GLOBAL(trap_nop)
-.section .rodata, "a", @progbits
-
+ .pushsection .rodata, "a", @progbits
ENTRY(exception_table)
.quad do_trap
.quad do_debug
@@ -873,9 +879,10 @@ ENTRY(exception_table)
.quad do_reserved_trap /* Architecturally reserved exceptions. */
.endr
.size exception_table, . - exception_table
+ .popsection
/* Table of automatically generated entry points. One per vector. */
- .section .init.rodata, "a", @progbits
+ .pushsection .init.rodata, "a", @progbits
GLOBAL(autogen_entrypoints)
/* pop into the .init.rodata section and record an entry point. */
.macro entrypoint ent
@@ -884,7 +891,7 @@ GLOBAL(autogen_entrypoints)
.popsection
.endm
- .text
+ .popsection
autogen_stubs: /* Automatically generated stubs. */
vec = 0
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -60,6 +60,13 @@ SECTIONS
_stext = .; /* Text and read-only data */
*(.text)
*(.text.__x86_indirect_thunk_*)
+
+ . = ALIGN(PAGE_SIZE);
+ _stextentry = .;
+ *(.text.entry)
+ . = ALIGN(PAGE_SIZE);
+ _etextentry = .;
+
*(.text.cold)
*(.text.unlikely)
*(.fixup)
++++++ 5a8be788-x86-nmi-start-NMI-watchdog-on-CPU0-after-SMP.patch ++++++
--- /var/tmp/diff_new_pack.XjVnAv/_old 2018-03-30 12:00:46.384160741 +0200
+++ /var/tmp/diff_new_pack.XjVnAv/_new 2018-03-30 12:00:46.384160741 +0200
@@ -28,10 +28,8 @@
Signed-off-by: Igor Druzhinin
Reviewed-by: Jan Beulich
-Index: xen-4.10.0-testing/xen/arch/x86/apic.c
-===================================================================
---- xen-4.10.0-testing.orig/xen/arch/x86/apic.c
-+++ xen-4.10.0-testing/xen/arch/x86/apic.c
+--- a/xen/arch/x86/apic.c
++++ b/xen/arch/x86/apic.c
@@ -682,7 +682,7 @@ void setup_local_APIC(void)
printk("Leaving ESR disabled.\n");
}
@@ -41,11 +39,9 @@
setup_apic_nmi_watchdog();
apic_pm_activate();
}
-Index: xen-4.10.0-testing/xen/arch/x86/smpboot.c
-===================================================================
---- xen-4.10.0-testing.orig/xen/arch/x86/smpboot.c
-+++ xen-4.10.0-testing/xen/arch/x86/smpboot.c
-@@ -1241,7 +1241,10 @@ int __cpu_up(unsigned int cpu)
+--- a/xen/arch/x86/smpboot.c
++++ b/xen/arch/x86/smpboot.c
+@@ -1284,7 +1284,10 @@ int __cpu_up(unsigned int cpu)
void __init smp_cpus_done(void)
{
if ( nmi_watchdog == NMI_LOCAL_APIC )
@@ -56,11 +52,9 @@
setup_ioapic_dest();
-Index: xen-4.10.0-testing/xen/arch/x86/traps.c
-===================================================================
---- xen-4.10.0-testing.orig/xen/arch/x86/traps.c
-+++ xen-4.10.0-testing/xen/arch/x86/traps.c
-@@ -1669,7 +1669,7 @@ static nmi_callback_t *nmi_callback = du
+--- a/xen/arch/x86/traps.c
++++ b/xen/arch/x86/traps.c
+@@ -1670,7 +1670,7 @@ static nmi_callback_t *nmi_callback = du
void do_nmi(const struct cpu_user_regs *regs)
{
unsigned int cpu = smp_processor_id();
@@ -69,7 +63,7 @@
bool handle_unknown = false;
++nmi_count(cpu);
-@@ -1677,6 +1677,16 @@ void do_nmi(const struct cpu_user_regs *
+@@ -1678,6 +1678,16 @@ void do_nmi(const struct cpu_user_regs *
if ( nmi_callback(regs, cpu) )
return;
@@ -86,7 +80,7 @@
if ( (nmi_watchdog == NMI_NONE) ||
(!nmi_watchdog_tick(regs) && watchdog_force) )
handle_unknown = true;
-@@ -1684,7 +1694,6 @@ void do_nmi(const struct cpu_user_regs *
+@@ -1685,7 +1695,6 @@ void do_nmi(const struct cpu_user_regs *
/* Only the BSP gets external NMIs from the system. */
if ( cpu == 0 )
{
++++++ 5a956747-x86-HVM-dont-give-wrong-impression-of-WRMSR-success.patch ++++++
--- /var/tmp/diff_new_pack.XjVnAv/_old 2018-03-30 12:00:46.420159439 +0200
+++ /var/tmp/diff_new_pack.XjVnAv/_new 2018-03-30 12:00:46.424159294 +0200
@@ -19,6 +19,20 @@
Reviewed-by: Andrew Cooper
Reviewed-by: Boris Ostrovsky
+# Commit 59c0983e10d70ea2368085271b75fb007811fe52
+# Date 2018-03-15 12:44:24 +0100
+# Author Jan Beulich
+# Committer Jan Beulich
+x86: ignore guest microcode loading attempts
+
+The respective MSRs are write-only, and hence attempts by guests to
+write to these are - as of 1f1d183d49 ("x86/HVM: don't give the wrong
+impression of WRMSR succeeding") no longer ignored. Restore original
+behavior for the two affected MSRs.
+
+Signed-off-by: Jan Beulich
+Reviewed-by: Andrew Cooper
+
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2106,6 +2106,13 @@ static int svm_msr_write_intercept(unsig
@@ -51,3 +65,43 @@
case 1:
break;
default:
+--- a/xen/arch/x86/msr.c
++++ b/xen/arch/x86/msr.c
+@@ -128,6 +128,8 @@ int guest_rdmsr(const struct vcpu *v, ui
+
+ switch ( msr )
+ {
++ case MSR_AMD_PATCHLOADER:
++ case MSR_IA32_UCODE_WRITE:
+ case MSR_PRED_CMD:
+ /* Write-only */
+ goto gp_fault;
+@@ -181,6 +183,28 @@ int guest_wrmsr(struct vcpu *v, uint32_t
+ /* Read-only */
+ goto gp_fault;
+
++ case MSR_AMD_PATCHLOADER:
++ /*
++ * See note on MSR_IA32_UCODE_WRITE below, which may or may not apply
++ * to AMD CPUs as well (at least the architectural/CPUID part does).
++ */
++ if ( is_pv_domain(d) ||
++ d->arch.cpuid->x86_vendor != X86_VENDOR_AMD )
++ goto gp_fault;
++ break;
++
++ case MSR_IA32_UCODE_WRITE:
++ /*
++ * Some versions of Windows at least on certain hardware try to load
++ * microcode before setting up an IDT. Therefore we must not inject #GP
++ * for such attempts. Also the MSR is architectural and not qualified
++ * by any CPUID bit.
++ */
++ if ( is_pv_domain(d) ||
++ d->arch.cpuid->x86_vendor != X86_VENDOR_INTEL )
++ goto gp_fault;
++ break;
++
+ case MSR_SPEC_CTRL:
+ if ( !cp->feat.ibrsb )
+ goto gp_fault; /* MSR available? */
++++++ 5a9eb7f1-x86-xpti-dont-map-stack-guard-pages.patch ++++++
# Commit d303784b68237ff3050daa184f560179dda21b8c
# Date 2018-03-06 16:46:57 +0100
# Author Jan Beulich
# Committer Jan Beulich
x86/xpti: don't map stack guard pages
Other than for the main mappings, don't even do this in release builds,
as there are no huge page shattering concerns here.
Note that since we don't run on the restructed page tables while HVM
guests execute, the non-present mappings won't trigger the triple fault
issue AMD SVM is susceptible to with our current placement of STGI vs
TR loading.
Signed-off-by: Jan Beulich
Acked-by: Andrew Cooper
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5538,6 +5538,14 @@ void memguard_unguard_stack(void *p)
memguard_unguard_range(p, PAGE_SIZE);
}
+bool memguard_is_stack_guard_page(unsigned long addr)
+{
+ addr &= STACK_SIZE - 1;
+
+ return addr >= STACK_SIZE - PRIMARY_STACK_SIZE - PAGE_SIZE &&
+ addr < STACK_SIZE - PRIMARY_STACK_SIZE;
+}
+
void arch_dump_shared_mem_info(void)
{
printk("Shared frames %u -- Saved frames %u\n",
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -797,7 +797,8 @@ static int setup_cpu_root_pgt(unsigned i
/* Install direct map page table entries for stack, IDT, and TSS. */
for ( off = rc = 0; !rc && off < STACK_SIZE; off += PAGE_SIZE )
- rc = clone_mapping(__va(__pa(stack_base[cpu])) + off, rpt);
+ if ( !memguard_is_stack_guard_page(off) )
+ rc = clone_mapping(__va(__pa(stack_base[cpu])) + off, rpt);
if ( !rc )
rc = clone_mapping(idt_tables[cpu], rpt);
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -519,6 +519,7 @@ void memguard_unguard_range(void *p, uns
void memguard_guard_stack(void *p);
void memguard_unguard_stack(void *p);
+bool __attribute_const__ memguard_is_stack_guard_page(unsigned long addr);
struct mmio_ro_emulate_ctxt {
unsigned long cr2;
++++++ 5a9eb85c-x86-slightly-reduce-XPTI-overhead.patch ++++++
# Commit 9d1d31ad9498e6ceb285d5774e34fed5f648c273
# Date 2018-03-06 16:48:44 +0100
# Author Jan Beulich
# Committer Jan Beulich
x86: slightly reduce Meltdown band-aid overhead
I'm not sure why I didn't do this right away: By avoiding the use of
global PTEs in the cloned directmap, there's no need to fiddle with
CR4.PGE on any of the entry paths. Only the exit paths need to flush
global mappings.
The reduced flushing, however, requires that we now have interrupts off
on all entry paths until after the page table switch, so that flush IPIs
can't be serviced while on the restricted pagetables, leaving a window
where a potentially stale guest global mapping can be brought into the
TLB. Along those lines the "sync" IPI after L4 entry updates now needs
to become a real (and global) flush IPI, so that inside Xen we'll also
pick up such changes.
Signed-off-by: Jan Beulich
Tested-by: Juergen Gross
Reviewed-by: Juergen Gross
Reviewed-by: Andrew Cooper
# Commit c4dd58f0cf23cdf119bbccedfb8c24435fc6f3ab
# Date 2018-03-16 17:27:36 +0100
# Author Jan Beulich
# Committer Jan Beulich
x86: correct EFLAGS.IF in SYSENTER frame
Commit 9d1d31ad94 ("x86: slightly reduce Meltdown band-aid overhead")
moved the STI past the PUSHF. While this isn't an active problem (as we
force EFLAGS.IF to 1 before exiting to guest context), let's not risk
internal confusion by finding a PV guest frame with interrupts
apparently off.
Signed-off-by: Jan Beulich
Acked-by: Andrew Cooper
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3782,18 +3782,14 @@ long do_mmu_update(
{
/*
* Force other vCPU-s of the affected guest to pick up L4 entry
- * changes (if any). Issue a flush IPI with empty operation mask to
- * facilitate this (including ourselves waiting for the IPI to
- * actually have arrived). Utilize the fact that FLUSH_VA_VALID is
- * meaningless without FLUSH_CACHE, but will allow to pass the no-op
- * check in flush_area_mask().
+ * changes (if any).
*/
unsigned int cpu = smp_processor_id();
cpumask_t *mask = per_cpu(scratch_cpumask, cpu);
cpumask_andnot(mask, pt_owner->domain_dirty_cpumask, cpumask_of(cpu));
if ( !cpumask_empty(mask) )
- flush_area_mask(mask, ZERO_BLOCK_PTR, FLUSH_VA_VALID);
+ flush_mask(mask, FLUSH_TLB_GLOBAL);
}
perfc_add(num_page_updates, i);
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -737,6 +737,7 @@ static int clone_mapping(const void *ptr
}
pl1e += l1_table_offset(linear);
+ flags &= ~_PAGE_GLOBAL;
if ( l1e_get_flags(*pl1e) & _PAGE_PRESENT )
{
@@ -1046,8 +1047,17 @@ void __init smp_prepare_cpus(unsigned in
if ( rc )
panic("Error %d setting up PV root page table\n", rc);
if ( per_cpu(root_pgt, 0) )
+ {
get_cpu_info()->pv_cr3 = __pa(per_cpu(root_pgt, 0));
+ /*
+ * All entry points which may need to switch page tables have to start
+ * with interrupts off. Re-write what pv_trap_init() has put there.
+ */
+ _set_gate(idt_table + LEGACY_SYSCALL_VECTOR, SYS_DESC_irq_gate, 3,
+ &int80_direct_trap);
+ }
+
set_nr_sockets();
socket_cpumask = xzalloc_array(cpumask_t *, nr_sockets);
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -202,7 +202,7 @@ ENTRY(compat_post_handle_exception)
/* See lstar_enter for entry register state. */
ENTRY(cstar_enter)
- sti
+ /* sti could live here when we don't switch page tables below. */
CR4_PV32_RESTORE
movq 8(%rsp),%rax /* Restore %rax. */
movq $FLAT_KERNEL_SS,8(%rsp)
@@ -222,11 +222,12 @@ ENTRY(cstar_enter)
jz .Lcstar_cr3_okay
mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
neg %rcx
- write_cr3 rcx, rdi, rsi
+ mov %rcx, %cr3
movq $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
.Lcstar_cr3_okay:
+ sti
- GET_CURRENT(bx)
+ __GET_CURRENT(bx)
movq VCPU_domain(%rbx),%rcx
cmpb $0,DOMAIN_is_32bit_pv(%rcx)
je switch_to_kernel
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -150,7 +150,7 @@ UNLIKELY_END(exit_cr3)
* %ss must be saved into the space left by the trampoline.
*/
ENTRY(lstar_enter)
- sti
+ /* sti could live here when we don't switch page tables below. */
movq 8(%rsp),%rax /* Restore %rax. */
movq $FLAT_KERNEL_SS,8(%rsp)
pushq %r11
@@ -169,9 +169,10 @@ ENTRY(lstar_enter)
jz .Llstar_cr3_okay
mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
neg %rcx
- write_cr3 rcx, rdi, rsi
+ mov %rcx, %cr3
movq $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
.Llstar_cr3_okay:
+ sti
__GET_CURRENT(bx)
testb $TF_kernel_mode,VCPU_thread_flags(%rbx)
@@ -254,7 +255,7 @@ process_trap:
jmp test_all_events
ENTRY(sysenter_entry)
- sti
+ /* sti could live here when we don't switch page tables below. */
pushq $FLAT_USER_SS
pushq $0
pushfq
@@ -270,14 +271,17 @@ GLOBAL(sysenter_eflags_saved)
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
GET_STACK_END(bx)
+ /* PUSHF above has saved EFLAGS.IF clear (the caller had it set). */
+ orl $X86_EFLAGS_IF, UREGS_eflags(%rsp)
mov STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rcx
neg %rcx
jz .Lsyse_cr3_okay
mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
neg %rcx
- write_cr3 rcx, rdi, rsi
+ mov %rcx, %cr3
movq $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
.Lsyse_cr3_okay:
+ sti
__GET_CURRENT(bx)
cmpb $0,VCPU_sysenter_disables_events(%rbx)
@@ -324,9 +328,10 @@ ENTRY(int80_direct_trap)
jz .Lint80_cr3_okay
mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
neg %rcx
- write_cr3 rcx, rdi, rsi
+ mov %rcx, %cr3
movq $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
.Lint80_cr3_okay:
+ sti
cmpb $0,untrusted_msi(%rip)
UNLIKELY_START(ne, msi_check)
@@ -510,7 +515,7 @@ ENTRY(common_interrupt)
mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
neg %rcx
.Lintr_cr3_load:
- write_cr3 rcx, rdi, rsi
+ mov %rcx, %cr3
xor %ecx, %ecx
mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
testb $3, UREGS_cs(%rsp)
@@ -552,7 +557,7 @@ GLOBAL(handle_exception)
mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
neg %rcx
.Lxcpt_cr3_load:
- write_cr3 rcx, rdi, rsi
+ mov %rcx, %cr3
xor %ecx, %ecx
mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
testb $3, UREGS_cs(%rsp)
@@ -748,7 +753,7 @@ ENTRY(double_fault)
jns .Ldblf_cr3_load
neg %rbx
.Ldblf_cr3_load:
- write_cr3 rbx, rdi, rsi
+ mov %rbx, %cr3
.Ldblf_cr3_okay:
movq %rsp,%rdi
@@ -783,7 +788,7 @@ handle_ist_exception:
mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
neg %rcx
.List_cr3_load:
- write_cr3 rcx, rdi, rsi
+ mov %rcx, %cr3
movq $0, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
.List_cr3_okay:
++++++ 5a9eb890-x86-remove-CR-reads-from-exit-to-guest-path.patch ++++++
# Commit 31bf55cb5fe3796cf6a4efbcfc0a9418bb1c783f
# Date 2018-03-06 16:49:36 +0100
# Author Jan Beulich
# Committer Jan Beulich
x86: remove CR reads from exit-to-guest path
CR3 is - during normal operation - only ever loaded from v->arch.cr3,
so there's no need to read the actual control register. For CR4 we can
generally use the cached value on all synchronous entry end exit paths.
Drop the write_cr3 macro, as the two use sites are probably easier to
follow without its use.
Signed-off-by: Jan Beulich
Tested-by: Juergen Gross
Reviewed-by: Juergen Gross
Reviewed-by: Andrew Cooper
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -88,6 +88,7 @@ void __dummy__(void)
OFFSET(VCPU_kernel_ss, struct vcpu, arch.pv_vcpu.kernel_ss);
OFFSET(VCPU_iopl, struct vcpu, arch.pv_vcpu.iopl);
OFFSET(VCPU_guest_context_flags, struct vcpu, arch.vgc_flags);
+ OFFSET(VCPU_cr3, struct vcpu, arch.cr3);
OFFSET(VCPU_arch_msr, struct vcpu, arch.msr);
OFFSET(VCPU_nmi_pending, struct vcpu, nmi_pending);
OFFSET(VCPU_mce_pending, struct vcpu, mce_pending);
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -45,7 +45,7 @@ restore_all_guest:
mov VCPUMSR_spec_ctrl_raw(%rdx), %r15d
/* Copy guest mappings and switch to per-CPU root page table. */
- mov %cr3, %r9
+ mov VCPU_cr3(%rbx), %r9
GET_STACK_END(dx)
mov STACK_CPUINFO_FIELD(pv_cr3)(%rdx), %rdi
movabs $PADDR_MASK & PAGE_MASK, %rsi
@@ -67,8 +67,13 @@ restore_all_guest:
sub $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \
ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rdi
rep movsq
+ mov STACK_CPUINFO_FIELD(cr4)(%rdx), %rdi
mov %r9, STACK_CPUINFO_FIELD(xen_cr3)(%rdx)
- write_cr3 rax, rdi, rsi
+ mov %rdi, %rsi
+ and $~X86_CR4_PGE, %rdi
+ mov %rdi, %cr4
+ mov %rax, %cr3
+ mov %rsi, %cr4
.Lrag_keep_cr3:
/* Restore stashed SPEC_CTRL value. */
@@ -124,7 +129,12 @@ restore_all_xen:
* so "g" will have to do.
*/
UNLIKELY_START(g, exit_cr3)
- write_cr3 rax, rdi, rsi
+ mov %cr4, %rdi
+ mov %rdi, %rsi
+ and $~X86_CR4_PGE, %rdi
+ mov %rdi, %cr4
+ mov %rax, %cr3
+ mov %rsi, %cr4
UNLIKELY_END(exit_cr3)
/* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -207,15 +207,6 @@ void ret_from_intr(void);
#define ASM_STAC ASM_AC(STAC)
#define ASM_CLAC ASM_AC(CLAC)
-.macro write_cr3 val:req, tmp1:req, tmp2:req
- mov %cr4, %\tmp1
- mov %\tmp1, %\tmp2
- and $~X86_CR4_PGE, %\tmp1
- mov %\tmp1, %cr4
- mov %\val, %cr3
- mov %\tmp2, %cr4
-.endm
-
#define CR4_PV32_RESTORE \
667: ASM_NOP5; \
.pushsection .altinstr_replacement, "ax"; \
++++++ 5aa2b6b9-cpufreq-ondemand-CPU-offlining-race.patch ++++++
# Commit 185413355fe331cbc926d48568838227234c9a20
# Date 2018-03-09 17:30:49 +0100
# Author Jan Beulich
# Committer Jan Beulich
cpufreq/ondemand: fix race while offlining CPU
Offlining a CPU involves stopping the cpufreq governor. The on-demand
governor will kill the timer before letting generic code proceed, but
since that generally isn't happening on the subject CPU,
cpufreq_dbs_timer_resume() may run in parallel. If that managed to
invoke the timer handler, that handler needs to run to completion before
dbs_timer_exit() may safely exit.
Make the "stoppable" field a tristate, changing it from +1 to -1 around
the timer function invocation, and make dbs_timer_exit() wait for it to
become non-negative (still writing zero if it's +1).
Also adjust coding style in cpufreq_dbs_timer_resume().
Reported-by: Martin Cerveny
Signed-off-by: Jan Beulich
Tested-by: Martin Cerveny
Reviewed-by: Wei Liu
--- a/xen/drivers/cpufreq/cpufreq_ondemand.c
+++ b/xen/drivers/cpufreq/cpufreq_ondemand.c
@@ -204,7 +204,14 @@ static void dbs_timer_init(struct cpu_db
static void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
{
dbs_info->enable = 0;
- dbs_info->stoppable = 0;
+
+ /*
+ * The timer function may be running (from cpufreq_dbs_timer_resume) -
+ * wait for it to complete.
+ */
+ while ( cmpxchg(&dbs_info->stoppable, 1, 0) < 0 )
+ cpu_relax();
+
kill_timer(&per_cpu(dbs_timer, dbs_info->cpu));
}
@@ -369,23 +376,22 @@ void cpufreq_dbs_timer_suspend(void)
void cpufreq_dbs_timer_resume(void)
{
- int cpu;
- struct timer* t;
- s_time_t now;
-
- cpu = smp_processor_id();
+ unsigned int cpu = smp_processor_id();
+ int8_t *stoppable = &per_cpu(cpu_dbs_info, cpu).stoppable;
- if ( per_cpu(cpu_dbs_info,cpu).stoppable )
+ if ( *stoppable )
{
- now = NOW();
- t = &per_cpu(dbs_timer, cpu);
- if (t->expires <= now)
+ s_time_t now = NOW();
+ struct timer *t = &per_cpu(dbs_timer, cpu);
+
+ if ( t->expires <= now )
{
+ if ( !cmpxchg(stoppable, 1, -1) )
+ return;
t->function(t->data);
+ (void)cmpxchg(stoppable, -1, 1);
}
else
- {
- set_timer(t, align_timer(now , dbs_tuners_ins.sampling_rate));
- }
+ set_timer(t, align_timer(now, dbs_tuners_ins.sampling_rate));
}
}
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -225,8 +225,8 @@ struct cpu_dbs_info_s {
struct cpufreq_frequency_table *freq_table;
int cpu;
unsigned int enable:1;
- unsigned int stoppable:1;
unsigned int turbo_enabled:1;
+ int8_t stoppable;
};
int cpufreq_governor_dbs(struct cpufreq_policy *policy, unsigned int event);
++++++ 5aaa9878-x86-vlapic-clear-TMR-bit-for-edge-triggered-intr.patch ++++++
# Commit 12a50030a81a14a3c7be672ddfde707b961479ec
# Date 2018-03-15 16:59:52 +0100
# Author Liran Alon
# Committer Jan Beulich
x86/vlapic: clear TMR bit upon acceptance of edge-triggered interrupt to IRR
According to Intel SDM section "Interrupt Acceptance for Fixed Interrupts":
"The trigger mode register (TMR) indicates the trigger mode of the
interrupt (see Figure 10-20). Upon acceptance of an interrupt
into the IRR, the corresponding TMR bit is cleared for
edge-triggered interrupts and set for level-triggered interrupts.
If a TMR bit is set when an EOI cycle for its corresponding
interrupt vector is generated, an EOI message is sent to
all I/O APICs."
Before this patch TMR-bit was cleared on LAPIC EOI which is not what
real hardware does. This was also confirmed in KVM upstream commit
a0c9a822bf37 ("KVM: dont clear TMR on EOI").
Behavior after this patch is aligned with both Intel SDM and KVM
implementation.
Signed-off-by: Liran Alon
Signed-off-by: Boris Ostrovsky
Reviewed-by: Jan Beulich
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -161,6 +161,8 @@ void vlapic_set_irq(struct vlapic *vlapi
if ( trig )
vlapic_set_vector(vec, &vlapic->regs->data[APIC_TMR]);
+ else
+ vlapic_clear_vector(vec, &vlapic->regs->data[APIC_TMR]);
if ( hvm_funcs.update_eoi_exit_bitmap )
hvm_funcs.update_eoi_exit_bitmap(target, vec, trig);
@@ -434,7 +436,7 @@ void vlapic_handle_EOI(struct vlapic *vl
{
struct domain *d = vlapic_domain(vlapic);
- if ( vlapic_test_and_clear_vector(vector, &vlapic->regs->data[APIC_TMR]) )
+ if ( vlapic_test_vector(vector, &vlapic->regs->data[APIC_TMR]) )
vioapic_update_EOI(d, vector);
hvm_dpci_msi_eoi(d, vector);