CPU and Memory Performance: Xen Dom0 vs DomU Comparison
This article documents performance benchmarks for Xen virtualization, comparing CPU and memory performance between Dom0 (privileged domain), DomU (unprivileged guest), and bare metal. While Xen remains in production use (AWS EC2, Citrix Hypervisor, Oracle Cloud), most Linux distributions default to KVM for new deployments. For contemporary virtualization needs, evaluate both Xen and KVM based on your specific requirements.
The test platform is Fedora 41 with Xen and Linux kernel 6.x. See the companion article on setting up Xen Dom0 on Fedora for environment details.
Test Design
Three performance dimensions were measured: CPU-bound operations, memory reads, and memory writes. Each test executed 50 times, with results analyzed for average, standard deviation, and coefficient of variation.
CPU-Bound Test
Measures raw computational throughput using nested loops:
const int test_limit = 100000;
clock_t begin_time = clock();
register int a = 0;
for (register int i = 0; i < test_limit; i++) {
for (register int j = 0; j < test_limit; j++) {
a += i + j;
}
}
clock_t end_time = clock();
int time_ms = (end_time - begin_time) / (CLOCKS_PER_SEC / 1000);
Memory Read Test
Benchmarks sequential memory access patterns with a 1GB array:
const int test_times = 1000000;
const int data_size = 1000000000;
const int data_interval = 1000000;
int* array_read = new int[data_size];
register int read_value = 0;
int data_range = data_size - test_times;
long read_begin = clock();
for (register int t = 0; t < test_times; t++) {
for (register int i = 0; i < data_range; i += data_interval) {
read_value = array_read[i + t];
}
}
long read_end = clock();
int time_ms = (read_end - read_begin) / (CLOCKS_PER_SEC / 1000);
Memory Write Test
Evaluates memory write performance using the same array structure:
const int test_times = 1000000;
const int data_size = 1000000000;
const int data_interval = 1000000;
int* array_write = new int[data_size];
int data_range = data_size - test_times;
long write_begin = clock();
for (register int t = 0; t < test_times; t++) {
for (register int i = 0; i < data_range; i += data_interval) {
array_write[i + t] = 528283;
}
}
long write_end = clock();
int time_ms = (write_end - write_begin) / (CLOCKS_PER_SEC / 1000);
Results
Physical Machine (Fedora, bare metal)
| Metric | CPU (ms) | Memory Read (ms) | Memory Write (ms) |
|---|---|---|---|
| Average | 13257.6 | 21449.6 | 22243.8 |
| Std Dev | 4.72 | 17.2 | 18.96 |
| Coefficient of Variation | 0.036% | 0.080% | 0.085% |
Dom0 (SUSE kernel, privileged domain)
| Metric | CPU (ms) | Memory Read (ms) | Memory Write (ms) |
|---|---|---|---|
| Average | 13283 | 23059 | 23856.6 |
| Std Dev | 4.58 | 22.38 | 18.07 |
| Coefficient of Variation | 0.034% | 0.097% | 0.076% |
| Relative Performance | 99.81% | 93.02% | 93.24% |
DomU (Fedora kernel, unprivileged guest)
| Metric | CPU (ms) | Memory Read (ms) | Memory Write (ms) |
|---|---|---|---|
| Average | 13307.6 | 23667.8 | 24459.2 |
| Std Dev | 13.2 | 33.96 | 37.19 |
| Coefficient of Variation | 0.099% | 0.144% | 0.152% |
| Relative Performance | 99.62% | 90.63% | 90.94% |
Analysis
CPU Performance: Both Dom0 and DomU deliver near-native CPU throughput (99.6–99.8%), with minimal virtualization overhead. This reflects Xen’s efficient handling of CPU-bound workloads through hardware support (Intel VT-x, AMD-V).
Memory Performance: Memory operations show measurable overhead, particularly in DomU:
- Dom0 memory reads/writes: ~7% slower than bare metal. This represents the overhead of managing guest memory translation and hypervisor involvement in privileged operations.
- DomU memory reads/writes: ~9–10% slower than bare metal. The additional overhead stems from Xen’s memory validation, balloon driver interactions, and guest-to-hypervisor context switches required for memory management in unprivileged domains.
The higher variability in DomU results (coefficient of variation ~0.14–0.15% vs. 0.03–0.10% on bare metal) reflects scheduling jitter and memory pressure management by the hypervisor.
Practical Considerations
These benchmarks represent microbenchmark performance under controlled conditions. Real-world workload performance depends on:
- Memory access patterns: Sequential access (tested here) benefits more from caching; random patterns may show larger overhead.
- CPU cache effects: The nested loop test remains in L1/L2 cache, masking memory subsystem effects.
- I/O operations: Network and disk I/O introduce additional virtualization overhead not captured here.
- Scheduling: Overcommitted systems show larger performance degradation than measured here.
For production workloads, validate performance with your actual application profile using tools like perf, sysbench, or application-level metrics. Xen remains suitable for latency-sensitive applications where CPU overhead must be minimized, though modern KVM implementations are competitive for most deployments.

nice information. thanks.
The memory read and write cant work.. it crashed for memwrite and memread i got time=0
@Teo
Please give me the error message of memwrite and memread.
No optimization should be done by gcc when compiling the code. If -O0 isn’t used, the time may be 0 since the loop actually only read one value at the end and compiler may optimize it to only one operation.