Monday, January 12, 2009

Stumped stumped stumped

I've been running around in circles for months now about how our linux
(RHEL5) VMs are gobbling up memory. We've got old(er) intel 64bit dual
dual-core with HT and new shiny AMD 64bit NUMA RVI dual quad-core
machines. Memory usage on the AMD boxen is through the roof, as in
80-90% utilized when left to it's own devices, while the intel stay
around 50-60%. RVI is supposed to save the world by dumping one layer
of memory virtualization. Virtualized MMU (hardware page table
virtualization) should 'just work'. The host should see roughly what
the guest sees.

So how come some guest cron jobs take most available (guest) physical
memory and then shove them into cache? The host still thinks the
memory is needed so it's not readily available for other guests. I ran
across the wonders of drop_caches and an echo 1 brings my physical
memory usage from 450MB to 90MB (and dumps the buffers / caches). The
host still shows the memory in use by the guest (esxtop, vc stats) but
it's not. no swapping, no idea what the hell is going on.

I would love to blame redhat and their shitty setup (come on,
bluetooth running on a minimal install?), but a debian box has a
similar issue. SELinux? Although not as horrible with SELinux off,
this is not the root.

But my real question with allathis is: Why the hell can't I find
anyone else with beef on how this is working? I have google kungfu
skillz and can't find anything that talks about this as a performance
problem (or not a problem).

yech.