He VMware ESXi hypervisor attracts a wide range of customers and is deployed in domains ranging from desktop computing to server computing. While the software systems are increasingly moving towards consolidation, hardware has already transitioned into multi-socket Non-Uniform Memory Access (NUMA)-based systems. The marriage of increasing consolidation and the multi-socket based systems warrants low-overhead, simple and practical mechanisms to detect and address performance bottlenecks, without causing additional contention for shared resources such as performance counters. In this paper, we propose a simple, practical and highly accurate, dynamic memory latency probing mechanism to detect memory congestion in a NUMA system. Using these dynamic probed latencies, we propose congestion-aware memory allocation, congestion-aware memory migration, and a combination of these two techniques. These proposals, evaluated on Intel Westmere (8 nodes) and Intel Haswell (2 nodes) using various workloads, improve the overall performance on an average by 7.2% and 9.5% respectively.