Chip Multiprocessors (CMPs) and Non-Uniform Cache Architectures (NUCAs) represent two emerging trends in computer architecture. Targeting future CMP based systems with NUCA type L2 caches, this paper proposes a novel data migration algorithm for parallel applications and evaluates it. The goal of this migration scheme is to determine a suitable location for each data block within a large L2 space at any given point during execution. A unique characteristic of the proposed scheme is that it models the problem of optimal data placement in the L2 cache space as a two-dimensional post office placement problem, presents a practical architectural implementation of this model, and gives a detailed evaluation of the proposed implementation. In our experimental evaluation, we also compare our approach to a previously-proposed NUCA management scheme using applications from the specomp suite, oltp, specjbb, and specweb. These experiments show that our migration approach generates about 35% improvement, on average, in average L2 access latency over the previous migration scheme, and these L2 latency savings translate, on average, to 9.5% improvement in IPC (instructions per cycle).We also observed during our experiments that both the careful initial placement of data (which itself triggers migrations within the L2 space) and subsequent migrations (due to interprocessor data sharing) play an important role in achieving our performance improvements.