Data locality optimization is a critical issue for NoC (network-on-chip) based multicore systems. In this paper, focusing on a two-dimensional NoC-based multicore and dataintensive multithreaded applications, we first discuss a data locality aware scheduling algorithm for any given computation-to-core mapping, and then propose an integrated mapping+scheduling algorithm that performs both tasks together. Both our algorithms consider temporal (time-wise) and spatial (neighborhood-aware) data reuse, and try to minimize distance-to-data in on-chip cache accesses. We test the effectiveness of our compiler algorithms using a set of twelve application programs. Our experiments indicate that the proposed algorithms achieve significant improvements in data access latencies (42.7% on average) and overall execution times (24.1% on average). We also conduct a sensitivity analysis where we change the number of cores, on-chip cache capacities, and data movement (migration) strategies. These experiments show that our proposed algorithms generate consistently good results.