Application programs that exhibit strong locality of reference lead to minimized cache misses and better performance in different architectures. In this paper, we target task-based programs, and propose a novel compiler-based approach that consists of four complementary steps. First, we partition the original tasks in the target application into sub-tasks and build a data reuse graph at a sub-task granularity. Second, based on the intensity of temporal and spatial data reuses among sub-tasks, we generate new tasks where each such (new) task includes a set of sub-tasks that exhibit high data reuse among them. Third, we assign the newly-generated tasks to cores in an architecture-aware fashion with the knowledge of data location. Finally, we re-schedule the execution order of sub-tasks within new tasks such that sub-tasks that belong to different tasks but share data among them are executed in close proximity in time. The experiments show that, when targeting a state of the art manycore system, our compiler-based approach improves the performance of 10 multithreaded programs by 23.4% on average.