As a result of process parameter variations, a large variability in circuit delay occurs in scaled technologies. This delay or latency variation problem is particularly pressing for memory components due to the minimum sized transistors used to build them. Current memory design techniques mostly cope with such variations by adopting a worst-case design option, which simply assumes all memory locations are operated under the worst possible latency, whereas in reality some memory locations could be much faster than the others. Note that, assuming any other latency value other than the worst-case latency for all memory locations uniformly can lead to reliability problems, since the data may not be ready when the assumed latency has passed. Instead of operating under the worst-case design option, this paper proposes and experimentally evaluates a compiler-driven approach that operates an on-chip scratch-pad memory (SPM) assuming different latencies for the different SPM lines. Our goal is to reduce execution cycles without creating any reliability problems due to variations in access latencies. The proposed scheme achieves its goal by evaluating the reuse of different data items and adopting a reuse and latency aware data-to-SPMplacement. It also employs data migration within SPM when it helps to cut down the number of execution cycles further. We also discuss an alternate scheme that can reduce latency of select SPM locations by controlling a circuit level mechanism in software to further improve performance. We implemented our approach within an optimizing compiler and tested its effectiveness through extensive simulations. Our experiments with twelve embedded application codes show that the proposed approach performs much better than the worst-case based design paradigm (16.2% improvement on the average) and comes close (within 5.7%) to an hypothetical bestcase design (i.e., one with no process variation) where every SPM locations uniformly have low latency.