Spin-transfer torque magnetic RAM (STT MRAM) has emerged as a promising candidate for on-chip memory in future computing platforms. We present a cross-layer (device-circuit-architecture) approach to energy-efficient cache design using STT MRAM. At the device and circuit levels, we consider different genres of MTJs and bitcells, and evaluate their impact on the area, energy and performance of caches. In addition, we propose micro-architectural techniques viz. sequential cache read and partial cache line update, which exploit the non-volatility of STT MRAM to further improve energy efficiency of STT MRAM caches. A detailed comparison of STT MRAM caches with SRAM-based caches is also presented. Our results indicate that the proposed optimizations significantly enhance the efficiency of STT MRAM for designing lower level caches.