]. P. Kogge, Exascale computing study: Technology challenges in achieving exascale systems, DARPA Information Processing Techniques Office, 2008.

J. Dongarra, A survey of architectural techniques for improving cache power efficiency Scaling the bandwidth wall: challenges in and avenues for CMP scaling Phase change memory: From devices to systems A software approach for combating asymmetries of non-volatile memories High-endurance hybrid cache design in CMP architecture with cache partitioning and access-aware policy, Sustainable Computing: Informatics and Systems ISLPED, 2012 GLSVLSI, 2013. [9] S. Mittal, " Energy Saving Techniques for Phase Change Memory (PCM), pp.3-60, 2009.

X. Wu, Hybrid cache architecture with disparate memory technologies, ISCA, pp.34-45, 2009.
DOI : 10.1145/1555815.1555761

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.463.8280

M. Chang, Technology Comparison for Large Last- Level Caches (L 3 Cs): Low-Leakage SRAM, Low Write-Energy STT-RAM, and Refresh-Optimized eDRAM, pp.143-154, 2013.

Y. Chen, Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design, DATE, pp.45-50, 2012.

C. W. Smullen, Relaxing non-volatility for fast and energy-efficient STT-RAM caches, 2011 IEEE 17th International Symposium on High Performance Computer Architecture, 2011.
DOI : 10.1109/HPCA.2011.5749716

J. Zhao, Bandwidth-aware reconfigurable cache design with hybrid memory technologies, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2010.
DOI : 10.1109/ICCAD.2011.6105304

X. Dong, Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement, Proceedings of the 45th annual conference on Design automation, DAC '08, pp.554-559, 2008.
DOI : 10.1145/1391469.1391610

M. H. Kryder, After Hard Drives—What Comes Next?, IEEE Transactions on Magnetics, vol.45, issue.10, pp.3406-3413, 2009.
DOI : 10.1109/TMAG.2009.2024163

R. Venkatesan, TapeCache, Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design, ISLPED '12, pp.185-190, 2012.
DOI : 10.1145/2333660.2333707

M. Rasquinha, An energy efficient cache design using spin torque transfer (STT) RAM, Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design, ISLPED '10, 2010.
DOI : 10.1145/1840845.1840931

R. Kalla, Power7: IBM's next-generation server processor, pp.7-15, 2010.
DOI : 10.1109/mm.2010.38

S. Iyer, Embedded DRAM: Technology platform for the Blue Gene/L chip, IBM Journal of Research and Development, vol.49, issue.2.3, pp.333-350, 2005.
DOI : 10.1147/rd.492.0333

N. Kurd, 5.9 Haswell: A family of IA 22nm processors, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp.112-113, 2014.
DOI : 10.1109/ISSCC.2014.6757361

J. Barth, A 500 MHz Random Cycle, 1.5 ns Latency, SOI Embedded DRAM Macro Featuring a Three-Transistor Micro Sense Amplifier, IEEE Journal of Solid-State Circuits, vol.43, issue.1, pp.86-95, 2008.
DOI : 10.1109/JSSC.2007.908006

C. Wilkerson, Reducing cache power with low-cost, multi-bit error-correcting codes, pp.83-93, 2010.

M. Alizadeh, Versatile refresh: low complexity refresh scheduling for high-throughput multi-banked eDRAM, ACM SIGMETRICS PER, pp.247-258, 2012.

Y. Huai, Spin-transfer torque MRAM (STT-MRAM): Challenges and prospects, AAPPS Bulletin, vol.18, issue.6, pp.33-40, 2008.

A. Jog, Cache revive, Proceedings of the 49th Annual Design Automation Conference on, DAC '12, pp.243-252, 2012.
DOI : 10.1145/2228360.2228406

H. Li and Y. Chen, An overview of non-volatile memory technology and the implication for tools and architectures, DATE, pp.731-736, 2009.

Y. Kim, Bi-layered RRAM with unlimited endurance and extremely uniform switching, VLSIT, pp.52-53, 2011.

X. Dong, NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Non-volatile Memory, IEEE TCAD, vol.31, issue.7, pp.994-1007, 2012.
DOI : 10.1007/978-1-4419-9551-3_2

J. Wang, WAP: Improving non-volatile cache lifetime by reducing inter-and intra-set write variations, HPCA, pp.234-245, 2013.

Y. Joo, Energy-and endurance-aware design of phase change memory caches, DATE, pp.136-141, 2010.

]. R. Bishnoi, Avoiding unnecessary write operations in STT-MRAM for low power implementation, Fifteenth International Symposium on Quality Electronic Design, 2014.
DOI : 10.1109/ISQED.2014.6783375

R. Bishnoi, Asynchronous asymmetrical write termination (AAWT) for a low power STT-MRAM, DATE, 2014.

J. Lira, Implementing a hybrid SRAM / eDRAM NUCA architecture, 2011 18th International Conference on High Performance Computing, pp.1-10, 2011.
DOI : 10.1109/HiPC.2011.6152738

A. Valero, An hybrid eDRAM/SRAM macrocell to implement first-level data caches, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42, 2009.
DOI : 10.1145/1669112.1669140

A. Valero, Exploiting reuse information to reduce refresh energy in on-chip eDRAM caches, Proceedings of the 27th international ACM conference on International conference on supercomputing, ICS '13, pp.491-492, 2013.
DOI : 10.1145/2464996.2467278

A. Valero, Analyzing the optimal ratio of SRAM banks in hybrid caches, 2012 IEEE 30th International Conference on Computer Design (ICCD), pp.297-302, 2012.
DOI : 10.1109/ICCD.2012.6378655

W. R. Reohr, Memories: Exploiting Them and Developing Them, 2006 IEEE International SOC Conference, pp.303-310, 2006.
DOI : 10.1109/SOCC.2006.283903

V. Lorente, Combining RAM technologies for harderror recovery in L1 data caches working at very-low power modes, DATE, pp.83-88, 2013.

L. Jiang, Constructing large and fast multi-level cell STT-MRAM based cache for embedded processors, Proceedings of the 49th Annual Design Automation Conference on, DAC '12, pp.907-912, 2012.
DOI : 10.1145/2228360.2228521

J. Ahn and K. Choi, Lower-bits cache for low power STT-RAM caches, 2012 IEEE International Symposium on Circuits and Systems, pp.480-483, 2012.
DOI : 10.1109/ISCAS.2012.6272069

J. Wang, X. Dong, and Y. Xie, OAP: An Obstruction-Aware Cache Management Policy for STT-RAM Last-Level Caches, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, pp.847-852, 2013.
DOI : 10.7873/DATE.2013.179

X. Bi, Z. Sun, H. Li, and W. Wu, Probabilistic design methodology to improve run-time stability and performance of STT-RAM caches, Proceedings of the International Conference on Computer-Aided Design, ICCAD '12, pp.88-94, 2012.
DOI : 10.1145/2429384.2429401

J. Jung, Energy-efficient Spin-Transfer Torque RAM cache exploiting additional all-zero-data flags, ISQED, 2013, pp.216-222

Q. Li, Compiler-Assisted Refresh Minimization for Volatile STT-RAM Cache, ASP-DAC, pp.273-278, 2013.
DOI : 10.1109/TC.2014.2360527

B. Quan, Prediction Table based Management Policy for STT-RAM and SRAM Hybrid Cache, pp.1092-1097, 2012.

S. Yazdanshenas, Coding Last Level STT-RAM Cache for High Endurance and Low Power, IEEE Computer Architecture Letters, vol.13, issue.2, 2013.
DOI : 10.1109/L-CA.2013.8

J. Li, C. J. Xue, and Y. Xu, STT-RAM based energy-efficiency hybrid cache for CMPs, 2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip, pp.31-36, 2011.
DOI : 10.1109/VLSISoC.2011.6081626

Q. Li, MAC, Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design, ISLPED '12, pp.351-356, 2012.
DOI : 10.1145/2333660.2333738

URL : https://hal.archives-ouvertes.fr/hal-00529679

Q. Li, Compiler-assisted preferred caching for embedded systems with STT-RAM based hybrid cache, Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems, LCTES '12, pp.109-118, 2012.
DOI : 10.1145/2248418.2248434

Z. Sun, Multi retention level STT-RAM cache designs with a dynamic refresh scheme, Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-44 '11, pp.329-338, 2011.
DOI : 10.1145/2155620.2155659

G. Sun, A novel architecture of the 3D stacked MRAM L2 cache for CMPs, 2009 IEEE 15th International Symposium on High Performance Computer Architecture, pp.239-249, 2009.
DOI : 10.1109/HPCA.2009.4798259

H. Sun, Design techniques to improve the device write margin for MRAM-based cache memory, Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI, GLSVLSI '11, pp.97-102, 2011.
DOI : 10.1145/1973009.1973030

X. Bi, H. Li, and J. Kim, Analysis and Optimization of Thermal Effect on STT-RAM Based 3-D Stacked Cache Design, 2012 IEEE Computer Society Annual Symposium on VLSI, pp.374-379, 2012.
DOI : 10.1109/ISVLSI.2012.56

Y. Joo and S. Park, A Hybrid PRAM and STT-RAM Cache Architecture for Extending the Lifetime of PRAM Caches, IEEE Computer Architecture Letters, vol.12, issue.2, 2012.
DOI : 10.1109/L-CA.2012.24

B. Lee and G. Park, Performance and energyefficiency analysis of hybrid cache memory based on SRAM- MRAM, ISOCC, 2012, pp.247-250

M. Mao, Coordinating prefetching and STT-RAM based last-level cache management for multicore systems, Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI, GLSVLSI '13, pp.55-60
DOI : 10.1145/2483028.2483060

Y. Li and A. K. Jones, Cross-Layer Techniques for Optimizing Systems Utilizing Memories with Asymmetric Access Characteristics, 2012 IEEE Computer Society Annual Symposium on VLSI, pp.404-409, 2012.
DOI : 10.1109/ISVLSI.2012.65

H. Sun, Using Magnetic RAM to Build Low-Power and Soft Error-Resilient L1 Cache, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.20, issue.1, pp.19-28
DOI : 10.1109/TVLSI.2010.2090914

Y. Chen, Static and dynamic co-optimizations for blocks mapping in hybrid caches, Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design, ISLPED '12, 2012.
DOI : 10.1145/2333660.2333717

G. Sun, Exploring the vulnerability of CMPs to soft errors with 3D stacked non-volatile memory, ICCD, pp.366-372, 2011.

H. Naeimi, STTRAM scaling and retention failure, Intel Technology Journal, vol.17, issue.1, p.54, 2013.

J. Li, Cache Coherence Enabled Adaptive Refresh for Volatile STT-RAM, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, pp.1247-1250, 2013.
DOI : 10.7873/DATE.2013.258

J. Ahn, S. Yoo, and K. Choi, Selectively protecting errorcorrecting code for area-efficient and reliable STT-RAM caches, ASP-DAC, pp.285-290, 2013.

Z. Sun, Process variation aware data management for STT-RAM cache design, Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design, ISLPED '12, pp.179-184, 2012.
DOI : 10.1145/2333660.2333706

Y. Zhou, Asymmetric-access aware optimization for STT-RAM caches with process variations, " in GLSVLSI, pp.143-148, 2013.

S. Lee, Hybrid cache architecture replacing SRAM cache with future memory technology, 2012 IEEE International Symposium on Circuits and Systems, pp.2481-2484
DOI : 10.1109/ISCAS.2012.6271803

V. Saripalli, Exploiting Heterogeneity for Energy Efficiency in Chip Multiprocessors, IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol.1, issue.2, pp.109-119, 2011.
DOI : 10.1109/JETCAS.2011.2158343

H. Li, Performance, Power, and Reliability Tradeoffs of STT-RAM Cell Subject to Architecture-Level Requirement, IEEE Transactions on Magnetics, vol.47, issue.10, pp.2356-2359, 2011.
DOI : 10.1109/TMAG.2011.2159262

Y. Chen, On-chip caches built on multilevel spintransfer torque RAM cells and its optimizations, J. Emerg. Technol. Comput. Syst, vol.9, issue.16, pp.1-1622, 2013.

P. Zhou, Energy reduction for STT-RAM using early write termination, Proceedings of the 2009 International Conference on Computer-Aided Design, ICCAD '09, pp.264-268, 2009.
DOI : 10.1145/1687399.1687448

S. P. Park, Future cache design using STT MRAMs for improved energy efficiency, Proceedings of the 49th Annual Design Automation Conference on, DAC '12, pp.492-497, 2012.
DOI : 10.1145/2228360.2228447

A. Jadidi, High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement, IEEE/ACM International Symposium on Low Power Electronics and Design, pp.79-84, 2011.
DOI : 10.1109/ISLPED.2011.5993611

L. V. Cargnini, Embedded memory hierarchy exploration based on magnetic RAM, 2013 IEEE Faible Tension Faible Consommation, pp.1-4
DOI : 10.1109/FTFC.2013.6577780

URL : https://hal.archives-ouvertes.fr/lirmm-01419132

A. Sharifi and M. Kandemir, Automatic Feedback Control of Shared Hybrid Caches in 3D Chip Multiprocessors, 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing, pp.393-400, 2011.
DOI : 10.1109/PDP.2011.83

K. Swaminathan, Design space exploration of workload-specific last-level caches, Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design, ISLPED '12, pp.243-248, 2012.
DOI : 10.1145/2333660.2333718

J. Ahn and K. Choi, LASIC: Loop-Aware Sleepy Instruction Caches Based on STT-RAM Technology, IEEE TVLSI, 2013.

K. Kwon, AWARE (Asymmetric Write Architecture With REdundant Blocks): A High Write Speed STT-MRAM Cache Architecture, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.22, issue.4, 2013.
DOI : 10.1109/TVLSI.2013.2256945

J. Ahn, Write intensity prediction for energy-efficient non-volatile caches, International Symposium on Low Power Electronics and Design (ISLPED), pp.223-228, 2013.
DOI : 10.1109/ISLPED.2013.6629298

Z. Sun, A dual-mode architecture for fast-switching STT-RAM, Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design, ISLPED '12, pp.45-50, 2012.
DOI : 10.1145/2333660.2333673

R. Venkatesan, DWM-TAPESTRI-an energy efficient allspin cache using domain wall shift based writes, DATE, pp.1825-1830, 2013.

N. Goswami, Power-performance co-optimization of throughput core architecture using resistive memory, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), pp.342-353, 2013.
DOI : 10.1109/HPCA.2013.6522331

Y. Li, C1C, ACM Transactions on Architecture and Code Optimization, vol.10, issue.4, pp.1-5222, 2013.
DOI : 10.1145/2541228.2555308

URL : https://hal.archives-ouvertes.fr/in2p3-00118705

J. Zhao and Y. Xie, Optimizing bandwidth and power of graphics memory with hybrid memory technologies and adaptive data migration, Proceedings of the International Conference on Computer-Aided Design, ICCAD '12, pp.81-87, 2012.
DOI : 10.1145/2429384.2429400

N. Strikos, Low-current probabilistic writes for powerefficient STT-RAM caches, ICCD, 2013, pp.511-514

P. Mangalagiri, A low-power phase change memory based hybrid cache architecture, Proceedings of the 18th ACM Great Lakes symposium on VLSI , GLSVLSI '08, pp.395-398, 2008.
DOI : 10.1145/1366110.1366204

S. Guo, Wear-Resistant Hybrid Cache Architecture with Phase Change Memory, 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage, pp.268-272, 2012.
DOI : 10.1109/NAS.2012.37

M. Sharad, Multi-level magnetic RAM using domain wall shift for energy-efficient, high-density caches, International Symposium on Low Power Electronics and Design (ISLPED), pp.64-69, 2013.
DOI : 10.1109/ISLPED.2013.6629268

Z. Sun, Cross-layer racetrack memory design for ultra high density and low power consumption, Proceedings of the 50th Annual Design Automation Conference on, DAC '13, 2013.
DOI : 10.1145/2463209.2488799

J. Li, Exploiting set-level write non-uniformity for energy-efficient NVM-based hybrid cache, 2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia, pp.19-28, 2011.
DOI : 10.1109/ESTIMedia.2011.6088521

H. Noguchi, D-MRAM Cache: Enhancing Energy Efficiency with 3T-1MTJ DRAM / MRAM Hybrid Memory, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, pp.1813-1818, 2013.
DOI : 10.7873/DATE.2013.363

A. Maashri, 3D GPU architecture using cache stacking: Performance, cost, power and thermal analysis, 2009 IEEE International Conference on Computer Design, pp.254-259, 2009.
DOI : 10.1109/ICCD.2009.5413147

P. Satyamoorthy, STT-RAM for Shared Memory in GPUs, 2011.

X. Guo, Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing, ISCA, pp.371-382, 2010.

X. Bi, Unleashing the potential of MLC STT-RAM caches, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp.429-436, 2013.
DOI : 10.1109/ICCAD.2013.6691153

Y. Xie, Future Memory and Interconnect Technologies, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, pp.964-969
DOI : 10.7873/DATE.2013.202

S. Kaxiras, Cache decay: exploiting generational behavior to reduce cache leakage power, ISCA, pp.240-251, 2001.

G. Sun, Moguls: a model to explore the memory hierarchy for bandwidth improvements, ISCA, pp.377-388, 2011.

S. Mittal-received-the and B. Tech, degree in electronics and communications engineering from IIT, Roorkee, India and the Ph.D. degree in computer engineering from Iowa State University, USA. He is currently working as a Post-Doctoral Research Associate at ORNL. His research interests include non-volatile memory, memory system power efficiency