Data-centric computing frontiers: A survey on processing-in-memory, Int'l Symposium on Memory Systems (MEMSYS), pp.295-308, 2016. ,
Near-data processing: Insights from a MICRO-46 workshop, IEEE Micro, vol.34, issue.4, pp.36-42, 2014. ,
Computing's energy problem (and what we can do about it)," in Int'l Solid-State Circuits Conference (ISSCC), pp.10-14, 2014. ,
Exascale computing -a fact or a fiction? (keynote), Int'l Parallel & Distributed Processing Symposium (IPDPS), 2013. ,
GPUs and the future of parallel computing, IEEE Micro, vol.31, issue.5, pp.7-17, 2011. ,
STREAM: Sustainable memory bandwidth in high performance computers, 2016. ,
Scaling analog circuits into deep nanoscale CMOS: Obstacles and ways to overcome them, IEEE Custom Integrated Circuits Conference (CICC), 2015. ,
Enabling scientific computing on memristive accelerators, Int'l Symposium on Computer Architecture (ISCA), pp.367-382, 2018. ,
Design and evaluation of a processing-in-memory architecture for the smart memory cube, Int'l Conference on Architecture of Computing Systems (ARCS), pp.19-31, 2016. ,
A scalable processing-in-memory accelerator for parallel graph processing, Int'l Symposium on Computer Architecture (ISCA), pp.105-117, 2015. ,
HRL: Efficient and flexible reconfigurable logic for near-data processing, Int'l Symposium on High Performance Computer Architecture (HPCA), pp.126-137, 2016. ,
Active memory cube: A processing-in-memory architecture for exascale systems, IBM Journal of Research and Development, vol.59, issue.2/3, 2015. ,
Is dark silicon useful?" in Design Automation Conference (DAC), pp.1131-1136, 2012. ,
Overcoming system memory challenges with persistent memory and NVDIMM-P, JEDEC Server Forum, 2017. ,
JEDEC Solid State Technology Association, committee JC-45, vol.6, p.2261, 2017. ,
, Core Specification, 2018.
, An Introduction to CCIX, 2018.
, ) BLAS (basic linear algebra subprograms), 2017.
An architecture for near-data processing systems, ACM Int'l Conf. on Computing Frontiers (CF), pp.357-360, 2016. ,
LazyPIM: An efficient cache coherence mechanism for processing-in-memory, IEEE Computer Architecture Letters, vol.16, issue.1, pp.46-50, 2017. ,
NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules, Int'l Symposium on High Performance Computer Architecture (HPCA), pp.283-295, 2015. ,
AIM: Accelerating computational genomics through scalable and noninvasive accelerator-interposed memory, Int'l Symposium on Memory Systems (MEMSYS), pp.3-14, 2017. ,
Application-transparent nearmemory processing architecture with memory channel network, IEEE/ACM Int'l Symposium on Microarchitecture (MICRO), pp.803-815, 2018. ,
Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems, IEEE/ACM Int'l Symposium on Microarchitecture (MICRO), 2016. ,
The Mondrian data engine, Int'l Symposium on Computer Architecture (ISCA), pp.639-651, 2017. ,
Highly scalable near memory processing with migrating threads on the Emu system architecture, Workshop on Irregular Applications: Architecture and Algorithms (IA3), pp.2-9, 2016. ,
Beyond the wall: Near-data processing for databases, Int'l Workshop on Data Management on New Hardware (DaMoN), 2015. ,
Near memory data structure rearrangement, Int'l Symposium on Memory Systems (MEMSYS) ,
Data reorganization in memory using 3D-stacked DRAM, Int'l Symposium on Computer Architecture (ISCA), pp.131-143, 2015. ,
Exploring the processing-in-memory design space, Elsevier Journal of Systems Architecture, vol.75, pp.59-67, 2017. ,
A 512GB 1.1v managed DRAM solution with 16GB ODP and media controller, Int'l Solid-State Circuits Conference (ISSCC), pp.384-385, 2019. ,
, JEDEC Solid State Technology Association, pp.82-112, 2014.
, JEDEC Solid State Technology Association, pp.82-102, 2009.
, ARM Architecture Reference Manual -ARMv8, 2018.
, ARM Architecture Reference Manual Supplement, The Scalable Vector Extension (SVE), for ARMv8-A, ARM DDI 0584a.d (ID122117) ed., ARM Ltd, 2017.
World Intellectual Property Organization (WIPO), Int'l Publication Number WO, 2018. ,
Practical near-data processing for in-memory analytics frameworks, Int'l Conference on Parallel Architecture and Compilation (PACT), pp.113-124, 2015. ,
A bare machine application development methodology, FCS Int'l Journal of Computers and Their Applications (IJCA), vol.19, issue.1, pp.10-25, 2012. ,
Kirin 950 takes performance lead, Mobile Chip Report, 2015. ,
Understanding the energy consumption of dynamic random access memories, IEEE/ACM Int'l Symposium on Microarchitecture (MICRO) ,
, Calculating Memory Power for DDR4 SDRAM, 2017.
POWER9: A processor family optimized for cognitive computing with 25Gb/s accelerator links and 16Gb/s PCIe Gen4, Int'l Solid-State Circuits Conference (ISSCC), pp.50-51, 2017. ,
The Xeon processor E5-2600 v3: A 22nm 18-core product family, Int'l Solid-State Circuits Conference (ISSCC), pp.1-3, 2015. ,
Analyzing the silicon: Die size estimates and arrangements: The Intel Skylake-X review, 2017. ,
A 16nm FinFET CMOS technology for mobile SoC and computing applications, Int'l Electron Devices Meeting (IEDM), 2013. ,
An integrated vector-scalar design on an in-order ARM core, ACM Transactions on Architecture and Code Optimization (TACO), vol.14, issue.2, 2017. ,
High-performance and low-power consumption vector processor for LTE baseband LSI, Fujitsu Scientific and Technical Journal (FSTJ), vol.50, issue.1, pp.132-137, 2014. ,
Hwacha preliminary evaluation results, v3.8.1, 2015. ,
In-datacenter performance analysis of a tensor processing unit, Int'l Symposium on Computer Architecture (ISCA), pp.1-12, 2017. ,
Cooling solution for computing and storage server, IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm), pp.840-849, 2017. ,
, DDR4 Registering Clock Driver (DDR4RCD01), JEDEC JESD82-31, 2016.
, DDR4 Data Buffer Definition (DDR4DB01), JEDEC JESD82-32, 2016.
, Intel C112/C114 Scalable Memory Buffer (SMB) data sheet, pp.332444-332445, 2015.
, POWER8 Memory Buffer Datasheet for DDR3 Applications, 2016.
Cost drivers in PCB production, NCAB Group Seminars, 2015. ,
, Samsung Galaxy S8, vol.10, 2017.
Unleashing fury: A new paradigm for 3-D design and test, IEEE Design & Test, vol.34, issue.1, pp.8-15, 2017. ,
The cost of HBM2 vs. GDDR5 & why AMD had to use it, 2017. ,
Cortex-A76 rev amps core design, Microprocessor Report, 2018. ,
Lmbench -system benchmarks, 2007. ,
The state of data science & machine learning, 2017. ,
LIBLINEAR: A library for large linear classification, Journal of Machine Learning Research, vol.9, pp.1871-1874, 2008. ,
LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology, vol.2, issue.3, pp.1-27, 2011. ,
SSD: Single shot multibox detector, European Conference on Computer Vision (ECCV), pp.21-37, 2016. ,
XGBoost: A scalable tree boosting system, 22nd ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, pp.785-794, 2016. ,
Improving the energy efficiency of big cores, Int'l Symposium on Computer Architecture (ISCA), pp.493-504, 2014. ,
Near data processing: Impact and optimization of 3D memory system architecture on the uncore, Int'l Symposium on Memory Systems (MEMSYS), pp.11-21, 2015. ,
The gem5 simulator, ACM SIGARCH Computer Architecture News, vol.39, issue.2, pp.1-7, 2011. ,
Quantifying the performance impact of memory latency and bandwidth for big data workloads, Int'l Symposium on Workload Characterization (IISWC), pp.213-224, 2015. ,
Overview -Dreams -AWB/Leap projects, 2013. ,
A view of the parallel computing landscape, Communications of the ACM, vol.52, issue.10, pp.56-67, 2009. ,
It's time to think about an operating system for near data processing architectures, 16th Workshop on Hot Topics in Operating Systems (HotOS), pp.56-61, 2017. ,