Date of Award

2023

Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy in Computer Science (PhD)

Administrative Home Department

Department of Computer Science

Advisor 1

Jianhui Yue

Committee Member 1

Soner Onder

Committee Member 2

Zhenlin Wang

Committee Member 3

Qinghui Chen

Abstract

The emergence of new non-volatile memory (NVM) technology and deep neural network (DNN) inferences bring challenges related to off-chip memory access. Ensuring crash consistency leads to additional memory operations and exposes memory update operations on the critical execution path. DNN inference execution on some accelerators suffers from intensive off-chip memory access. The focus of this dissertation is to tackle the issues related to off-chip memory in these high performance computing systems.

The logging operations, required by the crash consistency, impose a significant performance overhead due to the extra memory access. To mitigate the persistence time of log requests, we introduce a load-aware log entry allocation scheme that allocates log requests to the address whose bank has the lightest workload. To address the problem of intra-record ordering, we propose to buffer log metadata in a non-volatile ADR buffer until the corresponding log can be removed. Moreover, the recently proposed LAD introduced unnecessary logging operations on multicore CPU. To reduce these unnecessary operations, we have devised two-stage transaction execution and virtual ADR buffers.

To tackle the challenge of low response time and high computational intensity associated with DNN inferences, these computations are often executed on customized accelerators. However, data loading from off-chip memory typically takes longer than computing, thereby reducing performance in some scenarios, especially on edge devices. To address this issue, we propose an optimization of the widely adopted Weight Stationary dataflow to remove redundant accesses to IFMAP in off-chip memory by reordering the loops in the standard convolution operation. Furthermore, to enhance the off-chip memory throughput, we introduce the load-aware placement for data tiles on off-chip memory that reduces intra/inter contentions caused by concurrent accesses from multiple tiles and improves the off-chip memory device parallelism during access.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Lu, Zhiyuan, "MEMORY OPTIMIZATIONS FOR HIGH-THROUGHPUT COMPUTER SYSTEMS", Open Access Dissertation, Michigan Technological University, 2023.

https://doi.org/10.37099/mtu.dc.etdr/1660

00 Publishing Agreement 9-2021.pdf (77 kB)

Download

Included in

Computer and Systems Architecture Commons

COinS

Dissertations, Master's Theses and Master's Reports

MEMORY OPTIMIZATIONS FOR HIGH-THROUGHPUT COMPUTER SYSTEMS

Date of Award

Document Type

Degree Name

Administrative Home Department

Advisor 1

Committee Member 1

Committee Member 2

Committee Member 3

Abstract

Creative Commons License

Recommended Citation

Included in

LINKS

Browse

Search

Author Corner

Dissertations, Master's Theses and Master's Reports

MEMORY OPTIMIZATIONS FOR HIGH-THROUGHPUT COMPUTER SYSTEMS

Author

Date of Award

Document Type

Degree Name

Administrative Home Department

Advisor 1

Committee Member 1

Committee Member 2

Committee Member 3

Abstract

Creative Commons License

Recommended Citation

Included in

Share

LINKS

Browse

Search

Author Corner