A neural network model for cache and memory prediction of neural networks

Document Type

Conference Proceeding

Publication Date



Department of Computer Science


Neural networks have been widely applied to various research and production fields. However, most recent research is focused on the establishment and selection of a specific neural network model. Less attention is paid to their system overhead despite of their massive computing and storage resource demand. This research focuses on a relatively new research direction that models the system-level memory and cache demand of neural networks. We utilize a neural network to learn and predict hit ratio curve and memory footprint of neural networks with their hyper-parameters as input. The prediction result is used to drive cache partitioning and memory partitioning to optimize co-execution of multiple neural networks. To demonstrate effectiveness of our approach, we model four common networks, BP neural network, convolutional neural network, recurrent neural network, and autoencoder. We investigate the influence of hyper-parameters of each model on the last level cache and memory demand. We resort to the BP algorithm as the learning tool to predict last level cache hit ratio curve and memory usage. Our experimental results show that cache and memory allocation schemes guided by our prediction optimize for a wide range of performance targets.

Publisher's Statement

Copyright © 2018, IEEE. Publisher’s version of record: https://doi.org/10.1109/BDCloud.2018.00142

Publication Title

2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom)