High-Throughput Training of Deep CNNs on ReRAM-based Heterogeneous Architectures via Optimized Normalization Layers

Biresh Kumar Joardar, Aryan Deshwal, Janardhan Rao Doppa, Partha Pratim Pande, Krishnendu Chakrabarty

May 2021

PDF

Abstract

Resistive random-access memory (ReRAM)-based architectures can be used to accelerate Convolutional Neural Network (CNN) training. However, existing architectures either do not support normalization at all or they support only a limited version of it. Moreover, it is common practice for CNNs to add normalization layers after every convolution layer. In this work, we show that while normalization layers are necessary to train deep CNNs, only a few such layers are sufficient for effective training. A large number of normalization layers do not improve prediction accuracy; it necessitates additional hardware and gives rise to performance bottlenecks. To address this problem, we propose DeepTrain, a heterogeneous architecture enabled by a Bayesian optimization (BO) methodology; together, they provide adequate hardware and software support for normalization operations. The proposed BO methodology determines the minimum number of normalization operations necessary for a given CNN. Experimental evaluation indicates that the BO-enabled DeepTrain architecture achieves up to 15X speed-up compared to a conventional GPU for training CNNs with no accuracy loss while utilizing only a few normalization layers.

Type

Conference paper

Publication

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)

High-Throughput Training of Deep CNNs on ReRAM-based Heterogeneous Architectures via Optimized Normalization Layers

Abstract

Related