High-Throughput Training of Deep CNNs on ReRAM-based Heterogeneous Architectures via Optimized Normalization Layers


Resistive random-access memory (ReRAM)-based architectures can be used to accelerate Convolutional Neural Network (CNN) training. However, existing architectures either do not support normalization at all or they support only a limited version of it. Moreover, it is common practice for CNNs to add normalization layers after every convolution layer. In this work, we show that while normalization layers are necessary to train deep CNNs, only a few such layers are sufficient for effective training. A large number of normalization layers do not improve prediction accuracy; it necessitates additional hardware and gives rise to performance bottlenecks. To address this problem, we propose DeepTrain, a heterogeneous architecture enabled by a Bayesian optimization (BO) methodology; together, they provide adequate hardware and software support for normalization operations. The proposed BO methodology determines the minimum number of normalization operations necessary for a given CNN. Experimental evaluation indicates that the BO-enabled DeepTrain architecture achieves up to 15X speed-up compared to a conventional GPU for training CNNs with no accuracy loss while utilizing only a few normalization layers.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)