Jeff (Jun) Zhang, Zahra Ghodsi, Kartheek Rangineni and Siddharth Garg
Due to the success of deep neural networks (DNN) in achieving and surpassing state-of-the-art results for a range of machine learning applications, there is growing interest in the design of high-performance hardware accelerators for DNN execution. Further, as DNN hardware accelerators are increasingly being deployed in datacenters, accelerator power and energy efficiency have become key design metrics. In this paper, we seek to enhance the energy efficiency of high-performance systolic array based DNN accelerators, like the recently released Google TPU, using voltage underscaling based timing speculation, a powerful energy reduction technique that enables digital logic to execute below its nominal supply voltage.