دورية أكاديمية

FlashMAC: A Time-Frequency Hybrid MAC Architecture With Variable Latency-Aware Scheduling for TinyML Systems.

التفاصيل البيبلوغرافية
العنوان: FlashMAC: A Time-Frequency Hybrid MAC Architecture With Variable Latency-Aware Scheduling for TinyML Systems.
المؤلفون: Gweon, Surin, Kang, Sanghoon, Kim, Kwantae, Yoo, Hoi-Jun
المصدر: IEEE Journal of Solid-State Circuits; Oct2022, Vol. 57 Issue 10, p2944-2956, 13p
مصطلحات موضوعية: ARTIFICIAL neural networks, ENERGY consumption, PULSE width modulation
مستخلص: With the widespread of deep neural networks (DNNs) in diverse applications, tiny platforms such as Internet-of-Things devices are starting to adopt DNNs. Due to their extreme energy and form factor constraints, conventional digital-only implementations of multiply-and-accumulate (MAC) acceleration faced fundamental limitations. To that end, the investigation into mixed-signal computing architectures is growing rapidly. Motivated by the flash ADC, this article proposes FlashMAC architecture that can natively support multibit multiplication. In addition, through fusing time- and frequency-domain computing methods without power-hungry oscillators, it enables low latency accumulation with low power consumption. As a result, the proposed time-frequency hybrid architecture achieves high energy efficiency with the support for complex DNN models requiring higher precision. To enhance the robustness of PVT variation of the mixed-signal architecture, a frequency calibration loop is integrated. In addition, motivated by the data-dependent performance of the FlashMAC architecture, variable latency-aware scheduling is proposed. The FlashMAC does not skip MAC operations as zero-skipping architectures do, but the latency of the operation can be lower when operands are smaller in magnitude. Tackling the issue through software and hardware co-optimization, loose synchronization architecture and magnitude-aware weight reordering increase the DNN benchmark performance by achieving higher utilization of the parallel FlashMAC array. The proposed features are integrated into a test chip which is fabricated in 65-nm logic CMOS technology. The silicon chip achieves 56.52 TOPS/W peak energy efficiency and a peak operating frequency of 90 MHz. Tested with the VGG16 benchmark trained on the Imagenet dataset, it achieved 17.04-ms latency while showing 11.15 TOPS/W energy efficiency. As a result, compared to the previous state-of-the-art, the proposed FlashMAC achieved 3.15 $\times $ higher normalized energy efficiency. [ABSTRACT FROM AUTHOR]
Copyright of IEEE Journal of Solid-State Circuits is the property of IEEE and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
قاعدة البيانات: Complementary Index
الوصف
تدمد:00189200
DOI:10.1109/JSSC.2022.3182699