ParaML: A Polyvalent Multicore Accelerator for Machine Learning.

التفاصيل البيبلوغرافية
العنوان:	ParaML: A Polyvalent Multicore Accelerator for Machine Learning.
المؤلفون:	Zhou, Shengyuan¹ zhousy@ict.ac.cn, Guo, Qi¹ guoqi@ict.ac.cn, Du, Zidong¹ duzidong@cambricon.com, Liu, Daofu¹ liudaofu@cambricon.com, Chen, Tianshi¹ chentianshi@ict.ac.cn, Li, Ling² liling@ict.ac.cn, Liu, Shaoli¹ liushaoli@cambricon.com, Zhou, Jinhong¹ zhoujinhong@cambricon.com, Temam, Olivier³ olivier.temam@inria.fr, Feng, Xiaobing⁴ fxb@ict.ac.cn, Zhou, Xuehai⁵ xhzhou@ustc.edu.cn, Chen, Yunji¹ cyj@ict.ac.cn
المصدر:	IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems. Sep2020, Vol. 39 Issue 9, p1764-1777. 14p.
مصطلحات موضوعية:	*COMPUTER architecture, MACHINE learning, MULTICORE processors, SUPPORT vector machines, K-nearest neighbor classification, PRINCIPAL components analysis, VECTOR quantization
مستخلص:	In recent years, machine learning (ML) techniques are proven to be powerful tools in various emerging applications. Traditionally, ML techniques are processed on general-purpose CPUs and GPUs, but their energy efficiencies are limited due to their excessive support for flexibility. As an efficient alternative to CPUs/GPUs, hardware accelerators are still limited as they often accommodate only a single ML technique (family). However, different problems may require different ML techniques, which implies that such accelerators may achieve poor learning accuracy or even be ineffective. In this paper, we present a polyvalent accelerator architecture integrated with multiple processing cores, called ParaML, which accommodates ten representative ML techniques, including $k$ -means, $k$ -nearest neighbors ($k$ -NN), naive Bayes (NB), support vector machine (SVM), linear regression (LR), classification tree (CT), deep neural network (DNN), learning vector quantization (LVQ), parzen window (PW), and principal component analysis (PCA). Benefited from our thorough analysis on computational primitives and locality properties of different ML techniques, the single-core ParaML can perform up to 1056 GOP/s (e.g., additions and multiplications) in an area of 3.51 mm2 and consumes 596 mW only, estimated by ICC and PrimeTime PX with post-synthesis netlist, respectively. Compared with the NVIDIA K20M GPU (28-nm process), the single-core ParaML (65-nm process) is $1.21\times $ faster, and can reduce the energy by $137.93\times $. We also compare the single-core ParaML with other accelerators. Compared with PRINS, single-core ParaML achieves $72.09\times $ and $2.57\times $ energy benefit for $k$ -NN and $k$ -means, respectively, and speeds up each query in $k$ -NN by $44.76\times $. Compared with EIE, the single-core ParaML achieves $5.02\times $ speedup and $4.97\times $ energy benefit with $11.62\times $ less area when evaluating with dense DNN. Compared with TPU, the single-core ParaML achieves $2.45\times $ better power efficiency (5647 Gop/W versus 2300 Gop/W) with $321.36\times $ less area. Compared to the single-core version, the 8-core ParaML will further improve the speedup up to $3.98\times $ with an area of 13.44 mm2 and a power of 2036 mW. [ABSTRACT FROM AUTHOR]
	Copyright of IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems is the property of IEEE and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
قاعدة البيانات:	Business Source Index

الوصف
تدمد:	02780070
DOI:	10.1109/TCAD.2019.2927523