Throughput Prediction of Asynchronous SGD in TensorFlow

التفاصيل البيبلوغرافية
العنوان: Throughput Prediction of Asynchronous SGD in TensorFlow
المؤلفون: Leana Golubchik, Marco Paolieri, Wumo Yan, Zhuojin Li
المصدر: ICPE
بيانات النشر: ACM, 2020.
سنة النشر: 2020
مصطلحات موضوعية: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Performance, Artificial neural network, Computer science, Distributed computing, 020206 networking & telecommunications, 020207 software engineering, 02 engineering and technology, Machine Learning (cs.LG), Scheduling (computing), Performance (cs.PF), Stochastic gradient descent, Computer Science - Distributed, Parallel, and Cluster Computing, Asynchronous communication, Node (computer science), 0202 electrical engineering, electronic engineering, information engineering, Overhead (computing), Distributed, Parallel, and Cluster Computing (cs.DC), Throughput (business), Data transmission
الوصف: Modern machine learning frameworks can train neural networks using multiple nodes in parallel, each computing parameter updates with stochastic gradient descent (SGD) and sharing them asynchronously through a central parameter server. Due to communication overhead and bottlenecks, the total throughput of SGD updates in a cluster scales sublinearly, saturating as the number of nodes increases. In this paper, we present a solution to predicting training throughput from profiling traces collected from a single-node configuration. Our approach is able to model the interaction of multiple nodes and the scheduling of concurrent transmissions between the parameter server and each node. By accounting for the dependencies between received parts and pending computations, we predict overlaps between computation and communication and generate synthetic execution traces for configurations with multiple nodes. We validate our approach on TensorFlow training jobs for popular image classification neural networks, on AWS and on our in-house cluster, using nodes equipped with GPUs or only with CPUs. We also investigate the effects of data transmission policies used in TensorFlow and the accuracy of our approach when combined with optimizations of the transmission schedule.
الوصول الحر: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::bc66214bcae31ef9a8950efd124435a3Test
https://doi.org/10.1145/3358960.3379141Test
حقوق: OPEN
رقم الانضمام: edsair.doi.dedup.....bc66214bcae31ef9a8950efd124435a3
قاعدة البيانات: OpenAIRE