دورية أكاديمية

Pipeline Parallelism With Elastic Averaging

التفاصيل البيبلوغرافية
العنوان: Pipeline Parallelism With Elastic Averaging
المؤلفون: Bongwon Jang, In-Chul Yoo, Dongsuk Yook
المصدر: IEEE Access, Vol 12, Pp 5477-5489 (2024)
بيانات النشر: IEEE, 2024.
سنة النشر: 2024
المجموعة: LCC:Electrical engineering. Electronics. Nuclear engineering
مصطلحات موضوعية: Deep learning, stochastic gradient descent (SGD), parallel processing, pipeline processing, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
الوصف: To accelerate the training speed of massive DNN models on large-scale datasets, distributed training techniques, including data parallelism and model parallelism, have been extensively studied. In particular, pipeline parallelism, which is derived from model parallelism, has been attracting attention. It splits the model parameters across multiple computing nodes and executes multiple mini-batches simultaneously. However, naive pipeline parallelism suffers from the issues of weight inconsistency and delayed gradients, as the model parameters used in the forward and backward passes do not match, causing unstable training and low performance. In this study, we propose a novel pipeline parallelism technique called EA-Pipe to address the weight inconsistency and delayed gradient problems. EA-Pipe applies an elastic averaging method, which has been studied in the context of data parallelism, to pipeline parallelism. The proposed method maintains multiple model replicas to solve the weight inconsistency problem, and synchronizes the model replicas using an elasticity-based moving average method to mitigate the delayed gradient problem. To verify the efficacy of the proposed method, we conducted three image classification experiments on the CIFAR-10/100 and ImageNet datasets. The experimental results show that EA-Pipe not only accelerates training speed but also demonstrates more stable learning property compared to existing pipeline parallelism techniques. Especially, in the experiments using the CIFAR-100 and ImageNet datasets, EA-Pipe recorded error rates that were 2.58% and 2.19% lower, respectively, than the baseline pipeline parallelization method.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2169-3536
العلاقة: https://ieeexplore.ieee.org/document/10381706Test/; https://doaj.org/toc/2169-3536Test
DOI: 10.1109/ACCESS.2024.3350193
الوصول الحر: https://doaj.org/article/b8cd67204d4b4e04a982bd6eb9102078Test
رقم الانضمام: edsdoj.b8cd67204d4b4e04a982bd6eb9102078
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:21693536
DOI:10.1109/ACCESS.2024.3350193