يعرض 1 - 4 نتائج من 4 نتيجة بحث عن '"Lee, Sungjae"', وقت الاستعلام: 0.64s تنقيح النتائج
  1. 1
    دورية أكاديمية

    الوصف: The emergence of large-scale AI models, like GPT-4, has significantly impacted academia and industry, driving the demand for high-performance computing (HPC) to accelerate workloads. To address this, we present HPCClusterScape, a visualization system that enhances the efficiency and transparency of shared HPC clusters for large-scale AI models. HPCClusterScape provides a comprehensive overview of system-level (e.g., partitions, hosts, and workload status) and application-level (e.g., identification of experiments and researchers) information, allowing HPC operators and machine learning researchers to monitor resource utilization and identify issues through customizable violation rules. The system includes diagnostic tools to investigate workload imbalances and synchronization bottlenecks in large-scale distributed deep learning experiments. Deployed in industrial-scale HPC clusters, HPCClusterScape incorporates user feedback and meets specific requirements. This paper outlines the challenges and prerequisites for efficient HPC operation, introduces the interactive visualization system, and highlights its contributions in addressing pain points and optimizing resource utilization in shared HPC clusters. ; Comment: IEEE VDS 2023 ACM 2012 CCS - Human-centered computing, Visualization, Visualization design and evaluation methods

  2. 2
    دورية أكاديمية

    الوصف: Training a Convolutional Neural Network (CNN) model typically requires significant computing power, and cloud computing resources are widely used as a training environment. However, it is difficult for CNN algorithm developers to keep up with system updates and apply them to their training environment due to quickly evolving cloud services. Thus, it is important for cloud computing service vendors to design and deliver an optimal training environment for various training tasks to lessen system operation management overhead of algorithm developers. To achieve the goal, we propose PROFET, which can predict the training latency of arbitrary CNN implementation on various Graphical Processing Unit (GPU) devices to develop a cost-effective and time-efficient training cloud environment. Different from the previous training latency prediction work, PROFET does not rely on the implementation details of the CNN architecture, and it is suitable for use in a public cloud environment. Thorough evaluations reveal the superior prediction accuracy of PROFET compared to the state-of-the-art related work, and the demonstration service presents the practicality of the proposed system. ; Comment: 9 pages

  3. 3
    دورية أكاديمية

    الوصف: The recent advancements in self-supervised learning, combined with the Transformer architecture, have enabled natural language processing (NLP) to achieve remarkably low perplexity. However, powerful NLP models necessitate increasing model size, leading to substantial computational and memory requirements. In this paper, we introduce an efficient inference framework tailored for large-scale generative language models. To reduce the model size, we employ a weight-only quantization strategy while preserving full precision for activations. As a result, we attain sub-4-bit quantization for each weight through non-uniform or uniform quantization techniques. Our proposed kernel, called LUT-GEMM, then accelerates quantized matrix multiplications, offering a flexible balance between compression ratio and accuracy. Unlike earlier matrix multiplication kernels that accommodated weight-only quantization, LUT-GEMM efficiently eliminates the resource-demanding dequantization process for both uniform and non-uniform quantization methods. By reducing the latency of individual GPUs and the overall inference process for large-scale language models, LUT-GEMM provides significant performance improvements in inference. The impact of LUT-GEMM is facilitated by implementing high compression ratios through low-bit quantization and efficient LUT-based operations, which decreases the number of required GPUs. For the OPT-175B model with 3-bit quantization, we show that LUT-GEMM accelerates the latency for generating each token by 2.1x compared to OPTQ, which requires costly dequantization. Consequently, LUT-GEMM enables inference of the OPT-175B model on a single GPU without noticeable degradation in accuracy or performance, while the non-quantized OPT-175B model requires a minimum of 8 GPUs. ; Comment: Extension of "nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models"

  4. 4
    دورية أكاديمية

    الوصف: Public cloud service vendors provide a surplus of computing resources at a cheaper price as a spot instance. Despite the cheaper price, the spot instance can be forced to be shutdown at any moment whenever the surplus resources are in shortage. To enhance spot instance usage, vendors provide diverse spot instance datasets. Amon them, the spot price information has been most widely used so far. However, the tendency toward barely changing spot price weakens the applicability of the spot price dataset. Besides the price dataset, the recently introduced spot instance availability and interruption ratio datasets can help users better utilize spot instances, but they are rarely used in reality. With a thorough analysis, we could uncover major hurdles when using the new datasets concerning the lack of historical information, query constraints, and limited query interfaces. To overcome them, we develop SpotLake, a spot instance data archive web service that provides historical information of various spot instance datasets. Novel heuristics to collect various datasets and a data serving architecture are presented. Through real-world spot instance availability experiments, we present the applicability of the proposed system. SpotLake is publicly available as a web service to speed up cloud system research to improve spot instance usage and availability while reducing cost. ; Comment: 14 pages, 11 figures. This paper is accepted to IISWC 2022