دورية أكاديمية

Analysis and Optimization of Yee Bench using hardware performance counters

التفاصيل البيبلوغرافية
العنوان: Analysis and Optimization of Yee Bench using hardware performance counters
المؤلفون: Ulf Andersson A, Philip Mucci B
المساهمون: The Pennsylvania State University CiteSeerX Archives
المصدر: http://icl.cs.utk.edu/news_pub/submissions/parco05.pdfTest.
المجموعة: CiteSeerX
الوصف: In this paper, we report on our analysis and optimization of a serial Fortran 90 benchmark called Yee bench. This benchmark has been run on a variety of architectures and its performance is reasonably well understood. However, on AMD Opteron based machines, we found unexpected dips in the delivered MFLOPS of the code for a seemingly random set of problem sizes. Through the use of the Opteron’s on-chip hardware performance counters andPapiEx, aPAPI based tool, we discovered that these drops were directly related to high L1 cache miss rates for these problem sizes. The high miss rates could be attributed to the fact that in the two core regions of the code we have references to three dynamically allocated arrays which compete for the same set in the Opteron’s 2-way set associative cache. We validated this conclusion by accurately predicting those problem sizes that exhibit this problem. We were able to alleviate these performance anomalies using variable intra-array padding to effectively accomplish inter-array padding. We conclude with some comments on the general applicability of this method as well how one might improving the implementation of the Fortran 90ALLOCATE intrinsic to handle this case. 1.
نوع الوثيقة: text
وصف الملف: application/pdf
اللغة: English
العلاقة: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.81.7232Test; http://icl.cs.utk.edu/news_pub/submissions/parco05.pdfTest
الإتاحة: http://icl.cs.utk.edu/news_pub/submissions/parco05.pdfTest
حقوق: Metadata may be used without restrictions as long as the oai identifier remains attached to it.
رقم الانضمام: edsbas.8D4C900A
قاعدة البيانات: BASE