Please use this identifier to cite or link to this item:
DC FieldValue
dc.titleA practical approach for performance analysis of shared-memory programs
dc.contributor.authorTudor, B.M.
dc.contributor.authorTeo, Y.M.
dc.identifier.citationTudor, B.M.,Teo, Y.M. (2011). A practical approach for performance analysis of shared-memory programs. Proceedings - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011 : 652-663. ScholarBank@NUS Repository. <a href="" target="_blank"></a>
dc.description.abstractParallel programming has transcended from HPC into mainstream, enabled by a growing number of programming models, languages and methodologies, as well as the availability of multicore systems. However, performance analysis of parallel programs is still difficult, especially for large and complex programs, or applications developed using different programming models. This paper proposes a simple analytical model for studying the speedup of shared-memory programs on multicore systems. The proposed model derives the speedup and speedup loss from data dependency and memory overhead for various configurations of threads, cores and memory access policies in UMA and NUMA systems. The model is practical because it uses only generally available and non-intrusive inputs derived from the trace of the operating system run-queue and hardware events counters. Using six OpenMP HPC dwarfs from the NPB benchmark, our model differs from measurement results on average by 9% for UMA and 11% on NUMA. Our analysis shows that speedup loss is dominated by memory contention, especially for larger problem sizes. For the worst performing structured grid dwarf on UMA, memory contention accounts for up to 99% of the speedup loss. Based on this insight, we apply our model to determine the optimal number of cores that alleviates memory contention, maximizing speedup and reducing execution time. © 2011 IEEE.
dc.subjectanalytical model
dc.subjectdata dependency
dc.subjectmemory contention
dc.subjectspeedup loss
dc.subjectspeedup performance
dc.typeConference Paper
dc.contributor.departmentCOMPUTER SCIENCE
dc.description.sourcetitleProceedings - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011
Appears in Collections:Staff Publications

Show simple item record
Files in This Item:
There are no files associated with this item.


checked on Jan 21, 2022

Page view(s)

checked on Jan 20, 2022

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.