Please use this identifier to cite or link to this item: https://doi.org/10.1145/1880037.1880038
Title: PiPA: Pipelined profiling and analysis on multicore systems
Authors: Zhao, Q.
Cutcutache, I.
Wong, W.-F. 
Keywords: Analysis
Dynamic instrumentation
Multicore systems
Parallel cache simulation
Pipelining
Profiling
Issue Date: 2010
Source: Zhao, Q.,Cutcutache, I.,Wong, W.-F. (2010). PiPA: Pipelined profiling and analysis on multicore systems. Transactions on Architecture and Code Optimization 7 (3). ScholarBank@NUS Repository. https://doi.org/10.1145/1880037.1880038
Abstract: Profiling and online analysis are important tasks in program understanding and feedback-directed optimization. However, fine-grained profiling and online analysis tend to seriously slow down the application. To cope with the slowdown, one may have to terminate the process early or resort to sampling. The former tends to distort the result because of warm-up effects. The latter runs the risk of missing important effects because sampling was turned off during the time that these effects appeared. A promising approach is to make use of the parallel processing capabilities of the now ubiquitous multicore processors to speed up the profiling and analysis process. In this article, we present Pipelined Profiling and Analysis (PiPA), which is a novel technique for parallelizing dynamic program profiling and analysis by taking advantage of multicore systems. In essence, the application under examination is profiled using a dynamic instrumentation tool. Optimized instrumentation code outputs the profile information in a succinct format, that we call the REP format, to buffers. This lightweight trace compression minimizes the processing overhead impinged on the application whenever a buffer is full. Another thread recovers the required information from the REP buffer. The recovered full profile is then divided up and passed to multiple threads for further analysis. To achieve the best performance, the entire system has to be well-balanced. We have implemented prototypes of PiPA using two dynamic instrumentation systems, namely DynamoRIO and Pin, thereby demonstrating its portability. Our experiments show that PiPA is able to speed up the overall profiling and analysis tasks significantly. Compared to the more than 100× slowdown of Cachegrind and the 32× slowdown of Pin dcache, we achieved a mere 10.2× slowdown on an 8-core system. In this paper, we will also describe the insights we gained in obtaining the balance needed for PiPA to perform optimally. © 2010 ACM.
Source Title: Transactions on Architecture and Code Optimization
URI: http://scholarbank.nus.edu.sg/handle/10635/39881
ISSN: 15443566
DOI: 10.1145/1880037.1880038
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

SCOPUSTM   
Citations

8
checked on Dec 5, 2017

WEB OF SCIENCETM
Citations

2
checked on Nov 2, 2017

Page view(s)

84
checked on Dec 9, 2017

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.