Compiler driver memory system optimization using speculative execution | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/14193

DC Field	Value
dc.title	Compiler driver memory system optimization using speculative execution
dc.contributor.author	HARIHARAN SANDANAGOBALANE
dc.date.accessioned	2010-04-08T10:40:45Z
dc.date.available	2010-04-08T10:40:45Z
dc.date.issued	2004-08-30
dc.identifier.citation	HARIHARAN SANDANAGOBALANE (2004-08-30). Compiler driver memory system optimization using speculative execution. ScholarBank@NUS Repository.
dc.identifier.uri	http://scholarbank.nus.edu.sg/handle/10635/14193
dc.description.abstract	Wide-issue microprocessors are capable of remarkable execution rates, but they generally achieve only a fraction of their peak instruction throughput on real programs. This discrepancy is due to performance degrading events, largely branch mispredictions and cache misses. In this work we have addressed the performance degradation due to the latter through the use of Program Embedded Precomputation using Speculative Execution (PEPSE). Towards this, we introduce the Load Dependence Graph (LDG), which is a sub-graph of the traditional Program Dependence Graph(PDG) that computes the address of a load instruction.In the context of data prefetching, we illustrate how PEPSE can accurately predict and effectively prefetch future memory references with negligible overhead for both regular array-based applications as well as irregular pointer-based applications. We use profiling to identify delinquent loads. LDGs are created only for those loads. Subsequently, speculative versions of the LDG operations are statically scheduled along with a prefetch instruction for the computed address, such that these instructions execute and prefetch the value before the actual load is encountered resulting in either an elimination or reduction of the processor stall cycles due to the load instruction. Our prototype implementation of the optimizations within the Open Research Compiler (ORC) delivered encouraging results. For a 900 MHz Itanium 2 server, we could achieve speedups ranging from 1.05 to 2.14 for several benchmarks from SPEC and OLDEN suites.
dc.language.iso	en
dc.subject	Microprocessors, Cache misses, Program Dependence Graph, Prefetching, Scheduling, Optimizations
dc.type	Thesis
dc.contributor.department	COMPUTER SCIENCE
dc.contributor.supervisor	WONG WENG FAI
dc.description.degree	Master's
dc.description.degreeconferred	MASTER OF SCIENCE
dc.identifier.isiut	NOT_IN_WOS
Appears in Collections:	Master's Theses (Open)

Show simple item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
thesis.pdf		307.87 kB	Adobe PDF	OPEN	None	View/Download

Google Scholar^TM

Check

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.