Please use this identifier to cite or link to this item:
Title: Instruction cache optimizations for embedded systems
Authors: LIANG YUN
Keywords: embedded systems, instruction cache, cache modeling, design space exploration, cache locking, procedure placement
Issue Date: 1-Jul-2010
Citation: LIANG YUN (2010-07-01). Instruction cache optimizations for embedded systems. ScholarBank@NUS Repository.
Abstract: The application specific nature of embedded systems creates the opportunity to design a customized system-on-chip (SoC) platform for a particular application or an application domain. Cache memory subsystem bears significant importance as it bridges the performance gap between the fast processor and the slow main memory. In particular, instruction cache, which is employed by most embedded systems, is one of the foremost power consuming and performance determining microarchitectural features as instructions are fetched almost every clock cycle. Thus, careful tuning and optimization of instruction cache memory can lead to significant performance gain and energy saving. The objective of this thesis is to exploit application characteristics for instruction cache optimizations. The application characteristics we use include branch probability, loop bound, temporal reuse profile and intermediate blocks profile. These application characteristics are identified through profiling and exploited by our subsequent analytical approach. We consider both hardware and software solutions. The first part of the thesis focuses on hardware optimization --- identifying best cache configurations to match the specific temporal and spatial localities of a given application through analytical approach. We first develop a static program analysis to accurately model the cache behavior of a specific cache configuration. Then, we extend our analysis by taking the structural relations among the related cache configurations into account. Our analysis can estimate the cache hit rates for a set of cache configurations with varying number of sets and associativity in one pass as long as the cache line size remains constant. The input to our analysis is simply the branch probability and loop bounds, which is significantly more compact compared to the memory address traces required by trace-driven simulators and other trace based analytical works. The second part of the thesis focuses on software optimizations. We propose techniques to tailor the program to the underlying instruction cache parameters. First, we develop a framework to improve the average-case program performance through static instruction cache locking. We introduce temporal reuse profile to accurately and efficiently model the cost and benefit of locking memory blocks in the cache. We propose two cache locking algorithms : an optimal algorithm based on branch-and-bound search and a heuristic approach. Second, we propose an efficient algorithm to place procedures in memory for a specific cache configuration such that cache conflicts are minimized. As a result, both performance and energy consumption are improved. Our efficient algorithm is based on intermediate blocks profile that accurately but compactly models cost-benefit of procedure placement for both direct mapped and set associative caches. Finally, we propose an integrated instruction cache optimization framework by combining all the techniques together.
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
LiangY.pdf6.82 MBAdobe PDF



Page view(s)

checked on Apr 19, 2019


checked on Apr 19, 2019

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.