Vivy Suhendra
Email Address
dcsvivy@nus.edu.sg
8 results
Publication Search Results
Now showing 1 - 8 of 8
Publication WCET centric data allocation to scratchpad memory(2005) Suhendra, V.; Mitra, T.; Roychoudhury, A.; Chen, T.; COMPUTER SCIENCEScratchpad memory is a popular choice for on-chip storage in real-time embedded systems. The allocation of code/data to scratchpad memory is performed at compile time leading to predictable memory access latencies. Current scratchpad memory allocation techniques improve the average-case execution time of tasks. For hard real-time systems, on the other hand, worst case execution time (WCET) is a key metric. In this paper, we propose scratchpad allocation techniques for data memory that aim to minimize a task's WCET. We first develop an integer linear programming (ILP) based solution which constructs the optimal allocation assuming that all program paths are feasible. Next, we employ branch-and-bound search to more accurately construct the optimal allocation by exploiting infeasible path information. However, the branch-and-bound search is too time-consuming in practice. Therefore, we design fast heuristic searches that achieve near-optimal allocations for all our benchmarks. © 2005 IEEE.Publication Integrated scratchpad memory optimization and task scheduling for MPSoC architectures(2006) Suhendra, V.; Raghavan, C.; Mitra, T.; COMPUTER SCIENCEMultiprocessor system-on-chip (MPSoC) is an integrated circuit containing multiple instruction-set processors on a single chip that implements most of the functionality of a complex electronic system. An MPSoC architecture is, in general, customized for an embedded application. A critical component of this customization process is the on-chip memory system configuration. Embedded systems increasingly employ software-controlled scratchpad memory(SPM) due to its inherent advantages in terms of area, energy, and timing predictability compared to caches. An application-specific flexible partitioning of the on-chip SPM budget among the processors is critical for performance optimization. Moreover, scheduling the tasks of an application on to the processors and partitioning the SPM are inter-dependent even though these steps are decoupled in the traditional design space exploration process. In this work, we design an integrated task mapping, scheduling, SPM partitioning, and data allocation technique based on Integer Linear Programming(ILP)formulation. Our ILP formulation explores the optimal performance limit and shows that integrated task schedul-ing and SPM optimization improves performance by up to 80% for embedded applications. Copyright 2006 ACM.Publication Exploring locking & partitioning for predictable shared caches on multi-cores(2008) Suhendra, V.; Mitra, T.; COMPUTER SCIENCEMulti-core architectures consisting of multiple processing cores on a chip have become increasingly prevalent. Synthesizing hard realtime applications onto these platforms is quite challenging, as the contention among the cores for various shared resources leads to inherent timing unpredictability. This paper proposes the use of shared cache in a predictable manner through a combination of locking and partitioning mechanisms. We explore possible design choices and evaluate their effects on the worst-case application performance. Our study reveals certain design principles that strongly dictate the performance of a predictable memory hierarchy. Copyright 2008 ACM.Publication Scratchpad allocation for concurrent embedded software(2008) Suhendra, V.; Roychoudhury, A.; Mitra, T.; COMPUTER SCIENCESoftware-controlled scratchpad memory is increasingly employed in embedded systems as it offers better timing predictability compared to caches. Previous scratchpad allocation algorithms typically consider single process applications. But embedded applications are mostly multi-tasking with real-time constraints, where the scratchpad memory space has to be shared among interacting processes that may preempt each other. In this paper, we develop a novel dynamic scratchpad allocation technique that takes these process interferences into account to improve the performance and predictability of the memory system. We model the application as a Message Sequence Chart (MSC) to best capture the interprocess interactions. Our goal is to optimize the worst-case response time (WCRT) of the application through runtime reloading of the scratchpad memory content at appropriate execution points. We propose an iterative allocation algorithm that consists of two critical steps: (1) analyze the MSC along with the existing allocation to determine potential interference patterns, and (2) exploit this interference information to tune the scratchpad reloading points and content so as to best improve the WCRT. We evaluate our memory allocation scheme on a real-world embedded application controlling an Unmanned Aerial Vehicle (UAV). Copyright 2008 ACM.Publication Efficient detection and exploitation of infeasible paths for software timing analysis(2006) Suhendra, V.; Mitra, T.; Roychoudhury, A.; Chen, T.; COMPUTER SCIENCEAccurate estimation of the worst-case execution time (WCET) of a program is important for real-time embedded software. Static WCET estimation involves program path analysis and architectural modeling. Path analysis is complex due to the inherent difficulty in detecting and exploiting infeasible paths in a program's control flow graph. In this paper, we propose an efficient method to exploit infeasible path information for WCET estimation without resorting to exhaustive path enumeration. We demonstrate the efficiency of our approach for some real-life control-intensive applications. Copyright 2006 ACM.Publication Timing analysis of concurrent programs running on shared cache multi-cores(2009) Li, Y.; Suhendra, V.; Liang, Y.; Mitra, T.; Roychoudhury, A.; COMPUTER SCIENCEMemory accesses form an important source of timing unpredictability. Timing analysis of real-time embedded software thus requires bounding the time for memory accesses. Multiprocessing, a popular approach for performance enhancement, opens up the opportunity for concurrent execution. However due to contention for any shared memory by different processing cores, memory access behavior becomes more unpredictable, and hence harder to analyze. In this paper, we develop a timing analysis method for concurrent software running on multi-cores with a shared instruction cache. Communication across tasks is by message passing where the message mailboxes are accessed via interrupt service routines. We do not handle data cache, shared memory synchronization and code sharing across tasks. Our method progressively improves the lifetime estimates of tasks that execute concurrently on multiple cores, in order to estimate potential conflicts in the shared cache. Possible conflicts arising from overlapping task lifetimes are accounted for in the hit-miss classification of accesses to the shared cache, to provide safe execution time bounds. We show that our method produces lower worst-case response time (WCRT) estimates than existing shared-cache analysis on a real-world embedded application. © 2009 IEEE.Publication Exploiting branch constraints without exhaustive path enumeration(2007) Chen, T.; Mitra, T.; Roychoudhury, A.; Suhendra, V.; COMPUTER SCIENCEStatically estimating the worst case execution time (WCET) of a program is important for real-time software. This is difficult even in the programming language level due to the inherent difficulty in detecting and exploiting infeasible paths in a program's control flow graph. In this paper, we propose an efficient method to exploit infeasible path information for WCET estimation of a loop without resorting to exhaustive path enumeration. The efficiency of our approach is demonstrated with a real-life control-intensive program.Publication Heuristic search with reachability tests for automated generation of test programs(2004) Leow, W.K.; Khoo, S.C.; Loh, T.H.; Suhendra, V.; COMPUTER SCIENCEOur research complements the current research on automated specification-based testing by proposing a scheme that combines the setup process, test execution, and test validation into a single test program for testing the behavior of object-oriented classes. The test program can be generated automatically given the desired test cases and closed algebraic specifications of the classes. The core of the test program generator is a partial-order planner which plans the sequence of instructions required in the test program. A first-cut implementation of the planner has been presented in [4] based on simple depth-first search. This paper presents a more efficient and effective heuristic search algorithm that performs reachability tests using the Omega Calculator. Test results show that heuristic search with reachability tests significantly reduce the search time required to generate a valid sequence of instructions. © 2004 IEEE.