Permanent URI for this collection
Browse
566 results
Search results
Publication String matching and indexing with suffix data structures(2008-02-07) WONG SWEE SEONG; COMPUTER SCIENCE; SUNG WING KIN; WONG LIM SOONThis thesis studies methods for indexing a text so that the occurrences of any given query string in the text can be located efficiently. An occurrence or match may be imprecise, allowing some deviations from the actual query. This gives rise to a family of interesting string matching problems like exact and approximate string matching, and sequence alignment. We consider two different computing models to handle the problem. The first is to compress the index so that it is small enough to be stored in the main memory. Another computing model is to make use of secondary disk, where the index resides on the hard disk. Blocks or chunks of the index are fetched into memory upon request. In this case, we are concern with the number of IO accesses to perform string search on the index. In both scenarios, it is essential to have efficient computation algorithms to support various string search. Mixed computing model is also possible with multiple levels of indexing, combining both in-memory and disk-based indices.Publication Geographic Routing for Point to Point Data Delivery in Wireless Sensor Network(2010-07-26) SHAO TAO; COMPUTER SCIENCE; ANANDA, AKKIHEBBAL L; CHAN MUN CHOONAs the design of sensor network applications diversifies, besides flooding and converge-cast, the point-topoint delivery is often required to support more complex communication schemes. Due to the constraints in the current sensor platforms, the traditional ad hoc routing protocols are not scalable. Geographic routing is an attractive option, because its localized packet forwarding procedure obviates the requirement of routing tables. While wireless networks are generally deployed in three-dimensional environments, most geographic routing protocols are designed and evaluated in a two-dimensional space. Existing face routing protocols rely on a graph planarization procedure, which is not applicable to 3D networks. The greedy forwarding methods are vulnerable to local minimum cases, leading to frequent delivery failures in sparse networks. We present a tree-based routing protocol for 3D wireless networks named Spherical Coordinate Routing (SCR), that uses connectivity-based greedy forwarding to obtain efficient routing paths and a spherical coordinate tree to guarantee packet delivery. SCR can deploy multiple recovery trees simultaneously for better routing efficiency and resilience against network dynamics. Hop count vector-based routing protocols can also be integrated with tree-based routing to improve routing efficiency. However, when the entire hop count vector is used to address each node, the communication and storage overhead in the packets is often too high to be employed in a large scale. We apply a dimension reduction technique, Principal Component Analysis (PCA), to reduce the control overhead in data packets. Compared to the original hop count vector, the embedding coordinates generated from the PCA algorithm preserve the network geometry with much lower overhead, making their use more practical on the current sensor platforms. Simulation results show that the coordinates computed by PCA can achieve higher packet delivery ratio, lower hop stretch and shorter flooding range with the same packet overhead.Publication CREDIBILITY ASSESSMENT IN MICROBLOGS(2020-06-25) LIM WEE YONG; COMPUTER SCIENCE; Mong Li LeeMicroblogging platforms such as Twitter allow users to post publicly viewable short messages. Posts shared by users pertaining to real-world events or themes can provide a rich ``on-the-ground'' live update of the events for the benefit of everyone. Unfortunately, not all posted information may be credible, and the same platform can be exploited to spread misinformation across unsuspecting users on the platform. In this thesis, we propose to assess the credibility of microblogs at a finer granularity, using evidence from external sources, and considering the relations between the claims. We source for external independent sources of evidence in the form of Web Search Results (WSRs), identify dependencies between related claims, and use these dependencies to adjust the likelihood estimates of a claim being credible, not credible or inconclusive. We further extend our notion of claims to include temporal information and develop an end-to-end framework for evaluating time-sensitive claims.Publication Computational Analysis of 3D Protein Structures(2006-11-16) ZEYAR AUNG; COMPUTER SCIENCE; TAN KIAN LEEAnalysis of 3-dimensional (3D) protein structures plays an important role in bioinformatics. Here, we present four methods for four different types of protein structure analyses: alignment, database search, classification, and clustering. Firstly, we propose a new method that carries out precise structural alignment by means of aligning their distance profiles, followed by an iterative refinement. Secondly, we propose a new index-based method for rapid structural database searching. It builds an inverted index of secondary structure element (SSE) pairs, and uses this index for ranking of the database proteins with respect to a query. Thirdly, we develop a new protein structure classification method based on a nearest neighbor scheme integrated with active learning. It adopts the filter-and-refine strategy, and utilizes a two-tier abstract representation of protein structures. Finally, we propose a method for clustering protein-protein interfaces. We carefully choose a set of representative interfaces from PDB (Protein Data Bank); characterize them as interface matrices; encode them as feature vectors based on the different submatrix types contained in them; and cluster them using a version of nearest-neighbor clustering algorithm. We can discover a number of statistically and biologically significant clusters.Publication LOW-POWER COMPILER CONTROLLED PROGRAMMABLE ACCELERATORS FOR NEXT GENERATION WEARABLES(2020-01-02) DISSANAYAKA MUDIYANSELAGE EMIL MANUPA KARUNARATNE; COMPUTER SCIENCE; Tulika Mitra; Peh Li ShiuanRecently, the wearable device market is rapidly expanding, spanning a wide variety of usage scenarios. The main requirement of such devices would be superior computational ability under a stringent energy budget. Accelerator-rich SoCs (System-on-Chips) have been the mainstream solution. However, it is not feasible to design an accelerator for each wearable application due to the prohibitively high cost and exacting time-to-market constraints. Coarse-Grained Reconfigurable Arrays (CGRA) are a promising alternative for accelerators that provide a good balance between flexibility, performance, and power. Most of the existing work on CGRA, solely focus on either on the architectural aspects or compilation aspects. However, the close inter-dependence among the architecture and compiler makes it exceedingly difficult to efficiently map wearable applications to CGRAs, just by focusing on improving the architecture or the compiler individually. In this thesis, we focus on expanding the research frontiers of CGRAs by adopting hardware/software co-designed approaches.Publication Instruction cache optimizations for embedded systems(2010-07-01) LIANG YUN; COMPUTER SCIENCE; TULIKA MITRAThe application specific nature of embedded systems creates the opportunity to design a customized system-on-chip (SoC) platform for a particular application or an application domain. Cache memory subsystem bears significant importance as it bridges the performance gap between the fast processor and the slow main memory. In particular, instruction cache, which is employed by most embedded systems, is one of the foremost power consuming and performance determining microarchitectural features as instructions are fetched almost every clock cycle. Thus, careful tuning and optimization of instruction cache memory can lead to significant performance gain and energy saving. The objective of this thesis is to exploit application characteristics for instruction cache optimizations. The application characteristics we use include branch probability, loop bound, temporal reuse profile and intermediate blocks profile. These application characteristics are identified through profiling and exploited by our subsequent analytical approach. We consider both hardware and software solutions. The first part of the thesis focuses on hardware optimization --- identifying best cache configurations to match the specific temporal and spatial localities of a given application through analytical approach. We first develop a static program analysis to accurately model the cache behavior of a specific cache configuration. Then, we extend our analysis by taking the structural relations among the related cache configurations into account. Our analysis can estimate the cache hit rates for a set of cache configurations with varying number of sets and associativity in one pass as long as the cache line size remains constant. The input to our analysis is simply the branch probability and loop bounds, which is significantly more compact compared to the memory address traces required by trace-driven simulators and other trace based analytical works. The second part of the thesis focuses on software optimizations. We propose techniques to tailor the program to the underlying instruction cache parameters. First, we develop a framework to improve the average-case program performance through static instruction cache locking. We introduce temporal reuse profile to accurately and efficiently model the cost and benefit of locking memory blocks in the cache. We propose two cache locking algorithms : an optimal algorithm based on branch-and-bound search and a heuristic approach. Second, we propose an efficient algorithm to place procedures in memory for a specific cache configuration such that cache conflicts are minimized. As a result, both performance and energy consumption are improved. Our efficient algorithm is based on intermediate blocks profile that accurately but compactly models cost-benefit of procedure placement for both direct mapped and set associative caches. Finally, we propose an integrated instruction cache optimization framework by combining all the techniques together.Publication On forward pruning in game-tree search(2007-08-14) LIM YEW JIN; COMPUTER SCIENCE; LEE WEE SUNThis thesis is focused on the theoretical understanding and practical applications of forward pruning in game-tree search, also known as selective search. Our research focuses on three main areas: (1) Solving Tigers and Goats - using forward pruning techniques in addition to other advanced search techniques to prove that Tigers and Goats is a draw using modern desktop computers. (2) Practical Application - developing a domain-independent forward pruning technique called RankCut for game-tree search. We show the effectiveness of RankCut in open source Chess programs, even with implemented together with other forward pruning techniques. (3) Theoretical Understanding - forming theoretical frameworks of forward pruning to identify two factors, the player to move and the depth of a node, that affect the performance of selective search. We also formulate risk-management strategies for forward pruning techniques to maximize performance based on predictions by the theoretical frameworks.Publication AUTOMATED TEMPORAL LOGIC SYNTHESIS OF TIME SERIES DATA AND ITS APPLICATIONS(2019-05-06) ZHOU JUN; COMPUTER SCIENCE; WONG WENG FAITemporal logic is important for many applications, such as formal verification where the system is formally verified against specifications defined as temporal logic formulas. However, given a system, producing temporal logic specifications for it is a challenging task. In this thesis, we propose a framework to synthesize bounded linear temporal logic (BLTL) formulas from time series data and focus on making the formulas human-understandable. The framework adopts a commonly used two-step workflow: structure synthesis to generate parameterized formulas and parameter synthesis to concretize the parameters. For structure synthesis, we propose two methods for different use cases. For parameter synthesis, we solve it as an optimization problem by simulated annealing. We propose an innovative objective function to guide the optimization that balances both the statistical significance and interpretation quality of the synthesized formulas. In the direction of interpretation, we apply the framework to biological pathway models to synthesize specifications describing their behaviors in biochemical reactions. In the direction of classification, we apply the framework to classify electrocardiogram (ECG) beats. Experiments show that our framework outperforms existing machine learning methods in individual patient classification where data are few.Publication A Design Framework for Reactive and Time-triggered Embedded Systems via the UML-SystemC bridge(2009-03-12) KATHY NGUYEN DANG; COMPUTER SCIENCE; P. S. THIAGARAJANEmbedded systems are increasingly complex due to the large number of internal components and their interactions. This calls for more effective design methods. System level design methodologies have been proposed in this context as the means to cope with complex large scale embedded systems. The aim of this research is to use UML notations to support system level design of systems in which control ow is event-triggered or time-triggered. We use SystemC as an intermediate representation to do design validation. Our main contributions are: The identification of a subset of UML using which the structure, behavior and requirements for a system can be captured. In addition, we identify the neces- sary UML extension mechanisms and the level of abstraction to facilitate the e cient SystemC-based simulation. A translation framework in which the UML model can be used to generate SystemC code automatically. The generated SystemC code has been proven to offer good simulation speed. The first steps towards tool-supported model association in which UML-based test cases and requirements can be validated at the SystemC level and simula- tion traces can be displayed at the UML level. Case studies to con rm the efficacy of our design approach both in event- triggered and time-triggered settings.Publication Semi-Lazy Learning Approach to Dynamic Spatio-Temporal Data Analysis(2014-08-14) ZHOU JINGBO; COMPUTER SCIENCE; TUNG KUM HOE, ANTHONYWith a wide range of applications, spatio-temporal data analysis has been a timely and popular research topic in recent years. In this thesis, we investigate problems concerning dynamic spatio-temporal data analysis. Data analysis methods can be categorized into two classes: the eager learning approach and the lazy learning approach. However, none of the existing approaches are able to achieve eligible performance that is suitable for dynamic spatio-temporal data analysis. The main aim of this thesis is to propose a new approach to dynamic spatio-temporal data analysis. After carefully cogitating how the features of the eager learning and lazy learning approaches could influence analysis performance, we perceived, to our pleasure, that their strong points and weak points are just complementary. Hence, it would be highly imperative and persuasive to adopt their strong points to contrive a new approach. Consequently, we devised a novel semi-lazy learning approach which can take the dynamic factor into account in a similar fashion to the lazy learning approach and still keep good analysis functions like the eager learning approach. Based on the semi-lazy learning approach, we exploited three concrete dynamic spatio-temporal data analysis problems, which are trajectory prediction, time series prediction and itinerary recommendation respectively.