Please use this identifier to cite or link to this item: http://scholarbank.nus.edu.sg/handle/10635/33355
Title: On the performance characterization and evaluation of RNA structure prediction algorithms for high performance systems
Authors: S. P. T. KRISHNAN
Keywords: High Performance Computing, Multi-core processing, Cloud computing, Google App Engine, RNA secondary structure prediction, Performance studies
Issue Date: 31-Jul-2011
Source: S. P. T. KRISHNAN (2011-07-31). On the performance characterization and evaluation of RNA structure prediction algorithms for high performance systems. ScholarBank@NUS Repository.
Abstract: Scientific problems in domains such as bioinformatics demand high performance computing (HPC) based solutions. Yet, many of the existing algorithms were designed during the era of single-core CPU computing. These algorithms have traditionally benefitted from the performance scaling of the single CPU, typically through higher CPU clock speeds, with no code changes. Currently, the trend among processor manufacturers to get performance scaling is to add additional computing cores rather than make the individual cores more powerful. This requires that the existing algorithms be redesigned in order to run efficiently in this new generation of parallel computers. It also emphasizes the need that parallelization should be considered at the design stage itself, so that new algorithms can scale from single-core computers to many-core computers automatically. In this thesis, we design and analyze several parallelization methods, and apply them to highly recursive dynamic programming based RNA secondary structure prediction algorithms. We have implemented the parallelized versions of the algorithm on three different high-performance-computing architectures. By conducting large-scale experiments using different system configurations in these three architectures, we are able to characterize the performance trends on today?s parallel computers. The parallelization techniques that we have explored and used are - data parallelization, including wavefront parallelization, code parallelization and hybrid parallelization. The three high performance computing architectures that we have used in our experiments are the Intel x64, IBM Cell Broadband Engine and the Google App Engine (GAE). Each of these systems were chosen because of their respective uniqueness. The Intel architecture is a homogenous ISA (Instruction Set Architecture) multi-core system of Uniform Memory Access (UMA) type, while the Cell is a heterogeneous ISA multi-core system of Non-Uniform Memory Access (NUMA) type. GAE is a task-based multi-system parallel computing platform that is highly scalable for extreme amounts of workloads. Secondly, we designed a novel parallel-by-design RNA secondary structure prediction algorithm. The algorithm has been designed such that it does not contain any features that will inhibit the parallel execution of the algorithm. The algorithm is designed to scale from single-core to many-cores automatically. We have implemented optimized versions of this algorithm on the three HPC architectures described above. Using real RNA primary sequences, we conducted large-scale experiments for both of these algorithms on the mentioned three HPC hardware architectures. We modified the system configuration and repeated the experiments for each of these architectures. This resulted in the generation of large number of data points, comprising of program runtimes and other performance metrics. We subsequently analyzed this dataset and computed the performance trends such as Speedup, Incremental Speedup and Performance gain. The large-scale study has helped in identifying the best possible parallelization technique that can be used to parallelize existing Dynamic Programming based highly recursive algorithms. It has also helped in identifying the performance bottlenecks, system limits and programming challenges of the various high performance computing systems.
URI: http://scholarbank.nus.edu.sg/handle/10635/33355
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
SPTKrishnan.pdf5.48 MBAdobe PDF

OPEN

NoneView/Download

Page view(s)

240
checked on Dec 11, 2017

Download(s)

250
checked on Dec 11, 2017

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.