ScholarBank@NUShttps://scholarbank.nus.edu.sgThe DSpace digital repository system captures, stores, indexes, preserves, and distributes digital research material.Thu, 22 Feb 2024 23:26:57 GMT2024-02-22T23:26:57Z50121- Polynomial-time algorithm for designing digital filters with power-of-two coefficientshttps://scholarbank.nus.edu.sg/handle/10635/72865Title: Polynomial-time algorithm for designing digital filters with power-of-two coefficients
Authors: Li, Dongning; Song, Jianjian; Lim, Yong Ching
Abstract: This paper presents an algorithm for designing digital filters with coefficients expressible as sums of signed power-of-two (SPT) terms. For each filter gain, the time complexity of the algorithm is a second-order polynomial in the filter order and is a first-order polynomial in the filter wordlength. Unlike conventional methods where each coefficient is allocated a fixed number of SPT terms, our method allows the number of SPT terms for each coefficient to vary subject to the number of SPT terms for the entire filter. This provides us with the possibility of finding a better filter without increasing the number of adders, which determines the realization cost for a given filter length. Application of the algorithm to FIR filter designs shows that it achieves up to 8.9 dB improvement over simulated annealing and mixed integer linear programming on the normalized peak ripples of example filters.
Fri, 01 Jan 1993 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/728651993-01-01T00:00:00Z
- An improvement algorithm on general connectivity computation for VLSI placementhttps://scholarbank.nus.edu.sg/handle/10635/115382Title: An improvement algorithm on general connectivity computation for VLSI placement
Authors: Shen, Z.X.; Song, J.; Zhuang, W.J.
Abstract: A model of general connectivity and its application to placement was presented previously [Zhuang94]. Its performance on the resulting circuit designs was better than traditional models by 19.2% to 37.8%. However, the computation time of the previous algorithm in the worst case was O(Nk+4), where N is the number of cells in a connection graph and k is the order of general connectivity. An example of previous computation for the 4th-order general connectivities and clustering iteration on a circuit with 1906 cells took 202 CPU hours on an IBM RS6000 workstation. In this paper a new algorithm, called Concurrent Group Search Algorithm (CGSA), is proposed. The algorithm is O(N) times faster than the old one. Experimental results show that it can result in up to 12 times speedup for an example circuit.
Wed, 01 Jan 1997 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1153821997-01-01T00:00:00Z
- Signed power-of-two (SPT) term allocation scheme for the design of digital filtershttps://scholarbank.nus.edu.sg/handle/10635/50640Title: Signed power-of-two (SPT) term allocation scheme for the design of digital filters
Authors: Lim, Yong-Ching; Yang, Rui; Li, Dongning; Song, Jianjian
Abstract: It is well known that if each coefficient value of a digital filter is a sum of SPT terms, the filter can be implemented without using multipliers. In the past decade, several methods had been developed for the design of filters whose coefficient values are sums of SPT terms. Most of these methods are for the design of filters where all the coefficient values have the same number of SPT terms. In this paper, we present a new method for allocating different number of SPT terms to each coefficient value keeping the total number of SPT terms fixed. Our technique yields excellent results.
Thu, 01 Jan 1998 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/506401998-01-01T00:00:00Z
- Application-level load migration and its implementation on top of PVMhttps://scholarbank.nus.edu.sg/handle/10635/115598Title: Application-level load migration and its implementation on top of PVM
Authors: Song, J.; Choo, H.K.; Lee, K.M.
Abstract: The development and experiment of a load (process) migration scheme conceptually similar to moving house is described. The basic idea is to migrate a process by starting a new process on another processor with checkpoint data prepared by the old process itself but transferred automatically by the migration system. The new process will then unpack the data and resume the computation. The migration mechanism of our facility is implemented by a set of library calls on top of PVM. It performs functions such as freezing and unfreezing communications, checking load conditions, selecting destination processors, starting new processes and receiving migrated data. Before migrating, a process needs to freeze communication, handle pending messages in the receive buffer and pack checkpoint data. Besides the usual merits of concurrency, location transparency and the absence of residual dependency, our scheme solves the incoming message problem at the application level and is portable and easy to use in a heterogeneous environment. Our experiment shows that our facility can help to utilize 74% of idle CPU cycles of a network of workstations with less than 6% overhead on their normal operations.
Wed, 01 Jan 1997 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1155981997-01-01T00:00:00Z
- A distributed-termination experiment on a mesh-connected array of processorshttps://scholarbank.nus.edu.sg/handle/10635/54085Title: A distributed-termination experiment on a mesh-connected array of processors
Authors: Song, J.
Abstract: In totally asynchronous computation any number of processing elements (PEs) may start simultaneously. Termination detection for it is difficult because of asynchronousness and lack of single root (initiator) to accumulate termination information. This paper presents a symmetric, asynchronous, and distributed solution to the above problem for a mesh-connected array of PEs and its implementation and comparison on a parallel computer. Our algorithm produces less messages for termination purpose than the various counter token schemes. The worst time for it to finish is O(√n), where n is the number of PEs on a square mesh. An experiment on a 64-node NCUBE, parallel computer showed that it was at least twice as fast as a centralized termination detection algorithm. The algorithm makes three assumptions: only nearest-neighbor communication exists; activation messages are always broadcast to all neighboring PEs to avoid deadlock; and any number of PEs can initiate the computation. It defines tokens and dependency arcs: tokens to collect termination states of regions of the mesh and dependency arcs to establish terminating precedence among connected PEs. They together reduce the number of messages for termination detection. Two sufficient conditions are shown to guarantee that any one of the PEs may detect termination. Although the algorithm was originated for use with totally asynchronous computations for the finite element analysis, it is applicable to any iterative computations on a mesh. © 1992.
Wed, 01 Jul 1992 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/540851992-07-01T00:00:00Z
- A partially asynchronous and iterative algorithm for distributed load balancinghttps://scholarbank.nus.edu.sg/handle/10635/113237Title: A partially asynchronous and iterative algorithm for distributed load balancing
Authors: Song, J.
Abstract: Defining task as independent entities with identical execution time and the workload of a processor as the number of tasks, load balancing is to distribute tasks among processors of a network so that the resulting workload of every processor will be as close to the average over all the workloads as possible. We propose in this paper a partially asynchronous and iterative algorithm for distributed load balancing, show its properties, and report its simulation results. The algorithm converges geometrically as assured by a theorem for balancing continuous workload. We prove that the algorithm can achieve the maximum load imbalance of no more than ⌈ d 2⌉ tasks, where d is the diameter of a network. Our simulation not only validated the properties but also showed that the algorithm could produce much smaller load imbalances for hypercubes. The obtained imbalances for hypercubes of order up to ten were no more than two tasks and 56% of the sample runs produced only one task difference, as opposed to the theoretical maximum of six tasks. © 1994.
Wed, 01 Jun 1994 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1132371994-06-01T00:00:00Z
- Implementation of a parallel genetic algorithm for floorplan optimization on IBM SP2https://scholarbank.nus.edu.sg/handle/10635/113238Title: Implementation of a parallel genetic algorithm for floorplan optimization on IBM SP2
Authors: Foo, Han Yang; Song, Jianjian; Zhuang, Wenjun; Esbensen, Henrik; Kuh, Ernest S.
Abstract: A Multi-Selection-Multi-Evolution (MSME) scheme for parallelizing a genetic algorithm for floorplan optimization is presented and its implementation with MPI and its experimental results are discussed in this paper. Our experimental results on a 16-node IBM SP2 scaleable parallel computer have shown that the scheme is effective in improving performance of floorplanning over that of a sequential implementation. The parallel version could obtain better results with more than 90% of probability. Given 1000 second wall-clock time, our parallel program could reduce both chip area and maximum path delay by more than 8% with 8 processors and 12% with 12 processors. Parallel computing can also speed up the evolution process so that there could be higher probability of obtaining a better solution within a given time interval.
Wed, 01 Jan 1997 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1132381997-01-01T00:00:00Z
- Speedup improvement on general connectivity computation by algorithmic techniques and parallel processinghttps://scholarbank.nus.edu.sg/handle/10635/113239Title: Speedup improvement on general connectivity computation by algorithmic techniques and parallel processing
Authors: Shen, Zhaoxuan; Song, Jianjian; Zhuang, Wenjun
Abstract: Algorithmic techniques and parallel processing are proposed to speed up general connectivity computation, which was shown to be effective but time-consuming. A new algorithm, called Concurrent Group Search Algorithm (CGSA), will divide N(N-1)/2 vertex pairs into N-1 groups. Within each group, general connectivities of all pairs can be calculated concurrently. Our placement results show that this technique can provide speedup of up to 12 times for one circuit. In addition, since each group is independent of the others, group computations are parallelized on a 16-node IBM SP2. Speedup of 14 times over its serial counterpart is observed. Combining the two approaches could result in the total speedup of up to 170 times, reducing CPU time from over 200 hours to 1.2 hour for a circuit. This new development makes it more practical to apply our general connectivity concept to large industrial design within short time period.
Wed, 01 Jan 1997 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1132391997-01-01T00:00:00Z
- Performance improvement of a genetic algorithm for floorplanning with parallel computing technologyhttps://scholarbank.nus.edu.sg/handle/10635/113241Title: Performance improvement of a genetic algorithm for floorplanning with parallel computing technology
Authors: Foo, Han Yang; Esbensen, Henrik; Song, Jianjian; Zhuang, Wenjun; Kuh, Ernest S.
Abstract: This paper presents a parallel implementation of a genetic algorithm for floorplanning. It discusses the strategy and implementation and proposes a Multi-Selection-Multi-Evolution (MSME) parallelization scheme. The initial experimental results on a 12-node IBM SP2 scalable parallel computer have shown that the scheme is effective in improving performance of floorplanning over that of a serial implementation. The parallel version could obtain better results with more than 80% of probability and the improvements on chip area and maximum path delay could be more than 20% in the best cases.
Wed, 01 Jan 1997 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1132411997-01-01T00:00:00Z
- Investigation of measurement uncertainties and errors in a radiated emission test systemhttps://scholarbank.nus.edu.sg/handle/10635/123478Title: Investigation of measurement uncertainties and errors in a radiated emission test system
Authors: Song, Jian Jian; Hui, Hon Tat; Sim, Zhi Wei.
Thu, 01 Jan 2015 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1234782015-01-01T00:00:00Z
- Polynomial-time algorithm for designing digital filters with power-of-two coefficientshttps://scholarbank.nus.edu.sg/handle/10635/81678Title: Polynomial-time algorithm for designing digital filters with power-of-two coefficients
Authors: Li, Dongning; Song, Jianjian; Lim, Yong Ching
Abstract: This paper presents an algorithm for designing digital filters with coefficients expressible as sums of signed power-of-two (SPT) terms. For each filter gain, the time complexity of the algorithm is a second-order polynomial in the filter order and is a first-order polynomial in the filter wordlength. Unlike conventional methods where each coefficient is allocated a fixed number of SPT terms, our method allows the number of SPT terms for each coefficient to vary subject to the number of SPT terms for the entire filter. This provides us with the possibility of finding a better filter without increasing the number of adders, which determines the realization cost for a given filter length. Application of the algorithm to FIR filter designs shows that it achieves up to 8.9 dB improvement over simulated annealing and mixed integer linear programming on the normalized peak ripples of example filters.
Fri, 01 Jan 1993 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/816781993-01-01T00:00:00Z
- Signed power-of-two (SPT) term allocation scheme for the design of digital filtershttps://scholarbank.nus.edu.sg/handle/10635/81739Title: Signed power-of-two (SPT) term allocation scheme for the design of digital filters
Authors: Lim, Yong-Ching; Yang, Rui; Li, Dongning; Song, Jianjian
Abstract: It is well known that if each coefficient value of a digital filter is a sum of SPT terms, the filter can be implemented without using multipliers. In the past decade, several methods had been developed for the design of filters whose coefficient values are sums of SPT terms. Most of these methods are for the design of filters where all the coefficient values have the same number of SPT terms. In this paper, we present a new method for allocating different number of SPT terms to each coefficient value keeping the total number of SPT terms fixed. Our technique yields excellent results.
Thu, 01 Jan 1998 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/817391998-01-01T00:00:00Z