ScholarBank@NUShttps://scholarbank.nus.edu.sgThe DSpace digital repository system captures, stores, indexes, preserves, and distributes digital research material.Sat, 21 Sep 2019 20:26:16 GMT2019-09-21T20:26:16Z50271- Improved bounds on the sample complexity of learninghttps://scholarbank.nus.edu.sg/handle/10635/43158Title: Improved bounds on the sample complexity of learning
Authors: Li, Yi; Long, Philip M.; Srinivasan, Aravind
Abstract: We present two improved bounds on the sample complexity of learning. First, we present a new general upper bound on the number of examples required to estimate all of the expectations of a set of random variables uniformly well. The quality of the estimates is measured using a variant of the relative error proposed by Haussler and Pollard. We also show that our bound is within a constant factor of the best possible. Our upper bound implies improved bounds on the sample complexity of learning according to Haussler's decision theoretic model. Next, we prove a lower bound on the sample complexity for learning according to the prediction model that is optimal to within a factor of 1+o(1).
Sat, 01 Jan 2000 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/431582000-01-01T00:00:00Z
- Identification of Discriminators of Hepatoma by Gene Expression Profiling Using a Minimal Dataset Approachhttps://scholarbank.nus.edu.sg/handle/10635/31426Title: Identification of Discriminators of Hepatoma by Gene Expression Profiling Using a Minimal Dataset Approach
Authors: Neo, S.Y.; Leow, C.K.; Vega, V.B.; Long, P.M.; Islam, A.F.M.; Liu, E.T.; Ren, E.C.; Lai, P.B.S.
Thu, 01 Jan 2004 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/314262004-01-01T00:00:00Z
- Approximating hyper-rectangles: Learning and pseudorandom setshttps://scholarbank.nus.edu.sg/handle/10635/99195Title: Approximating hyper-rectangles: Learning and pseudorandom sets
Authors: Auer, P.; Long, P.M.; Srinivasan, A.
Abstract: The PAC learning of rectangles has been studied because they have been found experimentally to yield excellent hypotheses for several applied learning problems. Also, pseudorandom sets for rectangles have been actively studied recently because (i) they are a subproblem common to the derandomization of depth-2 (DNF) circuits and derandomizing randomized logspace, and (ii) they approximate the distribution of n independent multivalued random variables. We present improved upper bounds for a class of such problems of "approximating" high-dimensional rectangles that arise in PAC learning and pseudorandomness. © 1998 Academic Press.
Tue, 01 Dec 1998 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/991951998-12-01T00:00:00Z
- On the Complexity of Learning from Drifting Distributionshttps://scholarbank.nus.edu.sg/handle/10635/99361Title: On the Complexity of Learning from Drifting Distributions
Authors: Barve, R.D.; Long, P.M.
Abstract: We consider two models of on-line learning of binary-valued functions from drifting distributions due to Bartlett. We show that if each example is drawn from a joint distribution which changes in total variation distance by at most O(∈3/(d log(1/∈))) between trials, then an algorithm can achieve a probability of a mistake at most ∈ worse than the best function in a class of VC-dimension d. We prove a corresponding necessary condition of O(∈3/d). Finally, in the case that a fixed function is to be learned from noise-free examples, we show that if the distributions on the domain generating the examples change by at most O(∈2/(d log(1/∈))), then any consistent algorithm learns to within accuracy ∈. © 1997 Academic Press.
Sat, 01 Nov 1997 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/993611997-11-01T00:00:00Z
- Prediction, Learning, Uniform Convergence, and Scale-Sensitive Dimensionshttps://scholarbank.nus.edu.sg/handle/10635/99384Title: Prediction, Learning, Uniform Convergence, and Scale-Sensitive Dimensions
Authors: Bartlett, P.L.; Long, P.M.
Abstract: We present a new general-purpose algorithm for learning classes of [0, 1]-valued functions in a generalization of the prediction model and prove a general upper bound on the expected absolute error of this algorithm in terms of a scale-sensitive generalization of the Vapnik dimension proposed by Alon, Ben-David, Cesa-Bianchi, and Haussler. We give lower bounds implying that our upper bounds cannot be improved by more than a constant factor in general. We apply this result, together with techniques due to Haussler and to Benedek and Itai, to obtain new upper bounds on packing numbers in terms of this scale-sensitive notion of dimension. Using a different technique, we obtain new bounds on packing numbers in terms of Kearns and Schapire's fat-shattering function. We show how to apply both packing bounds to obtain improved general bounds on the sample complexity of agnostic learning. For each ∈ > 0, we establish weaker sufficient and stronger necessary conditions for a class of [0, 1]-valued functions to be agnostically learnable to within ∈ and to be an ∈-uniform Glivenko-Cantelli class. © 1998 Academic Press.
Wed, 01 Apr 1998 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/993841998-04-01T00:00:00Z
- PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance exampleshttps://scholarbank.nus.edu.sg/handle/10635/114368Title: PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples
Authors: Long, Philip M.; Tan, Lei
Abstract: We describe a polynomial-time algorithm for learning axis-aligned rectangles in Qd with respect to product distributions from multiple-instance examples in the PAC model. Here, each example consists of n elements of Qd together with a label indicating whether any of the n points is in the rectangle to be learned. We assume that there is an unknown product distribution D over Qd such that all instances are independently drawn according to D. The accuracy of a hypothesis is measured by the probability that it would incorrectly predict whether one of n more points drawn from D was in the rectangle to be learned. Our algorithm achieves accuracy ε with probability 1-δ in O(d5n12/ε20 log2 nd/εδ) time.
Mon, 01 Jan 1996 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1143681996-01-01T00:00:00Z
- PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance exampleshttps://scholarbank.nus.edu.sg/handle/10635/114367Title: PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples
Authors: Long, P.M.; Tan, L.
Abstract: We describe a polynomial-time algorithm for learning axis-aligned rectangles in Qd with respect to product distributions from multiple-instance examples in the PAC model. Here, each example consists of n elements of Qd together with a label indicating whether any of the n points is in the rectangle to be learned. We assume that there is an unknown product distribution D over Qd such that all instances are independently drawn according to D. The accuracy of a hypothesis is measured by the probability that it would incorrectly predict whether one of n more points drawn from D was in the rectangle to be learned. Our algorithm achieves accuracy ∈ with probability 1 - δ in O(d5n12/∈20 log2 nd/∈δ time. © 1998 Kluwer Academic Publishers.
Thu, 01 Jan 1998 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1143671998-01-01T00:00:00Z
- Text compression via alphabet re-representationhttps://scholarbank.nus.edu.sg/handle/10635/99603Title: Text compression via alphabet re-representation
Authors: Long, Philip M.; Natsev, Apostol I.; Vitter, Jeffrey Scott
Abstract: We consider re-representing the alphabet so that a representation of a character reflects its properties as a predictor of future text. This enables us to use an estimator from a restricted class to map contexts to predictions of upcoming characters. We describe an algorithm that uses this idea in conjunction with neural networks. The performance of this implementation is compared to other compression methods, such as UNIX compress, gzip, PPMC, and an alternative neural network approach.
Wed, 01 Jan 1997 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/996031997-01-01T00:00:00Z
- On the sample complexity of learning functions with bounded variationhttps://scholarbank.nus.edu.sg/handle/10635/99572Title: On the sample complexity of learning functions with bounded variation
Authors: Long, Philip M.
Abstract: We show that the class FBV of [0, 1]-valued functions with total variation at most 1 can be agnostically learned with respect to the absolute loss in polynomial time from O (1/ε2 log 1/δ) examples, matching a known lower bound to within a constant factor. We establish a bound of O (1/m) on the expected error of a polynomial-time algorithm for learning FBV in the prediction model, also matching a known lower bound to within a constant factor. Applying a known algorithm transformation to our prediction algorithm, we obtain a polynomial-time PAC learning algorithm for FBV with a sample complexity bound of O (1/ε log 1/δ); this also matches a known lower bound to within a constant factor.
Thu, 01 Jan 1998 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/995721998-01-01T00:00:00Z
- On-line evaluation and prediction using linear functionshttps://scholarbank.nus.edu.sg/handle/10635/99573Title: On-line evaluation and prediction using linear functions
Authors: Long, Philip M.
Abstract: We propose a model for situations where an algorithm needs to make a sequence of choices to minimize an evaluation function, but where the evaluation function must be learned on-line as it is being used. We describe algorithms for learning linear evaluation functions in this model, and prove performance bounds for them that hold in the worst case. Each bound is on the expectation, with respect to an algorithm`s randomization, of the sum of differences between the costs of the choices the algorithm makes and the best choices available. The bounds are in terms of the extent to which a linear model is appropriate, the number of alternatives to choose from, and the number of choices that need to be made. Ideas from the above analysis yield new absolute loss bounds for learning linear functions in the standard on-line prediction model. These bounds are on difference between the sum of absolute prediction errors made by the learning algorithm, and the best sum of absolute prediction errors that can be obtained by fixing a linear function in the given class. Known results imply that our bounds on this difference cannot be improved by more than a constant factor.
Wed, 01 Jan 1997 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/995731997-01-01T00:00:00Z
- Complexity of learning according to two models of a drifting environmenthttps://scholarbank.nus.edu.sg/handle/10635/99486Title: Complexity of learning according to two models of a drifting environment
Authors: Long, Philip M.
Abstract: The problem of learning functions from some set X to {0, 1} using two models of a drifting environment is studied. It is shown that a bound on the rate of drift of the distribution generating the examples is sufficient for learning to relative accuracy; this matches a known necessary condition to within a constant factor. A sufficient condition is established for the realizable case, also matching a known necessary condition to within a constant factor. A relatively simple proof of a bound of on the sample complexity of agnostic learning in a fixed environment is presented.
Thu, 01 Jan 1998 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/994861998-01-01T00:00:00Z
- Adaptive disk spindown via optimal rent-to-buy in probabilistic environmentshttps://scholarbank.nus.edu.sg/handle/10635/39737Title: Adaptive disk spindown via optimal rent-to-buy in probabilistic environments
Authors: Krishnan, P.; Long, P.M.; Vitter, J.S.
Abstract: In the single rent-to-buy decision problem, without a priori knowledge of the amount of time a resource will be used we need to decide when to buy the resource, given that we can rent the resource for $1 per unit time or buy it once and for all for $c. In this paper we study algorithms that make a sequence of single rent-to-buy decisions, using the assumption that the resource use times are independently drawn from an unknown probability distribution. Our study of this rent-to-buy problem is motivated by important systems applications, specifically, problems arising from deciding when to spindown disks to conserve energy in mobile computers [4], [13], [15], thread blocking decisions during lock acquisition in multiprocessor applications [7], and virtual circuit holding times in IP-over-ATM networks [11], [19]. We develop a provably optimal and computationally efficient algorithm for the rent-to-buy problem. Our algorithm uses O(√t) time and space, and its expected cost for the tth resource use converges to optimal as O(√log t/t), for any bounded probability distribution on the resource use times. Alternatively, using O(1) time and space, the algorithm almost converges to optimal. We describe the experimental results for the application of our algorithm to one of the motivating systems problems: the question of when to spindown a disk to save power in a mobile computer. Simulations using disk access traces obtained from an HP workstation environment suggest that our algorithm yields significantly improved power/response time performance over the nonadaptive 2-competitive algorithm which is optimal in the worst-case competitive analysis model.
Fri, 01 Jan 1999 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/397371999-01-01T00:00:00Z
- Improved bounds about on-line learning of smooth-functions of a single variablehttps://scholarbank.nus.edu.sg/handle/10635/39553Title: Improved bounds about on-line learning of smooth-functions of a single variable
Authors: Long, P.M.
Abstract: We consider the complexity of learning classes of smooth functions formed by bounding different norms of a function's derivative. The learning model is the generalization of the mistake-bound model to continuous-valued functions. Suppose Fq is the set of all absolutely continuous functions f from [0,1] to R such that ∥f′∥q≤1, and opt(Fq,m) is the best possible bound on the worst-case sum of absolute prediction errors over sequences of m trials. We show that for all q≥2, opt(Fq,m) = Θ(√logm), and that opt(F2,m)≤(√log2 m)/2 + O(1), matching a known lower bound of (√log2 m)/2 - O(1) to within an additive constant. © 2000 Elsevier Science B.V. All rights reserved.
Sat, 01 Jan 2000 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/395532000-01-01T00:00:00Z
- Identification of Discriminators of Hepatoma by Gene Expression Profiling Using a Minimal Dataset Approachhttps://scholarbank.nus.edu.sg/handle/10635/29442Title: Identification of Discriminators of Hepatoma by Gene Expression Profiling Using a Minimal Dataset Approach
Authors: Neo, S.Y.; Leow, C.K.; Vega, V.B.; Long, P.M.; Islam, A.F.M.; Liu, E.T.; Ren, E.C.; Lai, P.B.S.
Thu, 01 Jan 2004 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/294422004-01-01T00:00:00Z
- Structural results about on-line learning models with and without querieshttps://scholarbank.nus.edu.sg/handle/10635/39266Title: Structural results about on-line learning models with and without queries
Authors: Auer, P.; Long, P.M.
Abstract: We solve an open problem of Maass and Turan, showing that the optimal mistake-bound when learning a given concept class without membership queries is within a constant factor of the optimal number of mistakes plus membership queries required by an algorithm that can ask membership queries. Previously known results imply that the constant factor in our bound is best possible. We then show that, in a natural generalization of the mistake-bound model, the usefulness to the learner of arbitrary `yes-no' questions between trials is very limited. We show that several natural structural questions about relatives of the mistake-bound model can be answered through the application of this general result. Most of these results can be interpreted as saying that learning in apparently less powerful (and more realistic) models is not much more difficult than learning in more powerful models.
Fri, 01 Jan 1999 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/392661999-01-01T00:00:00Z
- Complexity of learning according to two models of a drifting environmenthttps://scholarbank.nus.edu.sg/handle/10635/39144Title: Complexity of learning according to two models of a drifting environment
Authors: Long, P.M.
Abstract: We show that a cε3/VC dim(F) bound on the rate of drift of the distribution generating the examples is sufficient for agnostic learning to relative accuracy ε, where c>0 is a constant; this matches a known necessary condition to within a constant factor. We establish a cε2/VC dim (F) sufficient condition for the realizable case, also matching a known necessary condition to within a constant factor . We provide a relatively simple proof of a bound of O(1/ε2(VC dim (F)+log 1/δ)) on the sample complexity of agnostic learning in a fixed environment.
Fri, 01 Jan 1999 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/391441999-01-01T00:00:00Z
- The relaxed online maximum margin algorithmhttps://scholarbank.nus.edu.sg/handle/10635/39176Title: The relaxed online maximum margin algorithm
Authors: Li, Y.; Long, P.M.
Abstract: We describe a new incremental algorithm for training linear threshold functions: the Relaxed Online Maximum Margin Algorithm, or ROMMA. ROMMA can be viewed as an approximation to the algorithm that repeatedly chooses the hyperplane that classifies previously seen examples correctly with the maximum margin. It is known that such a maximum-margin hypothesis can be computed by minimizing the length of the weight vector subject to a number of linear constraints. ROMMA works by maintaining a relatively simple relaxation of these constraints that can be efficiently updated. We prove a mistake bound for ROMMA that is the same as that proved for the perception algorithm. Our analysis implies that the maximum-margin algorithm also satisfies this mistake bound; this is the first worst-case performance guarantee for this algorithm. We describe some experiments using ROMMA and a variant that updates its hypothesis more aggressively as batch algorithms to recognize handwritten digits. The computational complexity and simplicity of these algorithms is similar to that of perceptron algorithm, but their generalization is much better. We show that a batch algorithm based on aggressive ROMMA converges to the fixed threshold SVM hypothesis.
Tue, 01 Jan 2002 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/391762002-01-01T00:00:00Z
- Fat-shattering and the learnability of real-valued functionshttps://scholarbank.nus.edu.sg/handle/10635/99288Title: Fat-shattering and the learnability of real-valued functions
Authors: Bartlett, P.L.; Long, P.M.; Williamson, R.C.
Abstract: We consider the problem of learning real-valued functions from random examples when the function values are corrupted with noise. With mild conditions on independent observation noise, we provide characterizations of the learnability of a real-valued function class in terms of a generalization of the Vapnik-Chervonenkis dimension, the fat-shattering function, introduced by Kearns and Schapire. We show that, given some restrictions on the noise, a function class is learnable in our model if an only if its fat-shattering function is finite. With different (also quite mild) restrictions, satisfied for example by guassion noise, we show that a function class is learnable from polynomially many examples if and only if its fat-shattering function grows polynomially. We prove analogous results in an agnostic setting, where there is no assumption of an underlying function class. © 1996 Academic Press, Inc.
Sat, 01 Jun 1996 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/992881996-06-01T00:00:00Z
- Dictionary selection using partial matchinghttps://scholarbank.nus.edu.sg/handle/10635/39738Title: Dictionary selection using partial matching
Authors: Hoang, D.T.; Long, P.M.; Vitter, J.S.
Abstract: This work concerns the search for text compressors that compress better than existing dictionary coders, but run faster than statistical coders. We describe a new method for text compression using multiple dictionaries, one for each context of preceding characters, where the contexts have varying lengths. The context to be used is determined using an escape mechanism similar to that of prediction by partial matching (PPM) methods. We describe modifications of three popular dictionary coders along these lines and experiments evaluating their effectiveness using the text files in the Calgary corpus. Our results suggest that modifying LZ77, LZFG, and LZW along these lines yields improvements in compression of about 3%, 6%, and 15%, respectively.
Fri, 01 Jan 1999 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/397381999-01-01T00:00:00Z
- Improved bounds on the sample complexity of learninghttps://scholarbank.nus.edu.sg/handle/10635/43067Title: Improved bounds on the sample complexity of learning
Authors: Li, Y.; Long, P.M.; Srinivasan, A.
Abstract: We present a new general upper bound on the number of examples required to estimate all of the expectations of a set of random variables uniformly well. The quality of the estimates is measured using a variant of the relative error proposed by Haussler and Pollard. We also show that our bound is within a constant factor of the best possible. Our upper bound implies improved bounds on the sample complexity of learning according to Haussler's decision theoretic model.
Mon, 01 Jan 2001 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/430672001-01-01T00:00:00Z
- On-line learning with linear loss constraintshttps://scholarbank.nus.edu.sg/handle/10635/39220Title: On-line learning with linear loss constraints
Authors: Helmbold, D.P.; Littlestone, N.; Long, P.M.
Abstract: We consider a generalization of the mistake-bound model (for learning {0, 1}-valued functions) in which the learner must satisfy a general constraint on the number M+ of incorrect 1 predictions and the number M- of incorrect 0 predictions. We describe a general-purpose optimal algorithm for our formulation of this problem. We describe several applications of our general results, involving situations in which the learner wishes to satisfy linear inequalities in M+ and M-. © 2000 Academic Press.
Sat, 01 Jan 2000 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/392202000-01-01T00:00:00Z
- Apple tastinghttps://scholarbank.nus.edu.sg/handle/10635/39219Title: Apple tasting
Authors: Helmbold, D.P.; Littlestone, N.; Long, P.M.
Abstract: In the standard on-line model the learning algorithm tries to minimize the total number of mistakes made in a series of trials. On each trial the learner sees an instance, makes a prediction of its classification, then finds out the correct classification. We define a natural variant of this model ("apple tasting") where • the classes are interpreted as the good and bad instances, • the prediction is interpreted as accepting or rejecting the instance, and • the learner gets feedback only when the instance is accepted. We use two transformations to relate the apple tasting model to an enhanced standard model where false acceptances are counted separately from false rejections. We apply our results to obtain a good general-purpose apple tasting algorithm as well as nearly optimal apple tasting algorithms for a variety of standard classes, such as conjunctions and disjunctions of n boolean variables. We also present and analyze a simpler transformation useful when the instances are drawn at random rather than selected by an adversary. © 2000 Academic Press.
Sat, 01 Jan 2000 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/392192000-01-01T00:00:00Z
- Efficient cost measures for motion estimation at low bit rateshttps://scholarbank.nus.edu.sg/handle/10635/99258Title: Efficient cost measures for motion estimation at low bit rates
Authors: Hoang, D.T.; Long, P.M.; Vitter, J.S.
Abstract: We present and compare methods for choosing motion vectors for block-based motion-compensated video coding. The primary focus is on videophone and videoconferencing applications, where low bit rates are necessary, where motion is usually limited, and where the amount of computation is also limited. In a typical block-based motion-compensated video coding system, motion vectors are transmitted along with a lossy encoding of the residuals. As the bit rate decreases, the proportion required to transmit the motion vectors increases. We provide experimental evidence that choosing motion vectors explicitly to minimize rate (including motion vector coding), subject to implicit constraints on distortion, yields better rate-distortion tradeoffs than minimizing some measure of prediction error. Minimizing a combination of rate and distortion yields further improvements. Although these explicit-minimization schemes are computationally intensive, they provide invaluable insight which we use to develop practical algorithms. We show that minimizing a simple heuristic function of the prediction error and the motion vector code length results in rate-distortion performance comparable to explicit-minimization schemes while being computationally feasible. Experimental results are provided for coders that operate within the H.261 standard. © 1998 IEEE.
Thu, 01 Jan 1998 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/992581998-01-01T00:00:00Z
- Mutational dynamics of the SARS coronavirus in cell culture and human populations isolated in 2003https://scholarbank.nus.edu.sg/handle/10635/143618Title: Mutational dynamics of the SARS coronavirus in cell culture and human populations isolated in 2003
Authors: Vega V.B.; Ruan Y.; Liu J.; Lee W.H.; Wei C.L.; Se-Thoe S.Y.; Tang K.F.; Zhang T.; Kolatkar P.R.; Ooi E.E.; Ling A.E.; Stanton L.W.; Long P.M.; Liu E.T.
Thu, 01 Jan 2004 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1436182004-01-01T00:00:00Z
- Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infectionhttps://scholarbank.nus.edu.sg/handle/10635/107685Title: Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection
Authors: Ruan, Y.; Wei, C.L.; Ee, L.A.; Vega, V.B.; Thoreau, H.; Yun, S.T.S.; Chia, J.-M.; Ng, P.; Chiu, K.P.; Lim, L.; Tao, Z.; Peng, C.K.; Ean, L.O.L.; Lee, N.M.; Sin, L.Y.; Ng, L.F.P.; Ren, E.C.; Stanton, L.W.; Long, P.M.; Liu, E.T.
Abstract: Background: The cause of severe acute respiratory syndrome (SARS) has been identified as a new coronavirus. Whole genome sequence analysis of various isolates might provide an indication of potential strain differences of this new virus. Moreover, mutation analysis will help to develop effective vaccines. Methods: We sequenced the entire SARS viral genome of cultured isolates from the index case (SIN2500) presenting in Singapore, from three primary contacts (SIN2774, SIN2748, and SIN2677), and one secondary contact (SIN2679). These sequences were compared with the isolates from Canada (TOR2), Hong Kong (CUHK-W1 and HKU39849), Hanoi (URBANI), Guangzhou (GZ01), and Beijing (BJ01, BJ02, BJ03, BJ04). Findings: We identified 129 sequence variations among the 14 isolates, with 16 recurrent variant sequences. Common variant sequences at four loci define two distinct genotypes of the SARS virus. One genotype was linked with infections originating in Hotel M in Hong Kong, the second contained isolates from Hong Kong, Guangzhou, and Beijing with no association with Hotel M (p<0.0001). Moreover, other common sequence variants further distinguished the geographical origins of the isolates, especially between Singapore and Beijing. Interpretation: Despite the recent onset of the SARS epidemic, genetic signatures are emerging that partition the worldwide SARS viral isolates into groups on the basis of contact source history and geography. These signatures can be used to trace sources of infection. In addition, a common variant associated with a non-conservative aminoacid change in the S1 region of the spike protein, suggests that immunological pressures might be starting to influence the evolution of the SARS virus in human populations.
Sat, 24 May 2003 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/1076852003-05-24T00:00:00Z
- The one-inclusion graph algorithm is near-optimal for the prediction model of learninghttps://scholarbank.nus.edu.sg/handle/10635/43369Title: The one-inclusion graph algorithm is near-optimal for the prediction model of learning
Authors: Li, Y.; Long, P.M.; Srinivasan, A.
Abstract: Haussler, Littlestone, and Warmuth described a general-purpose algorithm for learning according to the prediction model, and proved an upper bound on the probability that their algorithm makes a mistake in terms of the number of examples seen and the Vapnik-Chervonenkis (VC) dimension of the concept class being learned. We show that their bound is within a factor of 1 + o(1) of the best possible such bound for any algorithm.
Mon, 01 Jan 2001 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/433692001-01-01T00:00:00Z
- Text compression via alphabet re-representationhttps://scholarbank.nus.edu.sg/handle/10635/39304Title: Text compression via alphabet re-representation
Authors: Long, P.M.; Natsev, A.I.; Vitter, J.S.
Abstract: This article introduces the concept of alphabet re-representation in the context of text compression. We consider re-representing the alphabet so that a representation of a character reflects its properties as a predictor of future text. This enables us to use an estimator from a restricted class to map contexts to predictions of upcoming characters. We describe an algorithm that uses this idea in conjunction with neural networks. The performance of our implementation is compared to other compression methods, such as UNIX compress, gzip, PPMC, and an alternative neural network approach.
Fri, 01 Jan 1999 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/393041999-01-01T00:00:00Z