ScholarBank@NUS

PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples

Thu, 01 Jan 1998 00:00:00 GMT

Title: PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples Authors: Long, P.M.; Tan, L. Abstract: We describe a polynomial-time algorithm for learning axis-aligned rectangles in Qd with respect to product distributions from multiple-instance examples in the PAC model. Here, each example consists of n elements of Qd together with a label indicating whether any of the n points is in the rectangle to be learned. We assume that there is an unknown product distribution D over Qd such that all instances are independently drawn according to D. The accuracy of a hypothesis is measured by the probability that it would incorrectly predict whether one of n more points drawn from D was in the rectangle to be learned. Our algorithm achieves accuracy ∈ with probability 1 - δ in O(d5n12/∈20 log2 nd/∈δ time. © 1998 Kluwer Academic Publishers.

PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples

Mon, 01 Jan 1996 00:00:00 GMT

Title: PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples Authors: Long, Philip M.; Tan, Lei Abstract: We describe a polynomial-time algorithm for learning axis-aligned rectangles in Qd with respect to product distributions from multiple-instance examples in the PAC model. Here, each example consists of n elements of Qd together with a label indicating whether any of the n points is in the rectangle to be learned. We assume that there is an unknown product distribution D over Qd such that all instances are independently drawn according to D. The accuracy of a hypothesis is measured by the probability that it would incorrectly predict whether one of n more points drawn from D was in the rectangle to be learned. Our algorithm achieves accuracy ε with probability 1-δ in O(d5n12/ε20 log2 nd/εδ) time.

Apple tasting

Sat, 01 Jan 2000 00:00:00 GMT

Title: Apple tasting Authors: Helmbold, D.P.; Littlestone, N.; Long, P.M. Abstract: In the standard on-line model the learning algorithm tries to minimize the total number of mistakes made in a series of trials. On each trial the learner sees an instance, makes a prediction of its classification, then finds out the correct classification. We define a natural variant of this model ("apple tasting") where • the classes are interpreted as the good and bad instances, • the prediction is interpreted as accepting or rejecting the instance, and • the learner gets feedback only when the instance is accepted. We use two transformations to relate the apple tasting model to an enhanced standard model where false acceptances are counted separately from false rejections. We apply our results to obtain a good general-purpose apple tasting algorithm as well as nearly optimal apple tasting algorithms for a variety of standard classes, such as conjunctions and disjunctions of n boolean variables. We also present and analyze a simpler transformation useful when the instances are drawn at random rather than selected by an adversary. © 2000 Academic Press.

On-line learning with linear loss constraints

Sat, 01 Jan 2000 00:00:00 GMT

Title: On-line learning with linear loss constraints Authors: Helmbold, D.P.; Littlestone, N.; Long, P.M. Abstract: We consider a generalization of the mistake-bound model (for learning {0, 1}-valued functions) in which the learner must satisfy a general constraint on the number M+ of incorrect 1 predictions and the number M- of incorrect 0 predictions. We describe a general-purpose optimal algorithm for our formulation of this problem. We describe several applications of our general results, involving situations in which the learner wishes to satisfy linear inequalities in M+ and M-. © 2000 Academic Press.

Complexity of learning according to two models of a drifting environment

Fri, 01 Jan 1999 00:00:00 GMT

Title: Complexity of learning according to two models of a drifting environment Authors: Long, P.M. Abstract: We show that a cε3/VC dim(F) bound on the rate of drift of the distribution generating the examples is sufficient for agnostic learning to relative accuracy ε, where c>0 is a constant; this matches a known necessary condition to within a constant factor. We establish a cε2/VC dim (F) sufficient condition for the realizable case, also matching a known necessary condition to within a constant factor . We provide a relatively simple proof of a bound of O(1/ε2(VC dim (F)+log 1/δ)) on the sample complexity of agnostic learning in a fixed environment.

The relaxed online maximum margin algorithm

Tue, 01 Jan 2002 00:00:00 GMT

Title: The relaxed online maximum margin algorithm Authors: Li, Y.; Long, P.M. Abstract: We describe a new incremental algorithm for training linear threshold functions: the Relaxed Online Maximum Margin Algorithm, or ROMMA. ROMMA can be viewed as an approximation to the algorithm that repeatedly chooses the hyperplane that classifies previously seen examples correctly with the maximum margin. It is known that such a maximum-margin hypothesis can be computed by minimizing the length of the weight vector subject to a number of linear constraints. ROMMA works by maintaining a relatively simple relaxation of these constraints that can be efficiently updated. We prove a mistake bound for ROMMA that is the same as that proved for the perception algorithm. Our analysis implies that the maximum-margin algorithm also satisfies this mistake bound; this is the first worst-case performance guarantee for this algorithm. We describe some experiments using ROMMA and a variant that updates its hypothesis more aggressively as batch algorithms to recognize handwritten digits. The computational complexity and simplicity of these algorithms is similar to that of perceptron algorithm, but their generalization is much better. We show that a batch algorithm based on aggressive ROMMA converges to the fixed threshold SVM hypothesis.

Improved bounds on the sample complexity of learning

Sat, 01 Jan 2000 00:00:00 GMT

Title: Improved bounds on the sample complexity of learning Authors: Li, Yi; Long, Philip M.; Srinivasan, Aravind Abstract: We present two improved bounds on the sample complexity of learning. First, we present a new general upper bound on the number of examples required to estimate all of the expectations of a set of random variables uniformly well. The quality of the estimates is measured using a variant of the relative error proposed by Haussler and Pollard. We also show that our bound is within a constant factor of the best possible. Our upper bound implies improved bounds on the sample complexity of learning according to Haussler's decision theoretic model. Next, we prove a lower bound on the sample complexity for learning according to the prediction model that is optimal to within a factor of 1+o(1).

Improved bounds on the sample complexity of learning

Mon, 01 Jan 2001 00:00:00 GMT

Title: Improved bounds on the sample complexity of learning Authors: Li, Y.; Long, P.M.; Srinivasan, A. Abstract: We present a new general upper bound on the number of examples required to estimate all of the expectations of a set of random variables uniformly well. The quality of the estimates is measured using a variant of the relative error proposed by Haussler and Pollard. We also show that our bound is within a constant factor of the best possible. Our upper bound implies improved bounds on the sample complexity of learning according to Haussler's decision theoretic model.

The one-inclusion graph algorithm is near-optimal for the prediction model of learning

Mon, 01 Jan 2001 00:00:00 GMT

Title: The one-inclusion graph algorithm is near-optimal for the prediction model of learning Authors: Li, Y.; Long, P.M.; Srinivasan, A. Abstract: Haussler, Littlestone, and Warmuth described a general-purpose algorithm for learning according to the prediction model, and proved an upper bound on the probability that their algorithm makes a mistake in terms of the number of examples seen and the Vapnik-Chervonenkis (VC) dimension of the concept class being learned. We show that their bound is within a factor of 1 + o(1) of the best possible such bound for any algorithm.

Approximating hyper-rectangles: Learning and pseudorandom sets

Tue, 01 Dec 1998 00:00:00 GMT

Title: Approximating hyper-rectangles: Learning and pseudorandom sets Authors: Auer, P.; Long, P.M.; Srinivasan, A. Abstract: The PAC learning of rectangles has been studied because they have been found experimentally to yield excellent hypotheses for several applied learning problems. Also, pseudorandom sets for rectangles have been actively studied recently because (i) they are a subproblem common to the derandomization of depth-2 (DNF) circuits and derandomizing randomized logspace, and (ii) they approximate the distribution of n independent multivalued random variables. We present improved upper bounds for a class of such problems of "approximating" high-dimensional rectangles that arise in PAC learning and pseudorandomness. © 1998 Academic Press.

Text compression via alphabet re-representation

Wed, 01 Jan 1997 00:00:00 GMT

Title: Text compression via alphabet re-representation Authors: Long, Philip M.; Natsev, Apostol I.; Vitter, Jeffrey Scott Abstract: We consider re-representing the alphabet so that a representation of a character reflects its properties as a predictor of future text. This enables us to use an estimator from a restricted class to map contexts to predictions of upcoming characters. We describe an algorithm that uses this idea in conjunction with neural networks. The performance of this implementation is compared to other compression methods, such as UNIX compress, gzip, PPMC, and an alternative neural network approach.

Prediction, Learning, Uniform Convergence, and Scale-Sensitive Dimensions

Wed, 01 Apr 1998 00:00:00 GMT

Title: Prediction, Learning, Uniform Convergence, and Scale-Sensitive Dimensions Authors: Bartlett, P.L.; Long, P.M. Abstract: We present a new general-purpose algorithm for learning classes of [0, 1]-valued functions in a generalization of the prediction model and prove a general upper bound on the expected absolute error of this algorithm in terms of a scale-sensitive generalization of the Vapnik dimension proposed by Alon, Ben-David, Cesa-Bianchi, and Haussler. We give lower bounds implying that our upper bounds cannot be improved by more than a constant factor in general. We apply this result, together with techniques due to Haussler and to Benedek and Itai, to obtain new upper bounds on packing numbers in terms of this scale-sensitive notion of dimension. Using a different technique, we obtain new bounds on packing numbers in terms of Kearns and Schapire's fat-shattering function. We show how to apply both packing bounds to obtain improved general bounds on the sample complexity of agnostic learning. For each ∈ > 0, we establish weaker sufficient and stronger necessary conditions for a class of [0, 1]-valued functions to be agnostically learnable to within ∈ and to be an ∈-uniform Glivenko-Cantelli class. © 1998 Academic Press.

Efficient cost measures for motion estimation at low bit rates

Thu, 01 Jan 1998 00:00:00 GMT

Title: Efficient cost measures for motion estimation at low bit rates Authors: Hoang, D.T.; Long, P.M.; Vitter, J.S. Abstract: We present and compare methods for choosing motion vectors for block-based motion-compensated video coding. The primary focus is on videophone and videoconferencing applications, where low bit rates are necessary, where motion is usually limited, and where the amount of computation is also limited. In a typical block-based motion-compensated video coding system, motion vectors are transmitted along with a lossy encoding of the residuals. As the bit rate decreases, the proportion required to transmit the motion vectors increases. We provide experimental evidence that choosing motion vectors explicitly to minimize rate (including motion vector coding), subject to implicit constraints on distortion, yields better rate-distortion tradeoffs than minimizing some measure of prediction error. Minimizing a combination of rate and distortion yields further improvements. Although these explicit-minimization schemes are computationally intensive, they provide invaluable insight which we use to develop practical algorithms. We show that minimizing a simple heuristic function of the prediction error and the motion vector code length results in rate-distortion performance comparable to explicit-minimization schemes while being computationally feasible. Experimental results are provided for coders that operate within the H.261 standard. © 1998 IEEE.

Fat-shattering and the learnability of real-valued functions

Sat, 01 Jun 1996 00:00:00 GMT

Title: Fat-shattering and the learnability of real-valued functions Authors: Bartlett, P.L.; Long, P.M.; Williamson, R.C. Abstract: We consider the problem of learning real-valued functions from random examples when the function values are corrupted with noise. With mild conditions on independent observation noise, we provide characterizations of the learnability of a real-valued function class in terms of a generalization of the Vapnik-Chervonenkis dimension, the fat-shattering function, introduced by Kearns and Schapire. We show that, given some restrictions on the noise, a function class is learnable in our model if an only if its fat-shattering function is finite. With different (also quite mild) restrictions, satisfied for example by guassion noise, we show that a function class is learnable from polynomially many examples if and only if its fat-shattering function grows polynomially. We prove analogous results in an agnostic setting, where there is no assumption of an underlying function class. © 1996 Academic Press, Inc.

On the Complexity of Learning from Drifting Distributions

Sat, 01 Nov 1997 00:00:00 GMT

Title: On the Complexity of Learning from Drifting Distributions Authors: Barve, R.D.; Long, P.M. Abstract: We consider two models of on-line learning of binary-valued functions from drifting distributions due to Bartlett. We show that if each example is drawn from a joint distribution which changes in total variation distance by at most O(∈3/(d log(1/∈))) between trials, then an algorithm can achieve a probability of a mistake at most ∈ worse than the best function in a class of VC-dimension d. We prove a corresponding necessary condition of O(∈3/d). Finally, in the case that a fixed function is to be learned from noise-free examples, we show that if the distributions on the domain generating the examples change by at most O(∈2/(d log(1/∈))), then any consistent algorithm learns to within accuracy ∈. © 1997 Academic Press.

Complexity of learning according to two models of a drifting environment

Thu, 01 Jan 1998 00:00:00 GMT

Title: Complexity of learning according to two models of a drifting environment Authors: Long, Philip M. Abstract: The problem of learning functions from some set X to {0, 1} using two models of a drifting environment is studied. It is shown that a bound on the rate of drift of the distribution generating the examples is sufficient for learning to relative accuracy; this matches a known necessary condition to within a constant factor. A sufficient condition is established for the realizable case, also matching a known necessary condition to within a constant factor. A relatively simple proof of a bound of on the sample complexity of agnostic learning in a fixed environment is presented.

On-line evaluation and prediction using linear functions

Wed, 01 Jan 1997 00:00:00 GMT

Title: On-line evaluation and prediction using linear functions Authors: Long, Philip M. Abstract: We propose a model for situations where an algorithm needs to make a sequence of choices to minimize an evaluation function, but where the evaluation function must be learned on-line as it is being used. We describe algorithms for learning linear evaluation functions in this model, and prove performance bounds for them that hold in the worst case. Each bound is on the expectation, with respect to an algorithm`s randomization, of the sum of differences between the costs of the choices the algorithm makes and the best choices available. The bounds are in terms of the extent to which a linear model is appropriate, the number of alternatives to choose from, and the number of choices that need to be made. Ideas from the above analysis yield new absolute loss bounds for learning linear functions in the standard on-line prediction model. These bounds are on difference between the sum of absolute prediction errors made by the learning algorithm, and the best sum of absolute prediction errors that can be obtained by fixing a linear function in the given class. Known results imply that our bounds on this difference cannot be improved by more than a constant factor.

On the sample complexity of learning functions with bounded variation

Thu, 01 Jan 1998 00:00:00 GMT

Title: On the sample complexity of learning functions with bounded variation Authors: Long, Philip M. Abstract: We show that the class FBV of [0, 1]-valued functions with total variation at most 1 can be agnostically learned with respect to the absolute loss in polynomial time from O (1/ε2 log 1/δ) examples, matching a known lower bound to within a constant factor. We establish a bound of O (1/m) on the expected error of a polynomial-time algorithm for learning FBV in the prediction model, also matching a known lower bound to within a constant factor. Applying a known algorithm transformation to our prediction algorithm, we obtain a polynomial-time PAC learning algorithm for FBV with a sample complexity bound of O (1/ε log 1/δ); this also matches a known lower bound to within a constant factor.

Adaptive disk spindown via optimal rent-to-buy in probabilistic environments

Fri, 01 Jan 1999 00:00:00 GMT

Title: Adaptive disk spindown via optimal rent-to-buy in probabilistic environments Authors: Krishnan, P.; Long, P.M.; Vitter, J.S. Abstract: In the single rent-to-buy decision problem, without a priori knowledge of the amount of time a resource will be used we need to decide when to buy the resource, given that we can rent the resource for $1 per unit time or buy it once and for all for $c. In this paper we study algorithms that make a sequence of single rent-to-buy decisions, using the assumption that the resource use times are independently drawn from an unknown probability distribution. Our study of this rent-to-buy problem is motivated by important systems applications, specifically, problems arising from deciding when to spindown disks to conserve energy in mobile computers [4], [13], [15], thread blocking decisions during lock acquisition in multiprocessor applications [7], and virtual circuit holding times in IP-over-ATM networks [11], [19]. We develop a provably optimal and computationally efficient algorithm for the rent-to-buy problem. Our algorithm uses O(√t) time and space, and its expected cost for the tth resource use converges to optimal as O(√log t/t), for any bounded probability distribution on the resource use times. Alternatively, using O(1) time and space, the algorithm almost converges to optimal. We describe the experimental results for the application of our algorithm to one of the motivating systems problems: the question of when to spindown a disk to save power in a mobile computer. Simulations using disk access traces obtained from an HP workstation environment suggest that our algorithm yields significantly improved power/response time performance over the nonadaptive 2-competitive algorithm which is optimal in the worst-case competitive analysis model.

Dictionary selection using partial matching

Fri, 01 Jan 1999 00:00:00 GMT

Title: Dictionary selection using partial matching Authors: Hoang, D.T.; Long, P.M.; Vitter, J.S. Abstract: This work concerns the search for text compressors that compress better than existing dictionary coders, but run faster than statistical coders. We describe a new method for text compression using multiple dictionaries, one for each context of preceding characters, where the contexts have varying lengths. The context to be used is determined using an escape mechanism similar to that of prediction by partial matching (PPM) methods. We describe modifications of three popular dictionary coders along these lines and experiments evaluating their effectiveness using the text files in the Calgary corpus. Our results suggest that modifying LZ77, LZFG, and LZW along these lines yields improvements in compression of about 3%, 6%, and 15%, respectively.

Text compression via alphabet re-representation

Fri, 01 Jan 1999 00:00:00 GMT

Title: Text compression via alphabet re-representation Authors: Long, P.M.; Natsev, A.I.; Vitter, J.S. Abstract: This article introduces the concept of alphabet re-representation in the context of text compression. We consider re-representing the alphabet so that a representation of a character reflects its properties as a predictor of future text. This enables us to use an estimator from a restricted class to map contexts to predictions of upcoming characters. We describe an algorithm that uses this idea in conjunction with neural networks. The performance of our implementation is compared to other compression methods, such as UNIX compress, gzip, PPMC, and an alternative neural network approach.

Improved bounds about on-line learning of smooth-functions of a single variable

Sat, 01 Jan 2000 00:00:00 GMT

Title: Improved bounds about on-line learning of smooth-functions of a single variable Authors: Long, P.M. Abstract: We consider the complexity of learning classes of smooth functions formed by bounding different norms of a function's derivative. The learning model is the generalization of the mistake-bound model to continuous-valued functions. Suppose Fq is the set of all absolutely continuous functions f from [0,1] to R such that ∥f′∥q≤1, and opt(Fq,m) is the best possible bound on the worst-case sum of absolute prediction errors over sequences of m trials. We show that for all q≥2, opt(Fq,m) = Θ(√logm), and that opt(F2,m)≤(√log2 m)/2 + O(1), matching a known lower bound of (√log2 m)/2 - O(1) to within an additive constant. © 2000 Elsevier Science B.V. All rights reserved.

Identification of Discriminators of Hepatoma by Gene Expression Profiling Using a Minimal Dataset Approach

Thu, 01 Jan 2004 00:00:00 GMT

Title: Identification of Discriminators of Hepatoma by Gene Expression Profiling Using a Minimal Dataset Approach Authors: Neo, S.Y.; Leow, C.K.; Vega, V.B.; Long, P.M.; Islam, A.F.M.; Liu, E.T.; Ren, E.C.; Lai, P.B.S.

Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection

Sat, 24 May 2003 00:00:00 GMT

Title: Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection Authors: Ruan, Y.; Wei, C.L.; Ee, L.A.; Vega, V.B.; Thoreau, H.; Yun, S.T.S.; Chia, J.-M.; Ng, P.; Chiu, K.P.; Lim, L.; Tao, Z.; Peng, C.K.; Ean, L.O.L.; Lee, N.M.; Sin, L.Y.; Ng, L.F.P.; Ren, E.C.; Stanton, Lawrence Walter; Long, P.M.; Liu, E.T. Abstract: Background: The cause of severe acute respiratory syndrome (SARS) has been identified as a new coronavirus. Whole genome sequence analysis of various isolates might provide an indication of potential strain differences of this new virus. Moreover, mutation analysis will help to develop effective vaccines. Methods: We sequenced the entire SARS viral genome of cultured isolates from the index case (SIN2500) presenting in Singapore, from three primary contacts (SIN2774, SIN2748, and SIN2677), and one secondary contact (SIN2679). These sequences were compared with the isolates from Canada (TOR2), Hong Kong (CUHK-W1 and HKU39849), Hanoi (URBANI), Guangzhou (GZ01), and Beijing (BJ01, BJ02, BJ03, BJ04). Findings: We identified 129 sequence variations among the 14 isolates, with 16 recurrent variant sequences. Common variant sequences at four loci define two distinct genotypes of the SARS virus. One genotype was linked with infections originating in Hotel M in Hong Kong, the second contained isolates from Hong Kong, Guangzhou, and Beijing with no association with Hotel M (p<0.0001). Moreover, other common sequence variants further distinguished the geographical origins of the isolates, especially between Singapore and Beijing. Interpretation: Despite the recent onset of the SARS epidemic, genetic signatures are emerging that partition the worldwide SARS viral isolates into groups on the basis of contact source history and geography. These signatures can be used to trace sources of infection. In addition, a common variant associated with a non-conservative aminoacid change in the S1 region of the spike protein, suggests that immunological pressures might be starting to influence the evolution of the SARS virus in human populations.

Identification of Discriminators of Hepatoma by Gene Expression Profiling Using a Minimal Dataset Approach

Thu, 01 Jan 2004 00:00:00 GMT

Structural results about on-line learning models with and without queries

Fri, 01 Jan 1999 00:00:00 GMT

Title: Structural results about on-line learning models with and without queries Authors: Auer, P.; Long, P.M. Abstract: We solve an open problem of Maass and Turan, showing that the optimal mistake-bound when learning a given concept class without membership queries is within a constant factor of the optimal number of mistakes plus membership queries required by an algorithm that can ask membership queries. Previously known results imply that the constant factor in our bound is best possible. We then show that, in a natural generalization of the mistake-bound model, the usefulness to the learner of arbitrary `yes-no' questions between trials is very limited. We show that several natural structural questions about relatives of the mistake-bound model can be answered through the application of this general result. Most of these results can be interpreted as saying that learning in apparently less powerful (and more realistic) models is not much more difficult than learning in more powerful models.

Mutational dynamics of the SARS coronavirus in cell culture and human populations isolated in 2003

Thu, 01 Jan 2004 00:00:00 GMT

Title: Mutational dynamics of the SARS coronavirus in cell culture and human populations isolated in 2003 Authors: Vega V.B.; Ruan Y.; Liu J.; Lee W.H.; Wei C.L.; Se-Thoe S.Y.; Tang K.F.; Zhang T.; Kolatkar P.R.; Ooi E.E.; Ling A.E.; Stanton, Lawrence Walter; Long P.M.; Liu E.T.