Please use this identifier to cite or link to this item:
Title: Identifying clusters from positive data
Authors: Case, J.
Jain, S. 
Martin, E.
Sharma, A.
Stephan, F. 
Keywords: Clustering
Hypothesis space
Inductive inference
Topological and geometric properties of clusterable classes
Turing degree
Issue Date: 2006
Citation: Case, J., Jain, S., Martin, E., Sharma, A., Stephan, F. (2006). Identifying clusters from positive data. SIAM Journal on Computing 36 (1) : 28-55. ScholarBank@NUS Repository.
Abstract: The present work studies clustering from an abstract point of view and investigates its properties in the framework of inductive inference. Any class S considered is given by a hypothesis space, i.e., numbering, A 0, A 1,... of nonempty recursively enumerable (r.e.) subsets of ℕ or ℚ k. A clustering task is a finite and nonempty set of r.e. indices of pairwise disjoint such sets. The class S is said to be clusterable if there is an algorithm which, for every clustering task I, converges in the limit on any text for ∪ i∈I A i to a finite set J of indices of pairwise disjoint clusters such that ∪ j∈J A j = ∪ i∈I A i. A class is called semiclusterable if there is such an algorithm which finds a J with the last condition relaxed to ∪ j∈J A j ⊇ ∪ i∈I A i. The relationship between natural topological properties and clusterability is investigated. Topological properties can provide sufficient or necessary conditions for clusterability, but they cannot characterize it. On the one hand, many interesting conditions make use of both the topological structure of the class and a well-chosen numbering. On the other hand, the clusterability of a class does not depend on which numbering of the class is used as a hypothesis space for the clusterer. These ideas are demonstrated in the context of naturally geometrically defined classes. Besides the text for the clustering task, clustering of many of these classes requires the following additional information: the class of convex hulls of finitely many points in a rational vector space can be clustered with the number of clusters as additional information. Interestingly, the class of polygons (together with their interiors) is clusterable if the number of clusters and the overall number of vertices of these clusters is given to the clusterer as additional information. Intriguingly, this additional information is not sufficient for classes including figures with holes. While some classes are unclusterable due to their topological structure, others are only computationally intractable. An oracle might permit clustering all computationally intractable clustering tasks but fail on some classes which are topologically difficult. It is shown that an oracle E permits clustering all computationally difficult classes iff E ≥ T K Λ E′ ≥ T K″. Furthermore, no 1-generic oracle below K and no 2-generic oracle permits clustering any class which is not clusterable without an oracle. © 2006 Society for Industrial and Applied Mathematics.
Source Title: SIAM Journal on Computing
ISSN: 00975397
DOI: 10.1137/050629112
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.

Page view(s)

checked on Jan 13, 2019

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.