Near-optimal learning of tree-structured distributions by Chow-Liu

Please use this identifier to cite or link to this item: https://doi.org/10.1145/3406325.3451066

DC Field	Value
dc.title	Near-optimal learning of tree-structured distributions by Chow-Liu
dc.contributor.author	Bhattacharyya, Arnab
dc.contributor.author	Gayen, Sutanu
dc.contributor.author	Price, Eric
dc.contributor.author	Vinodchandran, NV
dc.date.accessioned	2021-06-30T08:16:51Z
dc.date.available	2021-06-30T08:16:51Z
dc.date.issued	2021-06-15
dc.identifier.citation	Bhattacharyya, Arnab, Gayen, Sutanu, Price, Eric, Vinodchandran, NV (2021-06-15). Near-optimal learning of tree-structured distributions by Chow-Liu. Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing abs/2011.04144. ScholarBank@NUS Repository. https://doi.org/10.1145/3406325.3451066
dc.identifier.uri	https://scholarbank.nus.edu.sg/handle/10635/192525
dc.description.abstract	We provide finite sample guarantees for the classical Chow-Liu algorithm (IEEE Trans.~Inform.~Theory, 1968) to learn a tree-structured graphical model of a distribution. For a distribution $P$ on $\Sigma^n$ and a tree $T$ on $n$ nodes, we say $T$ is an $\varepsilon$-approximate tree for $P$ if there is a $T$-structured distribution $Q$ such that $D(P\;\|\|\;Q)$ is at most $\varepsilon$ more than the best possible tree-structured distribution for $P$. We show that if $P$ itself is tree-structured, then the Chow-Liu algorithm with the plug-in estimator for mutual information with $\widetilde{O}(\|\Sigma\|^3 n\varepsilon^{-1})$ i.i.d.~samples outputs an $\varepsilon$-approximate tree for $P$ with constant probability. In contrast, for a general $P$ (which may not be tree-structured), $\Omega(n^2\varepsilon^{-2})$ samples are necessary to find an $\varepsilon$-approximate tree. Our upper bound is based on a new conditional independence tester that addresses an open problem posed by Canonne, Diakonikolas, Kane, and Stewart~(STOC, 2018): we prove that for three random variables $X,Y,Z$ each over $\Sigma$, testing if $I(X; Y \mid Z)$ is $0$ or $\geq \varepsilon$ is possible with $\widetilde{O}(\|\Sigma\|^3/\varepsilon)$ samples. Finally, we show that for a specific tree $T$, with $\widetilde{O} (\|\Sigma\|^2n\varepsilon^{-1})$ samples from a distribution $P$ over $\Sigma^n$, one can efficiently learn the closest $T$-structured distribution in KL divergence by applying the add-1 estimator at each node.
dc.publisher	ACM
dc.source	Elements
dc.subject	cs.DS
dc.subject	cs.DS
dc.subject	cs.IT
dc.subject	cs.LG
dc.subject	math.IT
dc.type	Conference Paper
dc.date.updated	2021-06-30T04:22:54Z
dc.contributor.department	DEPARTMENT OF COMPUTER SCIENCE
dc.description.doi	10.1145/3406325.3451066
dc.description.sourcetitle	Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing
dc.description.volume	abs/2011.04144
dc.published.state	Published
Appears in Collections:	Staff Publications Elements

Show simple item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
2011.04144v1.pdf		397.06 kB	Adobe PDF	OPEN	Post-print	View/Download

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM