Please use this identifier to cite or link to this item:
Title: Distributed multivariate regression based on influential observations
Authors: Yu, H. 
Chang, E.-C. 
Keywords: Distributed data mining
Learning curve
Multivariate linear regression
Issue Date: 2003
Citation: Yu, H., Chang, E.-C. (2003). Distributed multivariate regression based on influential observations. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining : 679-684. ScholarBank@NUS Repository.
Abstract: Large-scale data sets are sometimes logically and physically distributed in separate databases. The issues of mining these data sets are not just their sizes, but also the distributed nature. The complication is that communicating all the data to a central database would be too slow. To reduce communication costs, one could compress the data during transmission. Another method is random sampling. We propose an approach for distributed multivariate regression based on sampling and discuss its relationship with the compression method. The central idea is motivated by the observation that, although communication is limited, each individual site can still scan and process all the data it holds. Thus it is possible for the site to communicate only influential samples without seeing data in other sites. We exploit this observation and derive a method that provides tradeoff between communication cost and accuracy. Experimental results show that it is better than the compression method and random sampling. Copyright 2003 ACM.
Source Title: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
DOI: 10.1145/956750.956839
Appears in Collections:Staff Publications

Show full item record
Files in This Item:
There are no files associated with this item.


checked on Nov 24, 2021

Page view(s)

checked on Nov 18, 2021

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.