Please use this identifier to cite or link to this item:
|Title:||Distributed multivariate regression based on influential observations|
|Authors:||Yu, H. |
|Keywords:||Distributed data mining|
Multivariate linear regression
|Source:||Yu, H.,Chang, E.-C. (2003). Distributed multivariate regression based on influential observations. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining : 679-684. ScholarBank@NUS Repository. https://doi.org/10.1145/956750.956839|
|Abstract:||Large-scale data sets are sometimes logically and physically distributed in separate databases. The issues of mining these data sets are not just their sizes, but also the distributed nature. The complication is that communicating all the data to a central database would be too slow. To reduce communication costs, one could compress the data during transmission. Another method is random sampling. We propose an approach for distributed multivariate regression based on sampling and discuss its relationship with the compression method. The central idea is motivated by the observation that, although communication is limited, each individual site can still scan and process all the data it holds. Thus it is possible for the site to communicate only influential samples without seeing data in other sites. We exploit this observation and derive a method that provides tradeoff between communication cost and accuracy. Experimental results show that it is better than the compression method and random sampling. Copyright 2003 ACM.|
|Source Title:||Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
checked on Dec 13, 2017
checked on Dec 16, 2017
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.