Lee, M.L.Yang, L.H.Hsu, W.Yang, X.COMPUTER SCIENCE2013-07-042013-07-042002Lee, M.L.,Yang, L.H.,Hsu, W.,Yang, X. (2002). XClust: Clustering XML schemas for effective integration. International Conference on Information and Knowledge Management, Proceedings : 292-299. ScholarBank@NUS Repository.https://scholarbank.nus.edu.sg/handle/10635/40561It is increasingly important to develop scalable integration techniques for the growing number of XML data sources. A practical starting point for the integration of large numbers of Document Type Definitions (DTDs) of XML sources would be to first find clusters of DTDs that are similar in structure and semantics. Reconciling similar DTDs within such a cluster will be an easier task than reconciling DTDs that are different in structure and semantics as the latter would involve more restructuring. We introduce XClust, a novel integration strategy that involves the clustering of DTDs. A matching algorithm based on the semantics, immediate descendents and leaf-context similarity of DTD elements is developed. Our experiments to integrate real world DTDs demonstrate the effectiveness of the XClust approach.ClusteringData integrationSchema matchingXML schemaXClust: Clustering XML schemas for effective integrationConference PaperNOT_IN_WOS