A reinforcement learning approach for optimizing multiple traveling salesman problems over graphs

Please use this identifier to cite or link to this item: https://doi.org/10.1016/j.knosys.2020.106244

DC Field	Value
dc.title	A reinforcement learning approach for optimizing multiple traveling salesman problems over graphs
dc.contributor.author	YUJIAO HU
dc.contributor.author	YUAN YAO
dc.contributor.author	LEE WEE SUN
dc.date.accessioned	2020-10-13T01:17:30Z
dc.date.available	2020-10-13T01:17:30Z
dc.date.issued	2020-09-27
dc.identifier.citation	YUJIAO HU, YUAN YAO, LEE WEE SUN (2020-09-27). A reinforcement learning approach for optimizing multiple traveling salesman problems over graphs. Kowledge-Based Systems 204 : 106244. ScholarBank@NUS Repository. https://doi.org/10.1016/j.knosys.2020.106244
dc.identifier.issn	09507051
dc.identifier.uri	https://scholarbank.nus.edu.sg/handle/10635/177387
dc.description.abstract	This paper proposes a learning-based approach to optimize the multiple traveling salesman problem (MTSP), which is one classic representative of cooperative combinatorial optimization problems. The MTSP is interesting to study, because the problem arises from numerous practical applications and efficient approaches to optimize the MTSP can potentially be adapted for other cooperative optimization problems. However, the MTSP is rarely researched in the deep learning domain because of certain difficulties, including the huge search space, the lack of training data that is labeled with optimal solutions and the lack of architectures that extract interactive behaviors among agents. This paper constructs an architecture consisting of a shared graph neural network and distributed policy networks to learn a common policy representation to produce near-optimal solutions for the MTSP. We use a reinforcement learning approach to train the model, overcoming the requirement data labeled with ground truth. We use a two-stage approach, where reinforcement learning is used to learn an allocation of agents to vertices, and a regular optimization method is used to solve the single-agent traveling salesman problems associated with each agent. We introduce a S-samples batch training method to reduce the variance of the gradient, improving the performance significantly. Experiments demonstrate our approach successfully learns a strong policy representation that outperforms integer linear programming and heuristic algorithms, especially on large scale problems.
dc.language.iso	en
dc.publisher	Elsevier B.V.
dc.subject	multi-agent reinforcement learning
dc.subject	combinatorial optimization problems
dc.subject	multiple traveling salesman problems
dc.subject	graph neural networks
dc.subject	policy networks
dc.type	Article
dc.contributor.department	DEAN'S OFFICE (SCHOOL OF COMPUTING)
dc.description.doi	10.1016/j.knosys.2020.106244
dc.description.sourcetitle	Kowledge-Based Systems
dc.description.volume	204
dc.description.page	106244
dc.published.state	Published
Appears in Collections:	Staff Publications Elements

Show simple item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
MinMaxMTSPpreprint.pdf		1.9 MB	Adobe PDF	OPEN	Pre-print	View/Download

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM