https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/Head
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78
http://www.nanopub.org/nschema#hasAssertion
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/assertion
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78
http://www.nanopub.org/nschema#hasProvenance
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/provenance
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78
http://www.nanopub.org/nschema#hasPublicationInfo
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/pubinfo
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.nanopub.org/nschema#Nanopublication
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/assertion
https://doi.org/10.1145/3712256.3726452
http://purl.org/dc/terms/creator
https://orcid.org/0000-0001-9487-5622
https://doi.org/10.1145/3712256.3726452
http://purl.org/dc/terms/publisher
https://ror.org/021nxhr62
https://doi.org/10.1145/3712256.3726452
http://purl.org/dc/terms/subject
http://edamontology.org/topic_3316
https://doi.org/10.1145/3712256.3726452
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
https://w3id.org/fair/ff/terms/article
https://doi.org/10.1145/3712256.3726452
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
https://w3id.org/fdof/ontology#FAIRDigitalObject
https://doi.org/10.1145/3712256.3726452
http://www.w3.org/2000/01/rdf-schema#comment
Ant Colony Optimization (ACO) has served as a widely-utilized metaheuristic algorithm for decades for solving combinatorial optimization problems. Since its initial construction, ACO has seen a wide variety of modifications and connections to Reinforcement Learning (RL). Substantial parallels can be seen as early as 1995 with Ant-Q's relationship with Q-learning, through 2022 with ADACO's connection with Policy Gradient. In this work, we describe ACO, more specifically the Stochastic Gradient Descent ACO algorithm (ACOSGD), explicitly as an off-policy Policy Gradient (PG) method. We also incorporate experience replay into several ACO algorithm variants, including AS, MaxMin-ACO, ACOSGD, ADACO, and our two policy gradient-based versions: PGACO and PPOACO, drawing the connection to elitist ACO strategies. We show that our implementation of PG in ACO with experience replay and a baselined reward update strategy applied to eight TSP problems of varying sizes performs competitively with both fundamental ACO and SGD-based ACO versions. We also show that the replay buffer seems to unilaterally improve the performance of ACO algorithms through an ablation study
https://doi.org/10.1145/3712256.3726452
http://www.w3.org/2000/01/rdf-schema#label
Ant Colony Optimization with Policy Gradients and Replay
https://doi.org/10.1145/3712256.3726452
https://w3id.org/fdof/ontology#hasMetadata
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78
https://doi.org/10.1145/3712256.3726452
https://www.w3.org/ns/dcat#contactPoint
john.sheppard@montana.edu
https://doi.org/10.1145/3712256.3726452
https://www.w3.org/ns/dcat#endDate
July 13 2025
https://doi.org/10.1145/3712256.3726452
https://www.w3.org/ns/dcat#startDate
2024
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/provenance
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/assertion
http://www.w3.org/ns/prov#wasAttributedTo
https://orcid.org/0009-0008-8411-2742
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/pubinfo
https://orcid.org/0009-0008-8411-2742
http://xmlns.com/foaf/0.1/name
Emily Regalado
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78
http://purl.org/dc/terms/created
2026-04-30T21:39:47.426Z
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78
http://purl.org/dc/terms/creator
https://orcid.org/0009-0008-8411-2742
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78
http://purl.org/dc/terms/license
https://creativecommons.org/licenses/by/4.0/
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78
http://purl.org/nanopub/x/introduces
https://doi.org/10.1145/3712256.3726452
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78
http://purl.org/nanopub/x/wasCreatedAt
https://nanodash.knowledgepixels.com/
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78
https://w3id.org/np/o/ntemplate/wasCreatedFromProvenanceTemplate
https://w3id.org/np/RA7lSq6MuK_TIC6JMSHvLtee3lpLoZDOqLJCLXevnrPoU
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78
https://w3id.org/np/o/ntemplate/wasCreatedFromPubinfoTemplate
https://w3id.org/np/RACJ58Gvyn91LqCKIO9zu1eijDQIeEff28iyDrJgjSJF8
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78
https://w3id.org/np/o/ntemplate/wasCreatedFromPubinfoTemplate
https://w3id.org/np/RAukAcWHRDlkqxk7H2XNSegc1WnHI569INvNr-xdptDGI
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78
https://w3id.org/np/o/ntemplate/wasCreatedFromTemplate
https://w3id.org/np/RArM5GTwgxg9qslGX-XiQ-KTTUwdoM0KB1YqmT4GqTizA
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/sig
http://purl.org/nanopub/x/hasAlgorithm
RSA
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/sig
http://purl.org/nanopub/x/hasPublicKey
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxzr6UBGMW6c8tegz0babaledWUEQ0PLDE4tp7Iinbe2DZtAtY5JUptKYuStWDZx+QER4808P8dejNWRnBDzgthYJm/AyNSXflHSJhz2+NC+h7RylOLxbwLEQocmyKKiYxa2gT85m6ajVL2M6TnfG67nnK+K2f7iCGL6wYXRITD1q+7+5SWqBdDXIV921W4IKWaD2GJk+NRBoOqQhbsrk8Tn5XsNd7DMYVHk47oMDGbeBnrOIoRPsbBgAcoCsxxhiB9yN6Lf8EUbnlXVEDzJuZk048L1BDZL+6nkA8btTQGP2ijUFWA7rTrod3LjUDQWLZS95njjl867dtmv/znYkzwIDAQAB
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/sig
http://purl.org/nanopub/x/hasSignature
QK0Uq0dM8EDClZWwK1iypzM5Jofx7eS22L4Yyk8y1QSVx7lJke+W4p4J+YgX6SyQ5ArHEcpoJHzdiV/fM2BzLoBO5d4TqI2fXMpyAdEa3MCZBkv2VnG7G27xSBbEEuYQQfKCdCuLpxFTUfq7u6U9225ODch4R53l2xXGGJPhzvwuwAFxphAzJcrDZo8NzhyHbYq3Mp7Y0FZUbbAF6GBwK/qxrRVuUNuhVE6+EMSo9o3cATE/pb5B5YMkOSY2GYfsThybCKX0FETh5T5L8pp4AY3kA8aCW42ZpH0511DkuMpDNvyDArvBmj85jLc7wJaJPV8n2NtpbChXFOrjlMWIug==
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/sig
http://purl.org/nanopub/x/hasSignatureTarget
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78
https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/sig
http://purl.org/nanopub/x/signedBy
https://orcid.org/0009-0008-8411-2742