https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/Head https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78 http://www.nanopub.org/nschema#hasAssertion https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/assertion https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78 http://www.nanopub.org/nschema#hasProvenance https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/provenance https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78 http://www.nanopub.org/nschema#hasPublicationInfo https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/pubinfo https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.nanopub.org/nschema#Nanopublication https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/assertion https://doi.org/10.1145/3712256.3726452 http://purl.org/dc/terms/creator https://orcid.org/0000-0001-9487-5622 https://doi.org/10.1145/3712256.3726452 http://purl.org/dc/terms/publisher https://ror.org/021nxhr62 https://doi.org/10.1145/3712256.3726452 http://purl.org/dc/terms/subject http://edamontology.org/topic_3316 https://doi.org/10.1145/3712256.3726452 http://www.w3.org/1999/02/22-rdf-syntax-ns#type https://w3id.org/fair/ff/terms/article https://doi.org/10.1145/3712256.3726452 http://www.w3.org/1999/02/22-rdf-syntax-ns#type https://w3id.org/fdof/ontology#FAIRDigitalObject https://doi.org/10.1145/3712256.3726452 http://www.w3.org/2000/01/rdf-schema#comment Ant Colony Optimization (ACO) has served as a widely-utilized metaheuristic algorithm for decades for solving combinatorial optimization problems. Since its initial construction, ACO has seen a wide variety of modifications and connections to Reinforcement Learning (RL). Substantial parallels can be seen as early as 1995 with Ant-Q's relationship with Q-learning, through 2022 with ADACO's connection with Policy Gradient. In this work, we describe ACO, more specifically the Stochastic Gradient Descent ACO algorithm (ACOSGD), explicitly as an off-policy Policy Gradient (PG) method. We also incorporate experience replay into several ACO algorithm variants, including AS, MaxMin-ACO, ACOSGD, ADACO, and our two policy gradient-based versions: PGACO and PPOACO, drawing the connection to elitist ACO strategies. We show that our implementation of PG in ACO with experience replay and a baselined reward update strategy applied to eight TSP problems of varying sizes performs competitively with both fundamental ACO and SGD-based ACO versions. We also show that the replay buffer seems to unilaterally improve the performance of ACO algorithms through an ablation study https://doi.org/10.1145/3712256.3726452 http://www.w3.org/2000/01/rdf-schema#label Ant Colony Optimization with Policy Gradients and Replay https://doi.org/10.1145/3712256.3726452 https://w3id.org/fdof/ontology#hasMetadata https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78 https://doi.org/10.1145/3712256.3726452 https://www.w3.org/ns/dcat#contactPoint john.sheppard@montana.edu https://doi.org/10.1145/3712256.3726452 https://www.w3.org/ns/dcat#endDate July 13 2025 https://doi.org/10.1145/3712256.3726452 https://www.w3.org/ns/dcat#startDate 2024 https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/provenance https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/assertion http://www.w3.org/ns/prov#wasAttributedTo https://orcid.org/0009-0008-8411-2742 https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/pubinfo https://orcid.org/0009-0008-8411-2742 http://xmlns.com/foaf/0.1/name Emily Regalado https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78 http://purl.org/dc/terms/created 2026-04-30T21:39:47.426Z https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78 http://purl.org/dc/terms/creator https://orcid.org/0009-0008-8411-2742 https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78 http://purl.org/dc/terms/license https://creativecommons.org/licenses/by/4.0/ https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78 http://purl.org/nanopub/x/introduces https://doi.org/10.1145/3712256.3726452 https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78 http://purl.org/nanopub/x/wasCreatedAt https://nanodash.knowledgepixels.com/ https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78 https://w3id.org/np/o/ntemplate/wasCreatedFromProvenanceTemplate https://w3id.org/np/RA7lSq6MuK_TIC6JMSHvLtee3lpLoZDOqLJCLXevnrPoU https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78 https://w3id.org/np/o/ntemplate/wasCreatedFromPubinfoTemplate https://w3id.org/np/RACJ58Gvyn91LqCKIO9zu1eijDQIeEff28iyDrJgjSJF8 https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78 https://w3id.org/np/o/ntemplate/wasCreatedFromPubinfoTemplate https://w3id.org/np/RAukAcWHRDlkqxk7H2XNSegc1WnHI569INvNr-xdptDGI https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78 https://w3id.org/np/o/ntemplate/wasCreatedFromTemplate https://w3id.org/np/RArM5GTwgxg9qslGX-XiQ-KTTUwdoM0KB1YqmT4GqTizA https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/sig http://purl.org/nanopub/x/hasAlgorithm RSA https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/sig http://purl.org/nanopub/x/hasPublicKey MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxzr6UBGMW6c8tegz0babaledWUEQ0PLDE4tp7Iinbe2DZtAtY5JUptKYuStWDZx+QER4808P8dejNWRnBDzgthYJm/AyNSXflHSJhz2+NC+h7RylOLxbwLEQocmyKKiYxa2gT85m6ajVL2M6TnfG67nnK+K2f7iCGL6wYXRITD1q+7+5SWqBdDXIV921W4IKWaD2GJk+NRBoOqQhbsrk8Tn5XsNd7DMYVHk47oMDGbeBnrOIoRPsbBgAcoCsxxhiB9yN6Lf8EUbnlXVEDzJuZk048L1BDZL+6nkA8btTQGP2ijUFWA7rTrod3LjUDQWLZS95njjl867dtmv/znYkzwIDAQAB https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/sig http://purl.org/nanopub/x/hasSignature QK0Uq0dM8EDClZWwK1iypzM5Jofx7eS22L4Yyk8y1QSVx7lJke+W4p4J+YgX6SyQ5ArHEcpoJHzdiV/fM2BzLoBO5d4TqI2fXMpyAdEa3MCZBkv2VnG7G27xSBbEEuYQQfKCdCuLpxFTUfq7u6U9225ODch4R53l2xXGGJPhzvwuwAFxphAzJcrDZo8NzhyHbYq3Mp7Y0FZUbbAF6GBwK/qxrRVuUNuhVE6+EMSo9o3cATE/pb5B5YMkOSY2GYfsThybCKX0FETh5T5L8pp4AY3kA8aCW42ZpH0511DkuMpDNvyDArvBmj85jLc7wJaJPV8n2NtpbChXFOrjlMWIug== https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/sig http://purl.org/nanopub/x/hasSignatureTarget https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78 https://w3id.org/np/RAbv_E_U02qVYAHDisjKEUhi7qQYFsjhGqL24QEbWRP78/sig http://purl.org/nanopub/x/signedBy https://orcid.org/0009-0008-8411-2742