21 points | by jivaprime 3 days ago
4 comments
TSP = Travelling Salesman Problem (https://en.wikipedia.org/wiki/Travelling_salesman_problem)
PPO = Proximal Policy Optimisation, a reinforcement learning algorithm (https://en.wikipedia.org/wiki/Proximal_Policy_Optimization)
Thanks. Was wondering if this was about my federal thrift savings plan.
Sorry if I am harsh, but a 1200 node tsp problem is a toy problem. We can find proven optimal solutions to these in a fraction of the time you spent.
RL is probably best suited for uncertainty infected instances.
Out of curiosity I solved it with the concorde solver in the Neos server.
In 58s its heuristic found a solution 0.037% away from optimal, and in 943s it found and proved the optimal solution.
(This is with 3GB of ram and 4 threads of an Intel Xeon E5-2698 @ 2.3GHz aka a 30yo algorithm on a 10 yo machine)
TSP = Travelling Salesman Problem (https://en.wikipedia.org/wiki/Travelling_salesman_problem)
PPO = Proximal Policy Optimisation, a reinforcement learning algorithm (https://en.wikipedia.org/wiki/Proximal_Policy_Optimization)
Thanks. Was wondering if this was about my federal thrift savings plan.
Sorry if I am harsh, but a 1200 node tsp problem is a toy problem. We can find proven optimal solutions to these in a fraction of the time you spent.
RL is probably best suited for uncertainty infected instances.
Out of curiosity I solved it with the concorde solver in the Neos server.
In 58s its heuristic found a solution 0.037% away from optimal, and in 943s it found and proved the optimal solution.
(This is with 3GB of ram and 4 threads of an Intel Xeon E5-2698 @ 2.3GHz aka a 30yo algorithm on a 10 yo machine)