Today I read a paper titled “Reinforcement Learning for Adaptive Routing”
The abstract is:
Reinforcement learning means learning a policy–a mapping of observations into actions–based on feedback from the environment.
The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment.
We present an application of gradient ascent algorithm for reinforcement learning to a complex domain of packet routing in network communication and compare the performance of this algorithm to other routing methods on a benchmark problem.