Today I read a paper titled “Bayesian Opponent Exploitation in Imperfect-Information Games”
The abstract is:
The two most fundamental problems in computational game theory are computing a Nash equilibrium and learning to exploit opponents given observations of their play (aka opponent exploitation).
The latter is perhaps even more important than the former: Nash equilibrium does not have a compelling theoretical justification in game classes other than two-player zero-sum, and furthermore for all games one can potentially do better by exploiting perceived weaknesses of the opponent than by following a static equilibrium strategy throughout the match.
The natural setting for opponent exploitation is the Bayesian setting where we have a prior model that is integrated with observations to create a posterior opponent model that we respond to.
The most natural, and a well-studied prior distribution is the Dirichlet distribution.
An exact polynomial-time algorithm is known for best-responding to the posterior distribution for an opponent assuming a Dirichlet prior with multinomial sampling in the case of normal-form games; however, for the case of imperfect-information games the best known algorithm is a sampling algorithm based on approximating an infinite integral without theoretical guarantees.
The main result is the first exact algorithm for accomplishing this in imperfect-information games.
We also present an algorithm for another natural setting where the prior is the uniform distribution over a polyhedron.