As a branch of machine learning, reinforcement learning is an approach towards training a machine to find an optimal policy for a stochastic control system, without explicitly building a model for the system. In this context, the search for optimal policies is organized around the search for the optimal value function.
Finance is a particularly rich area for the optimal policy framework, with a number of central problems in finance focusing on such an analytical process. One example would be a derivative that can be priced by replication, where the price is given by the value function of the dynamic replicating portfolio strategy.
Another fruitful area for reinforcement learning involves optimal execution problems, including the Almgren–Chriss model and its extensions. The associated techniques may be used to train agents that will execute or hedge in an optimal manner. Part of Ritter’s research focuses on strategy performance and he finds that the optimization holds true in the presence of market impact, regardless of the type of model that is under consideration.
Much of traditional finance theory is based on certain assumptions that do not hold in the real world. No-arbitrage assumptions, for example, are typically considered as a given in a world free of transaction costs. The problem is that the analysis of real world hedging will fall apart under such assumptions. However, with their ability to tackle more nuances and parameters within the operational environment, reinforcement learning algorithms can produce dynamic hedging strategies that are optimal, even in a world with frictions.
Another situation where reinforcement learning has something to offer involves order execution problems. If we are examining an optimal order execution problem and refer to Almgren and Chriss, we will see that we should execute an order of size X while optimizing the mean-variance form of expected utility. This may sound quite abstract, but in fact, many buy-side quant traders are indeed optimizing forms of mean-variance in the presence of trading costs. In this way, they are solving a problem that is quite similar to optimal order execution, with the additional complication of needing to consider transaction costs and their impact on return predictions as well.