J.P. Morgan: reinforcement learning in electronic trading

The globalization of asset trading, the emergence of ultrafast information technology and lightning fast communications made it impossible for humans to efficiently compete in the routine low-level decision making process. Today most micro-level trading decisions in equities and electronic future contracts are made by algorithms: they define where to trade, at what price, and what quantity.

In a recent paper, a research team from J.P. Morgan outlined the idiosyncrasies of neural information processing and machine learning in quantitative finance. They also present some of the approaches they take towards solving the fundamental challenges faced.

Data modelling culture

This culture is characterized by a belief that nature (and financial markets) can be described as a black box with a relatively simple model inside which actually generates the observational data. The task of quantitative finance is to find a plausible functional approximation for this data generating process, a quantitative model, and to extract its parameters from the data. The output of the model is then fed into quantitative decision-making processes. Complexity of markets and behaviours of market participant present the main challenge to the data modelling culture: simple models do not necessarily capture all essential properties of the environment. One can argue that simple models often give a false sense of certainty, and for this reason are prone to abject failures.

Machine learning culture

For the machine learning culture an agnostic approach is taken to the question whether nature and financial markets are simple. Researchers have good reason to suspect that it is not: empirically the world of finance looks more Darwinian than Newtonian: it is constantly evolving, and observed processes including trading in electronic markets are best described as emerging behaviours rather than data generating machines. In the machine learning culture complex and sometimes opaque functions are used to model the observations. Researchers don’t claim that these functions reveal the nature of the underlying processes. As in the data modelling culture, machine learning models are built and their output is fed into decision-making processes. Complex models are prone to failures as well: risk of the model failure increases with its complexity.

Algorithmic decision-making culture

Here researchers’ focus is on decision-making rather than on model-building. They bypass the stage of learning “how the world works” and proceed directly to training electronic agents to distinguish good decisions from bad decisions. The challenge presented by this approach is in the ability to understand and explain the decisions the algorithmic agent takes, to make sense of its policies, and to be able to ensure that the agent produces sensible actions in all, including hypothetical, environments. In the algorithmic decision-making culture the agent learns that certain actions are bad because they lead to negative outcomes. But they still have to inject values and rules and constraints that steer the agent away from taking actions which they view as prohibited (malum prohibitum) but which the agent cannot learn from its environment and history.

In this paper researchers show the interplay between the agent’s constraints and rewards in one practical application of reinforcement learning. They will also give an overview of specific challenges and how they tackle them using computational resources and the many achievements of other AI teams across many industries and in academia.

Read the full paper

Related Posts

Previous Post
NY DFS approves digital payment platform Signet
Next Post
Insurers, asset managers, banks among firms completing CordaKYC trial

Fill out this field
Fill out this field
Please enter a valid email address.

X

Reset password

Create an account