Time consistent reinforcement learning for optimal consumption applied to SPY

The question of how to optimize consumption of a resource today versus saving it for future consumption remains one of the most fundamental economic problems impacting a wide array of consumer and business use cases, from budgeting and retirement planning to shaping economic policy and climate change.

A central challenge in formulating new utility functions for stochastic consumption problems is addressing the issue of time-inconsistency, which can lead to non-robust reinforcement learning approximations of the optimal policy. Time-consistency has emerged as a central theory in monetary risk measure theory.

Researchers from the Illinois Institute of Technology and EDHEC-Risk Institute present a class of least squares reinforcement learning algorithms for optimal consumption under elasticity of intertemporal substitution and risk aversion preferences. The classical setting of Epstein-Zin utility preferences is cast into a dynamic utility functional framework and shown to exhibit time consistency.

As a dynamic utility function, they find the robust approximation of the optimal consumption problem as a discrete time Markov Decision Process. They present a least-squares Q-Learning algorithm suitable for non-linear monotone certainty equivalents and benchmark its policy estimation convergence properties on an optimal wealth consumption problem against Least Squares Monte-Carlo and binomial tree methods. Finally, researchers demonstrate their least-squares Q-learning algorithm on an optimal consumption problem applied to SPDR S&P 500 ETF Trust (SPY) data.

The least-squares Q-learning algorithm is sufficiently general to approximate a wide class of optimal consumption problems. When combined with other control variables, the approach is expected to be relevant to a broad class of wealth management problems where having a model-free approach combined with the ability to express a client’s intertemporal elasticity of substitution is important. For example, robo-advisor applications which can subsequently customize solutions and financial products in anticipation of different interest rate regimes and stock market environments.

Read the full paper

Related Posts

Previous Post
CFTC flags swaps rules for new bridge banks after SVB, Signature failures
Next Post
S&P GMI: March 2023 long/short report

Fill out this field
Fill out this field
Please enter a valid email address.


Reset password

Create an account