Time consistent reinforcement learning for optimal consumption applied to SPY

March 16, 2023

The question of how to optimize consumption of a resource today versus saving it for future consumption remains one of the most fundamental economic problems impacting a wide array of consumer and business use cases, from budgeting and retirement planning to shaping economic policy and climate change.

A central challenge in formulating new utility functions for stochastic consumption problems is addressing the issue of time-inconsistency, which can lead to non-robust reinforcement learning approximations of the optimal policy. Time-consistency has emerged as a central theory in monetary risk measure theory.

Researchers from the Illinois Institute of Technology and EDHEC-Risk Institute present a class of least squares reinforcement learning algorithms for optimal consumption under elasticity of intertemporal substitution and risk aversion preferences. The classical setting of Epstein-Zin utility preferences is cast into a dynamic utility functional framework and shown to exhibit time consistency.

As a dynamic utility function, they find the robust approximation of the optimal consumption problem as a discrete time Markov Decision Process. They present a least-squares Q-Learning algorithm suitable for non-linear monotone certainty equivalents and benchmark its policy estimation convergence properties on an optimal wealth consumption problem against Least Squares Monte-Carlo and binomial tree methods. Finally, researchers demonstrate their least-squares Q-learning algorithm on an optimal consumption problem applied to SPDR S&P 500 ETF Trust (SPY) data.

The least-squares Q-learning algorithm is sufficiently general to approximate a wide class of optimal consumption problems. When combined with other control variables, the approach is expected to be relevant to a broad class of wealth management problems where having a model-free approach combined with the ability to express a client’s intertemporal elasticity of substitution is important. For example, robo-advisor applications which can subsequently customize solutions and financial products in anticipation of different interest rate regimes and stock market environments.

Read the full paper

Username

Password

Remember me


Reset password
Find user name
Create an account

Member Login

Time consistent reinforcement learning for optimal consumption applied to SPY

Click here to cancel reply.

Member Login

Finadium

Contact Us

Time consistent reinforcement learning for optimal consumption applied to SPY

Related Posts

Click here to cancel reply.

Member Login

Finadium

Contact Us