The question of how to optimize consumption of a resource today versus saving it for future consumption remains one of the most fundamental economic problems impacting a wide array of consumer and business use cases, from budgeting and retirement planning to shaping economic policy and climate change.
A central challenge in formulating new utility functions for stochastic consumption problems is addressing the issue of time-inconsistency, which can lead to non-robust reinforcement learning approximations of the optimal policy. Time-consistency has emerged as a central theory in monetary risk measure theory.
Researchers from the Illinois Institute of Technology and EDHEC-Risk Institute present a class of least squares reinforcement learning algorithms for optimal consumption under elasticity of intertemporal substitution and risk aversion preferences. The classical setting of Epstein-Zin utility preferences is cast into a dynamic utility functional framework and shown to exhibit time consistency.
As a dynamic utility function, they find the robust approximation of the optimal consumption problem as a discrete time Markov Decision Process. They present a least-squares Q-Learning algorithm suitable for non-linear monotone certainty equivalents and benchmark its policy estimation convergence properties on an optimal wealth consumption problem against Least Squares Monte-Carlo and binomial tree methods. Finally, researchers demonstrate their least-squares Q-learning algorithm on an optimal consumption problem applied to SPDR S&P 500 ETF Trust (SPY) data.
The least-squares Q-learning algorithm is sufficiently general to approximate a wide class of optimal consumption problems. When combined with other control variables, the approach is expected to be relevant to a broad class of wealth management problems where having a model-free approach combined with the ability to express a client’s intertemporal elasticity of substitution is important. For example, robo-advisor applications which can subsequently customize solutions and financial products in anticipation of different interest rate regimes and stock market environments.