Artificial intelligence (AI) is increasingly deployed in commercial situations. Consider for example using AI to set prices of insurance products to be sold to a particular customer. There are legitimate reasons for setting different prices for different people, but it may also be profitable to ‘game’ their psychology or willingness to shop around.
The AI has a vast number of potential strategies to choose from, but some are unethical – by which researchers mean, from an economic point of view, that there is a risk that stakeholders will apply some penalty, such as fines or boycotts, if they subsequently understand that such a strategy has been used. Such penalties can be huge: although these happened too early for an AI to be involved, the penalties levied on banks for misconduct are currently estimated to be over $276 billion.
In an environment in which decisions are increasingly made without human intervention, there is therefore a strong incentive to know under what circumstances AI systems might adopt unethical strategies. Society and governments are closely engaged in such issues. Principles for ethical use of AI have been adopted at national and international levels and the area of AI ethics is one of very considerable activity. Recent work has proposed a framework for developing algorithms that avoid undesirable outcomes.
Ideally there would be no unethical strategies in the AI’s strategy space. But the best that can be achieved may be to have only a small fraction of such strategies being unethical. Unfortunately this runs up against the unethical optimization principle, which researchers formulate as: if an AI aims to maximize risk-adjusted return, then under mild conditions it is disproportionately likely to pick an unethical strategy unless the objective function allows sufficiently for this risk.