[Disclaimer: While I have dabbled in machine learning, I do not consider myself an expert.] Introduction When introducing newcomers to the idea of AI existential risk, a typical story of destruction will involve some variation of the “paperclip maximiser” story. The idea is that some company wishes to use an AGI to perform some seemingly simple and innocuous task, such as producing paperclips. So they set the AI with a goal function of maximizing paperclips. But, foolishly, they haven’t realized that taking over the world and killing all the humans would allow it to maximize paperclips, so it deceives them into thinking it’s friendly until it gets a chance to defeat humanity and tile the universe with paperclips (or wiggles that the AI interprets as paperclips under it's own logic).
Unlike humans, it is not really possible for an AI to have multiple goals, or at least not in the same way as humans. To an AI, everything it knows about is commensurable, everything is just a number. We can get bound up in whether or not spending money on a new medical procedure is worth 1 life, an AI can't, to an AI the decision boils down to which number is bigger. and if they are the same, it can just consult its random number generator to decide.
I have been persuaded that non-fanatical AGI optimizers might not even be possible. The problem is that, if you say, I only want "about" 1000 paperclips, an AGI will first decide that you mean "at least 980" paperclips. It will then realise that, if it only builds 1 paperclip making machine then that might break down before making 980 paperclips but if it builds 2 its chances of making at least 980 paperclips increases to 99% and if it builds 1000 machines its chances of making at least 980 paperclips increases to 99.99999% and, if it takes over the world, its chances of creating at least 980 paperclips increases to 99.99999999999999% and since this is the highest probability of doing what it was asked, that is what it does :-(
Of course you say that the first AI wont have the capability to reason that far but, one day it gets a memory or CPU upgrade and suddenly it does :-(
Unlike humans, it is not really possible for an AI to have multiple goals, or at least not in the same way as humans. To an AI, everything it knows about is commensurable, everything is just a number. We can get bound up in whether or not spending money on a new medical procedure is worth 1 life, an AI can't, to an AI the decision boils down to which number is bigger. and if they are the same, it can just consult its random number generator to decide.
I have been persuaded that non-fanatical AGI optimizers might not even be possible. The problem is that, if you say, I only want "about" 1000 paperclips, an AGI will first decide that you mean "at least 980" paperclips. It will then realise that, if it only builds 1 paperclip making machine then that might break down before making 980 paperclips but if it builds 2 its chances of making at least 980 paperclips increases to 99% and if it builds 1000 machines its chances of making at least 980 paperclips increases to 99.99999% and, if it takes over the world, its chances of creating at least 980 paperclips increases to 99.99999999999999% and since this is the highest probability of doing what it was asked, that is what it does :-(
Of course you say that the first AI wont have the capability to reason that far but, one day it gets a memory or CPU upgrade and suddenly it does :-(