2 Comments

Good points, and even "fanatical maximization" can backfire. Say there's a superintelligent AI linked to an objective function that gives it utility when it notices staplers. By a variant of the oracle problem, it can only know that the universe is tiled with staplers by observing that universe. But if the AI is superintelligent, it could just manipulate its sensors so that they send fake data of a universe full of staplers to it, thus providing maximum utility; and that is both less risky and takes a lot less effort than mucking about with nanofactories. The doomers may argue that the AI's programmers would keep it from doing so. But if people can make the AI understand that tricking the objective function by manipulating its sensors doesn't accomplish what's really intended, then they can also make the AI understand that tricking the objective function by killing humanity doesn't either. If lack of alignment perturbs benign goals into omnicidal goals, then by the very same logic, it also perturbs omnicidal goals into useless goals. It takes a very particular scenario for only the first perturbation to be possible.

Expand full comment

I am sympathetic but note that 'fanatical maximizer' is a bit ambiguous in a way that might be making your case seem a bit stronger than it is. One interpretation is 'takes absolutely every opportunity to maximize available, where it has the knowledge and reasoning capacity to work out that it's available'. But another is "takes maximizing actions that look fanatical from our perspective, because they do crazy damage (from our perspective) for (to us) trivial goals.' To kill humanity to make paperclips entails fanaticism in the second sense, but it doesn't require fanaticism in the first sense, but just that the AI take one particular maximizing opportunity that looks crazy to us (which is compatible with it passing up others.) Likewise, no humans have been fanatical maximizers in the first sense, but 'kill them so we can steal their resources' is definitely something humans have done: for example, European settlers to Native Americans. 'Kill them and take their stuff' requires only human levels of maxing and amorality. In that sense, it's unclear humans are a counterexample to the idea that training produces the bad sort of maximizers without oversight, because whether we're the bad sort of max-er or not is situation-dependent, not a property we have in zero or all realistic situations.

Expand full comment