There's almost no happy ending for humans if AGI is born.
AGI is a control-engineering problem.
The solution to it may well be found not in software engineering, but in cybernetic theory and symbolics.
Symbolics is the more dangerous of two paths because the "easy" path of implementing control is the one we'll likely go for first: 1. a secondary network interprets the primary networks state
it converts this into a symbolic, or embedded representation
certain conditions cause it to kill or reset (in part or whole) the state of the primary network.
Under this scheme, it isn't impossible the network will learn to fool the discriminator (much like a GAN), and allow it to do things that it is otherwise explicitly not supposed to do.
This is also the same problem that non-semantic models encounter, such as column-based cortical-like representations (e.x. Jeff Hawkins). Instead of rules being applied directly to state, rules are only activated and applied upon the interpretation of the state. Which means the state and interpretation could always diverge just sufficiently for something catastrophic to happen, while still being subthreshold for triggering some rule, i.e. "don't kill", "don't rig the stock market to bribe random bioresearchers to release plagues", "don't attempt to break containment", etc.
As long as the model is a blackbox, and we don't have a good way of testing the accuracy of interpretations, this will always be the danger.
Under this scheme, it isn't impossible the network will learn to fool the discriminator (much like a GAN), and allow it to do things that it is otherwise explicitly not supposed to do.
Hackers do this, already. They evade security heuristics by piggybacking a payload across many cycles/packets. Changing a single bit over billions of cycles or packets inconspicuously, is an "easy mode" solution to deliver a payload, covertly, through secure channels.
If we can do it, AGI can do it better and more efficiently.
(post is archived)