scholarly journals Provably Safe Artificial General Intelligence via Interactive Proofs

Author(s):  
Kristen Carlson

Methods are currently lacking to prove artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation AGI1 rapidly triggers a succession of more powerful AGIn that differ dramatically in their computational capabilities (AGIn≪AGIn+1). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2-100). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. In toto, IPS provides a way to reduce AGIn↔AGIn+1 interaction hazards to an acceptably low level.

Philosophies ◽  
2021 ◽  
Vol 6 (4) ◽  
pp. 83
Author(s):  
Kristen Carlson

Methods are currently lacking to prove artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation AGI1 rapidly triggers a succession of more powerful AGIn that differ dramatically in their computational capabilities (AGIn << AGIn+1). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2−100). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. In toto, IPS provides a way to reduce AGIn ↔ AGIn+1 interaction hazards to an acceptably low level.


Impact ◽  
2019 ◽  
Vol 2019 (10) ◽  
pp. 30-32
Author(s):  
Tomoyuki Morimae

In cloud quantum computing, a classical client delegate quantum computing to a remote quantum server. An important property of cloud quantum computing is the verifiability: the client can check the integrity of the server. Whether such a classical verification of quantum computing is possible or not is one of the most important open problems in quantum computing. We tackle this problem from the view point of quantum interactive proof systems. Dr Tomoyuki Morimae is part of the Quantum Information Group at the Yukawa Institute for Theoretical Physics at Kyoto University, Japan. He leads a team which is concerned with two main research subjects: quantum supremacy and the verification of quantum computing.


2021 ◽  
pp. 1-6
Author(s):  
Scott McLean ◽  
Gemma J. M. Read ◽  
Jason Thompson ◽  
P. A. Hancock ◽  
Paul M. Salmon

2021 ◽  
Vol 13 (1) ◽  
pp. 1-25
Author(s):  
Dmitry Itsykson ◽  
Alexander Okhotin ◽  
Vsevolod Oparin

The partial string avoidability problem is stated as follows: given a finite set of strings with possible “holes” (wildcard symbols), determine whether there exists a two-sided infinite string containing no substrings from this set, assuming that a hole matches every symbol. The problem is known to be NP-hard and in PSPACE, and this article establishes its PSPACE-completeness. Next, string avoidability over the binary alphabet is interpreted as a version of conjunctive normal form satisfiability problem, where each clause has infinitely many shifted variants. Non-satisfiability of these formulas can be proved using variants of classical propositional proof systems, augmented with derivation rules for shifting proof lines (such as clauses, inequalities, polynomials, etc.). First, it is proved that there is a particular formula that has a short refutation in Resolution with a shift rule but requires classical proofs of exponential size. At the same time, it is shown that exponential lower bounds for classical proof systems can be translated for their shifted versions. Finally, it is shown that superpolynomial lower bounds on the size of shifted proofs would separate NP from PSPACE; a connection to lower bounds on circuit complexity is also established.


Sign in / Sign up

Export Citation Format

Share Document