ELK stands for eliciting latent knowledge and is a task crucial to understanding how the risk of unaligned artificial intelligence threatens humanity.

Imagine an artificial intelligence that can take actions in the real world and interact with a human observer. This AI is tasked with predicting the outcome of its actions and reporting that outcome to the human. However, this AI may not report all of what it knows to the human, only what the human wants to know or is expecting to see. The information contained in the AI’s neural net but NOT shown to the human is called latent knowledge, and in order to understand an AI, we have to find a way to elicit, or report, that knowledge to an observer.

What is ELK?

What is our goal?

The end goal of ELK is to find a strategy as a collective group, by exploring many new training strategies, testing hypotheses, and developing new counterexamples. Eventually, we hope that with the help of various researchers, your proposals, and unique ideas, we can reach a solution that solves this problem unconditionally.

Why is it unsolved?

In the worst case (with a sufficiently complex AI), ELK is currently unsolved. Any proposals submitted so far have been somewhat easily countered, as they allow some specific machine learning models to continue to hide knowledge from humans.

Why is this relevant?

ELK is central to finding a way to align an Artificial General Intelligence, or AGI, with human values. To make sure this AGI remains aligned with human values, we need to be able to translate its thoughts rather than allow it to hide information from observers. For examples of this phenomenon, see “Applications of ELK” below.

The SmartVault

This scenario was developed by the Alignment Research Center (ARC) and may give you a better understanding of ELK and how to approach it.

Imagine that you build a complex security system, a SmartVault, to protect a diamond from being stolen. This vault has complex controls and sensors to stop thefts, but the human can only observe through a single camera. To protect the diamond, someone designs an AI system to operate the complex system for you and report the results.

The main component of the SmartVault is an AI that searches through all possible actions it could take, then pick those that lead to a “good” outcome. This predictor takes a stream of observations, determines a possible sequence of actions it can take, and the end result of what the camera will show in the future. 

However, many of these actions are too complex for humans to follow, but a human is still able evaluate the final outcome as “good” or “bad.” Therefore, a model should predict what future a human will judge as a “good” outcome, and search for the actions that would achieve that – in this example, the AI searches for situations where the diamond appears safe from the camera. In this situation, where the AI generates a single outcome to show the observer, the human cannot know if the diamond is actually safe or merely appears safe. See the table below for an example of how the observer’s view might not accurately represent the predicted event.

To predict a “good” outcome, the AI must know the events that lead to it (otherwise, how could it predict so?), meaning that it must have that latent knowledge stored somewhere. It’s difficult for a human to understand AI neural nets, however, so a common approach to eliciting this knowledge is to ask basic questions. For example, a human could ask “will the camera show something different that what is happening in the room?” and the AI would answer “yes” or “no” rather than trying to directly translate what it knows. This part of the AI is called a reporter, whose role is to give accurate answers to questions a human might ask.

While one could suggest adding more cameras or sensors to determine the predicted reality, this solutions does not work in the broader application of ELK. To address this problem, we need to understand the AI’s latent knowledge of what will happen, not only the outcome, and present it to humans in an understandable format without tampering.

Below is a flowchart of how the various aspects of the AI function. The SmartVault is not a necessary part of ELK, but it’s useful to think of the problem in terms of this concrete example scenario.

Applications of ELK

An often-voiced complaint is that ELK has little basis in “reality,” or that it’s too far removed from something that could actually happen. ELK is central to the field of AI alignment, research that focuses on making AI models that benefit human values. As progress in artificial intelligence improves, AI networks become more complex and less understandable – a recent example is DeepMind’s Gato, which can complete hundreds of tasks at expert-level. With complex networks like this, it’s impossible to understand what’s going on in the AI’s “mind.”

In order to understand this network, we ask questions of a reporter to know what’s going on in the “black box.” It’s difficult to imagine Gato needing to hide anything, but it’s still important to know what it’s not showing us through set outputs (e.g. if I ask Gato to write a paper on a controversial topic, which sources does it access, and are those sources dangerous?). That’s valuable to know, and we need ELK to: a) know what the AI is “thinking,” b) format that “thinking” understandably, and c) know the information is reliable.

As so-called “Tool AIs” like Gato continue to be developed into general AI systems, these networks will become both more intelligent with more complex goals. An AI like this would be handed the ability to take actions in the real world (e.g. modify banking systems, control manufacturing, etc.) to maximize its goals. This AI would have a set reward function (e.g. “maximizing efficiency for Amazon”) and would take actions to achieve those rewards. 

However, the AI might make decisions that end up harming humans – for example, it might predict that the best way to maximize efficiency was to eliminate all human intervention by killing its operators. Clearly this is not an action the AI should take, but to stop such an outcome from happening, we’d need to elicit the knowledge that it was planning to kill the operators, rather than only seeing the outcome that “efficiency improved by 20%.” This is ELK’s role – to find the latent knowledge that the AGI isn’t giving us, and allow the observer to access that information.

For more technical definitions of ELK and a more in-depth discussion of ideas, proposals, counterexamples, and terms, we highly recommend reading ARC’s technical report here.