Registered users can unlock up to five pieces of premium content each month.
Training AI Systems with Real-Time Human Input |
NEWS |
Artificial Intelligence (AI), in the guise of Deep Neural Networks (DNNs), is ever more present in industry, with myriad AI systems applied to all sorts of problems and to all kinds of situations. As pattern-recognition algorithms, though—and this is what DNNs effectively effect (no pun intended)—AI systems can run into many problems. The well-publicized instances in which AI algorithms return clearly biased outcomes just one example, but it is a unmistakable one. When such outcomes take part in a decision-making process of some kind—e.g., in deciding whether to award a loan to an applicant—it is not hard to conclude that biased outcomes are manifestly unfair; unfair decisions can easily produce harmful states of affairs. Moreover, the underlying algorithms that put inputs and outputs together are typically not explainable as it is usually not possible to know how the algorithm reached the decision it has returned. In many cases, AI vendors often find themselves between the proverbial rock and a hard place. AI systems may yield many benefits, but often the outcomes are simply unacceptable. Enter Anthropic and AI Redefined, two startups that aim to bring about safer, more interpretable AI systems by involving humans and human input in the training process that AI systems undertake in their pattern-finding quest. The idea seems to be that, in providing demonstrations and feedback, human input will make for better AI systems overall when using an approach that is based on incorporating humans in a “reinforcement-learning” training session along with AI agents (avatars of an AI system that a human user can interact with).
Human-Involved AI Can Make for Better AI Agents |
IMPACT |
How is this supposed to work, then? A method that is concerned with how intelligent agents ought to take actions in a given environment to maximize cumulative reward, Reinforcement Learning (RL) is one of the main paradigms in machine learning. It works by finding a balance between exploration (of the environment) and exploitation (of rewards), and to this end a task needs to have well-defined rewards, actions, and so on. Such an approach has been rather efficient at dealing with games (chess, PC games, etc.), but it can be generalized to many other settings. Some AI researchers even believe that reinforcement learning can underpin human learning in some cases, though this is rather doubtful. In a saner vein, RL offers an appropriate setting to implement a human-AI collaborative enterprise where the goal would be for the AI agent to adapt to humans—an undertaking that can be extrapolated to fields such as robotics as in the case of collaborative robots (or cobots) that learn and adapt from their interaction with humans.
As a result, we have Cogment, AI Redefined’s open-source platform for human-AI collaboration—a shared environment for humans and AI agents where humans provide input to AI agents during training sessions in a real-time, continuous fashion. A human-in-the-loop kind of training, AI Redefined hopes that this kind of human-involved RL will result in better AI. Consequently, we have the research conducted by Anthropic, an AI safety and research company that is squarely focused on building reliable, interpretable AI systems. The aim is to create AI that is “aligned”—that is, an AI that is helpful, honest, and harmless. To this end, Anthropic has recently released research (arXiv:2112.00861 [cs.CL]) involving the interactions between human users and large language models generated by AI processes where it was found that human feedback can improve models, and therefore the aim would be to take maximal advantage of human feedback data. This work was motivated by the problem of technical AI alignment with the specific goal of training a natural language agent that is helpful, honest, and harmless, but this may have the potential for broader impacts from AI—and from language models in particular—especially if progress in the market continues at its current rapid pace. It is worth noting that AI Redefined has recently joined Confiance.ai, a research project of the French government to create trustworthy AI systems, while Anthropic has managed to raise US$124 million in Series A financing for the purpose of building more reliable AI systems.
Toward Safer, Interpretable AI Systems—Maybe |
RECOMMENDATIONS |
But will this work at all? Though the underlying aim is a laudable one—and if achieved, it is guaranteed to be a commercial success—the proposed aim may be a bit of a patchwork. After all, DNNs very often reproduce what is found in the inputs they are fed; they do not generate biases and the like ab ovo, as it were. Indeed, many of the reproduced biases originate in the human population. As such, human input may or may not provide the right guidance or feedback; it more or less depends on the human that takes part in these training sessions and whether they are representative of society’s values at large. That this will turn out okay is not a given, and in any case, what the right values or decisions are is probably best left to philosophers.
These AI-human hybrid systems may help to achieve the implementation of more interpretable AI decisions, a factor that will be very important for such AI legislation that the European Union is currently considering—risk-based approaches to the law that will require more explainable AI systems that are currently the norm. ABI Research has recently discussed such legislation (IN-6354), and we now add that this is probably the path to better and safer AI. That is, a human-involved AI system may well produce a more interpretable AI process that can then be properly evaluated by the relevant legislation and/or enforcement bodies—and hopefully by a human who will judge the output of an AI agent in terms of the law and not in terms of their own bias and feelings.