Advanced Technology Teaches Robots to Understand Human Intentions and Preferences for Safer Interaction
Technology

Advanced Technology Teaches Robots to Understand Human Intentions and Preferences for Safer Interaction

s
sumernow
Jun 17, 2026 4 min read

The field of artificial intelligence has witnessed a new advancement with an innovative technology that enables machines to understand human preferences, representing a qualitative leap towards achieving safer and more harmonious interaction between humans and robots in various aspects of life. In this context, researchers from the Korea Advanced Institute of Science and Technology (KAIST) have achieved a significant breakthrough by developing a new technology that enables machines to automatically grasp human preferences and evaluation criteria, relying on watching a limited number of video clips. This step paves the way for expanding the scope of AI applications in real-world environments. Professor Chang-Dee Yoo, a faculty member at the School of Electrical Engineering, unveiled this new approach, named VOTP (Video-based Optimal TransPort Preference). This approach aims to equip intelligent systems with the ability to perceive human preferences without relying on vast quantities of pre-classified data. Recent years have seen AI excel in tasks such as text generation, image creation, and music composition. Today, this field is moving towards an advanced phase known as "Physical AI," where its capabilities extend beyond merely creating digital content to include direct interaction with the physical world. Applications of this trend are prominent in various sectors, including industrial robots operating in hazardous environments, autonomous vehicles capable of navigating complex traffic conditions, and surgical robots assisting doctors in highly delicate medical procedures. The process of developing robot behavior has long faced a fundamental challenge: the need for AI to understand patterns of behavior considered acceptable or unacceptable from a human perspective. Previously, developers resorted to collecting tens of thousands of human evaluations, where each action or behavior of the machine was manually classified as "appropriate" or "inappropriate." This methodology demanded lengthy periods, high costs, and intensive human effort. VOTP technology offers a more efficient and natural solution, based on the fundamental principle that humans acquire new skills by observing a limited number of illustrative examples. Based on this premise, AI can now analyze a small number of video clips showcasing successful and unsuccessful behaviors, extracting the criteria humans use to evaluate various actions. This innovative algorithm enables machines to infer unstated human intentions and preferences. For instance, a surgical robot performing sutures, or an autonomous car navigating a crowded pedestrian intersection, can choose the optimal behavior from several available options by relying on an understanding of human expectations rather than merely adhering to rigid instructions. Experiments conducted in diverse conditions and tasks have demonstrated the technology's effectiveness and its ability to apply acquired knowledge to new situations previously unfamiliar to the system. VOTP technology significantly contributes to reducing the costs associated with data collection and obtaining human feedback. Intelligent systems no longer require colossal databases of evaluations; instead, a limited set of high-quality video clips suffices, accelerating the development process and alleviating financial burdens. Furthermore, this technology opens wide horizons for diverse applications, including automated handling systems in industrial facilities, human-like robots, autonomous vehicles, smart production lines, drones, advanced surgical systems, and software agents that manage computers on behalf of users. Researchers anticipate that VOTP will become a cornerstone for the next generation of physical AI systems, which rely on a precise understanding of human needs and preferences to make more accurate and effective decisions. Commenting on this development, Professor Chang-Dee Yoo stated: "The essence of physical AI lies in its ability to teach machines how to understand human intentions and choose appropriate actions. Thanks to VOTP technology's capability to extract human evaluation criteria by showing a limited number of video clips, it represents a fundamental advancement that contributes to accelerating the development of robots and intelligent systems capable of making decisions that are as close as possible to human decisions."

s

sumernow