The breakthrough that makes robot faces feel less creepy
Humans pay enormous attention to lips during conversation, and robots have struggled badly to keep up. A new robot developed at Columbia Engineering learned realistic lip movements by watching its own reflection and studying human videos online. This allowed it to speak and sing with synchronized facial motion, without being explicitly programmed. Researchers believe this breakthrough could help robots finally cross the uncanny valley.
When people talk face to face, nearly half of their attention is drawn to the movement of the lips. Despite this, robots still have great difficulty moving their mouths in a convincing way. Even the most advanced humanoid machines often rely on stiff, exaggerated mouth motions that resemble a puppet, assuming they have a face at all.
Humans place enormous importance on facial expression, especially subtle movements of the lips. While awkward walking or clumsy hand gestures can be forgiven, even small mistakes in facial motion tend to stand out immediately. This sensitivity contributes to what scientists call the "Uncanny Valley," a phenomenon where robots appear unsettling rather than lifelike. Poor lip movement is a major reason robots can seem eerie or emotionally flat, but researchers say that may soon change.
A Robot That Learns to Move Its Lips
On January 15, a team from Columbia Engineering announced a major advance in humanoid robotics. For the first time, researchers have built a robot that can learn facial lip movements for speaking and singing. Their findings, published in Science Robotics, show the robot forming words in multiple languages and even performing a song from its AI-generated debut album "hello world_."_
Rather than relying on preset rules, the robot learned through observation. It began by discovering how to control its own face using 26 separate facial motors. To do this, it watched its reflection in a mirror, then later studied hours of human speech and singing videos on YouTube to understand how people move their lips.
"The more it interacts with humans, the better it will get," said Hod Lipson, James and Sally Scapa Professor of Innovation in the Department of Mechanical Engineering and director of Columbia's Creative Machines Lab, where the research took place.
See link to "Lip Syncing Robot" video below.
Robot Watches Itself Talking
Creating natural-looking lip motion in robots is especially difficult for two main reasons. First, it requires advanced hardware, including flexible facial material and many small motors that must operate quietly and in perfect coordination. Second, lip movement is closely tied to speech sounds, which change rapidly and depend on complex sequences of phonemes.
Human faces are controlled by dozens of muscles located beneath soft skin, allowing movements to flow naturally with speech. Most humanoid robots, however, have rigid faces with limited motion. Their lip movements are typically dictated by fixed rules, which leads to mechanical, unnatural expressions that feel unsettling.
To address these challenges, the Columbia team designed a flexible robotic face with a high number of motors and allowed the robot to learn facial control on its own. The robot was placed in front of a mirror and began experimenting with thousands of random facial expressions. Much like a child exploring their reflection, it gradually learned which motor movements produced specific facial shapes. This process relied on what researchers call a "vision-to-action" language model (VLA).
Learning From Human Speech and Song
After understanding how its own face worked, the robot was shown videos of people talking and singing. The AI system observed how mouth shapes changed with different sounds, allowing it to associate audio input directly with motor movement. With this combination of self-learning and human observation, the robot could convert sound into synchronized lip motion.
The research team tested the system across multiple languages, speech styles, and musical examples. Even without understanding the meaning of the audio, the robot was able to move its lips in time with the sounds it heard.
The researchers acknowledge that the results are not flawless. "We had particular difficulties with hard sounds like 'B' and with sounds involving lip puckering, such as 'W'. But these abilities will likely improve with time and practice," Lipson said.
Beyond Lip Sync to Real Communication
The researchers stress that lip synchronization is only one part of a broader goal. Their aim is to give robots richer, more natural ways to communicate with people.
"When the lip sync ability is combined with conversational AI such as ChatGPT or Gemini, the effect adds a whole new depth to the connection the robot forms with the human," said Yuhang Hu, who led the study as part of his PhD work. "The more the robot watches humans conversing, the better it will get at imitating the nuanced facial gestures we can emotionally connect with."
"The longer the context window of the conversation, the more context-sensitive these gestures will become," Hu added.
Facial Expression as the Missing Link
The research team believes that emotional expression through the face represents a major gap in current robotics.
"Much of humanoid robotics today is focused on leg and hand motion, for activities like walking and grasping," Lipson said. "But facial affection is equally important for any robotic application involving human interaction."
Lipson and Hu expect realistic facial expressions to become increasingly important as humanoid robots are introduced into entertainment, education, healthcare, and elder care. Some economists estimate that more than one billion humanoid robots could be produced over the next decade.
"There is no future where all these humanoid robots don't have a face. And when they finally have a face, they will need to move their eyes and lips properly, or they will forever remain uncanny," Lipson said.
"We humans are just wired that way, and we can't help it. We are close to crossing the uncanny valley," Hu added.
Risks and Responsible Progress
This work builds on Lipson's long-running effort to help robots form more natural connections with people by learning facial behaviors such as smiling, eye contact, and speech. He argues that these skills must be learned through observation rather than programmed through rigid instructions.
"Something magical happens when a robot learns to smile or speak just by watching and listening to humans," he said. "I'm a jaded roboticist, but I can't help but smile back at a robot that spontaneously smiles at me."
Hu emphasized that the human face remains one of the most powerful tools for communication, and scientists are only beginning to understand how it works.
"Robots with this ability will clearly have a much better ability to connect with humans because such a significant portion of our communication involves facial body language, and that entire channel is still untapped," Hu said.
The researchers also acknowledge the ethical concerns that come with creating machines that can emotionally engage with humans.
"This will be a powerful technology. We have to go slowly and carefully, so we can reap the benefits while minimizing the risks," Lipson said.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
studyresearchTriple-Identity Authentication: The Future of Secure Access
arXiv:2505.02004v4 Announce Type: replace Abstract: In password-based authentication systems, the username fields are essentially unprotected, while the password fields are susceptible to attacks. In this article, we shift our research focus from traditional authentication paradigm to the establishment of gatekeeping mechanisms for the systems. To this end, we introduce a Triple-Identity Authentication scheme. First, we combine each user credential (i.e., login name, login password, and authentication password) with the International Mobile Equipment Identity (IMEI) and International Mobile Subscriber Identity (IMSI) of a user's smartphone to create a combined identity represented as "credential+IMEI+IMSI", defined as a system attribute of the user. Then, we grant the password-based local
Stratechery (with Ben Thompson)
Ben Thompson joins Acquired to discuss the business of Stratechery itself and celebrate 10 years (!) of the internet’s best strategy analysis destination. Even beyond Stratechery’s enormous impact itself on business and tech over the years, Ben’s work inspired a whole generation of business content creators — this show very much included — and it was super special for us to give the Acquired treatment to one of our own heroes. We cover the full history of Ben pioneering the subscription internet media business model (indeed SubStack’s seed round pitch was “Stratechery-in-a-box”), and how + why he’s evolved the business since and is now doubling down both on podcasting and a broader vision of the Stratechery Plus bundle… including for the first time content not made by Ben himself! Tune in
Sessions: David Senra (Founders Podcast)
ACQ Sessions returns with David Senra of the Founders Podcast. David is one of our very favorite people in the world — it’s impossible to spend an hour (or 3!) with him and not come away inspired to go take over the world. This conversation is an “extended, IRL version” of monthly calls that we do together where we share stories, swap life and podcast advice, and just genuinely enjoy sharing time with someone who shares our outlook and enthusiasm for the history of entrepreneurship. Pull up a chair, grab a beverage (or energy drink in David’s case) and join us! Links: Go subscribe to Founders! Some of our favorite episodes: Bernard Arnault , Brunello Cucinelli , Edwin Land , Kobe Bryant Sponsors: WorkOS: https://bit.ly/workos25 Intapp: https://bit.ly/acquiredintapp Sentry: https://bit.ly/a
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Triple-Identity Authentication: The Future of Secure Access
arXiv:2505.02004v4 Announce Type: replace Abstract: In password-based authentication systems, the username fields are essentially unprotected, while the password fields are susceptible to attacks. In this article, we shift our research focus from traditional authentication paradigm to the establishment of gatekeeping mechanisms for the systems. To this end, we introduce a Triple-Identity Authentication scheme. First, we combine each user credential (i.e., login name, login password, and authentication password) with the International Mobile Equipment Identity (IMEI) and International Mobile Subscriber Identity (IMSI) of a user's smartphone to create a combined identity represented as "credential+IMEI+IMSI", defined as a system attribute of the user. Then, we grant the password-based local
Passive iFIR filters for data-driven velocity control in robotics
arXiv:2603.29882v1 Announce Type: new Abstract: We present a passive, data-driven velocity control method for nonlinear robotic manipulators that achieves better tracking performance than optimized PID with comparable design complexity. Using only three minutes of probing data, a VRFT-based design identifies passive iFIR controllers that (i) preserve closed-loop stability via passivity constraints and (ii) outperform a VRFT-tuned PID baseline on the Franka Research 3 robot in both joint-space and Cartesian-space velocity control, achieving up to a 74.5% reduction in tracking error for the Cartesian velocity tracking experiment with the most demanding reference model. When the robot end-effector dynamics change, the controller can be re-learned from new data, regaining nominal performance.
AI Inspires New Research Topics In Materials Science - miragenews.com
<a href="https://news.google.com/rss/articles/CBMihwFBVV95cUxQRlVFdkRBaHRvYkJJdFRlMTZmajEzeFRPU0hGWWdfbi02V1FnTUdVQ2pmY2VZLUV2NlB4V3BFdEVlSVZkUlhRSTZaNWFKMmcyWXJYbnNqbUhMTmp0NnFtMEppOXlPZkJSNHJfck5VSEVYcmUtX1k2QkJlR1BvUEdTTkp3UmlYRkk?oc=5" target="_blank">AI Inspires New Research Topics In Materials Science</a> <font color="#6f6f6f">miragenews.com</font>
From brain scans to alloys: Teaching AI to make sense of complex research data - Penn State University
<a href="https://news.google.com/rss/articles/CBMiwAFBVV95cUxPZDFHdkptQ2VUM2hmWjhqQkxoRnBiTWoxMXRRR21MUG5TamdUMlFRWmhvYVNHaFVNREVKU3VmSnVOdDVZYnNLb2ppYXRVRTZmVFVMV1pLTlVhUm9ybTNZbGtvZTdIMnIyMHNpOEk5aU9TSmxxS2Y4V2MwazYwY3JlX1Axbk1nd3pfcWhFdUJaaDJWRXJaMFIyTTROcmFHeXI3ZzFudXJ2M1h6UHI1LW1Ca1dta2RkM3BiYndocGk3Yjg?oc=5" target="_blank">From brain scans to alloys: Teaching AI to make sense of complex research data</a> <font color="#6f6f6f">Penn State University</font>
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!