Two new projects point to a future where the uncanny valley the rift in our perception — where an artificial character goes from being realistic to creepy — being overcome. University of Cambridge researchers have developed Zoe, an artificial talking head, and NVIDIA has demonstrated an incredible new computer graphic simulation of a face called, Face Works. |
The system, called "Zoe", is the result of a collaboration between researchers at Toshiba's Cambridge Research Lab and the University of Cambridge's Department of Engineering.
Zoe, or her offspring, could be used as a visible version of Siri, as a personal assistant in smartphones, or to replace mobile phone texting with “face messaging” in which you “face-message” friends.
The lifelike face can display emotions such as happiness, anger, and fear, and changes its voice to suit any feeling the user wants it to simulate. Users can type in any message, specifying the required emotion, and the face recites the text. According to its designers, it is the most expressive controllable avatar ever created, replicating human emotions with unprecedented realism.
To recreate her face and voice, researchers recorded British actress Zoe Lister’s speech and facial expressions.
The framework behind “Zoe” could in the near future enable people to upload their own faces and voices to customize and personalize their own emotionally realistic, digital assistants. A user could, for example, text the message “I’m going to be late” and set her emotion to “frustrated.” A friend would then receive a “face message” that looked like the sender, repeating the message in a frustrated way.
The team that created Zoe is currently looking for applications, and are also working with a school for autistic and deaf children, where the technology could be used to help pupils to “read” emotions and lip-read.
Ultimately, the system could have multiple uses — including gaming, robotics, audio-visual books, for delivering online lectures, and in other user interfaces.
“This technology could be the start of a whole new generation of interfaces which make interacting with a computer much more like talking to another human being,” Professor Roberto Cipolla, from the Department of Engineering, University of Cambridge, said.
The program used to run Zoe is just tens of megabytes in size, which means that it can be easily incorporated into even the smallest computer devices, including tablets and smartphones.
It works by using a set of fundamental emotions. Zoe’s voice, for example, has six basic settings: Happy, Sad, Tender, Angry, Afraid and Neutral. The user can adjust these settings to different levels, as well as altering the pitch, speed and depth of the voice itself.
By combining these levels, it becomes possible to pre-set or create almost infinite emotional combinations. For instance, combining happiness with tenderness and slightly increasing the speed and depth of the voice makes it sound friendly and welcoming.
To make the system as realistic as possible, the research team collected a dataset of thousands of sentences, which they used to train the speech model with Lister. They also tracked Lister’s face while she was speaking using computer vision software. This was converted into voice and face-modelling algorithms that provided voice and image data needed to recreate expressions on a digital face, directly from the text alone.
NVIDIA is able to take 32GB of facial data (the bump maps, texture maps, lighting, expressions, etc) and compress it down to 400MB, in a new way of rendering highly realistic facial (and voice) expression.
“This technology could be the start of a whole new generation of interfaces which make interacting with a computer much more like talking to another human being,” Professor Roberto Cipolla, from the Department of Engineering, University of Cambridge, said.
The program used to run Zoe is just tens of megabytes in size, which means that it can be easily incorporated into even the smallest computer devices, including tablets and smartphones.
It works by using a set of fundamental emotions. Zoe’s voice, for example, has six basic settings: Happy, Sad, Tender, Angry, Afraid and Neutral. The user can adjust these settings to different levels, as well as altering the pitch, speed and depth of the voice itself.
By combining these levels, it becomes possible to pre-set or create almost infinite emotional combinations. For instance, combining happiness with tenderness and slightly increasing the speed and depth of the voice makes it sound friendly and welcoming.
To make the system as realistic as possible, the research team collected a dataset of thousands of sentences, which they used to train the speech model with Lister. They also tracked Lister’s face while she was speaking using computer vision software. This was converted into voice and face-modelling algorithms that provided voice and image data needed to recreate expressions on a digital face, directly from the text alone.
Face Works
In related news, KurzweilAI.net also featured the annual GPU Technology Conference demonstation of, NVIDIA's “Face Works,” a technology made possible by their Titan graphics card, capable of 1TB/s of memory bandwidth.NVIDIA is able to take 32GB of facial data (the bump maps, texture maps, lighting, expressions, etc) and compress it down to 400MB, in a new way of rendering highly realistic facial (and voice) expression.
NVIDIA Co-founder and CEO Jen-Hsun Huang showed a demo of Face Works that must be seen to be believed, amazingly realistic simulation of a face in this second segment of the opening day keynote at the conference.
Potential applications include animated video, videoconferencing with avatars and film virtual actors.
Here’s the NVIDIA demo:
SOURCE University of Cambridge, KurzweilAI.net
Potential applications include animated video, videoconferencing with avatars and film virtual actors.
Here’s the NVIDIA demo:
SOURCE University of Cambridge, KurzweilAI.net
By 33rd Square | Subscribe to 33rd Square |
More here - you can't even tell the images apart in some instances:
ReplyDeletehttp://gizmodo.com/5992372/which-side-of-this-picture-is-real-and-which-side-of-it-is-cgi?utm_campaign=socialflow_gizmodo_facebook&utm_source=gizmodo_facebook&utm_medium=socialflow
here too: http://www.wired.com/design/2013/03/luxion-keyshot/?viewall=true