Artificial Intelligence
Researchers have finally taught computers how to read lips. LipNet, created at the University of Oxford, is the first deep learning system to successfully lip read full sentences, including difficult pronunciations and non-intuitive sentences.
When HAL 9000 read Dave Bowman and Frank Poole's lips in 2001: A Space Odyssey, it was a key moment in the film showing the superhuman power of artificial intelligence (and its malevolence). Now a new AI system has made this a reality.
The research has been published online.
Lip reading is the task of decoding text from the movement of a speaker's mouth. Traditional approaches to program machines to do this task separated the problem into two stages: designing or learning visual features, and prediction. So far, all previous research has led to only word classification, not sentence-level sequence prediction—until now.
Other studies have shown that human lip reading performance increases for longer words, indicating the importance of features capturing temporal context in an ambiguous communication channel. Motivated by this observation, the researchers worked to create LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, a Long Term, Short Term Memory (LSTM) recurrent neural network, and the connectionist temporal classification loss, trained entirely end-to-end.
Related articles
"To the best of our knowledge, LipNet is the first lipreading model to operate at sentence-level, using a single end-to-end speaker-independent deep model to simultaneously learn spatiotemporal visual features and a sequence model," they report. Comparatively, LipNet achieves 93.4% accuracy, outperforming experienced human lipreaders and the previous 79.6% state-of-the-art accuracy.Machine lipreaders have enormous practical potential, with applications in improved hearing aids, silent dictation in public spaces, covert conversations, speech recognition in noisy environments, biometric identification, and silent-movie processing.
LipNet could potentially work as a tool for the hearing impaired, or could even be a way for people to communicate with their devices if they aren't comfortable speaking aloud. Imagine if you are in a crowded office or an elevator, and you don't really want to draw attention to your self by speaking aloud, to seemingly no-one; just mouth the words to the camera.
With the association of the researchers to Google's DeepMind, we wouldn't be surprised if LipNet sees commercial applications sooner than later.
0 comments:
Post a Comment