Nuance Creates the Winograd Schema Challenge As Alternative to the Turing Test

Monday, July 28, 2014

Nuance Creates the Winograd Schema Challenge As Alternative to the Turing Test

 Artificial Intelligence
With the initial report that a computer modeled after a 13-year-old boy had passed the Turing Test, followed much skepticism and doubt, Nuance has announced a new way to measure Artificial Intelligence with the Winograd Schema Challenge, an annually hosted hosted competition with Commonsense.org to push the boundaries of Artificial Intelligence.




The firm behind Siri, Nuance Communications has announced an annual competition to develop programs that can solve the Winograd Schema Challenge, a test developed by Hector Levesque, Professor of Computer Science at the University of Toronto, and winner of the 2013 IJCAI Award for Research Excellence.

The announcement was made at the 28th AAAI Conference in Quebec, Canada.

Nuance is sponsoring the yearly competition in cooperation with CommonsenseReasoning.org, a research group dedicated to furthering and promoting research in the field of formal commonsense reasoning. CommonsenseReasoning.org will organize, administer, and evaluate the Winograd Schema Challenge. The winning program that passes the test will receive a grand prize of $25,000. The test is designed to judge whether a program has truly modeled human level intelligence.

"The Winograd Schema Challenge provides us with a tool for concretely measuring research progress in commonsense reasoning, an essential element of our intelligent systems."


Artificial Intelligence (AI) has long been measured by the "Turing Test, " proposed in 1950 by one of the great pioneers of computer science, Alan Turing, who sought a way to determine whether a computer program exhibited human level intelligence. The test is considered passed if the program can convince a human that he or she is conversing with a human and not a machine. No system has ever passed the Turing Test, and most existing programs that have tried rely on considerable trickery to fool humans. Even the recently unveiled program modeling a 13-year-old boy, Eugene Goostman, has left many skeptical. These efforts have also suggested that the Turing Test may not be an ideal way to judge a machine's intelligence.

The Winograd Schema (WS) Challenge is an alternative to the Turing Test that provides a more accurate measure of genuine machine intelligence. Rather than base the test on the sort of short free-form conversation suggested by the Turing Test, the Winograd Schema Challenge poses a set of multiple-choice questions that have a form where the answers are expected to be fairly obvious to a layperson, but ambiguous for a machine without human-like reasoning or intelligence.

The schema is named after Terry Winograd, an American professor of computer science at Stanford University, and co-director of the Stanford Human-Computer Interaction Group and author of, Bringing Design to Softwareand Understanding Computers and Cognition: A New Foundation for Design. He is known within the philosophy of mind and artificial intelligence fields for his work on natural language using the SHRDLU program.

An example of a Winograd Schema question is the following: "The trophy would not fit in the brown suitcase because it was too big. What was too big? Answer 0: the trophy or Answer 1: the suitcase?" A human who answers these questions correctly typically uses his abilities in spatial reasoning, his knowledge about the typical sizes of objects, and other types of commonsense reasoning, to determine the correct answer.

According to Levesque, "The WS challenge does not allow a subject to hide behind a smokescreen of verbal tricks, playfulness, or canned responses. Assuming a subject is willing to take a WS test at all, much will be learned quite unambiguously about the subject in a few minutes."  Eugene Goostman was already close to "passing" the Turing Test when Levesque wrote about the Winograd Schema in 2011, clearly targeting AIs that were created to use deception to defeat the test.  In the case of Goostman, the fact that the creator made him a young boy with English as a second language was key factor in it fooling 30 percent of the judges it conversed with.

Related articles
"There has been renewed interest in AI and Natural Language Processing (NLP) as a means of humanizing the complex technological landscape that we encounter in our day-to-day lives," said Charles Ortiz, Senior Principal Manager of AI and Senior Research Scientist, Natural Language and Artificial Intelligence Laboratory, Nuance Communications. "The Winograd Schema Challenge provides us with a tool for concretely measuring research progress in commonsense reasoning, an essential element of our intelligent systems. Competitions such as the Winograd Schema Challenge can help guide more systematic research efforts that will, in the process, allow us to realize new systems that push the boundaries of current AI capabilities and lead to smarter personal assistants and intelligent systems."

The test will be administered on a yearly basis by CommonsenseReasoning.org starting in 2015. The first submission deadline will be October 1, 2015. The 2015 Commonsense Reasoning Symposium, to be held at the AAAI Spring Symposium at Stanford from March 23-25, 2015, will include a special session for presentations and discussions on progress and issues related to this Winograd Schema Challenge. Contest details can be found at http://commonsensereasoning.org/winograd.html.

The winner that meets the baseline for human performance will receive a grand prize of $25,000. In the case of multiple winners, a panel of judges will base their choice on either further testing or examination of traces of program execution. If no program meets those thresholds, a first prize of $3,000 and a second prize of $2,000 will be awarded to the two highest scoring entries. In the case of teams, the prize will be given to the team lead whose responsibility will be to divide the prize among its teammates as appropriate.

Clearly defining exactly what intelligence is continues to be a both a philosophical and technical issue for defining human-level artificial intelligence tests.  By creating a variety of such tests, we will be better able to judge the progress of our AI systems, and, by extension, learn more about ourselves.


SOURCE  Business Wire

By 33rd SquareEmbed

0 comments:

Post a Comment