Teaching Robots How To Manipulate Objects By Having Them Watch YouTube Videos

Friday, January 2, 2015

Teaching Robots How To Manipulate Objects By Having Them Watch YouTube Videos

 Machine Learning
Using convolutional neural networks, a team of researchers has taught robots how to manipulate objects by having them watch videos from the Internet.




Using machine learning, an international team of researchers has taught robots how to manipulate objects by having them watch videos from the Internet.

The research, that will be presented at the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15), later this month involved was carried out by Yezhou Yang, a PhD. candidate and Yi Li at the Computer Vision Lab in the Department of Computer Science at the University of Maryland College Park The work was under Professor Yiannis Aloimonos and Dr. Cornelia Fermuller.

"Our ultimate goal is to build a self-learning robot that is able to enrich its knowledge about fine grained manipulation actions by “watching” demo videos."


The lower level of the system consists of two convolutional neural network (CNN) based recognition modules, one for classifying the hand grasp type and the other for object recognition. The higher level is a probabilistic manipulation action grammar based parsing module that aims at generating visual sentences for robot manipulation.

In experiments conducted on a publicly available unconstrained videos, the team was able to show that the system was able to learn manipulation actions by “watching” unconstrained videos with high accuracy.

"Our ultimate goal is to build a self-learning robot that is able to enrich its knowledge about fine grained manipulation actions by “watching” demo videos," the team writes in the research paper.

Working with Objects not so easy

Teaching robots how to grasp objects remains a tedious and complicated task, involving multiple subsystems like computer vision, 3D scanning and biomechanics.  For people we generally don't really think about these actions when manipulating objects after we are toddlers—our biological system of our brains and dexterous hands is that good.  

robot grasping

The researchers chose to classify manipulation actions into multiple levels of abstraction.  At lower levels the symbolic quantities are grounded in perception, and at the high level a grammatical structure represents symbolic information for objects, grasps and actions.  Their system uses CNN based object recognition and CNN based grasp type recognition.  

Related articles
Using visual sentences like (LeftHand GraspType1 Object1 Action RightHand GraspType2 Object2), the system puts everything together into a program for the robot. By using the visual information from the videos, the robot chooses the grasp type based on the object. Moreover, because the videos involve human grasp behaviors, in essence, the robots are learning by watching people.

The right grasp for the job

The system also takes into account the type of grippers the robot has to work with.  For instance, a humanoid robot with one parallel gripper and one vacuum gripper using a power grasp should select the vacuum gripper for a stable grasp, and the parallel gripper for a precision grasping task.

The researchers found that their CNN system achieved 93% success in object recognition, and 76% on grasp recognition.  This ended up with a 83% success rate for manipulation actions by the robots, although they admit the robot did get confused about how to handle the tofu.    

"We believe this preliminary integrated system raises hope towards a fully intelligent robot for manipulation tasks that can automatically enrich its own knowledge resource by “watching” recordings from the World Wide Web," write the researchers.

Future directions

In future studies the researchers plan to further extend the list of grasping types with a finer categorization, investigate the possibility of using the grasp type as an additional feature for action recognition, and automatically segment a long demonstration video into
action clips based on the change of grasp type.

The team is also looking at a higher level system that would use machine learning to construct a language of manipulation much more naturally.  This work is similar to work being done on language that extends machine learning beyond words and sentences, to contextual understanding of the underlying message.


SOURCE  Robot Learning Manipulation Action Plans by "Watching" Unconstrained Videos from the World Wide Web

By 33rd SquareEmbed

0 comments:

Post a Comment