bloc 33rd Square Business Tools - image recognition 33rd Square Business Tools: image recognition - All Post
Showing posts with label image recognition. Show all posts
Showing posts with label image recognition. Show all posts

Monday, December 19, 2016

Image Processing Artificial Intelligence Learns Mostly On Its Own, Just Like a Human


Artificial Intelligence

Artificial intelligence and neuroscience researchers have taken inspiration from the human brain in creating a new deep learning system that enables computers to learn about the visual world largely on their own, just like human babies do.


Artificial intelligence and neuroscience experts from Rice University and Baylor College of Medicine using inspiration from the human brain have developed a new deep learning method that lets computers learn about the visual world largely on their own, much the same way human babies do.

In tests, the group’s “deep rendering mixture model” (DRMM) largely taught itself how to distinguish handwritten digits using a standard dataset of 10,000 digits written by federal employees and high school students. The results which were  presented this month at the Neural Information Processing Systems (NIPS) conference in Barcelona,the researchers described how they trained their algorithm by giving it just 10 correct examples of each handwritten digit between zero and nine and then presenting it with several thousand more examples that it used to further teach itself.

The algorithm was more accurate at correctly distinguishing handwritten digits than almost all previous algorithms that were trained with thousands of correct examples of each digit.

"The DRMM is applicable to semi-supervised and unsupervised learning tasks, achieving results that are state-of-the-art in several categories on the MNIST benchmark and comparable to state of the art," conclude the authors.

Related articles
“In deep learning parlance, our system uses a method known as semisupervised learning,” said lead researcher Ankit Patel, an assistant professor with joint appointments in neuroscience at Baylor and electrical and computer engineering at Rice. “The most successful efforts in this area have used a different technique called supervised learning, where the machine is trained with thousands of examples: This is a one. This is a two.

“Humans don’t learn that way,” Patel said. “When babies learn to see during their first year, they get very little input about what things are. Parents may label a few things: ‘Bottle. Chair. Momma.’ But the baby can’t even understand spoken words at that point. It’s learning mostly unsupervised via some interaction with the world.”

Patel said he and graduate student Tan Nguyen, a co-author on the new study, set out to design a semisupervised learning system for visual data that didn’t require much “hand-holding” in the form of training examples. For instance, neural networks that use supervised learning would typically be given hundreds or even thousands of training examples of handwritten digits before they would be tested on the database of 10,000 handwritten digits in the Mixed National Institute of Standards and Technology (MNIST) database.

DRMM

The semisupervised Rice-Baylor algorithm is a “convolutional neural network,” a piece of software made up of layers of artificial neurons whose design was inspired by biological neurons. These artificial neurons, or processing units, are organized in layers, and the first layer scans an image and does simple tasks like searching for edges and color changes. The second layer examines the output from the first layer and searches for more complex patterns. Mathematically, this nested method of looking for patterns within patterns within patterns is referred to as a nonlinear process.

“It’s essentially a very simple visual cortex,” Patel said of the convolutional neural net. “You give it an image, and each layer processes the image a little bit more and understands it in a deeper way, and by the last layer, you’ve got a really deep and abstract understanding of the image. Every self-driving car right now has convolutional neural nets in it because they are currently the best for vision.”

"The way the brain is doing it is far superior to any neural network that we’ve designed."
Like human brains, neural networks start out as blank slates and become fully formed as they interact with the world. For example, each processing unit in a convolutional net starts the same and becomes specialized over time as they are exposed to visual stimuli.

“Edges are very important,” Nguyen said. “Many of the lower layer neurons tend to become edge detectors. They’re looking for patterns that are both very common and very important for visual interpretation, and each one trains itself to look for a specific pattern, like a 45-degree edge or a 30-degree red-to-blue transition.

“When they detect their particular pattern, they become excited and pass that on to the next layer up, which looks for patterns in their patterns, and so on,” he said. “The number of times you do a nonlinear transformation is essentially the depth of the network, and depth governs power. The deeper a network is, the more stuff it’s able to disentangle. At the deeper layers, units are looking for very abstract things like eyeballs or vertical grating patterns or a school bus.”

Patel said the theory of artificial neural networks, which was refined in the NIPS paper, could ultimately help neuroscientists better understand the workings of the human brain.

“There seem to be some similarities about how the visual cortex represents the world and how convolutional nets represent the world, but they also differ greatly,” Patel said. “What the brain is doing may be related, but it’s still very different. And the key thing we know about the brain is that it mostly learns unsupervised.

“What I and my neuroscientist colleagues are trying to figure out is, What is the semisupervised learning algorithm that’s being implemented by the neural circuits in the visual cortex? and How is that related to our theory of deep learning?” he said. “Can we use our theory to help elucidate what the brain is doing? Because the way the brain is doing it is far superior to any neural network that we’ve designed.”

SOURCE  Rice University


By  33rd SquareEmbed



Saturday, December 5, 2015

Google's Cloud Vision API Will Allow for Cloud Based Image, Face and Emotion Detection


Image Recognition

With the new release of a cloud-based image recognition system API from Google, developers will be empowered to build new applications that can see, and more importantly understand, the content of images. The company showed off the software with a simple robot that can recognize objects like a banana, and a user's smiling face.


Google recently announced the launch of Cloud Vision, one of the company’s image recognition technologies. They have made it available to developer as an API with a limited preview available using the Google Cloud Platform.

Related articles
"The uses of Cloud Vision API are game changing to developers of all types of applications and we are very excited to see what happens next," writes Ram Ramanathan, Product Manager for the Google Cloud Platform.

Google’s image recognition technology is one of the strongest around, applicable to many domains that include optical character recognition (OCR), face detection, and object recognition.

The Cloud Vision API quickly classifies images into thousands of categories, detects faces with associated emotions, and recognizes printed words in many languages. Developers using the Cloud Vision API, will be able to build metadata into an image catalog, to moderate offensive content, or enable new marketing scenarios through image sentiment analysis.

Google Cloud Vision


The following set of Google Cloud Vision API features can be applied in any combination on an image:

  • Label/Entity Detection picks out the dominant entity (e.g., a car, a cat) within an image, from a broad set of object categories. You can use the API to easily build metadata on your image catalog, enabling new scenarios like image based searches or recommendations.
  • Optical Character Recognition to retrieve text from an image. Cloud Vision API provides automatic language identification, and supports a wide variety of languages.
  • Safe Search Detection to detect inappropriate content within your image. Powered by Google SafeSearch, the feature enables you to easily moderate crowd-sourced content.
  • Facial Detection can detect when a face appears in photos, along with associated facial features such as eye, nose and mouth placement, and likelihood of over 8 attributes like joy and sorrow. We don't support facial recognition and we don’t store facial detection information on any Google server.
  • Landmark Detection to identify popular natural and manmade structures, along with the associated latitude and longitude of the landmark.
  • Logo Detection to identify product logos within an image. Cloud Vision API returns the identified product brand logo, with the associated bounding polybox.
To demonstrate a simple example of the Vision API, Google developers have built a working Raspberry Pi based platform with just a few hundreds of lines of Python code, calling the Vision API. 

As the video below shows, the demo robot can roam and identify objects, including smiling faces.

Cloud Vision is partially powered by Google's TensorFlow machine learning platform that was recently open-sourced.


SOURCE  Google


By 33rd SquareEmbed


Saturday, June 20, 2015

Google's Inceptionism Lets Us Look at an Artificial Intelligence Hallucination

Artificial Intelligence
Did you ever look up at the sky and try to imagine the clouds as shapes or animals? A team at Google has programmed their neural networks to do something similar while recognizing images, with some intriguing results.





Google’s image recognition software can detect, analyze, and even auto-caption images by using artificial neural networks to simulate the human brain. Now, in a process they’re calling “inceptionism,” Google engineers tried to see if they could find out what these artificial intelligences “dream” of.

Google trained the neural network by feeding it millions of images, eventually teaching it to recognize specific objects within a picture. When the network is presented an image, algorithms process the image, trying to emphasize the object in the image that it recognizes.

Google's Inceptionism Lets Us Look at an Artificial Intelligence Hallucination

Related articles
The process uses many layers—or deep neural networks, to generate greater and greater precision. The final output layer, the network makes a final interpretation of the image.

The network typically consisted of 10-30 stacked layers of artificial neurons. Each image is fed into the input layer, which then talks to the next layer, until eventually the “output” layer is reached. The network’s “answer” comes from this final output layer.

It is almost like the system is playing  visual game of twenty questions, driving at a solution.

Google Inceptionism

"We know that after training, each layer progressively extracts higher and higher-level features of the image, until the final layer essentially makes a decision on what the image shows," write the researchers.


"Neural networks could become a tool for artists—a new way to remix visual concepts—or perhaps even shed a little light on the roots of the creative process in general."


"For example, the first layer maybe looks for edges or corners. Intermediate layers interpret the basic features to look for overall shapes or components, like a door or a leaf. The final few layers assemble those into complete interpretations—these neurons activate in response to very complex things such as entire buildings or trees."

For this research, the team asked the network: “Whatever you see there, I want more of it!” This created a feedback loop: if a cloud looks a little bit like a dog or bird, the network would make it look more like a dog or a bird.

"We call this technique “Inceptionism” in reference to the neural net architecture used," write the researchers.

This in turn made the network recognize the bird even more strongly on the next pass and so forth, until a highly detailed bird appears, seemingly out of nowhere.

Google Inceptionism

The results are very interesting.  Google has shown that even a relatively simple neural network can over-interpret an image, just like as children we enjoyed watching clouds and interpreting the random shapes. 

This network was trained mostly on images of animals, so naturally it tends to interpret shapes as animals. But because the data is stored at such a high abstraction, the results are an interesting remix of these learned features.

Artificial Intelligence Hallucination

The researchers say the work has helped them understand and visualize how neural networks are able to carry out difficult classification tasks, improve network architecture, and check what the network has learned during training. "It also makes us wonder whether neural networks could become a tool for artists—a new way to remix visual concepts—or perhaps even shed a little light on the roots of the creative process in general."

Artificial Intelligence Hallucination

What do you think?  Are these the dreams of artificial intelligence, a window onto the creative process, or just another filter for Photoshop? For more examples, check out the Inceptionism gallery for more examples.


SOURCE  Google Research

By 33rd SquareEmbed

Monday, January 12, 2015

Disney Researchers Create System To Organize Your Vacation Photos

 Machine Learning
Computer science researchers have created an automated method to assemble story-driven photo albums from an unsorted group of images.




Taking photos has never been easier, thanks to the ubiquity of mobile phones, tablets and digital cameras. However, editing a mass of vacation photos into an album remains a chore. A new automated method developed by Disney Research could ease that task while also telling a compelling story.

The method developed by a team led by Leonid Sigal, senior research scientist at Disney Research, attempts to not only select photos based on quality and relevance, but also to order them in a way that makes narrative sense.

"Professional photographers, whether they are assembling a wedding album or a photo slideshow, know that the strict chronological order of the photos is often less important than the story that is being told," Sigal said. "But this process can be laborious, particularly when large photo collections are involved. So we looked for ways to automate it."

Sigal and his collaborators presented their findings at WACV 2015, the IEEE Winter Conference on Applications of Computer Vision, in Waikoloa Beach, Hawaii. Others involved include Disney Research's Rafael Tena, Fereshteh Sadeghi, a computer science PhD student at the University of Washington and Ali Farhadi, assistant professor of computer science and engineering at the University of Washington.

The team looked at ways of arranging vacation photos into a coherent album. Previous efforts on automated album creation have relied on arranging photos based largely on chronology and geo-tagging, Sigal noted.

Darth Vader in Disneyland

But when four people were asked to choose and assemble five-photo albums that told a story, the researchers noted that these individuals took photos out of chronological order about 40 percent of the time. Subsequent preference testing using Mechanical Turk showed people preferred these annotated albums over those chosen randomly or those based on chronology.

Related articles
To create a computerized system capable of creating a compelling visual story, the researchers built a model that could create albums based on variety of photo features, including the presence or absence of faces and their spatial layout; overall scene textures and colors; and the esthetic quality of each image.

Their model also incorporated learned rules for how albums are assembled, such as preferences for certain types of photos to be placed at the beginning, in the middle and at the end of albums. An album about a Disney World visit, for instance, might begin with a family photo in front of Cinderella's castle or with Mickey Mouse. Photos in the middle might pair a wide shot with a close-up, or vice versa. Exclusionary rules, such as avoiding the use the same type of photo more than once, were also learned and incorporated.

The researchers used a machine learning algorithm to enable the system to learn how humans use those features and what rules they use to assemble photo albums. The training sets used for this purpose were created for the study from thousands of photos from Flickr. These included 63 image collections in five topic areas: trips to Disney theme parks, beach vacations and trips to London, Paris and Washington, D.C. Each collection was annotated by four people, who were asked to assemble five-photo albums that told stories and to group images into sets of near duplicates.

The system relies purely on visual information for features and exemplar album annotations to drive the machine learning procedure.

Once the system learned the principles of selecting and ordering photos, it was able to compose photo albums from unordered and untagged collections of photos. Sigal noted that such a system also can learn the preferences of individuals, in assembling these collections, to customize the album creation process.


SOURCE  Disney Research via EurekAlert

By 33rd SquareEmbed

Thursday, January 8, 2015

What Do You Need To Know About Deep Learning?

 Artificial Intelligence
Certainly a main buzzword for technology and computer science today, 'deep learning' is having a major impact in many sectors.  Why is deep learning so important?




Tthere is no question that deep learning  is one of the most talked about trends in business and computer science today.  Why has this artificial intelligence technology, that can be traced back to the 1970s and 80's so much in the news today? For many the long held promise of artificial intelligence that is actually smart is coming to light through this technology.

Deep learning is used with numerous fields, and it will soon aid in manufacturing, medicine, retail, the home, and beyond.

Gartner calls it “the most significant technology shift of this decade”.

Essentially, deep learning refers to machine learning algorithms, that are programmed to generalize data and use it in a way that is suitable for the application.

Related articles
This isn't a new concept, but for decades it has been a pipe dream, limited by computer hardware and insufficient data. Moore's Law has come to fix these shortcomings with server and data storage costs approaching zero and Big Data continuously increasing.

The implications for the not-to-distant future are enormous.

In a shopping recommendation system for a website, a trending product and indications that a user has browsed in the category of that product in the last day, they are very likely to buy the item.

These two variables are so accurate together that they can be combined them into a new single variable, or a feature. Finding connections between variables and packaging them into a new discreet variable is called feature engineering. In deep learning systems the feature engineering can done automatically.  Another common use of the technology today is automatic tagging of images, and work being done to do the same for video.

Rethink Robotics Baxter
Deep learning may soon help robots like Baxter 'know' how to handle various objects without the help of human programming.

Image Recognition

In computer vision deep learning is also being found to be increasingly useful. Algorithms fed thousands and thousands of images can independently learn to identify individual components in them. Along with improving search technology, this will eventually be used to make robots navigate their way independently, and interact with objects without human assistance.

It's not about computers recognizing a particular face - we've had that for years. It's about recognizing that an object is a face to begin with. Or a car. Or a cat. In one example, a team was recently able to have their system recognize objects as well as a monkey, and it is only improving.  This rapid progress in automatic image definition is going to allow blind people to see, and get self-driving cars to work properly.

Natural Language Processing & Speech Recognition.

Speech recognition continues to get smarter.  While we aren't quite yet at the level of Samantha, from the film Her, Siri and Google Now are improving all the time.  Just a few months ago the voice search function would never work for my four-year-old.  Now he is browsing videos without help.

Skype's real-time translator is a another example of what lies ahead.

Deep Learning Sentiment Analysis

Sentiment analysis—the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information— also help businesses and organizations understand consumer emotions regardless of the language they are written in. Researchers at Stanford University have a demonstration sentiment analysis tool available online.

Surely in the development of social robots, sentiment analysis will play a large role.

Predictive Algorithms

Creating deep neural networks has transformed businesses like Amazon and Netflix through intelligent recommendations, and will lead to better sales automation and lead generation, highly efficient marketing, predictive hiring, algorithmic trading.

In highly trained deep neural networks, there is, and will increasingly be an associated deep predictive tool built into the system. Some would argue that a human's ability to predict the future is one of the key parts of our intelligence.

Facebook's Yann LeCun has said that once deep learning overcomes some technical hurdles, it will open up other areas like, automatically-created high-performance data analytics systems; vector-space embedding of everything (augmented reality); multimedia content understanding, search and indexing; multilingual speech dialog systems; driver-less cars; autonomous maintenance robots and personal care robots.

Looked at it from this way, deep learning is a foundation technology on which so many future developments are built.

Who is Building the Deep Learning Foundation?

Apart from academia, (which has been plundered by the big names in tech), Google is the biggest company in this space, though Baidu will surely follow since they recently poached deep learning pioneer Andrew Ng. IBM, Microsoft, and Facebook have made great strides as well. There are a handful of smaller companies, most notably AlchemyAPI and Cortica.

For instance, Google’s DeepMind team has published their initial efforts to build algorithm-creating systems that it calls “Neural Turing Machines”; Facebook showed off a “generic” 3D feature for analyzing videos; and Microsoft researchers concluded that quantum computing could prove a boon for certain types of deep learning algorithms.

Shivon Zilis, has put together an infographic that shows what she calls the Machine Intelligence Landscape of companies involved with deep learning.

Machine Intelligence Landscape
Image Source - Shivon Zilis
Deep learning advances the state of the art in pattern recognition and natural language processing, but what is important when looking at the core feature of deep learning is that it acquires generalized representations grounded in experience.

The next frontier will be building machines that can represent the deepest conceptual structures of our minds, such as what a container is, and can use that ability to understand abstract concepts through metaphor. "If we can make it all the way down so that computers have a grounded understanding of the most fundamental concepts, we will have built an intelligence that is as flexible as our own," writes Jonathan Mugan.

With the increasing exposure, deep learning experts are in high demand, and an ever-increasing range of applications for the technology is being introduced.

Moreover, while a lot of information has come out lately about how organizations are using deep learning, a lot of the work is behind the scenes or not released to the public.  Some of these developments may turn out to have the greatest impact in the future.  For many organizations, deep learning is about much more than tagging images.

In short, deep learning is moving us one step into the future, and giant leap closer to artificial general intelligence. Like it or not, deep learning really means is that we are close to living with smart machines.


By 33rd SquareEmbed