PROVIDENCE, R.I. [Brown University] — Why is it that artificial intelligence systems can outperform humans on some visual tasks, like facial recognition, but make egregious errors on others — such as classifying an image of an astronaut as a shovel?
Like the human brain, AI systems rely on strategies for processing and classifying images. And like the human brain, little is known about the precise nature of those processes. Scientists at Brown University’s Carney Institute for Brain Science are making strides in understanding both systems, publishing a recent paper that helps to explain computer vision in a way the researchers say is accessible as well as more useful than previous models.
“Both the human brain and the deep neural networks that power AI systems are referred to as black boxes because we don’t know exactly what goes on inside,” said Thomas Serre, a Brown professor of cognitive, linguistic and psychological sciences and computer science. “The work that we do at Carney’s Center for Computational Brain Science is trying to understand and characterize brain mechanisms related to learning, vision, and all kinds of things, and highlighting the similarities and differences with AI systems.”
Deep neural networks use learning algorithms to process images, Serre said. They are trained on massive sets of data, such as ImageNet, which has over a million images culled from the web organized into thousands of object categories. The training mainly involves feeding data to the AI system, he explained.
“We don't tell AI systems how to process images — for example, what information to extract from the images to be able to classify them,” Serre said. “The AI system discovers its own strategy. Then computer scientists evaluate the accuracy of what they do after they’ve been trained — for example, maybe the system achieves 90% accuracy on discriminating between a thousand image categories.”
Serre collaborated with Brown Ph.D. candidate Thomas Fel and other computer scientists to develop a tool that allows users to pry open the lid of the black box of deep neural networks and illuminate what types of strategies AI systems use to process images. The project, called CRAFT — for Concept Recursive Activation FacTorization for Explainability — was a joint project with the Artificial and Natural Intelligence Toulouse Institute, where Fel is currently based. It was presented this month at the IEEE/CVF Conference on Computer Vision and Pattern Recognition in Vancouver, Canada.
Serre shared how CRAFT reveals how AI “sees” images and explained the crucial importance of understanding how the computer vision system differs from the human one.
Q. What does CRAFT show about the way AI processes images?
CRAFT provides an interpretation of the complex and high-dimensional visual representations of objects learned by neural networks, leveraging modern machine learning tools to make them more understandable to humans. This leads to a representation of the key visual concepts used by neural networks to classify objects. As an example, let’s think about a type of freshwater fish called a tench. We built a website that allows people to browse and visualize these concepts. Using the website, one can see that AI system’s concept of a tench includes sets of fish fins, heads, tails, eyeballs and more.
These concepts also reveal that deep networks sometimes pick up on biases in datasets. One of the concepts associated with the tench, for example, is the face of a white male, because there are many photos online of sports fishermen holding fish that look like tench. (Yet the system can still distinguish a man from a fish.) In another example, the predominant concept associated with a soccer ball in neural networks is the presence of soccer players on the field. This is likely because the majority of internet images featuring soccer balls also include individual players rather than solely the ball itself.