Carney Institute (CI): Tell us a bit about yourself.
Michael Lepori (ML): I grew up outside of New York City and went to college at Johns Hopkins University where I studied physics and computer science. I got into research early, mostly taking part in projects at the intersection of machine learning, cognition, perception and linguistics.
I graduated in the middle of the pandemic and took a position with a government contractor. A few months in, a friend of mine reached out to me about doing some machine learning for a start-up he was getting off the ground. I started working with him just a few hours a week and soon joined full time, trying to make a platform where students could talk to each other via synchronous text-based discussion that would facilitate constructive academic discussions. It was a clean fit. I could take the things that I had learned in class and in research and straightforwardly apply them. After that, I joined another, larger start-up doing natural language processing for a brief time before coming to Brown.
CI: What are you focusing on here at Brown?
ML: Working with professors Thomas Serre and Ellie Pavlick, I’m investigating whether neural network representations are structured in the same way as we think human cognitive representations are structured. We have this hypothesis about how humans might solve a complex task or represent a complex property, but do models do what we think humans do?
At least one theory of human cognition states that we understand things compositionally - we see individual properties and then compose them together in an almost algebraic way to give the correct result. Imagine you were trying to detect a blue circle. Mentally, you decompose the image into a color property - blue - and a shape property - a circle - and then merge them into “blue circle.” But to a neural network model, it's just a bunch of pixels. The model doesn’t see objects and scenes, just a big matrix of numbers. Because of this, the model might look at this bunch of pixels and just directly go to “blue circle” straight away without understanding color or shape independently.
However, if it was implementing a compositional system, it might extract the color feature and the shape feature independently and then compose them together to either give you your blue circle or understand that it's not a blue circle at all. This representational system - extracting color and shape and applying them as you want - is a well-studied property of neural networks, except most people study it in terms of behavior.
Models tend to fail at generalization tasks of this nature. Give them a red circle or blue square in this example and they’re lost, so people conclude that models can’t understand things compositionally. But compositionality is really about how the representations are structured. I want to find out if models can learn representations that could eventually help them overcome this behavioral handicap.
CI: Is this analogous to how children and babies learn to recognize and process new inputs and stimuli?
ML: Compositional generalization is a known property of children and adults. One place you can see this is in natural language. We understand natural language because it's largely compositional. You could say” I'm drinking a cup of coffee "and then extend it freely using the rules of English syntax. So, the sentence becomes “I'm drinking a cup of coffee in the room near the light with Mary.” The meaning of the sentence that I'm building is composed of the meaning of individual phrases within the sentence. One way to track human compositional behavior is to track the acquisition of language. We get language and comprehension pretty early, so there's certainly a set of compositional behaviors that are endemic to human cognition.
It's crazy that compositionality recurs as a unifying thread throughout various aspects of cognition, like vision, language, and reasoning. That's why it's really cool to have Thomas, the visual neuroscientist, and Ellie, the computational linguist, as my advisors, supporting work that is inherently cross modality.
CI: You recently published a preprint article with professors Serre and Pavlick with the intriguing title of “NeuroSurgeon: A Toolkit for Subnetwork Analysis.” Tell us more about that.
ML: NeuroSurgeon is a package that I implemented in the programming language Python. It’s meant to be an “off the shelf tool” that helps users find functional subunits within a neural network. It borrows techniques from model pruning literature, which is focused on compressing models as much as possible through the reduction of the number of parameters. For my work, NeuroSurgeon helps figure out whether your neural network has self-organized in any interesting way during training. If you’re trying to understand where GPT-2 computes a particular linguistic dependency for example, NeuroSurgeon can help you uncover the subnetwork that computes that dependency.
So, in our “blue circle” detector example, we have these two properties that are mutually independent - the shape computation and the color computation - and they can be applied separately. NeuroSurgeon would allow us to see whether there is one subnetwork that computes shape and another subnetwork that computes color. If the two subnetworks are distinct, then this suggests that the model is representing the image in a compositional way.
CI: Are there other projects on your immediate horizon?
ML: I have a few ideas in the works, but one project that we're concretely working on right now is about the “same-different” abstract relation that neural networks seem to learn. Imagine that you have an image with two items in it. These items can take one of 16 different shapes, 16 different colors and 16 different textures. Our goal is to train the model to determine whether two objects are the same or different. Then we’re going to recover the algorithm that the neural network implements to make this judgment.
This will tell us something about the features that the network has chosen to be important to this judgment: be it shape, color or texture. It will also tell us how different models represent abstract relations like this. Does a network trained with linguistic input use a different algorithm for same-different judgements than a network trained with just visual input? This work adapts techniques developed in the mechanistic interpretability literature from natural language processing to answer cognitively-interesting questions in vision.
CI: Looking ahead, where do you see the field of artificial intelligence going in five years? 10 years?
ML: 10 years in computer science is like a lifetime. I think if you fast forward a decade, it would be great to be able to get human interpretable algorithms just directly from the model: one where a human could feed the system problems that we can’t solve. The algorithm wouldn't simply solve the unknown but would give you an approximation of how it did so. Essentially, “showing its work.” For example, linguists have been arguing about the structure of English syntax for decades. Imagine we just had a hypothesis generator that spits out, “Oh, here's what I do. Here's how I do syntax,” and we could see how that compares to our methods. Does that stack up against our hypotheses and theories? What might change? How might we compare humans to machines in a much deeper algorithmic way? I think that'd be amazing.