Filling-in from the new point of view about vision

Clearly there is a difference between the more “active” filling in phenomenon observed in illusory contours for example, and the more implicit filling in observed in amodal completion. It does seem reasonable therefore to postulate some kind of active mechanism in the illusory contour case, and there is ample evidence that neural mechanisms are involved[1].

But I want to claim that such mechanisms are not really “filling-in” mechanisms. If they were, this would bring us back to the problematic homuncular view of seeing, since it would be suggesting that the purpose of these mechanisms would be to re-create a perfected image of the world that can subsequently be projected onto an internal screen and contemplated by a homunculus.

Instead I would rather say that apparently active filling-in mechanisms are the product of the way the brain extracts information about the outside world. This can be understood in the following way.

What the brain does when we see is to probe the environment, checking whether this or that thing is present. To do this, the brain uses a battery of neural processing widgets or "filters" that it applies to the sensory input. Such filters may have evolved in our brains through the course of evolution, and furthermore the filters may be additionally tuned by our everyday interaction with the world.

It seems likely that what the visual system must do in order to tune its filters to provide information about the outside world is to study the statistics of the incoming neural events, and determine which among these events are “interesting” or “useful”, and which are “irrelevant” or “noise”. Possibly a good way of defining “interesting” events is to pinpoint events which are unexpected[2]. To do this the visual system must first figure out what generally happens, and then check whether what actually is happening is different.

Suppose one night you looked up into the starry sky and saw a line of stars all arranged in a perfectly straight line, or, say, in the shape of the head of Snoopy. You would be very surprised, and you might conclude that an extraterrestrial army was approaching the earth or that some far-off intelligence was trying to signal to you.

Similarly, if you were to take a computer and generate displays consisting of millions of random pixels, it would be very unlikely that you would find cases where pixels are all lined up to form a line or a well-defined edge or border.

So when the visual system detects that lines and borders are things that actually happen quite frequently (but not too frequently -- since that would not provide any information, nor too rarely -- since that would not be worth watching out for), it makes sense for it to take notice, and to consider that lines and borders might be convenient building blocks or might constitute a useful visual vocabulary in which to describe the outside world. Defining lines and borders as the filters with which to analyze the world would appear to be a useful thing. The neural hardware in the visual system would then tune itself so that it tends to code outside information in terms of these building blocks or visual vocabulary items.

And indeed, we know from the work that earned David Hubel and Thornton Wiesel the Nobel Prize in 1981, that the first stages of the visual system do indeed contain detectors that detect oriented lines and borders, among other basic building blocks.

Subsequently then, when later the visual system applies its filters to the world, and when it is confronted with cues that are compatible with the presence of lines and borders, the filters will signal that there are lines and borders out there. And this will happen even if the cues are only partially present, or even sometimes when in fact there are no lines or borders at all: Such a situation might occur when a Kanizsa or an Ehrenstein figure (see the Figure in Chapter 2) tricks the neural hardware into responding in the way that it usually does when there are lines or borders, without there actually being lines or borders present.

A better known example of how the neural hardware can be tricked in this way is cinema.

We all know that the pictures in motion pictures are not in actual motion. Movies consist in sequences of still shots, clicking away in succession at a rate of about 24 frames per second. We see this as smooth motion because the neural hardware that detects smooth motion in the retina actually happens to react in the same way when an object moves smoothly from A to B, as when the object simply jumps from A to B — provided the jump takes place sufficiently quickly, and provided A and B are sufficiently close.

A particularly interesting case is when the object changes its color as it jumps. Say a red square at A is followed in rapid succession by a green square at B. The subjective impression you have is of a red square moving to B, and changing its color to green somewhere in the middle along the way between A and B.

The philosopher Daniel Dennett has devoted considerable discussion to this example in his book Consciousness Explained. What interested Dennett was the philosophical difficulty this finding seems to present: the nervous system doesn’t know in advance that the square is going to change to green when it arrives at B. Thus, how can it generate the perception of changing to green in the middle of the path of motion, during the motion, before the information about what color will occur in the future has been received?

From the philosopher’s point of view there seems to be a problem here. But from the neuroscientist’s point of view there is no problem at all: the neural hardware responds in the same way to the jump from A to B, with the color change, as it would if there were smooth motion from A to B with a color change in the middle. The neural hardware has been tricked. Or rather, the neural hardware provides the same information to the observer as when smooth motion occurs with a change in the middle. The observer thus interprets the scene in that way.

The situation is analogous to what happens on a computer screen: what looks to you like a straight white line on the screen, for example, is actually created by the lighting up of clumps of little red, green and blue pixels (you can see these easily by lightly spitting at the screen, creating small droplets that act like lenses!). Because our eyes use only three photoreceptor types to see colors, the appearance of white can be obtained with only three pixel colors on the screen. Furthermore, providing the pixels are very close to each other, they look like a continuous straight line, even though actually there are only a series of disconnected dots.

A philosopher might want to speculate about how the visual system manages to interpolate a straight white line from a series of disconnected red, blue and green pixels. But a neuroscientist knows that when the pixels are so close together that they can’t be distinguished individually by the eye’s acuity, the neural mechanisms simply respond the same way as if they had been shown a straight white line. Again, the neural hardware has been tricked.

Coming back to the question of filling-in of the blind spot or of virtual contours, we see that that this way of thinking is quite different from an account in terms of “generating” borders and edges, or filling in a white triangle floating above the page. The account is different because nothing is being generated. Neural hardware which normally signals borders and edges is being stimulated (even if in fact there are none there), and this provides evidence in favor of there being borders and edges, and that is what we see.

To summarize what we’ve said about filling-in the blind spot: Under the new view of seeing being suggested here, there is no need for filling-in, because there is no internal screen to be filled in. Vision involves probing the outside world, which has no holes, with a tool (the retina) which has its defects, but which cannot be perceived any more than a ruler can be used to measure its own length.

Thus the fact that the retina has a blind spot and a vascular scotoma should not interfere with the feeling of wholeness you have when you look at visual scenes. Seeing consists in exploring the world with a tool, namely the retina. If the tool has defects, seeing the world becomes more difficult, but we do not confuse the defects of the tool with defects in the world.

On the other hand there clearly are phenomena in vision, like virtual contours and amodal completion which people have been tempted to interpret as involving some kind of filling-in. However I claim that such phenomena are better considered as the result of basic information-extraction processes in the visual system, whose purpose it is to extract regularities in the environment. These processes, can, under certain circumstances, be tricked into signaling the presence of information that is in fact not present in the environment. When that happens we see lines where there are no lines, brightness changes where there no brightness changes, extended surfaces where there are no surfaces, for example.

Deformations and cortical magnification

Why do retinal distortions and cortical magnification not affect our perception of the geometry of the world?

To understand this better, let’s set up a little thought experiment. Let’s take a bundle of optical fibers—those hair-thin, flexible glass fibers that are used nowadays in telecommunications to transmit telephone, television and internet channels, and in medicine to do endoscopic examinations. Sometimes you see such optical fibers in decorative lamps making hundreds of little colored specks of light at their ends. Light going in one end of the fiber is trapped and carried along the fiber. It makes a speck of light at the other end where it comes out.

Imagine you hold the bunch of fibers on a sheet of paper with marks on it, and look at the pattern you get at the other end. Instead of keeping the fibers in order, however, assume they’ve been mixed up, so that the pattern you see at the other end is a mess.

Is there any way you can figure out whether there is a straight line on the paper?

The answer is yes. If you move the bundle along the straight line, the pattern you see at the other end of the fiber bundle does not change. But if you move the bundle across the straight line, the pattern changes dramatically. Note that this is true even though the optical fibers are all mixed up.

Now let’s look at what this means for the eye. The eye samples the retinal image very much like the bundle of optical fibers sampled the page. Like the optical fibers, the nerve fibers transmit information about specks of light on the retina into the visual cortex of the brain. But, as we have seen, the representation you get in the visual cortex is highly distorted. First, there is the distortion caused by the lens’s aberrations and by the curvature of the eyeball. Then there are distortions from the non-uniform way the retina samples, as well as through the phenomenon of cortical magnification, in which foveal information is overly represented in the visual cortex as compared to peripheral information. All this means that you would be hard put to know if what is received in the cortex corresponds to a straight line out in the world or not.

But if we move the eye a bit, and look at the changes that this causes in the cortical representation, just as with the optical fiber, we can study the laws relating eye motions to resulting changes in the pattern of excitation. Certain such laws are characteristic of straight lines: when the eye moves along a straight line, there will be little change in the cortical excitation pattern. When it moves across a line, there will be a large change[3].

And note that doesn’t matter if the optics of the eye produce horrendous distortions, with the image totally bent or exploded. And it doesn’t matter that the cortical representation is distorted. The retina doesn’t have to be flat: in fact it can be any shape, even corrugated. If the eye moves along a straight line, there will be little change in the excitation pattern.

 



[1] See the review by Pessoa et al, 1998; and Komatsu, H. 2006. The neural mechanisms of perceptual filling-in. Nature Reviews Neuroscience 7:220-231.

[2] I first came across such ideas in the early, nowadays less well known but extremely interesting, work of David Marr on archicortex as an account of longterm memory (Marr, D. (1971) Simple memory: a theory for archicortex. Phil. Trans. Royal Soc. London, 262:23-81.) Today the idea that the visual system extracts statistical regularities is a very fertile field of research, extensively developed by  [Barlow, 1990 #1854], among many others. What I think is particularly interesting is that the visual system should extract events which are neither highly frequent, nor completely rare. Highly frequent events, because they occur all the time, cannot convey very much interesting information. Highly rare events, though undoubtedly interesting, don't merit your wasting too much time on them, because they occur so rarely. Only events of just-right in-between frequency are what we need to concentrate on.

[3] This idea was suggested by Platt, JR (1960) How we see straight lines. Scient. Am., June, 121-129. And further developed by J.J. Koenderink, The concept of local sign. P 495-547 in Limits in Perception: Essays in Honour of Maarten A. Bouman. Andrea J. van Doorn, Wim A. van de Grind, Maarten A. Bouman, Jan J. Koenderink. Contributor Andrea J. van Doorn, Wim A. van de Grind, Jan J. Koenderink, VSP, 1984, 568 pp,