Clearly
there is a difference between the more ÒactiveÓ filling in phenomenon observed
in illusory contours for example, and the more implicit filling in observed in
amodal completion. It does seem reasonable therefore to postulate some kind of
active mechanism in the illusory contour case, and there is ample evidence that
neural mechanisms are involved[1].
But I
want to claim that such mechanisms are not really Òfilling-inÓ mechanisms. If
they were, this would bring us back to the problematic homuncular view of
seeing, since it would be suggesting that the purpose of these mechanisms would
be to re-create a perfected image of the world that can subsequently be
projected onto an internal screen and contemplated by a homunculus.
Instead I
would rather say that apparently active filling-in mechanisms are the product
of the way the brain extracts information about the outside world. This can be
understood in the following way.
What the
brain does when we see is to probe the environment, checking whether this or
that thing is present. To do this, the brain uses a battery of neural
processing widgets or "filters" that it applies to the sensory input.
Such filters may have evolved in our brains through the course of evolution,
and furthermore the filters may be additionally tuned by our everyday
interaction with the world.
It seems
likely that what the visual system must do in order to tune its filters to
provide information about the outside world is to study the statistics of the
incoming neural events, and determine which among these events are
ÒinterestingÓ or ÒusefulÓ, and which are ÒirrelevantÓ or ÒnoiseÓ. Possibly a
good way of defining ÒinterestingÓ events is to pinpoint events which are
unexpected[2].
To do this the visual system must first figure out what generally happens, and then check whether
what actually is
happening is different.
Suppose
one night you looked up into the starry sky and saw a line of stars all
arranged in a perfectly straight line, or, say, in the shape of the head of
Snoopy. You would be very surprised, and you might conclude that an
extraterrestrial army was approaching the earth or that some far-off
intelligence was trying to signal to you.
Similarly,
if you were to take a computer and generate displays consisting of millions of
random pixels, it would be very unlikely that you would find cases where pixels
are all lined up to form a line or a well-defined edge or border.
So when
the visual system detects that lines and borders are things that actually
happen quite frequently (but not too frequently -- since that would not provide
any information, nor too rarely -- since that would not be worth watching out
for), it makes sense for it to take notice, and to consider that lines and
borders might be convenient building blocks or might constitute a useful visual
vocabulary in which to describe the outside world. Defining lines and borders
as the filters with which to analyze the world would appear to be a useful
thing. The neural hardware in the visual system would then tune itself so that
it tends to code outside information in terms of these building blocks or
visual vocabulary items.
And
indeed, we know from the work that earned David Hubel and Thornton Wiesel the
Nobel Prize in 1981, that the first stages of the visual system do indeed
contain detectors that detect oriented lines and borders, among other basic
building blocks.
Subsequently
then, when later the visual system applies its filters to the world, and when
it is confronted with cues that are compatible with the presence of lines and
borders, the filters will signal that there are lines and borders out there.
And this will happen even if the cues are only partially present, or even
sometimes when in fact there are no lines or borders at all: Such a situation
might occur when a Kanizsa or an Ehrenstein figure (see the Figure in Chapter
2) tricks the neural hardware into responding in the way that it usually does
when there are lines or borders, without there actually being lines or borders
present.
A better
known example of how the neural hardware can be tricked in this way is cinema.
We all
know that the pictures in motion pictures are not in actual motion. Movies
consist in sequences of still shots, clicking away in succession at a rate of
about 24 frames per second. We see this as smooth motion because the neural
hardware that detects smooth motion in the retina actually happens to react in
the same way when an object moves smoothly from A to B, as when the object
simply jumps from A to B — provided the jump takes place sufficiently
quickly, and provided A and B are sufficiently close.
A
particularly interesting case is when the object changes its color as it jumps.
Say a red square at A is followed in rapid succession by a green square at B.
The subjective impression you have is of a red square moving to B, and changing
its color to green somewhere in the middle along the way between A and B.
The
philosopher Daniel Dennett has devoted considerable discussion to this example
in his book Consciousness Explained. What interested Dennett was the philosophical
difficulty this finding seems to present: the nervous system doesnÕt know in
advance that the square is going to change to green when it arrives at B. Thus,
how can it generate the perception of changing to green in the middle of the path of motion, during the motion, before the information
about what color will occur in the future has been received?
From the
philosopherÕs point of view there seems to be a problem here. But from the neuroscientistÕs
point of view there is no problem at all: the neural hardware responds in the
same way to the jump from A to B, with the color change, as it would if there
were smooth motion from A to B with a color change in the middle. The neural
hardware has been tricked. Or rather, the neural hardware provides the same
information to the observer as when smooth motion occurs with a change in the
middle. The observer thus interprets the scene in that way.
The
situation is analogous to what happens on a computer screen: what looks to you
like a straight white line on the screen, for example, is actually created by
the lighting up of clumps of little red, green and blue pixels (you can see
these easily by lightly spitting at the screen, creating small droplets that
act like lenses!). Because our eyes use only three photoreceptor types to see
colors, the appearance of white can be obtained with only three pixel colors on
the screen. Furthermore, providing the pixels are very close to each other,
they look like a continuous straight line, even though actually there are only
a series of disconnected dots.
A
philosopher might want to speculate about how the visual system manages to
interpolate a straight white line from a series of disconnected red, blue and
green pixels. But a neuroscientist knows that when the pixels are so close
together that they canÕt be distinguished individually by the eyeÕs acuity, the
neural mechanisms simply respond the same way as if they had been shown a
straight white line. Again, the neural hardware has been tricked.
Coming
back to the question of filling-in of the blind spot or of virtual contours, we
see that that this way of thinking is quite different from an account in terms
of ÒgeneratingÓ borders and edges, or filling in a white triangle floating
above the page. The account is different because nothing is being generated.
Neural hardware which normally signals borders and edges is being stimulated
(even if in fact there are none there), and this provides evidence in favor of
there being borders and edges, and that is what we see.
To
summarize what weÕve said about filling-in the blind spot: Under the new view
of seeing being suggested here, there is no need for filling-in, because there
is no internal screen to be filled in. Vision involves probing the outside
world, which has no holes, with a tool (the retina) which has its defects, but
which cannot be perceived any more than a ruler can be used to measure its own
length.
Thus the
fact that the retina has a blind spot and a vascular scotoma should not
interfere with the feeling of wholeness you have when you look at visual
scenes. Seeing consists in exploring the world with a tool, namely the retina.
If the tool has defects, seeing the world becomes more difficult, but we do not
confuse the defects of the tool with defects in the world.
On the
other hand there clearly are phenomena in vision, like virtual contours and
amodal completion which people have been tempted to interpret as involving some
kind of filling-in. However I claim that such phenomena are better considered
as the result of basic information-extraction processes in the visual system,
whose purpose it is to extract regularities in the environment. These
processes, can, under certain circumstances, be tricked into signaling the
presence of information that is in fact not present in the environment. When
that happens we see lines where there are no lines, brightness changes where
there no brightness changes, extended surfaces where there are no surfaces, for
example.
Why do
retinal distortions and cortical magnification not affect our perception of the
geometry of the world?
To
understand this better, letÕs set up a little thought experiment. LetÕs take a
bundle of optical fibers—those hair-thin, flexible glass fibers that are
used nowadays in telecommunications to transmit telephone, television and
internet channels, and in medicine to do endoscopic examinations. Sometimes you
see such optical fibers in decorative lamps making hundreds of little colored
specks of light at their ends. Light going in one end of the fiber is trapped
and carried along the fiber. It makes a speck of light at the other end where
it comes out.
Imagine
you hold the bunch of fibers on a sheet of paper with marks on it, and look at
the pattern you get at the other end. Instead of keeping the fibers in order,
however, assume theyÕve been mixed up, so that the pattern you see at the other
end is a mess.
Is there
any way you can figure out whether there is a straight line on the paper?
The
answer is yes. If you move the bundle along the straight line, the pattern you see at the
other end of the fiber bundle does not change. But if you move the bundle across the straight line, the pattern
changes dramatically. Note that this is true even though the optical fibers are
all mixed up.
Now letÕs
look at what this means for the eye. The eye samples the retinal image very
much like the bundle of optical fibers sampled the page. Like the optical
fibers, the nerve fibers transmit information about specks of light on the
retina into the visual cortex of the brain. But, as we have seen, the
representation you get in the visual cortex is highly distorted. First, there
is the distortion caused by the lensÕs aberrations and by the curvature of the
eyeball. Then there are distortions from the non-uniform way the retina
samples, as well as through the phenomenon of cortical magnification, in which
foveal information is overly represented in the visual cortex as compared to
peripheral information. All this means that you would be hard put to know if
what is received in the cortex corresponds to a straight line out in the world
or not.
But if we
move the eye a bit, and look at the changes that this causes in the cortical
representation, just as with the optical fiber, we can study the laws relating
eye motions to resulting changes in the pattern of excitation. Certain such
laws are characteristic of straight lines: when the eye moves along a straight line, there will be
little change in the cortical excitation pattern. When it moves across a line, there will be a large change[3].
And note
that doesnÕt matter if the optics of the eye produce horrendous distortions,
with the image totally bent or exploded. And it doesnÕt matter that the cortical
representation is distorted. The retina doesnÕt have to be flat: in fact it can be any shape, even
corrugated. If the eye moves along a straight line, there will be little change
in the excitation pattern.
[1] See the review by Pessoa et al, 1998; and Komatsu, H. 2006. The neural mechanisms of perceptual filling-in. Nature Reviews Neuroscience 7:220-231.
[2] I first came across such ideas in
the early, nowadays less well known but extremely interesting, work of David
Marr on archicortex as an account of longterm memory (Marr, D. (1971) Simple
memory: a theory for archicortex. Phil. Trans. Royal Soc. London, 262:23-81.)
Today the idea that the visual system extracts statistical regularities is a
very fertile field of research, extensively developed by [Barlow, 1990 #1854], among many
others. What I think is particularly interesting is that the visual system
should extract events which are neither highly frequent, nor completely rare.
Highly frequent events, because they occur all the time, cannot convey very
much interesting information. Highly rare events, though undoubtedly
interesting, don't merit your wasting too much time on them, because they occur
so rarely. Only events of just-right in-between frequency are what we need to
concentrate on.
[3] This idea was suggested by Platt, JR (1960) How we see straight lines. Scient. Am., June, 121-129. And further developed by J.J. Koenderink, The concept of local sign. P 495-547 in Limits in Perception: Essays in Honour of Maarten A. Bouman. Andrea J. van Doorn, Wim A. van de Grind, Maarten A. Bouman, Jan J. Koenderink. Contributor Andrea J. van Doorn, Wim A. van de Grind, Jan J. Koenderink, VSP, 1984, 568 pp,