.WAFL (l8: ‡ &Ď@pkZ…Ă]ź—‚9Í)[šCňntry(żd Y-ËiXuä>ßŕňěkZ…Ă]ź—‚9Í)[šCňNĆŹurl ihttp://web13.epnet.com/citation.asp?tb=1&_ug=dbs+4+ln+en%2Dus+sid+FF0B939D%2D392C%2D485C%2DAF76%2DB0C3F9340BC6%40sessionmgr3%2Dsessionmgr4+F3B0&_up=dba+pdhau+st+alpha+3AA5&_us=bs+%28AR++%22Dawson%2C++Michael++R%2E+++W%2E%22%29+db+4+ds+%28AR++%22Dawson%2C++Michael++R%2E+++W%2E%22%29+dstb+KS+fcl+Aut+hd+0+hs+0+ri+KAAACBVB00094794+sm+KS+ss+SO+C2D1&cf=1&fn=1&rn=2mime text/htmlhvrsdata EBSCOhost
New Search  | View FolderView Folder | Preferences | Help
Basic SearchAdvanced SearchChoose Database
INDIANA UNIV LIBRARIES
Keyword Search Publication Search Authors--PsycARTICLES Image Collections
Prev 2 of 3 Next   Result List | Refine Search     PrintPrint  E-mailE-mail  SaveSave   Items added to the folder may be printed, e-mailed or saved from the View Folder screen.Folder has 0 items.
Formats:   CitationCitation  HTML Full Text with GraphicsHTML Full Text  




Title: The How and Why of What Went Where in Apparent Motion: Modeling Solutions to the Motion Correspondence Problem: ,  By: Michael R. W. Dawson, Psychological Review, 0033-295X, October 1, 1991, Vol. 98, Issue 4
Database: PsycARTICLES
The How and Why of What Went Where in Apparent Motion: Modeling Solutions to the Motion Correspondence Problem:


Contents
By: Michael R. W. Dawson
Motion Correspondence Problem
Part 1: Constraining Solutions to the Motion Correspondence Problem
Nearest Neighbor Principle
Relative Velocity Principle
Element Integrity Principle
The Need for Multiple Constraints
Part 2: A Motion Correspondence Model
Autoassociation and Problems of Underdetermination
Defining Connection Strengths
Nearest neighbor weights.
Relative velocity weights.
Element integrity weights.
Defining the overall connection matrix.
Updating the Network
Convergence Properties of the Network
Stable States Are Optimal Problem Solutions
Interpreting Converged States
Specifying the Starting State
Performance of the Model: Preliminary Remarks
Performance of the Model: Benchmark Displays
Single element translation.
Multiple element translation.
Multiple element rotation.
Nearest neighbor sensitivity.
Context sensitivity.
Motion shear.
Stationary elements.
Necessity of all three constraints.
Problems with "standard" settings.
Part 3: Motion Correspondence and Motion Perception
The Two-Process Distinction in Motion Perception
Cover Principle
Least-Change Transformations
Image-Matching Procedures
Cognitive Penetrability and Motion Correspondence
Cognitive impenetrability of correspondence matches.
Cognitive penetration and element individuation.
Visual attention and tag assignment.
Motion correspondence and tag assignment.
Part 4: Toward the Physiology of Motion Correspondence Processing
The Motion Pathway in Vision
Where in the Motion Pathway Is Correspondence Computed?
The neural substrate must be sensitive to individuated elements.
The neural substrate must be sensitive to element locations.
The neural substrate must have very large receptive fields.
The neural substrate must be involved in tracking objects.
The neural substrate must track elements defined in different sensory modalities.
The neural substrate must mediate attentional processing of tracked elements.
Summary.
Neural Measurements for Motion Correspondence Processing
The nearest neighbor principle.
The relative velocity principle.
Field effects and relative motion.
The element integrity principle.
Alternative Structures for the Motion Correspondence Network
Designing a fixed processing network.
Abandoning vector normalization.
Eliminating massively parallel connections.
Limiting the number of processing units.
Conclusion: The Two-Process Distinction Revisited
Footnotes
References:

By: Michael R. W. Dawson

University of Alberta Edmonton, Alberta Canada

This research was supported by Natural Sciences and Engineering Research Council of Canada (NSERC) Operating Grant A2038, by Equipment Grant 46584, by a grant from the Central Research Fund of the University of Alberta, and by a Province of Alberta Summer Temporary Employment Program (STEP) grant—all awarded to Michael R. W. Dawson—and by NSERC Operating Grant A2600 awarded to Zenon Pylyshyn. It has benefitted greatly from discussions with Zenon Pylyshyn and Richard Wright at the University of Western Ontario and from discussions with Vince Di Lollo, Walter Bischof, Charles Bourassa, Bill Rozeboom, Don Kuiken, Ngaire Nevin-Meadows, Don Schopflocher, and Nancy Digdon at the University of Alberta. Comments from Dennis Proffitt and an anonymous reviewer on an earlier version of the manuscript were also extremely useful. I would like to acknowledge Brian Harder, who died during the past year. He began work on this research with me and performed numerous simulation runs and prepared many of the figures.

Correspondence may be addressed to: Michael R. W. Dawson, Department of Psychology, University of Alberta, Edmonton, Alberta T6G 2E9 Canada. Electronic mail may be sent to mike@psych.ualberta.ca.

Many researchers have described the goal of visual perception as the construction of useful representations about the world (e. g. , Horn, 1986; Marr, 1976, 1982; Ullman, 1979). These representations are derived from the information projected from a three-dimensional visual world (the distal stimulus) onto an essentially two-dimensional surface of light receptors in the eyes. The interpretation of the distal stimulus must be determined from the resulting pattern of retinal stimulation (the proximal stimulus).

However, the information represented in the proximal stimulus cannot, by itself, completely determine the nature of the distal stimulus. This is because the proximal stimulus does not preserve the full dimensionality of the physical world. The mapping from three-dimensional patterns to two-dimensional patterns is a many-to-one mapping and is not uniquely invertible (see Gregory, 1970; Horn, 1986; Marr, 1982; Richards, 1988). Retinal stimulation geometrically underdetermines interpretations of the physical world. 1

Underdetermination can also result because the information available from local measurements of the proximal stimulus is consistent with a large number of different global interpretations. Alone, the local measurements are not sufficient to determine which global interpretation is correct. One example of this is the aperture problem: local measurements of a contour's movement do not by themselves specify the contour's true velocity (e. g. , Hildreth, 1983; Marr & Ullman, 1981). Another example is the stereo correspondence problem: local measurements do not by themselves specify which proximal stimulus element on the right and left retinas were produced by the same distal element (e. g. , Grimson, 1981; Marr & Poggio, 1976).

A third example is called the motion correspondence problem (e. g. , Attneave, 1974; Ramachandran & Anstis, 1986b; Ullman, 1979), and in this article, a model for its solution is described as follows: First, the motion correspondence problem is defined, and three constraining principles to be used to generate solutions to this problem are discussed. Second, an model that applies these constraints is outlined. Third, the model's performance is used to focus discussion on several theoretical issues in the study of motion perception, including the relation between motion correspondence processing and the tag-assignment problem in visual cognition. Fourth, functional properties of the model are related to what is known about the physiological mechanisms that mediate motion perception by humans.

Motion Correspondence Problem

The human visual system can produce the illusion of movement from a rapid succession of static images. 2 If elements depicted in these images are displaced a large amount (i. e. , a degree of visual angle or more), this illusory or apparent motion is usually presumed to be detected by the so-called long-range motion system (e. g. , Anstis, 1978, 1980; Braddick, 1974, 1980; Petersik, 1989; Ullman, 1981). To generate apparent motion, the long-range system must identify an element in a position in one image (Frame 1) and another element in a different position in the next image (Frame 2) as constituting different glimpses of the same moving element. A motion correspondence match between a Frame 1 element and a Frame 2 element is such an identification.

In this article, an element is defined as an individuated component of the proximal stimulus. In other words, an element is some aspect of the proximal stimulus that can be referred to by a unique symbolic code (e. g. , a token, or a FINST [to be described] in the sense of Pylyshyn, 1989). Ullman (1979, Chap. 2) presented evidence suggesting that these tokens could represent components such as oriented parts of edges, corners, and terminators (i. e. , the type of information available in the primal sketch of Marr, 1982).

Measurements of Frame 1 and Frame 2 element positions underdetermine the motion correspondence matches that can be assigned to an apparent motion display. Several sets of motion correspondence matches are consistent with the same set of positions. This is illustrated in

rev984569fig1a.jpg

Motion correspondence as a problem of underdetermination. ([a] An apparent motion display. Outline squares represent element positions in Frame 1; solid squares represent element positions in Frame 2. [b-g] Possible motion correspondence solutions for this display, in which solid lines represent assigned motion correspondence matches. Solution b is generated by the human visual system. )

Figure 1. In general, if there are N elements in Frames 1 and 2, and if one assumes a one-to-one mapping between frames of view, then there are N! sets of correspondence matches that are consistent with the proximal stimulus (Ullman, 1979). If a one-to-one mapping between frames of view is not assumed, then the number of possible solutions increases to 2N. In order to solve the motion correspondence problem, one set of motion correspondence matches (a global interpretation) must be selected from the many that are consistent with the position measurements.

Because measurements of element locations are not sufficient to solve the motion correspondence problem, additional rules or principles must be exploited. These principles, in combination with the local measurements, must determine the set of motion correspondence matches assigned to a display. The discovery of the principles used by the human visual system has been an important goal of apparent-motion research (e. g. , Attneave, 1974; Petersik, 1989; Ramachandran & Anstis, 1986b; Ternus, 1938; Ullman, 1979).

The model to be described solves the motion correspondence problem by exploiting principles that are based on measurements of element positions: principles that minimize changes in element positions over time and that minimize changes in element positions relative to one another. The model does not exploit principles that are based on measurements of the figural appearance of elements. There were three major reasons for designing a model that was sensitive only to element positions.

First, human observers can easily experience apparent motion for displays in which all elements are of identical appearance (e. g. , all are dots or lines). For such displays, the assignment of motion correspondence matches can be based only on measurements of element positions; therefore, nonfigural principles must be important determinants of correspondence match assignments. Second, many psychophysical experiments have shown that the human visual system assigns motion correspondence matches primarily on the basis of element locations and not on the basis of element appearances. Whereas human observers are very sensitive to manipulations of element positions in apparent motion displays, they are much less sensitive to manipulations of figural properties such as shape, color, or spatial frequency (e. g. , Baro & Levinson, 1988; Burt & Sperling, 1981; Cavanagh, Arguin, & von Grunau, 1989; Dawson, 1990a; Kolers, 1972; Kolers & Green, 1984; Kolers & Pomerantz, 1971; Kolers & von Grunau, 1976; Krumhansl, 1984; Navon, 1976; Ullman, 1979, Chap. 2; Victor & Conte, 1990). Third, physiological evidence supports the existence of at least two predominantly independent anatomical pathways in the visual system, one of which is sensitive to movement but not to form (e. g. , Botez, 1975; Livingstone & Hubel, 1988; Maunsell & Newsome, 1987; Ungerleider & Mishkin, 1982). For example, many cells located late in this pathway are very sensitive to movement, regardless of stimulus shape, size, or contrast (e. g. , Albright, 1984; Albright, Desimone, & Gross, 1984; Dubner & Zeki, 1971; Maunsell & van Essen, 1983b; Rodman & Albright, 1987; Zeki, 1974). Furthermore, lesions in this area produce significant deficits in motion perception but do not appear to affect object perception (Hess, Baker, & Zihl, 1989; Newsome & Pare, 1988; Newsome, Wurtz, Dursteler, & Mikami, 1985; Zihl, von Cramon, & Mai, 1983). In sum, the psychophysical and physiological evidence indicates that element appearances play, at best, a minor role in the assignment of motion correspondence matches and therefore should not represent a major component of a motion correspondence model. 3

Part 1: Constraining Solutions to the Motion Correspondence Problem

Marr (1976, 1982) described a theory of a computation as an account of what is being computed and why. For a visual problem of underdetermination, such a theory describes a constraining principle to be used to choose the correct proximal stimulus interpretation from the set of possible interpretations. A constraining principle usually characterizes or exploits an attribute of a distal stimulus. The principle is applied as follows: if the constraining property is characteristic of some distal stimulus that could have caused the proximal stimulus, then this is the distal stimulus that is perceived. For instance, Ullman (1979) showed that the property "being rigid" can be used to determine three-dimensional wire-frame interpretations of dynamic (two-dimensional) proximal stimuli (for several other examples, see Marr, 1982, Chap. 3).

Three possible constraining principles are considered in the following sections for the motion correspondence problem. For each, three questions are briefly addressed: Does the principle entail the use of a general characteristic of distal stimuli? Does experimental evidence suggest that the principle is exploited by the human visual system? Can the principle be successfully applied to the correspondence problem? It is then argued that an adequate model of human motion correspondence processing requires (at least) that all three principles be applied simultaneously.

Nearest Neighbor Principle

An experimental technique called the motion competition paradigm has been used to study how the human visual system solves the correspondence problem (e. g. , Ullman, 1979, Chap. 2). In the simplest motion competition display, two opposing paths of apparent motion (i. e. , two opposing motion correspondence matches) compete with one another for assignment. In Frame 1 of such a display, a single element is presented in the center. In Frame 2, the Frame 1 element has disappeared, and two lateral elements are now displayed, one to the right of center and the other to the left (see

rev984569fig2a.jpg

The nearest neighbor principle governs perceptions of a standard motion competition display. (a and b) The central Frame 1 element is seen to move in the direction consistent with the shorter motion correspondence match. (c) When both possible matches are of equal length, they are equiprobable. (In many cases subjects will report seeing the Frame 1 element split into two. )

Figure 2). Under appropriate temporal conditions, the Frame 1 element is seen to move either to the left or to the right. Of interest are the factors that determine the perceived direction of motion.

In a competition display, a very strong predictor of the perceived direction of motion is element displacement (i. e. , the distance between a Frame 1 element and a potentially corresponding element in Frame 2). The visual system prefers to assign correspondence matches that represent short element displacements (e. g. , Burt & Sperling, 1981; Ullman, 1979, Chap. 2). For example, if motion to the left in a competition display involves a shorter element displacement than motion to the right, then motion to the left will be preferred (Figure 2a). The visual system exploits a "nearest neighbor" principle, in which motion correspondence matches are created between Frame 1 elements and their nearest neighbors in Frame 2.

The nearest neighbor principle is consistent with the geometry of the typical viewing conditions for motion (Ullman, 1979, pp. 114-118). When three-dimensional motion vectors are projected onto a two-dimensional surface (e. g. , the retina), their depth component is lost. As a result, slower two-dimensional movements are much more likely to occur than are faster movements. Because a preference for the nearest neighbor is equivalent to a preference for slowest two-dimensional velocities, it may be that the human visual system exploits this constraint because it has evolved in a visual environment in which low velocities are more frequent than high velocities.

The nearest neighbor principle is used in Ullman's (1979) minimal mapping theory of motion correspondence processing. According to this theory, a cost is associated with each possible motion correspondence match. This cost is proportional to element displacement, so that shorter motion correspondence matches have lower costs. The model selects the set of motion correspondence matches that minimizes the total cost. (The solution must also be consistent with what Ullman called the cover principle, which is discussed in detail later in this article. ) A computer implementation of minimal mapping theory demonstrated, for many displays, that the nearest neighbor principle can be used to emulate the correspondence solutions of the human visual system (an alternative implementation of minimal mapping theory was described by Grzywacz & Yuille, 1988). However, minimal mapping theory cannot generate correct solutions for displays in which element interdependencies can play a role (e. g. , Dawson, 1987; Ramachandran & Anstis, 1985). An additional constraining principle is required for such displays.

Relative Velocity Principle

A major assumption underlying minimal mapping theory is that the cost assigned to any particular motion correspondence match in a display is independent of the costs assigned to any other possible motion correspondence matches (Ullman, 1979, pp. 84-86). This assumption is questionable in principle because the perceptible world consists primarily of coherent surfaces whose properties vary smoothly (i. e. , neighboring points on the surface have, in the vast majority of cases, nearly identical visual properties, as described by Marr, 1982, pp. 44-51). To the extent that visual elements arise from physical features on such surfaces, the movement of neighboring elements should be similar. If there were a general way of characterizing the interdependent properties of the two-dimensional motion of points projected from a coherent surface, the result would be a property that could be used to constrain motion correspondence solutions.

Yuille (1983) provided an account of one such property for an apparent motion display constructed from different views of a moving, continuous contour. This contour is represented as the set of unit vectors tangent to the contour at all its points. A motion correspondence solution for this display describes the transformation that produces the Frame 2 view of the contour from the Frame 1 view. Yuille argued that the desired transformation minimizes the distortion of the contour as it moves. This transformation therefore matches each Frame 1 tangent vector to a Frame 2 tangent vector, minimizing the differences between matched vectors over the entire contour.

Yuille (1983) demonstrated that his measure of figural distortion is very strongly related to another measure called motion smoothness (Hildreth, 1983; Horn & Schunk, 1981). The smoothness of motion is measured by integrating differences between local velocities over the entire moving contour: motion is smoothest when neighboring points on a contour have nearly identical velocities. Hildreth (1983) showed that coherent objects moving arbitrarily in three-dimensional space produce unique, smooth patterns of retinal movement. The relation between figural variation and smoothness indicates that motion correspondence solutions that minimize figural distortion represent unique, physically plausible solutions.

Yuille's (1983) analysis can be used to generate a hypothesis about a constraining property for displays that consist of discrete visual elements (rather than continuous contours). A motion correspondence match between discrete elements can be described as a motion vector because the match is an assertion that a Frame 1 element has moved in a particular direction, at a particular speed, to occupy a new Frame 2 position. It is hypothesized that the visual system selects the set of motion correspondence matches that minimizes the relative velocities between neighboring display elements in an attempt to minimize local changes in the configuration. Relative velocity is defined as the difference between two motion correspondence matches (interpreted as motion vectors) after they have been centered at a common origin (

rev984569fig3a.jpg

Defining the relative velocity between matches. (a) Two of four possible motion correspondence matches for a simple apparent motion display. (b) The same two matches represented as motion vectors centered at a common origin. (The distance between the endpoints of the vectors, indicated by the brace, is the relative velocity between the two matches. )

Figure 3). This constraint is called the relative velocity principle (Dawson, 1986; Dawson & Pylyshyn, 1986, 1988).

Dawson (1987) provided empirical support for the relative velocity principle; this empirical support shows that the independence assumption in minimal mapping theory is incorrect. Human observers were presented competition displays embedded in the context of a moving configuration (

rev984569fig4a.jpg

The effect of element interdependencies on motion correspondence matches. (a) A control competition display in which two correspondence matches are equiprobable. (b and c) When the control display is embedded in an unambiguous context, there is a strong preference to see the Frame 1 component move in the direction of the context (see

Figure 4). The presence of a moving context had a pronounced effect on the perceived direction of the Frame 1 element in comparison with control displays without contexts. There was a strong tendency to see the central element and the context move in the same direction. This demonstrates that element interdependencies are important determinants of motion correspondence matches. This result is also consistent with several others showing that the human visual system minimizes patterns of relative motion for various discrete element displays (e. g. , Cutting & Proffitt, 1982; Gogel, 1974; Johansson, 1950; Proffitt & Cutting, 1979, 1980; Ramachandran & Anstis, 1985, 1986b).

Computer simulations have shown that many motion correspondence problems can be solved through the use of only the relative velocity principle (Dawson, 1986; Dawson & Pylyshyn, 1986, 1988; see also Barnard & Thompson, 1980). Furthermore, when this constraint is applied to displays in which element interdependencies are important (e. g. , Figure 3), the solutions generated are more similar to humans' solutions than those produced by minimal mapping theory. However, the relative velocity principle alone is not sufficient to solve a number of elementary motion correspondence problems that are solved by minimal mapping theory. For example, displays in which there is only a single element in both frames have no relative velocity information at all. This indicates that motion correspondence solutions must be constrained by the use of relative velocity information in combination with other principles (e. g. , nearest neighbor).

Element Integrity Principle

Experimental studies of motion perception suggest that the human visual system prefers one-to-one mappings between elements in different frames of view.

rev984569fig5a.jpg

Preference for one-to-one matchings by the human visual system. (a and b) Correspondence solutions generated by the human visual system. (c) An incorrect solution for Part a generated by minimal mapping theory. (d) An incorrect solution for Part b generated by the relative velocity principle.

Figures 5a and 5b illustrate two examples of this. These displays can pose problems for motion correspondence models that are based on the two principles just described. Figure 5c depicts an incorrect solution for Figure 5a that is generated by minimal mapping theory (Ullman, 1979, p. 99). Figure 5d depicts an incorrect solution for Figure 5b generated by the relative velocity principle (Dawson, 1986, Figure 5,

rev984569fig6a.jpg

Motion correspondence problems. (a) An example of apparent motion display. (b) The four possible motion correspondence matches for the display, labeled from 0 to 3. (c) An autoassociative network for this problem. The labeled circles depict processing units; each unit represents one of the correspondence matches from Part b. Lines represent the weighted connections between processing units. (d) The representation of the network in linear algebra. (The column vector represents the activation values of each processing unit, which vary over time. The square matrix represents the connection weights for the network, which are defined on the basis of the three constraining principles. )

6,

rev984569fig7a.jpg

Examples of the network's performance for some benchmark displays.

7,

rev984569fig8a.jpg

An example of a problematic display. (a) An incorrect correspondence solution generated through the use of the "standard" settings. (b) The correct solution generated by change in the network's settings, as described in the text.

8 and

rev984569fig9a.jpg

Solutions generated for the Ternus configuration. (a) The group motion solution, generated through the use of the standard settings. (b) The element motion solution, generated when the network's preference for short matches is increased. (c) The group motion solution, generated when the distance between elements is reduced from five units to one unit, even when preference for short matches is increased.

9).

The two incorrect solutions depicted in Figure 5 show that for some displays, minimal mapping theory and a relative velocity model include motion correspondence matches that are discarded by the human visual system. To deal with this problem, Ullman (1979, pp. 97-101) modified the minimal mapping theory to include an element integrity principle. According to this principle, the splitting of one element into parts during movement, or the fusing together of different elements into one, should be penalized. Ullman incorporated these penalties into the nearest neighbor cost function and, as a result, extended the range of problems solved by minimal mapping theory.

The element integrity principle is consistent with general assumptions about the physical nature of moving surfaces (see also Marr's [1982, pp. 111-114] discussion of stereopsis). Specifically, proximal stimulus elements are assumed to correspond to parts of coherent, physical stimuli, such as edge segments, physical markings, and so on (see Ullman, 1979, Chap. 2). The physical coherence of surfaces (and therefore of surface parts) suggests that the splitting or the fusing of visual elements is unlikely. In addition, one-to-one mappings between elements over time will be correct everywhere except at surface discontinuities (e. g. , at an occluding edge where different elements may be suddenly appearing or disappearing). However, discontinuities make up a small proportion of scenes and images, and as a result the element integrity principle is likely to be true over most of an image.

Nevertheless, element integrity by itself is a very weak constraining principle. For instance, each of the possible motion correspondence solutions illustrated in Figure 1 is consistent with the element integrity principle. Therefore, this principle alone does not generate unique solutions to the correspondence problem. The utility of this principle, illustrated in Ullman's (1979) modified minimal mapping theory, is that it may select one solution from a set that cannot be differentiated by other constraints, as in Figure 5.

The Need for Multiple Constraints

In the preceding sections, three potential constraining principles for the motion correspondence problem were described. The application of each is supported by empirical studies of motion perception, and each is consistent with general assumptions about the nature of moving surfaces. However, it is clear that none of these constraints are by themselves sufficient to emulate humans' solutions to the motion correspondence problem. The preceding examples show that if one of the constraints is omitted in a model, the model will generate incorrect solutions (i. e. , solutions not generated by the human visual system) for elementary displays. Given this situation, the working hypothesis for the model to be described next was that all three must be applied simultaneously. Once this working hypothesis is adopted, the researcher's task is to design an effective procedure for this simultaneous application.

Part 2: A Motion Correspondence Model

In the following section, a model is presented for solving the motion correspondence problem by simultaneously applying the three constraints just described. The model is an autoassociative network that iteratively modifies the activation pattern of a set of simple interconnected processing units (representing local measurements) until a stable pattern is achieved (representing a global interpretation). The model's properties are described in three steps. First, it is shown how the three constraining principles are used to define the strengths of connections between processing units. Second, rules for iteratively updating the network are described and are analyzed to determine the nature of the stable states to which the network converges. Third, in order to examine the performance of the model, the solutions that it generates are compared with those generated by the human visual system.

Autoassociation and Problems of Underdetermination

Within cognitive science, there is considerable interest in connectionist models of perception and cognition. Connectionist models are networks of simple processing units (e. g. , Clark, 1989; Rumelhart, Hinton, & McClelland, 1986; Smolensky, 1988). A single processing unit is characterized by a numeric activation value, which is changed as a function of the total signal processed by the unit. Connections between processing units in a network are communication channels that transmit numeric signals from one unit to another. A particular connection between two units is defined by a single number representing its strength, which is used to scale the numeric signal that it transmits.

One kind of connectionist model is an autoassociation network. It consists of a set of so-called massively parallel processing units; that is, each processing unit is connected to every other processing unit in the network. The initial pattern of network activation values produces changes in itself through a feedback loop enforced by the massively parallel connections. Given an incomplete initial pattern, an autoassociator can fill in missing details on the basis of pattern knowledge stored in its connections (e. g. , Hinton & Sejnowski, 1986; Hopfield, 1982), and this knowledge can also be used to assign unique category labels to input patterns (e. g. , Anderson & Mozer, 1981; Anderson, Silverstein, Ritz, & Jones, 1977).

An autoassociator can also solve problems of underdetermination by applying constraints (e. g. , Grzywacz & Yuille, 1988). Consider the general structure of such a model for the motion correspondence problem. Assume that there are N Frame 1 and M Frame 2 elements in an apparent motion display. This means that there are N × M possible motion correspondence matches. The model must select a subset of these possible matches as being true of the display. Each possible correspondence match is represented by one of the processing units in the autoassociation network. A match is included in the solution to the correspondence problem if the processing unit that represents it has a high activation value at the end of processing. Otherwise, the match is excluded from the solution.

Defining Connection Strengths

The connections among processing units in the network are defined by whatever constraints are being applied to the problem. For example, imagine that only one constraint is being used. Let Ui and Uj be two processing units in the autoassociation network, each representing a different correspondence match. If both these units represent matches that are consistent with the applied constraint, the connection between them should be assigned a large, positive weight. As a result, high activity in Ui would produce high activity in Uj, and vice versa. If these two units represent matches that are not consistent with the applied constraint, the connection between the two should be assigned a large, negative weight. As a result, high activity in Ui would produce low activity in Uj, and vice versa.

In the computer simulation, the weighted connections between processing units are represented in a square matrix C with N × M rows and columns. Each entry cij in C is a numeric value that represents the connection strength between processing units Ui and Uj. C is a symmetric matrix; that is, cij = cji. Processing units can be connected to themselves; when some constraints are applied, it is possible that cii ? 0. 00.

The connections among units, represented in the constraint matrix C, are defined by a combination of the independently determined influences of the nearest neighbor principle, the relative velocity principle, and the element integrity principle. As a result, it is convenient to describe C as the weighted sum of three connection matrices, each identical to C in size and each defined by one of the three principles being exploited. Of importance is that these three matrices are described for expository purposes only: There is only one set of connections among processing units in the network, and C is its only representation.

Nearest neighbor weights.

Let NN be a square matrix that represents network connections defined by only the nearest neighbor principle. Recall that this principle encourages the selection of shorter motion correspondence matches. NN is defined by the creation of a single excitatory connection between each unit and itself; as a result, NN is a diagonal matrix. The strength of each connection in the diagonal of NN ranges from 0. 00 to 1. 00 and is a function of the length of the correspondence match represented by the unit. The shorter this match is, the stronger is the excitatory connection. Because of this, units representing shorter matches (all other factors being equal) increase their activation values at a faster rate than do units representing longer matches.

An exponential function is used to transform ||mi||, the length of motion correspondence match i (which in principle can range from zero to infinity), into a connection strength within the desired range of 0. 00 to 1. 00. An exponential function was chosen because of Ullman's (1979, pp. 114-118) argument that the probability distribution for patterns of movement projected onto the retinas is exponential in shape. Equation 1 defines the nearest neighbor principle (nn) as operationalized in the computer simulation:

rev984569eq1.gif

The constant a is a positive parameter that defines the preference of the network for short element displacements (i. e. , small values of ||mi||). When a is large, the network has a very strong preference for short element displacements and generates very small connection strengths for long element displacements. When a is small, the preference for short element displacements still exists but is not as strong. In the simulations to be described, a = 0. 25.

Relative velocity weights.

Let RV be a matrix representing network connections defined by only the relative velocity principle. Recall that this principle is used to assign neighboring elements similar movements (i. e. , correspondence matches of similar direction and length). In order to define weights through the use of this principle, two issues must be considered: (a) an operational definition of relative velocity and (b) an operational definition of neighboring.

Relative velocity is the distance between two vectors mi and mj, representing two motion correspondence matches, after they have been centered at a common origin (Figure 3). Because it is a distance measure, it can (in principle) range from zero to infinity. In the model, this distance is transformed into a connection strength limited to the range from -1. 00 to 1. 00. This means (all other factors being equal) that processing units representing similar motion correspondence matches increase each other's activation values. Processing units representing dissimilar motion correspondence matches decrease each other's activation values.

Relative velocity (r?) is transformed to a connection strength in the desired range by the following exponential function:

rev984569eq2.gif

The constant ß in this equation determines the preference of the network for small relative velocities, in the same manner that was described for the constant ß in Equation 1. In the simulation to be described, ß = 0. 25. The value ?ij in Equation 2 is a parameter used to operationalize the notion of neighboring Frame 1 elements. The neighborhood parameter is such that the effect of two Frame 1 elements on one another (with respect to correspondence match assignment) decreases as an exponential function of their separation. This is consistent with Dawson's (1987) finding that the effect of a context decreased exponentially as it was moved farther from a motion competition display.

The neighborhood parameter ?ij is defined as follows: Let x be the coordinates of the Frame 1 element from which mi originates, and let y be the coordinates of the Frame 1 element from which mj originates. The distance between the two Frame 1 elements is therefore ||x - y||. (The neighborhood parameter is not computed if this distance is zero—i. e. , if the two matches originate from the same Frame 1 element. ) The neighborhood parameter is defined by the exponential equation

rev984569eq3.gif

The parameter &epsis; determines the extent to which the assignment of a correspondence match to one Frame 1 element is affected by match assignment to other, distant Frame 1 elements. When &epsis; is large, these distant elements have little effect; their effect increases as &epsis; is decreased. In the simulations to be described, &epsis; = 0. 15.

Element integrity weights.

Let EI be a matrix representing network connections defined by only the element integrity principle. Recall that this constraining property is used to inhibit the splitting or fusing of display elements during movement. EI is a symmetric matrix constructed in the following manner: All pairs of processing units that represent motion correspondence matches emanating from the same Frame 1 element are given mutually inhibitory connections. Specifically, if units Ui and Uj represent two matches originating from the same Frame 1 element (i. e. , a split), eiij and eiji are both set to - 1. 00. Similarly, all pairs of processing units that represent motion correspondence matches that terminate at the same Frame 2 element (i. e. , a fusion) are given mutually inhibitory connections. All other connection strengths represented in EI are assigned values of 0. 00.

Defining the overall connection matrix.

The connection matrix C for the network is defined as the weighted sum of the three individual constraints matrices just described. Specifically,

rev984569eq4.gif

where ?1, ?2, and ?3 are constants specifying the relative importance of each constraining principle and d is a fraction that determines the rate of convergence of the algorithm. In the simulations to be reported, d = 0. 10, and ?1, ?2, and ?3 all = 1. 00.

Because NN, RV, and EI are all symmetric matrices, C must also be symmetric. This ensures that the network converges to a stable state, as described later. Also, C is not a learned matrix, as is typically the case in autoassociative networks (e. g. , Anderson et al. , 1977; Hopfield, 1982). The values of C are determined a priori from the properties of the input display.

Updating the Network

The preceding section demonstrated how the connections among the processing units in the network are defined by the three constraining principles. This section of the article describes how these connections are used to iteratively change the activation values of the processing units, so that eventually these units represent a solution to the motion correspondence problem.

Linear algebra can be used to describe how to iteratively update an autoassociative network capable to identifying the motion correspondence solution that is most consistent with the constraints used to define connection weights. (For an introduction to linear algebra in the context of connectionism, see Jordan, 1986. ) Let a be a column vector of N × M entries, which represent the activation values of every processing unit in the network. The entry ai in this vector represents the activation value of Ui. The state of all the units at time k is represented as ak. Figure 6 illustrates the relation among the apparent motion display, the autoassociative network, and the representation in linear algebra.

Two equations are used to describe how the network is updated. The first describe how a is changed over time:

rev984569eq5.gif

In Equation 5, the matrix W is the connection matrix C plus the identity matrix I and is used as a compact notation to describe network updating. In Appendix A this compact notation is used to prove that the network converges to a stable state.

Equation 5 is the updating rule applied in the "brainstate-in-a-box" model of pattern categorization (e. g. , Anderson & Mozer, 1981; Anderson et al. , 1977). At each processing step k, every unit Ui in the network adds, to its current activation value, the total signal transmitted to it by the units to which it is connected. This total signal is equal to S cij · aj). Because the connection strengths dictate the consistency among the motion correspondence matches, the activation values of the units representing to-be-included matches are strengthened. The activation values of units representing to-be-excluded matches are weakened.

Equation 5 represents a feedback loop that, operating alone, would usually lead processing units to unboundedly increase or decrease their activation value (e. g. , Anderson et al. , 1977, p. 427). A second equation, to be applied after Equation 5 is computed, is required to restrict the growth of the activation values:

rev984569eq6.gif

Equation 6 constrains the growth of a by ensuring that after the network is updated, a has a length of 1. 00. As shown later, the application of this equation also ensures that the network converges to a stable state that represents the set of activation values that are most consistent with the constraints represented in C.

The autoassociative network governed by Equations 5 and 6 is very similar to the model proposed by Anderson et al. (1977). Qualitatively speaking, the network just described is a "brain-state-in-a-sphere. " The vector a is the unit radius of a hyperdimensional sphere. Repeated application of the matrix C rotates a, so that it points to different directions from the origin of the hypersphere. As processing proceeds, a is rotated to point in a stable direction that represents a solution to the problem being solved.

Convergence Properties of the Network

Vector normalization is not the only means by which the growth of activation values in a could be checked. In the brain-state-in-a-box model, activation values are restricted to the range (-1. 00 to 1. 00). Although this approach could be adopted to solve the correspondence problem (e. g. , Dawson, 1988), there are two reasons why it is not ideal. First, although it can be shown that the processing units in the brainstate-in-a-box converge to a stable state (i. e. , a corner of a hyperdimensional box in which every activation value is equal to ±1. 00), this requires that the nonzero eigenvalues of the constraint matrix all be positive (see Anderson et al. , 1977, p. 428). Although this property follows from the learning procedure used by Anderson et al. , it is not necessarily true of C, as described earlier. Second, the brainstate-in-a-box can have a large number of stable states. In order to solve a problem of underdetermination, a network with fewer stable states is desirable.

Through the use of vector normalization to restrict activation value growth, it can be shown that the autoassociation network converges to a stable state. In Equation 5, both I and C are symmetric. Therefore, the matrix W is also symmetric. This ensures that there exists some vector e such that W · e = ? · e, where ? is a real scalar value called an eigenvalue and e is called an eigenvector of W. When W is used to premultiply one of its eigenvectors, the result is that the length (but not the direction) of the eigenvector is changed. Let a0 be the initial activation values of the network, and assume that a0 is normalized to be of unit length. Let e1 be the most dominant eigenvector of W (i. e. , the eigenvector associated with the largest eigenvalue). Appendix A presents a proof that except in special cases, a network updated by the procedures described in Equations 5 and 6 rotates the activation vector a until it converges to either e1 or -e1, depending on the relationship of a0 to e1.

This proof helps establish another important property of the model. The goal for overcoming a problem of underdetermination is to converge to a solution that is unique: A system should produce the same answer to the problem each time that it is presented. The current model converges to a unique solution because the vector e1 is uniquely defined for matrix W. The model fails to generate unique solutions only in rare, special cases in which W has more than one dominant eigenvector (e. g. , when two eigenvectors have the same eigenvalue and this value is greater than all other eigenvalues for W). In this special case, the model always converges to a solution that is a linear composite of the two dominant eigenvectors, and as a result, identical solutions are usually not achieved when a problem is presented repeatedly (see Hall, 1963, pp. 63-66). However, this special situation is rarely encountered. It is highly unlikely that a matrix that represents naturally occurring properties would have more than one dominant eigenvector (W. W. Rozeboom, personal communication, October 15, 1990). Indeed, this situation has not been encountered for any of the displays used to test the performance of the model.

Stable States Are Optimal Problem Solutions

The results just summarized, and detailed in Appendix A, indicate that the most dominant eigenvector (multiplied by 1 or -1) of the constraint matrix W represents the stable state to which the network converges in almost all cases. It is also important to show that this stable state is meaningful with respect to the constraints defined in W. In other words, it must also be shown that this stable state represents the best solution to the problem of underdetermination, given the constraints used to create C.

Hopfield (1982) developed a measure of the cost, or energy, of patterns of activation in a Hopfield net, which is a particular example of an autoassociative network. The iterative processing in a Hopfield net serves to decrease this cost measure until a minimal-energy network state is reached. This minimal-energy state is the optimal response of the network, given the constraints specified in its connections: Changes in any of the final activation values result in a higher-energy state (i. e. , a state that is less consistent with the constraints defined by network connections).

Appendix B develops a minor generalization of Hopfield's (1982) cost measure to be applied to the network characterized by Equations 5 and 6. It is shown that e1, the most dominant eigenvector of the connection matrix, represents the least-energy state of the network, as defined by this cost measure. Thus when the network converges, it has reached a state representing the solution that is most consistent with the constraints defined in C.

Of course, this least-energy state of the network is optimal in another sense. The simulation results to be described indicate that when this state is achieved, the solution represented by the network is, for a wide variety of displays, the same as the solution generated by the human visual system.

Interpreting Converged States

In order for a network to make decisions, there must be a nonlinear component to its processing (e. g. , Blake & Zisserman, 1987, Section 1. 2. 3). The computations summarized in Equations 5 and 6 define a system that is linear. 4 As a result, an additional component is required if the stabilized activity values are to be interpreted as decisions about the inclusion or the exclusion of correspondence matches. Specifically, continuous activity values must be translated into discrete assertions about inclusion or exclusion.

Nonlinear components are characteristic of autoassociation networks. For example, in a Hopfield net (e. g. , Hopfield, 1982), changes in the state of a processing unit require that the total signal to the unit exceed a threshold. Similarly, restricting the range of activation values in the brainstate-in-a-box model introduces a nonlinear processing component (Anderson et al. , 1977).

The nonlinearity that is introduced in the current model is a threshold-testing operation that only occurs after network convergence is achieved. An arbitrary threshold is selected for the network. If ai is greater than or equal to the threshold, then the match represented by processing unit Ui is included in the correspondence solution. Otherwise, the match is not included in the solution. In the simulation just described, the threshold was equal to 0. 13.

Specifying the Starting State

In order for network processing to commence, the initial activation values of the processing units (i. e. , vector a0) must first be specified. In the simulations to be reported, all motion correspondence matches were assumed to be equally likely at time 0. Thus each processing unit was initially assigned the same positive value (1. 0), which was then scaled by normalizing a0 to unit length. In adopting this method, the model is following a procedure similar to that of relaxation labeling (e. g. , Zucker, 1976): Initially, all possible labels (motion correspondence matches) are asserted to be equally true of the display, and as iterative processing proceeds, labels inconsistent with the constraints in C are discarded.

Performance of the Model: Preliminary Remarks

The performance of the motion correspondence model was examined in a series of computer simulations. The only information provided to the model is the numbers of elements in Frame 1 and in Frame 2, as well as the x and y coordinate of each element. Unless otherwise stated, the distance between nearest Frame 1 neighbors in the following figures was five arbitrary units, and the diagrams are drawn to scale. The "standard" settings noted earlier for the equation parameters were used to compute the network's connection weights for each simulation. The network was always initialized so that the processors representing different motion correspondence matches had equal, positive activation values. These values, however, varied from display to display because the vector that represented them was normalized. As a result, initial activation was a function of the number of processors in the network. Iterative processing continued until the network converged upon a solution. The convergence index that was used was the sum of the squared differences in processor activation values from iteration k to iteration k + 1. Convergence occurred when the value of this index reached zero.

Before the network's performance is considered, it is important to place claims about the model in the proper perspective. The model is not proposed as a specific performance theory of human motion correspondence processing. It is highly unlikely that human motion correspondence is performed in exactly the manner dictated by the network. For example, the model has too many degrees of freedom: Several equation parameters can be freely varied, and numerous alternative equations could be derived for each of the constraints. 5

Although the network is not being proposed as a performance theory, it is being proposed as a general framework for human competence in motion correspondence processing. Specifically, the model is presumed to apply the same kind of constraints to the motion correspondence problem as does the human visual system. In general, then, the network's performance reflects the adequacy and the utility of the constraining principles, even if the model uses specific procedures that may differ from those of the human visual system.

When viewed as a working competence theory, the model provides a powerful qualitative tool for generating insights, and for raising questions, about human motion perception. What kinds of solutions can be solved by the simultaneous application of the three constraints? Are there any emergent properties of the network, so that it solves problems that were not considered during its creation? If such qualitative questions are dealt with first, the foundation is laid for later quantitative attempts to model human perceptions of apparent-motion displays (cf. Köhler, 1947/1975, chap. 2; see also Dawson, 1990b).

With this perspective noted, the performance of the model is described in two parts. In the following section, the model is shown to generate the same qualitative solutions as the human visual system to a set of benchmark displays. In Part 3 the simulation's performance on some additional displays is used as a focal point for considering several theoretical issues that have arisen in the study of motion perception.

Performance of the Model: Benchmark Displays

When a model of motion correspondence processing is developed, it is difficult to specify an optimal procedure to test its capabilities. The number of potential displays that the model could process is infinite. The problem is to choose an interesting and informative subset of these potential tests. The strategy adopted to test the current model was to choose a set of socalled benchmark displays to investigate the capabilities of the model. For the most part, these benchmark displays were quite simple, primarily because the qualitative nature of human motion correspondence solutions is known only for relatively simple displays (see, for instance, the many examples given by Kolers, 1972). The particular benchmarks tested were selected for a variety of reasons. Many of the displays posed severe problems for ancestors of the current model (e. g. , Dawson & Pylyshyn, 1986); correct solutions for these displays therefore indicated definite progress in modeling. Some of the displays tested the utility of a particular constraint or posed potential challenges because they violated a constraint exploited by the model. Still others were selected to differentiate the current model from those developed by other researchers. Figure 7 depicts the model's performance on some benchmark displays, which are described in detail in the following sections. In each case, the model assigned the same set of correspondence matches that are assigned by the human visual system.

Single element translation.

Figure 7a illustrates the simplest possible apparent motion display that could be presented to a human observer: a single element presented in different positions in Frames 1 and 2. The model solves this problem easily, converging to the correct solution (i. e. , the solution generated by the human visual system) after only one iteration. This performance is notable only in comparison with that of the model's ancestors. For instance, the network proposed by Dawson and Pylyshyn (1986) exploited only the relative velocity principle and, as a result, could not solve the correspondence problem for this elementary display; when only one element is in the display, no relative velocity information exists.

Multiple element translation.

The model also generates correct solutions when N elements are translating from Frame 1 to Frame 2. This solution can be generated when the elements are moving in parallel (65 iterations for the Figure 7b solution) or when the elements are moving in different directions (41 iterations for the Figure 7c solution). This latter display was presumed to pose a greater challenge for the model because the relative velocity principle cannot be exploited as readily as for the Figure 7b display.

Multiple element rotation.

The model can generate the correct solution when a configuration of elements is rotated about the origin. Figure 7d illustrates a display in which elements are located at the vertices of a square. This square configuration was rotated 10° clockwise about the origin (indicated by the small circle) from Frame 1 to Frame 2. The model required 48 iterations to generate the Figure 7d solution. It was viewed as a challenge for the current model because elements opposite one other across the origin move in opposite directions, which is contrary to the relative velocity principle.

Nearest neighbor sensitivity.

Figure 7e illustrates the model's performance for three motion competition displays. In the first two displays, one Frame 2 element is twice as close as the other to the central Frame 1 element. In both cases, the model assigns the shortest correspondence match after 49 iterations. In the third display, both Frame 2 elements are the same distance from the central Frame 1 element. In this case, the model generates a splitting solution after only one iteration.

The model's solutions to these competition displays are important in two respects. First, they show that the network is implementing a nearest neighbor solution to these problems, as does the human visual system. Second, they show how the model performs when there is an unequal number of elements in the two frames of view. In the first two cases, no motion correspondence match was assigned to one of the Frame 2 elements, which can be interpreted as an assertion that this element suddenly appeared. Human observers often interpret such a display in this fashion. However, this kind of interpretation is not possible for minimal mapping theory because it must exploit the so-called cover principle, which forces a correspondence match to be assigned to every display element. The fact that the current model can function without requiring the cover principle is an important advance and is discussed in detail in Part 3.

Context sensitivity.

The major motivation for incorporating the relative velocity principle into the current model was evidence that such information is an important determinant of motion correspondence matches for human observers (Dawson, 1987). Figure 7f illustrates a solution, generated in 58 iterations, that depends on this principle. The Figure 7f display can be viewed as being identical to the third display of Figure 7e with an additional contextual element that provides disambiguating relative velocity information. The solution generated by the model illustrates that it can generate simple field effects that are not unrelated to those reported by Ramachandran and Anstis (1985). Field effects are discussed in more detail in Part 4.

Motion shear.

Figure 7g depicts a solution for a problem that is difficult for any model that uses the relative velocity principle. In this display, nearest neighbors move in exactly opposite directions. The correct solution was generated after 293 iterations. The fact that it could be generated at all indicates that the nearest neighbor and the element integrity principles combined are capable of overcoming alternative (and incorrect) solutions that are more consistent with the relative velocity principle.

Stationary elements.

Figure 7h illustrates a solution for a degenerate apparent-motion display; in this display, no motion is perceived because the presented elements do not change position over time. The solution that is generated (after 50 iterations) is consistent with human perceptions of stationary elements. This display is a benchmark because it poses tremendous difficulties for certain operationalizations of the constraining principles. For example, instead of defining the relative velocity principles as in Equation 2, one could define relative velocity as the cosine of the angle between neighboring matches (after centering at a common origin). Such a definition has the attractive advantage of providing a natural scaling of relative velocity into the range (-1 to 1) but cannot be applied to stimuli in which one or more vectors have zero length, as is the case in this display.

Necessity of all three constraints.

Although not illustrated in any of the figures, it can be easily shown that the model's performance depends on the application of all three constraints (Dawson & Harder, 1989). If any one of the constraints is removed from C, there will be some displays for which the model will generate solutions that differ from those generated by humans. This is consistent with the arguments made earlier in which experimental evidence was provided for each of the constraints.

Problems with "standard" settings.

Some displays provide problems for the network when the so-called standard settings are used in the equations that define connection strengths. One example of an incorrect solution generated by the standard network is illustrated in Figure 8a. Typically, cases like this can be remedied with some minor changes of the network's settings. For instance, the correct motion correspondence solution of Figure 8b is generated by increasing the system threshold from 0. 13 to 0. 23.

This flexibility is indicative of the many parameters that can be freely varied in the model and would be an undesirable property if the model were to be proposed as an explanation of human motion correspondence. To increase the explanatory power of the model, it must eventually be translated from a general (qualitative) framework for motion correspondence competence to a specific (quantitative) model of motion correspondence performance. The collection of the experimental data required for such a translation is an important component of an ongoing research program (e. g. , Dawson, 1987, 1990a; Dawson & Wright, 1989). However, even in its current qualitative state, the model sheds some interesting light on some specific motion perception issues, as is shown in the following discussion.

Part 3: Motion Correspondence and Motion Perception

The previous section indicated that the motion correspondence model was capable of generating the same qualitative solutions generated by human observers to a variety of benchmark apparent-motion displays. In the section that follows, additional examples of the model's performance are used to focus discussion on a number of theoretical issues that have arisen in the study of motion perception. These issues include the two-process distinction in motion perception, particular motion correspondence modeling assumptions, the role of figural properties in apparent-motion perception, and the relation between motion correspondence and visual attention.

A major theme underlying this part of the article concerns the utility of designing effective procedures for studying psychological phenomena. One tradition in the study of motion correspondence processing, illustrated earlier, is to argue that a small number of constraining principles are exploited by the visual system and then to design a working computer model that implements these constraints (e. g. , Grzywacz & Yuille, 1988; Ullman, 1979). When such a model is constructed, one can determine the extent to which the constraining principles account for phenomena not originally considered when the model was designed. One measure of the model's strength is its ability to account for a wider range of phenomena than was originally intended. Examples of emergent explanations generated by the current model include its ability to generate both versions of the Ternus configuration (Ternus, 1938), its ability to follow the cover principle under certain element displacement conditions, and its ability to generate least-change transformations (see the following section).

In contrast, a second tradition in the study of motion correspondence processing is to use psychophysical experiments to compose a catalog of independent variables that affect which matches are assigned (e. g. , Ramachandran & Anstis, 1986b; Sekuler et al. , 1990). This tradition fulfils the important role of refining descriptions of correspondence processing. However, it does little to offer explanations of how this processing actually occurs. For example, Ramachandran and Anstis (1986b) proposed that the visual system applies a set of strategies or heuristics "from what is in effect a bag of tricks" (p. 102). However, they offered few concrete proposals for how these strategies could be realized as an effective procedure, a step that many authors believe is necessary to provide explanations of phenomena (e. g. , Johnson-Laird, 1983, pp. 4-6). Furthermore, experimental results may provide misleading information about what such a bag of tricks may contain. For instance, after reviewing the experimental evidence, Attneave (1974) proposed that one rule governing correspondence match assignment is a preference for symmetric matches. However, such a rule need not be explicitly implemented. Ullman's (1979) minimal mapping theory and the current model (see Figure 8) can generate symmetric patterns of matches without explicitly assuming a symmetry rule. This kind of discovery is possible only when one explores processing with explicit proposals about effective procedures.

The Two-Process Distinction in Motion Perception

In recent years, much of the research on human motion perception has been guided by the putative distinction between two motion perception systems (for reviews, see Anstis, 1980, 1986; Braddick, 1980; Petersik, 1989). The first, called the short-range motion system, is thought to detect movements involving short element displacements (e. g. , 15-30 minutes of visual angle) and brief temporal intervals (e. g. , interstimulus intervals of 40 ms or less). Anstis (1980) proposed that the short-range system detects motion before the extraction of figural properties. As a result, it is typically modeled as some form of spatiotemporal correlation between image intensities that vary continuously over time (e. g. , the correlator class of detector proposed by Reichardt, 1961; for related models, see Adelson & Bergen, 1985; Burr, Ross, & Morrone, 1986; Dawson & Di Lollo, 1990; Farrell & Kesler, 1988; Marr & Ullman, 1981; Morgan & Watt, 1983; van Santen & Sperling, 1984, 1985; Watson & Ahumada, 1985).

The second process, called the long-range motion system, is thought to detect movement involving much longer element displacements (e. g. , several degrees of visual angle) and temporal intervals (e. g. , interstimulus intervals of more than 100 ms). The long-range system is usually proposed as the system that mediates the perception of classical apparent motion (i. e. , the motion considered by Kolers, 1972), and must solve the correspondence problem, as was assumed earlier in this article. Anstis (1980) proposed that the long-range system detects motion after some figural properties have been extracted from the stimulus. Accordingly, it is typically modeled as the matching of discrete tokens over time, whereby each token represents the properties of an individuated element (e. g. , Ullman, 1979, 1981).

Recently, researchers have questioned the validity of the two-process distinction. Many of the classical differences between the two systems have not stood up to experimental inquiry (for a review, see Cavanagh & Mather, 1989; the contrasting view was presented by Petersik, 1989). Cavanagh and Mather generalized the short- and long-range distinction to one between first and second-order motion but then proceeded to argue that first- and second-order motion detectors do not differ qualitatively.

Researchers who have argued against a qualitative distinction between the short- and long-range motion systems have also proposed that all motion perception can be modeled with some variation of a Reichardt detector (e. g. , Cavanagh & Mather, 1989). Token-based schemes are viewed skeptically. Adelson and Bergen (1985, p. 284) criticized feature-based schemes for not making precise claims about which figural properties are explicitly represented. Ramachandran and Cavanagh (1987) asked, "How does the visual system know which spot goes with which? Our answer is that the visual system doesn't care"; instead, the short-range motion signal derived from the low spatial frequencies of the stimulus is "spontaneously attributed to the spots themselves" (p. 105).

One problem with this position is that it mistakenly equates the detection of element motion with the maintenance of element identity. Typical correlator models (e. g. , Dawson & Di Lollo, 1990; Reichardt, 1961; van Santen & Sperling, 1984) generate a numeric value that can be interpreted as asserting that "motion to the left was detected in the display" or "no motion was detected in the display. " However, these assertions are quite different from the assertion that "Frame 1 element x and Frame 2 element y are the same entity," which in some cases may be completely unrelated to movement. For instance, the states of the motion correspondence network represent assertions about element identities when no movement has occurred (e. g. , Figure 7a). Similarly, motion signals can be generated for displays in which elements may not have been individuated (e. g. , Daugman, 1988; Mather, 1984).

A token-based model—the motion correspondence network—can generate both long- and short-range identity matches for a bistable display. The Ternus configuration (Ternus, 1938) has long been studied by apparent-motion researchers. The display consists of a group of three elements that are translated in one direction from Frame 1 to Frame 2. The amount of translation is such that two stimulus locations in both Frame 1 and Frame 2 always represent the position of an element (see Figure 9). The Ternus configuration can support two motion percepts. One is the group motion percept, in which all three elements are perceived to translate in one direction as a whole group (Figure 9a). The other is the element motion percept, in which two elements remain stationary while the third moves from one end of the group to the other (Figure 9b). In the context of the two-process distinction, researchers have argued that the long-range system generates the group motion percept and that the short-range system generates the element motion percept (e. g. , Braddick & Adlard, 1978; Pantle & Picciano, 1976).

When the standard settings for the motion correspondence model are used, the model generates correspondence matches that are consistent with the group motion percept for the Ternus configuration (Figure 9a). This is not surprising, given the contention that the network simulates one component of the long-range system. However, a minor adjustment of the model's settings, to increase its preference for short correspondence matches, results in the network's generating a solution that has been ascribed to the short-range system. The matches consistent with element motion were produced after the value of a in Equation 1 was increased from 0. 25 to 0. 50 (Figure 9b).

The motion correspondence model also provides an additional account of another Ternus display regularity. Breitmeyer and Ritter (1986) argued that group motion results when there is a reduction in the visible persistence of activation produced by Frame 1 of the configuration. The results of several experiments have supported this argument. For instance, the visible persistence of visual elements is known to decrease substantially when display elements are very near one another (e. g. , Di Lollo & Hogben, 1987). When the distance between Ternus configuration elements is decreased, the group motion percept predominates (Breitmeyer & Ritter, 1986; Petersik, 1986). This finding supports the visible persistence hypothesis. However, it is also consistent with the assumptions underlying the correspondence network: When the distance between elements is decreased from five units to one unit, group motion correspondence matches are assigned, even when a = 0. 50 (Figure 9c). This is because decreasing the distance between elements in Frame 1 increases the effect of the relative velocity principle through the neighborhood parameter in Equation 2.

The fact that relatively minor changes in network settings can produce both correspondence solutions to the Ternus configuration is at first glance consistent with Cavanagh and Mather's (1989) contention that separate motion detection systems may differ quantitatively but not qualitatively. However, the network's performance does not entail a rejection of a two-process distinction. This is because although the network maintains the identity of moving elements, it does not represent their movement. Later in this article it is argued that the two-process distinction may not be between different motion perception systems but instead may be between a low-level motion detector and an attentional tracking system that is part of visual cognition (see also Marr, 1982, pp. 202-204; Petersik, 1989).

Cover Principle

Ullman's (1979) minimal mapping theory exploits the nearest neighbor principle in conjunction with a second constraint, called the cover principle. The cover principle requires that a valid correspondence solution account for ("cover") every Frame 1 and Frame 2 element with at least one motion correspondence match. In other words, this constraint stipulates that Frame 1 elements cannot suddenly disappear and that Frame 2 elements cannot suddenly appear.

In minimal mapping theory, the cover principle serves a special function that differentiates it from other constraints. It forces a system governed by the nearest neighbor principle to include at least some matches in a correspondence solution. Without this principle, Ullman's (1979) model would produce the zero-cost solution that does not include any motion correspondence matches at all.

The special function of the cover principle is reflected in how it is applied in minimal mapping theory. Other constraints are included in the cost function that is minimized. Because of this, these constraints are what connectionists would call "weak"; they can be violated to a certain extent if the result is a better overall solution. The cover principle is not part of the cost function but instead defines a set of necessary conditions on cost minimization. Thus the cover principle is a "strong" constraint: it cannot be violated.

Adoption of the cover principle could be defended on grounds similar to those used to defend the element integrity principle (e. g. , Dawson & Pylyshyn, 1988). However, evidence shows that the cover principle can be violated by the human visual system. For instance, Figures 2a and 2b represent displays in which one element is seen to move in one direction while a second element suddenly appears. This sudden appearance is a violation of the cover principle.

Element displacement appears to determine whether the cover principle is followed.

rev984569fig10a.jpg

The cover principle and element displacement. (a) When element displacement is small, all six Frame 2 elements are covered by motion correspondence matches. (b) When element displacement is doubled, three of the Frame 2 elements are not covered by matches; they are seen by human subjects to suddenly appear.

Figure 10 illustrates two displays in which three elements are presented in the same locations in both Frames 1 and 2 and another three elements are presented only in Frame 2. If this second set of elements is close to the first set (short element displacement), then all Frame 2 elements are covered by motion correspondence matches: Human subjects see three elements move from behind the three stationary elements (Figure 10a). If this second set of elements is displaced farther from the first set, then not all Frame 2 elements are covered: Human subjects see three stationary elements, followed by the sudden appearance of another three elements (Figure 10b).

Violations of the cover principle clearly indicate that it should not be a strong constraint on motion correspondence processing. Ideally, weak constraints should be sufficient to identify a unique and correct solution. The motion correspondence network does not apply the cover principle because it does not specify that a particular subset of processing units must have high activation values. Yet it generates appropriate "covers" in the solutions to which it converges—it generates both of the Figure 10 solutions—and in so doing is sensitive to element displacement.

Least-Change Transformations

The projected movement of rigid objects in three-dimensional space should produce two-dimensional patterns of movement that are consistent with the nearest neighbor, relative velocity, and element integrity principles. As a result, when these three principles are applied to a proximal stimulus produced by the movement of a rigid distal stimulus, the resulting correspondence solution should be physically plausible. In particular, the assigned correspondence matches should not contradict the rigid structure of the distal stimulus (see Dawson & Pylyshyn, 1988).

In some cases, however, there is more than one way to perceive motion and preserve physical plausibility. Consider the three displays presented in

rev984569fig11a.jpg

Three examples of least-transformation solutions. (The outline circle represents an origin about which the group of three elements could be described as being rotated by [a] 45°, [b] 135°, or [c] 180°. See text for details. )

Figure 11. Each of these could be geometrically described, and plausibly perceived, as a rigid clockwise rotation of three points about the marked origin. Whereas motion correspondence matches consistent with this interpretation are generated by the model for Figure 11a, alternative (physically plausible) solutions are produced for the other two displays.

The Figure 11b solution is consistent with the perception of the three elements translating downward, accompanied by a 45° clockwise rotation about the middle point. The Figure 11c solution is consistent with the perception of the three elements translating downward. (The model does not generate element trajectories per se; it generates only identity matches that can be viewed as being consistent with trajectory interpretations. ) The alternative matches are assigned for the Figure 11b and 11c displays because they produce lower total element displacement and relative velocity than do the matches that are consistent with a rigid clockwise rotation. Thus when the model is faced with choosing between physically plausible interpretations, it selects the interpretation that represents a "least-change" transformation: the interpretation that produces the least cost with respect to the three applied constraints.

The human visual system also prefers least-change transformations of this type (e. g. , Dawson & Pylyshyn, 1988; Farrell & Shepard, 1981; Shepard & Judd, 1976). This has led some researchers to propose models of apparent-motion perception that apply explicit principles of least change (e. g. , Caelli & Dodwell, 1980; Foster, 1978; Mori, 1982; Restle, 1979; Shepard, 1984).

For example, Shepard (1978, 1981, 1982, 1984) proposed a geometric theory of mental representation that can be applied to apparent-motion perception. The basic assumption of the theory is that object motion is represented as a path of activation on a representational surface called a manifold. In general, each location on the manifold represents an object in a particular position and orientation in three-dimensional space. Shepard proposed that the visual system processes apparent-motion displays as follows: Frames 1 and 2 of an apparent-motion display produce two points of activation on a specific representational manifold. One point of activation represents the figural appearance of Frame 1; the other represents the figural appearance of Frame 2. The illusion of motion is then produced by a spread of activation from the first manifold point to the second: The path of activation from the Frame 1 point to the Frame 2 point represents the appearance of the object as it moves.

However, there are infinitely many paths between the activation that represent the two frames of view. Therefore, Shepard (e. g. , 1982, 1984) assumed a "minimum principle": The visual system selects the shortest path between the two activation points. In addition, however, all points of the path must lie on the manifold because the manifold is defined by the set of physically plausible transformations that could be applied to represented objects. If the path was not on the manifold, a physically implausible transformation would be represented.

In comparison with the current model, Shepard's (1978, 1981, 1982, 1984) manifold approach has the advantage of providing an explicit account of how objects change appearance during apparent movement. However, the manifold model still requires the correspondence problem to be solved. For example, an apparent-motion display composed of several objects of the same type would result in several Frame 1 and Frame 2 activations on a single manifold. The correct correspondence matches would therefore be required to direct the spread of activation between appropriate manifold locations. This suggests that solving the correspondence problem and generating the figural appearance of moving objects are functionally different components of apparent-motion perception.

Evidence does support a functional dissociation between the processes that assign correspondence matches and those that generate object appearances in apparent motion. Although correspondence matches are assigned in a two-dimensional coordinate system (e. g. , Green & Odom, 1986; Mutch, Smith, & Yonas, 1983; Tarr & Pinker, 1985; Ullman, 1978), the appearance or the quality of apparent motion is sensitive to manipulations of apparent three-dimensional depth (e. g. , Attneave & Block, 1973; Corbin, 1942). Apparent-motion quality can be affected by manipulations of the meaningfulness, the consistency, or the familiarity of a display (e. g. , Jones & Bruner, 1954), but this is not the case for correspondence match assignment (Dawson & Wright, 1989). Similarly, the processes that generate the appearance of apparent-motion trajectories are sensitive to higher order stimulus variables, such as the topological structure of elements; motion correspondence processes are not (Dawson, 1990a).

These results indicate that although both the current model and the manifold approaches propose least-change transformations, they do so for different aspects of motion perception. Under the assumption that motion correspondence processing is a component of Braddick's (1980) long-range motion perception system, this indication leads to the speculation that the long-range system has at least two