Alignment-based Nonmonotonicities in Similarity

Robert L. Goldstone

Indiana University







Running head: SIMILARITY AND ALIGNMENT

Abstract

According to the assumption of monotonicity in similarity judgments, adding a shared feature in common to two items should should never decrease their similarity. Violations of monotonicity are not predicted by feature- or dimension-based models but can be accommodated by alignment-based models in which the parts of one compared display are placed in correspondence with the parts of the other display. In two experiments, evidence for nonmonotonicities is obtained that is generally consistent with the alignment-based model SIAM (Similarity as Interactive Activation and Mapping; R. L. Goldstone, 1994). The calculation of similarity in this model involves an interactive activation process whereby correspondences between the parts of compared displays mutually and concurrently influence each other. As SIAM predicts, the occurrence of nonmonotonicities depends on the perceptual similarity of features and the duration of presented comparisons.

Alignment-based Nonmonotonicities in Similarity

Both feature-based (Tversky, 1977) and dimension-based (Caroll & Wish, 1974) models of similarity assume monotonicity; that is, they assume that adding a shared feature in common to two items should increase or leave unchanged, but should never decrease, their similarity. The current experiments demonstrate violations of monotonicity. In doing so, the experiments provide evidence against models that base similarity on the overlap or distance between simple representations, and provide evidence for models that compute similarity by a process of aligning the parts of structured representations.

In order to show that there is a true nonmonotonic relation between matching features and similarity, it is necessary to show that no reasonable interpretation of the stimuli can yield a monotonic relation. Some apparent violations of monotonicity can be accommodated by feature-based models if certain featural descriptions are invoked. For example, it is easy to imagine a situation where experiment participants judge XX to be more similar to YY than to XY (Goldstone, Gentner, & Medin, 1989; Markman & Gentner, 1993a; Medin, Goldstone, & Gentner, 1990, 1993). Although this might appear to violate monotonicity (adding an "X" feature match decreases similarity), a feature-based model can handle this situation if abstract features such as "contains two identical shapes" are permitted (Tversky, 1977). Such abstract features do appear to be used by people making similarity judgments (Gentner & Markman, 1995; Goldstone, Medin, & Gentner, 1991; Markman & Gentner, 1993a). The situation only represents a nonmonotonicity because the set of features used by the experimenter does not conform to those used by the participants. The experiments presented here demonstrate nonmonotonicities that are not explainable in terms of participants' use of abstract features.

Alignment-based Models of Similarity

Although there are important differences between dimension-based and feature-based approaches to similarity, there is also a significant commonality. Neither approach takes into account alignment, the process of creating interdependent correspondences between the parts of compared entities, in determining similarity. Previous research (Gentner & Markman, 1994, 1995; Gentner & Ratterman, 1991; Goldstone, 1994; Goldstone & Medin, 1994a, 1994b; Markman & Gentner, 1993a, 1993b; Medin et al, 1993) has shown that there are strong influences of alignment on similarity. Properties shared by objects increase similarity more if the properties belong to parts of the objects that correspond well to each other. For example, the similarity of a dog and a wolf is increased more by a matching color found on corresponding parts (e.g. black tails on both animals) than by a matching color found on noncorresponding parts (e.g. a white paw on the wolf and a white tail on the dog).

In general, display elements will tend to be placed in correspondence if they are similar to each other, if they play the same role within their respective entities, and if they are consistent with other correspondences. The constraints on correspondences are directly borrowed from work in analogy (Clement & Gentner, 1991; Gentner, 1983, 1989; Gentner & Toupin, 1986; Holyoak & Thagard, 1989; Hummel, Holyoak, & Burns, 1994; Spellman & Holyoak, 1992; Wharton, Holyoak, Downing, Lange, Wickens, & Melz, 1994), and more distantly, from work in the perception of depth and motion (Dawson, 1991; Marr & Poggio, 1979, Ullman, 1979). As hypothesized by researchers in analogical reasoning (Gentner, 1983; Holyoak & Thagard, 1989), the first way for correspondences to be consistent with each other is if the correspondences do not place an element from one display into correspondence with more than one element from the other display (Gentner's one-to-one mapping constraint). The second way for correspondences to be consistent is if they place related elements into alignment with other elements that have the same relation. By this notion of "parallel connectivity" (Markman & Gentner, 1993a, 1993b), the correspondence between bowling pins and an archery target is consistent with the correspondence between a bowling ball and arrows (Goldstone & Medin, 1994b).

Although featural and dimensional models have a primitive notion of correspondence (identical features or dimensions correspond to each other), they do not have the notion that correspondences may influence each other. As will be seen, alignment-based models of similarity can predict nonmonotonicities in similarity because the notion of consistency between correspondences implies that correspondences may influence each other. Adding a feature match between two displays may decrease their similarity if the feature match is not consistent with other well-aligned feature matches. In Figure 1, the different letters in the bodies of the stylized butterflies refer to different colors. When the top display of two butterflies is compared to the display labelled "XYÆYB," the matching Y color between the displays is inconsistent with the other feature matches; the matching Y colors occur between butterflies that do not correspond well according to other features such as "wing shading," "head style", and "tail style." As such, this matching Y feature may interfere with the creation of strong correspondences between highly similar butterflies, thereby decreasing similarity. This nonmonotonicity would be evidenced by systematically lower similarity ratings between the starting display and the display labelled "XYÆYB" than between the starting display and "XYÆAB."

The notion that one alignment may be inconsistent with other alignments only arises because the representations of the displays are more structured than simple sets of features; the display is hierarchically composed of parts (butterflies) that contain features (e.g. body color and wing shading). Alignments between two displays' features are consistent if they place the same butterflies into alignment; alignments between features are inconsistent if they result in one butterfly part being aligned with two other butterflies. When a person creates structured representations for displays, procedures for their alignment will be required, thus complicating the process of judging similarity. However, in many cases the cost of executing these alignment procedures will be more than compensated by the person's ability to represent complex entities efficiently and to easily grasp their structural commonalities.

The Similarity as Interactive Activation and Mapping (SIAM) Model

A Brief Description of SIAM

The research reported here tests predictions made by a recent alignment-based model of similarity. The model, inspired by work in analogical reasoning (Falkenhainer, Forbus, & Gentner, 1989; Gentner, 1983, Holyoak & Thagard, 1989) and interactive activation models of perception (McClelland & Elman, 1986; McClelland & Rumelhart, 1981), is based on the principle that determining the similarity of structured displays requires putting the displays' parts into alignment, and that these alignments mutually and simultaneous affect each other. For the present purposes, a display is structured if it is composed of multiple objects that are related to each other, and each object is composed of multiple features. Complete descriptions of SIAM are provided elsewhere (Goldstone, 1994; Goldstone & Medin, 1994a). The primary processing unit is the node. Nodes send and receive activation from other nodes. As with Holyoak and Thagard's Analogical Constraint Mapping Engine (ACME), nodes represent hypotheses that two entities correspond to one another in two displays. In SIAM, there are two types of nodes: feature-to-feature nodes and object-to-object nodes.

Feature-to-feature nodes each represent a hypothesis that two features correspond to each other. There will be one node for every pair of features that belong to the same dimension (e.g. "white" and "black" both belong to the "color" dimension). The activation of a feature-to-feature node reflects the strength of correspondence between the two features referenced by the node. In addition to activation, feature-to-feature nodes also have a "Match/Mismatch value"- a number between 0 and 1 that indicates the similarity of the two features' values on a given dimension. For example, a node that places orange and red colors into correspondence may have a Match value of 0.90, but the Match value may be only 0.40 if orange and blue colors are aligned. The Match value decreases monotonically as the similarity of two values decreases (Medin & Schaffer, 1978). Likewise, each Object-to-object node represents a hypothesis that two objects correspond to one another.

At a broad level, SIAM works by first creating correspondences between the features of displays. At first, SIAM has "no idea" what objects belong together. Once features begin to be placed into correspondence, SIAM begins to place objects into correspondence that are consistent with the feature correspondences. Once objects begin to be put in correspondence, activation is fed back down to the feature (mis)matches that are consistent with the object alignments. In this way, object correspondences influence activation of feature correspondences at the same time that feature correspondences influence the activation of object correspondences. This local-to-global processing principle is also found in Falkenhainer et al's (1989) and Holyoak et al's (1989) models.

As in ACME and McClelland and Rumelhart's original work, activation spreads in SIAM by two principles: 1) nodes that are consistent send excitatory activation to each other, and 2) nodes that are inconsistent inhibit each another. Nodes are inconsistent if they create two-to-one alignments -- if two elements from one display would be placed into correspondence with one element of the other display. Feature-to-feature nodes also excite and are excited by Object-to-object nodes. For example, the node that places Object A in correspondence with Object C is excited by the node that places a feature of A into correspondence with a feature of C. Processing in SIAM starts with a description of the displays to be compared. Displays are described in terms of objects that contain feature slots that are filled with particular feature values. Processing consists of activation passing. On each time cycle, activation spreads between nodes. Network activity starts by features being placed in correspondence according to their perceptual similarity, as determined by match values. Subsequently, nodes send activation to each other for a specified number of time cycles, in a manner specified by Goldstone (1994). The network's pattern of activation determines both the perceived similarity of the displays and the alignment of the displays' features and objects. Nodes that have high activity will be weighted highly in the similarity assessment and their elements will tend to be placed in alignment.

At each time cycle, the similarity of the two displays is computed by

similarity = ,

a simplified version of the similarity formula used by Goldstone (1994), where n is the number of feature-to-feature nodes required to represent two displays (n = FO2 where F is the number of features in an object, and O is the number of objects in the displays), Ai is the activation of node i (0 £ Ai £ 1), and Mi is the match value associated with node i (0 £ Mi £ 1). Node activations, but not match values, change with processing. Generally speaking, nodes that represent correspondences that are consistent with many other strong correspondences will have their activation increased. Thus, similarity will be influenced by the perceptually determined featural similarities between display elements, but it will also be influenced by the activation or attention given to these similarities.

Predicted Nonmonotonicities in Similarity

The purpose of the reported experiments is to explore a qualitative prediction of SIAM. The three experiments to be reported test SIAM's prediction of a nonmonotonic relation between shared features and judged similarity. Alignment-based models of similarity can predict that adding shared features to two displays decreases similarity because these features may compete against other feature matches. If strong alignments are created between highly similar parts of two displays, then adding feature matches that compete against these strong alignments may decrease the importance of the features that are brought into correspondence by the best alignment. An example of such a feature match is the Y color match in Figure 1 between the starting display and Display "XY Æ YB." Gentner and Toupin (1986) call a situation with two alignments that are inconsistent a "crossmapping."

As the modeling sections describe, whether SIAM predicts nonmonotonicities when there are crossmappings depends on the particular values given to model parameters. Two influential parameters are match value and processing time. As described earlier, every feature-to-feature correspondence node has an activation value that represents correspondence strength and a (mis)match value that represents the perceptual similarity of features placed in correspondence. Experimentally, mismatch values can be manipulated by altering the physical similarity of features (Experiment 1). Processing time can be experimentally manipulated by allowing participants to process a similarity comparison for varying amounts of time (Experiments 2A and 2B).

Experiment 1

Essentially, SIAM predicts that a nonmonotonic relation between shared features in two displays and display similarity can arise if the shared features belong to poorly aligned objects. As described earlier, the Y feature match shared between the displays labeled "Starting display" and "XY Æ YB" does not belong to optimally aligned butterflies. A nonmonotonic relation between shared features and similarity arises if these two displays receive a lower similarity rating than do the starting display and the display labeled "XY Æ AB." Assuming proper stimulus construction and color randomization, such a result would indicate that replacing the A feature (not present in the starting display) with the Y feature (present in the starting display) decreases similarity.

______________________

Insert Figure 1 about here

______________________

In SIAM, replacing the A feature with the Y feature in the display that is compared to the starting display has two effects; one effect increases similarity and the other decreases similarity. The substitution will increase the physical match value, Mi, of the node that represents the correspondence between colors of butterflies with the Y feature. SIAM's similarity estimate is based on the featural similarity, denoted by Mi values, between the displays. However, adding the Y feature match will also alter Ai (node activation) values. In particular, the shared Y feature will tend to make SIAM increase the activation of nodes that align dissimilar butterflies. In turn, as these nodes become activated, they will decrease the activation of nodes that place optimally aligned features and objects in correspondence.

SIAM usually predicts monotonicity because the similarity gain caused by increasing Mi values is larger than the similarity loss resulting from increasing the influence of mismatching features and decreasing the influence of matching features. However, depending on model parameters, SIAM can produce nonmonotonicities. In particular, specific Mi values can produce nonmonotonicities, as will be described in the Discussion. Mi values denote the physical similarity of nonidentical features. In Experiment 1 the physical similarity between nonidentical features was explicitly manipulated in order to explore the influence of Mi on nonmonotonicity.

Method

Participants. Forty-four undergraduate students from Indiana University participated in the experiment in order to fulfill a course requirement.

Materials. Participants saw 200 trials on Macintosh SI screens. Each trial contained four butterflies - two on either side of a black vertical bar subtending the middle of the screen. A display consisted of two butterflies. Each butterfly was composed of four features: Wing shading (twenty-two different values including striped, spotted, checkerboard, black, brick, etc.), Head style (triangle, square, circle, or M-shaped), Tail style (radiating lines, zig-zag, cross lines, or line with ball), and body color. The display area was 17 cm. high by 21 cm. across. Each individual butterfly was approximately 6 cm. by 4 cm. Viewing distance was not controlled but was approximately 60 cm.

Colors for different butterflies within a display were chosen to be either very similar, somewhat similar, or dissimilar. All colors were constrained to have CIE (Commission Internationale de l'Eclairage) 1976 u' values between .2 and .6, and v' values between .1 and .7. A rough measure of color similarity can be obtained by measuring the Euclidean distance in CIE space between two colors. The Euclidean CIE distance between colors designated as highly similar ranged from 0.026 units to 0.045 units. The distance between moderately similar colors ranged from .058 to .121. The distance between dissimilar colors distance ranged from 0.152 to 0.794. These ranges can be converted to approximate dominant wavelength differences if a reference color with a CIE u' value of .2105 and a v' value of .4737 is selected. When this is done, the wavelength differences for similar, somewhat similar, and dissimilar colors average to 39 nm, 83 nm, and 174 nm respectively. The difference between typical red and orange hues is about 100 nm.

Design. On each trial, one display composed of two butterflies was constructed (the "starting display") and the other display (the "changed display") was constructed by selectively changing color features of the starting display. The terms "starting display" and "changed display" refers only to how the displays are designed, and not to their order of presentation. The letters X, Y, A, and B denote different body colors. The two butterflies within a display were always given different values on the other three dimensions of wing shading, head style, and tail style.

Colors were altered in one of the six ways shown in Figure 1. The method labeled "XY Æ XY" simply duplicates the starting display to create the changed display. The method labeled "XY Æ YX" swaps the body colors of the starting display's butterflies in creating the changed display (as indicated by the reversed order of letters in the label). Thus, for this trial, the same colors are present in the two displays, but matching colors belong to poorly aligned butterflies. The optimal alignment for a butterfly is the alignment that is part of the consistent set of correspondences between two displays that maximizes the number of matching features between corresponding objects. "XYÆYB" introduces one new color B and has one matching color Y in common with the starting display. The matching color belongs to dissimilar, poorly aligned butterflies. "XYÆYB" also introduces one new color B, but now the matching Y color occurs between well aligned, similar butterflies. The Display "XYÆXX" yields one color match between similar butterflies and one color match between dissimilar butterflies. Finally, the display "XYÆAB" yields no color matches at all. The description for each of the changed displays is given in Table 1.

______________________

Insert Table 1 about here

______________________

When a particular trial was shown, the following factors were randomized: the left/right position of the starting and changed displays; the color values represented by the letters A, B, X, and Y; the particular values on the three non-color dimensions for the two butterflies in the starting display; the similarity of colors A, B, X, and Y; and the changed display that was shown.

The physical location of the butterflies could very likely act as a cue for aligning the butterflies from one display onto another. To control for this, three different spatial layouts were used. In the "same positions" layout, butterflies that corresponded to each other (according to their feature overlap) were placed in the same relative locations in their respective displays. In the "opposite positions" layout, butterflies that did not correspond to each other were placed in the same relative locations. In the "unrelated positions" layout, neither butterfly of one display had the same relative location as either of the butterflies of the other display. Within a display, the two butterflies were always placed diagonal to each other.

Procedure. Each trial began with the simultaneous presentation of the initial display and the changed display. The participants' task was to rate the two displays' similarity on a scale from one to nine. A rating of one indicated very low similarity; a rating of nine indicated very high similarity. It was stressed to the participants that they should rate the similarity of the whole left display to the whole right display. The screen was erased and participants proceeded to the next trial after the participants' rating was displayed for two seconds.

Results

The average similarity ratings for the six displays compared to the starting display, at each of the three levels of similarity, are shown in Table 1. On average, similarity rating differences of more than 0.13 were significant at a level of p<.05 in planned comparisons between individual cell means. Overall, there was a strong influence both of trial type (6 levels corresponding to the 6 changed displays in Figure 1), F (5,215) = 9.3, p<.05, MSe=0.06, and of color similarity, F(2,86)=7.4, p<.05, MSe=0.04, and also a significant interaction between these factors, F(10,430)=2.4, p<.05, MSe=0.07. Using Fisher's post-hoc probabilistic least significant difference (PLSD) adjustment, similarity ratings for all trial types except the following two pairs were significantly different at a level of p<.05: XY Æ XB and XY ÆXX, and XY ÆYB and XY ÆAB. Using the same adjustment, all three color similarity conditions were significantly different from each other.

Some individual cell mean comparisons showed a significant nonmonotonicity, using a Fisher's PLSD criteria of p<.05. The trial with XY Æ YB obtained a significantly lower similarity rating than the trial with XY Æ AB for the intermediate level of color similarity. The relation between these two trials was significant in the opposite direction for low and high color similarity. In addition, XY Æ XX received a significantly lower similarity rating than XY Æ XB for the intermediate level of color similarity. The difference between these trials was not significant for the other two levels of color similarity.

The average similarity ratings given for trials with butterflies in same, unrelated, and opposite positions were 6.65, 6.36, and 6.29 respectively. The overall difference between these ratings was significant, F(2,86)=9.2, p<.05, MSe=0.03, and each pair of means differed significantly. As such, when featurally corresponding butterflies are in corresponding positions, similarity is maximized. When poorly aligned butterflies are in corresponding positions, similarity is reduced relative to the control condition of unrelated positions. One reason for this latter result, consistent with an alignment-based approach, is that spatially determined correspondences may induce poorly aligned butterflies to be placed in correspondence, thereby interfering with the construction of optimal alignments.

The variability of similarity ratings for the six trials can potentially provide useful information in diagnosing the cause of the nonmonotonicity. Under one view, nonmonotonicities are due to unaligned feature matches competing against proper alignments, decreasing the weight given to the properly aligned feature matches in similarity. This is the account given by the SIAM model. Under another view, participants create one of two sets of correspondences between display parts, and feature matches only increase similarity when they occur between parts that are subjectively aligned. Assessed similarity is expected to be higher when optimal alignments are formed because these alignments maximize the number of feature matches that are incorporated into the judgment. Furthermore, it is assumed that participants are more likely to create sub-optimal alignments when there are feature matches between poorly aligned objects. Consequently, low similarity ratings for trials with unaligned feature matches may be due to a mixture of two different types of trials -- trials where participants make optimal alignments and produce high similarity ratings, and trials where participants make suboptimal alignments and produce low similarity ratings. This account is similar to the account given by the Structure Mapping Engine (Falkenhainer et al, 1989) for resolving situations with multiple possible alignments (Gentner & Toupin, 1986; Markman & Gentner, 1993a). This latter account could predict greater variability in similarity ratings for trials with unaligned feature matches. In fact, the average variabilities, in standard deviations, for trials with unaligned and aligned feature matches were 1.6 and 1.3 respectively, paired t(43)=1.2, p = 0.24. Consequently, although the trend is for trials with unaligned feature matches to be more variably rated than trials with only aligned feature matches, the current results do not provide strong support for this account of the nonmonotonicities.

Discussion

In all, three of the present results might be taken as evidence of violations of the monotonicity assumption. Two of these results have readily available alternative explanations that allow an assumption of monotonicity to be preserved. The third result is more problematic for monotonicity, but can be accommodated by an alignment-based approach.

One piece of prime facie evidence against monotonicity is that trials that involved XY Æ XX never received average higher similarity ratings than trials that involved XY Æ XB, and in one case, the former trials received significantly lower similarity ratings. In other words, adding an exact color match to two displays did not increase similarity. Importantly, the color match that was added occurred between butterflies that were not similar to each other. Moreover, the color match that was added involved a color that already had a match. The reason why this result does not count as conclusive evidence for a nonmonotonicity is that an emergent feature arises whenever a display has two identical color features. Participants may represent a display with identically colored butterflies such as XX as containing the feature "Same coloration." This new emergent feature is not present in either displays with Colors X and Y, or displays with Colors X and B, and it may consequently decrease similarity to the starting display.

The second possible evidence for a nonmonotonicity is the result that displays with unrelated positions receive higher similarity ratings than displays with opposite positions. This can be taken as evidence for a nonmonotonicity in that displays with opposite positions share global features such as "butterflies are in the lower left hand corner and upper right hand corner" that have been shown to increase similarity (Goldstone, 1994). This result is not indisputable evidence for a nonmonotonicity, because a monotonic relation between shared features and similarity can account for the result if conjunctive features such as "pink object on top" are hypothesized.

The strongest evidence for a nonmonotonicity comes from situations where the comparisons with trials involving "XY Æ AB" are judged to be more similar than comparisons with "XY Æ YB." The Y feature match decreases similarity when 1) mismatching features have an intermediate level of similarity to each other, and 2) when the Y feature match belongs to poorly aligned butterflies. Given the experimental controls and method of randomizing features, this effect cannot be explained by Feature A being more similar to X than is Y. On average, these two features will be equally similar to X.

This nonmonotonicity also cannot be explained by refusing to count particular color matches as matching features. One might try to salvage the assumption of monotonicity by claiming that a shared feature does not count as a matching feature unless it belongs to properly aligned objects. The first problem with this strategy is that the significant nonmonotonicity is still not explained; there are times when a poorly aligned matching feature not only fails to increase similarity, but also significantly decreases similarity. The second problem is that poorly aligned matching features must be counted as matching features if the typical finding of monotonicity is to be explained. Usually, displays with poorly aligned feature matches are judged to be more similar than the same displays without these feature matches (Goldstone, 1994). It is only under certain feature similarity circumstances that this major trend is reversed.

Computational Modeling of Nonmonotonicity. SIAM predicts nonmonotonicities when an added feature match decreases attention paid to properly aligned features, and increases attention paid to poorly aligned and mismatching features. As it turns out, SIAM also generally predicts that an intermediate level of similarity between mismatching dimension values results in the greatest degree of nonmonotonicity.

In applying SIAM to the data, three parameter s were allowed to vary, and the other parameters were given the default values described by Goldstone (1994). The number of cycles of activation passing was set to 15, and the influence of features on each other (the parameter "feature-to-feature-weight") was set to 0.05. These two values were selected because they can accommodate nonmonotonicities, and thus can be considered free parameters of the model. However, values of these parameters were not selected to maximize the fit between human data and SIAM, but only to permit predictions of nonmonotonicities. The parameter of critical importance for modeling was "color mismatch value" which is proportional to the similarity between differing colors. The mismatch value for the other dimensions was set to 0.45.

The predictions of SIAM for each of the six changed displays compared to the starting display are shown in Figure 2. The relative similarity of these six displays varies with the color mismatch value. SIAM qualitatively captures a number of the empirical results. Most generally, SIAM predicts that whether a matching color occurs between properly or poorly aligned butterflies has a strong influence on its effect on similarity. In comparing XY Æ XB to XY Æ YB, or XY Æ XY with XY Æ YX, it is clear that the displays with properly aligned color matches receive higher similarities, as was found in Experiment 1.

More importantly, SIAM predicts the nonmonotonicities that are found. XY and XX are predicted to be less similar than are XY and XB at times. In addition, the XY and YB are predicted to be less similar than are XY and AB. Furthermore, the parameter values that yield these two nonmonotonicities are similar, and correspond to intermediate color similarity values. In Experiment 1, the level of color similarity that produced the significant nonmonotonicities was the same for these two display comparisons, and was also at an intermediate level of color similarity.

_________________________

Insert Figure 2 about here

_________________________

Nonmonotonicities are only predicted for intermediate color mismatch values, because these values produce substantial activation of poorly aligned features, and the mismatch values are low enough to significantly reduce similarity. Similarity in SIAM is then determined by summing the product of node activations by their (mis)match value. Nonmonotonicities are produced when poorly aligned mismatches are strongly activated and markedly dissimilar. These requirements are jointly maximized at the intermediate level of color similarity.

Also, both SIAM and Experiment 1 show a significant difference between trials involving two and one unaligned matching color features (Displays XY Æ YX and XY Æ YB respectively). Thus, a pair of poorly aligned color feature matches increases similarity, even though an individual poorly aligned feature match decreases similarity. SIAM predicts this nonlinearity because two unaligned features that occur along the same dimension (body color) will support each other, whereas one unaligned feature match will receive no support from other nodes. Finally, SIAM generally predicts that increasing the color mismatch value will increase display similarity. The one large exception to this prediction, for high mismatch values of XY Æ XY, is because increasing the color similarity of mismatching features will inhibit the activation of nodes that place all identical features into correspondence. This problem can be corrected by decreasing the competition between inconsistent correspondences, but it is not clear that this can be done without altering predictions for nonmonotonicities. SIAM may need to be supplemented with a more holistic, non-alignment perceptual process that responds to the overall level of color similarities between the scenes.

Summary. SIAM can provide an account of the nonmonotonic influences of feature matches on similarity, if the feature matches occur between poorly aligned objects. In this regard, SIAM can accommodate results from Experiment 1 that invalidate assumptions made by several feature-based and geometric models of similarity. The most important result from Experiment 1 is the evidence for reliable nonmonotonicities that cannot be explained in terms of emergent features. Furthermore, both SIAM and Experiment 1 indicate a similar influence of color similarity on the occurrence of nonmonotonicities. Although both SIAM and the data indicate the largest degree of nonmonotonicity at an intermediate level of color similarity, this resemblance should be treated as only suggestive, given that color mismatch values were not scaled to physical color similarities. Still, it is a point in SIAM's favor that it predicts an influence of color similarity on the degree of nonmonotonicity. SIAM does, however, wrongly predict some relations between color similarity and similarity estimates.

Experiment 2A

Experiment 1 found evidence for nonmonotonicities in similarity that were modulated by the physical similarity of mismatching features such as color. SIAM also predicts that nonmonotonicities should be modulated by the amount of time permitted for similarity assessment. The purpose of Experiments 2A and 2B was to explore the influence of judgment time on the presence of nonmonotonicities.

Previous research has shown judgment time to influence similarity judgments. Goldstone and Medin (1994a, 1994b) manipulated judgment time by giving participants one of three different response time deadlines. Using stimuli similar to those used in Experiment 1, they manipulated the features shared by two displays. Matching features either occurred between optimally aligned objects or not. Similarity was assumed to increase as a function of participants' percentage of incorrect responses that two displays contained identical butterflies. When participants were required to respond quickly, perceived similarity was approximately equally influenced by properly and improperly aligned feature matches. When participants were given longer deadlines, similarity was much more influenced by properly aligned feature matches.

This interaction between judgment time and type of feature match is predicted by SIAM, assuming that different response time deadlines are modeled by varying the number of cycles of activation passing. Thus, the global consistency of alignments influences similarity more as more cycles are completed. In addition to predicting that proper alignments matter relatively more for similarity as processing continues, SIAM also predicts nonmonotonicities at particular times in processing. As with featural similarity (manipulated in Experiment 1), nonmonotonicities are maximized at an intermediate duration level. At an intermediate duration, SIAM selectively weights properly aligned features, and the improperly aligned features compete strongly against these proper alignments. As such, at an intermediate duration, feature matches between unaligned objects can have the net effect of decreasing similarity. At earlier durations, proper alignments have not been established. At later durations, proper alignments have successfully attained superiority over improper alignments.

Experiment 2A tested SIAM's prediction of nonmonotonicities in similarity ratings that depend on the amount of processing time allowed for a comparison. Processing time was manipulated by presenting trials for different durations. By using the level of featural similarity that was found to produce nonmonotonicities in Experiment 1, as opposed to the arbitrary levels used by Goldstone and Medin (1994a), the probability of obtaining nonmonotonicities was maximized.

Method

Participants. Forty-six undergraduate students from Indiana University participated in the experiment in order to fulfill a course requirement.

Materials. Participants saw 200 trials on Macintosh SI screens. The intermediate level of color similarity from Experiment 1 was used. Thus, the average wavelength difference for the butterflies' body colors was 83 nm. The other dimensions and dimensions values were identical to those used in Experiment 1.

Design. Experiment 2A used the six trial types (see Figure 1) used in Experiment 1. As with Experiment 1, only the body color dimension was altered between the starting and changed display. Dimension values from the other three dimensions were not changed when constructing the changed display from the starting display.

When a particular trial was shown, the following factors were randomized: the left/right order of the starting and changed displays; the actual color values represented by the letters A, B, X, and Y; the particular values on the three non-color dimensions for the two butterflies in the starting display; the presentation time for the trial; and the changed display that was shown.

Procedure. Each trial began with the simultaneous presentation of the starting display and the changed display. The displays remained on the screen for a certain amount of time, after which point the screen was erased. Participants then rated the two displays' similarity on a scale from one to nine.

The display duration was selected at random from three possible times: 1.5, 3, and 5 seconds. Three seconds was chosen as a time because analysis of Experiment 1 revealed that this was the average response time.

Results

The average similarity ratings for the six trial types, at each of the three display durations, are shown in Table 2. On average, similarity rating differences of more than 0.15 are significant at a level of p<.05 in planned comparisons between individual cell means. Overall, there was a strong influence of trial type, F (5, 225) = 8.5, p<.05, MSe=0.08, a moderately strong influence of duration, F(2,90)=4.8, p<.05, MSe=0.05, and also a significant interaction between these factors, F(10, 450)=3.3, p<.05, MSe=0.06. Using Fisher's post-hoc probabilistic least significant difference (PLSD) adjustment, all trial types except XY Æ XB and XY ÆXX were significantly different at a level of p<.05.

______________________

Insert Table 2 about here

______________________

One individual cell mean comparison showed a significant nonmonotonicity, using a Fisher's PLSD criteria of p<.05. The trial with XY Æ XX obtained a significantly lower similarity rating than XY Æ XB for the moderate display duration. The difference between these two trials was in the opposite direction for both short and long display durations (significantly so for short durations). Trials with XY Æ YB received a lower similarity rating than trials with XY Æ AB for moderate display durations, but the difference was not significant, t(43) = 1.5, p = .14. Again, the relation between these displays was in the opposite direction for the other durations, and was significantly so for the short duration displays.

The similarity estimates for short, moderate, and long display durations were 6.3, 6.32, and 6.40, respectively. Using Fisher's post-hoc probabilistic least significant difference (PLSD) adjustment, similarity for the long display duration was significantly higher than for the other two durations. Thus, there appears to be a trend for similarity ratings to increase as presentation time increases. The increase in similarity was largest for the trial with XY Æ XY, in which the starting and changed displays contain identical butterflies.

The general pattern of results from Goldstone and Medin (1994a) were replicated. They found that aligned matching features, relative to unaligned matching features, were more influential as judgment time increased. The six trial types of Experiment 2A can be divided into two groups as a function of whether the trials contains any unaligned feature matches. Trials with XY Æ YX, XY Æ YB, and XY Æ XX contain at least one unaligned feature match, and the other three trials do not. Similarity ratings for trials with unaligned feature matches on short, moderate, and long display durations were 6.14, 6.00, and 6.10, respectively. For trials with only aligned feature matches, these same ratings were 6.45, 6.63, and 6.69. These numbers show a strong interaction between display duration and feature alignment, F(2, 90)=5.7, p < .05, MSe=0.07. Trials with aligned feature matches showed a much greater similarity increase with increasing duration than did trials with unaligned matches. The average variability, in standard deviations, for trials with unaligned and aligned feature matches were 1.8 and 1.6 respectively, paired t(45)=0.97, p = 0.34.

Discussion

Experiment 2A provided evidence for nonmonotonicities and for an interaction between nonmonotonicities and display duration. The one significant nonmonotonicity was found when comparing XY Æ XX and XY Æ XB for intermediate durations. A nonsignificant trend also existed for XY Æ AB to be judged as more similar than XY Æ YB at intermediate durations. Even for this nonsignificant nonmonotonicity, there was significant interaction between duration and the type of trial (XY Æ AB vs XY Æ YB).

Computational Modeling. The most straightforward way to model display duration differences in SIAM is by allowing SIAM to complete varying numbers of cycles before a similarity assessment is given.

The predictions of SIAM for each of the six trial types are shown in Figure 3. The relative similarity of these six trials varies with the number of cycles SIAM completes. SIAM qualitatively captures some, but not all, of the empirical results. Generally, SIAM predicts the correct ordering of the six trials, with XY Æ XY as the most similar trial, followed by XY Æ XB and XY Æ XX, followed by XY Æ YX, and finally followed by XY Æ YB and XY Æ AB. SIAM also predicts that trials with only aligned feature matches become more similar with display duration, relative to those with unaligned feature matches.

_________________________

Insert Figure 3 about here

_________________________

Most importantly for the present purposes, SIAM predicts both of the types of nonmonotonicity that are found. At times, the trials involving XYÆ XX receives lower similarity ratings than the trial with XYÆ XB, and XYÆ YB is predicted to receive lower similarity ratings than XYÆ AB. Furthermore, the parameter values that yield these two nonmonotonicities are similar, and correspond to intermediate display durations. In Experiment 2A, the duration that produced the nonmonotonicities was the same for these two display comparisons, and was also at an intermediate level.

In order to assess SIAM's fit to the actual results, it is useful to associate the three durations with particular cycles. A sample assignment that produces good fits is: short duration (1.5 sec) = 8 cycles, intermediate duration (3 sec) = 15 cycles, and long duration (5 sec) = 25 cycles. When these assignments are used, SIAM agrees with the rank orders of the six trials within each of three durations on 17 out of 18 data points.

This leads to the question "Why does SIAM predict nonmonotonicities at intermediate display durations?" SIAM predicts nonmonotonicities when feature matches are added between displays in a manner that interferes with the processing of other feature matches. Feature matches that occur between poorly aligned objects drag attention away from properly aligned feature matches because the two types of match are inconsistent. If few cycles are executed, similarity is determined mostly by the sheer number of matching features, aligned or unaligned. In this case, poorly aligned feature matches will increase similarity almost as much as properly aligned matches. If many cycles (25 or more) are executed, then similarity will be mostly determined by aligned matches, and these matches will have been placed in strong correspondence. In this case, adding poorly aligned matches does not detract from similarity because the proper alignments are firmly established. Thus, poorly aligned feature matches decrease similarity only at an intermediate number of cycles because in this situation SIAM is strongly influenced by the alignment of features, but the proper alignments are not fully established and so can weakened by conflicting alignments.

SIAM's greatest difficulty comes from modeling the relation between display duration (number of cycles) and similarity. In SIAM, as processing continues, similarity will become increasingly based on properly aligned rather than unaligned objects, and because aligned objects tend to be similar, similarity will increase with processing. Empirically, similarity ratings in Experiment 2A did increase with display duration, and previous work (Medin, Goldstone, & Gentner, 1993) has suggested that participants more often revise their original similarity ratings by increasing them rather than by decreasing them.

However, SIAM predicts far too strong a relation between display duration and similarity. This problem led Goldstone and Medin (1994a) to augment SIAM with a boundary-crossing model for same/different judgments. The basic premise of this model is that a "same" judgment is given when SIAM's similarity estimate exceeds an upper bound, and a "different" judgment is given when SIAM's similarity estimate falls below a lower bound. The upper and lower boundaries change with processing (see also Busemeyer & Rapoport, 1988), such that increasingly stringent criteria are required for "same" judgments and increasingly lax criteria are required for "different" judgments. Such an approach could be used to correct the overinfluence of duration on similarity.

Given SIAM's highly nonlinear similarity estimates and the difficulty in finding analytic solutions for SIAM's behavior, it is useful to plot SIAM's predicted nonmonotonicities as a function of varying parameter values. Figures 2 and 3 show that two variables, color mismatch value and cycles, moderate the extent of nonmonotonicities that are found. Figure 4 shows SIAM's predictions for nonmonotonicities across the full parameter space with variations to these variables. The vertical axis of Figure 4 shows SIAM's estimate for trials with XYÆ YB subtracted from SIAM's estimate for trials with XYÆ AB. As such, if the value is positive, then SIAM predicts a nonmonotonicity (the type of nonmonotonicity that cannot be explained in terms of emergent features). Figure 4 shows that SIAM generally predicts monotonicity, and monotonic relations are generally stronger than the occasional nonmonotonicities that are found. Also, SIAM does not ever predict nonmonotonicities for low values of "cycles"; when few strong alignments have been created, additional feature matches always increase similarity. Finally, maximum nonmonotonicities are predicted for intermediate values along both parameters, as supported by Experiments 1 and 2A.

_________________________

Insert Figure 4 about here

_________________________

Experiment 2B

The purpose of Experiment 2B was to replicate the observed relation between display duration and nonmonotonicities. Although an intermediate display duration tended to produce nonmonotonicities in Experiment 2A, only one of the two tests of nonmonotonicity was significant. Experiment 2B used simpler materials than those used in Experiment 2A, and tested a wider range of display durations.

Method

Participants. Forty-eight undergraduate students from Indiana University participated in the experiment in order to fulfill a course requirement.

Materials. Participants saw 300 trials on Macintosh SI screens. Each trial consisted of two side-by-side displays, and each display was composed of two bezier curves. Each bezier curve was defined by 9 control points. Each control point was given a random horizontal and vertical location within the range 0-5 cm. The bezier curves were constrained to smoothly pass through each of the control points. One curve in a display was generated by assigning random values to the control points. The second curve was a distortion of the original curve, generated by displacing each control point by 0.62 cm. in a random direction. A sample comparison of two displays is shown in Figure 5.

_________________________

Insert Figure 5 about here

_________________________

Each of the two displays within a trial contained the same two curves. The curves were positioned using the "unrelated" spatial layout of Experiment 1. As with Experiments 1 and 2A, the unaligned matching features were produced by varying the hues of the objects within a display. The intermediate level of color similarity from Experiment 1 was used. Thus, the average wavelength difference for the bezier curves' hues was 83 nm.

Design. As before, each trial contained a starting and a changed display, and the changed display altered the hues of the starting display in one of six manners (see Figure 1). The randomizations of Experiment 2A were used.

Procedure. The similarity rating procedure of Experiment 2A was used with the single exception of a wider range of display durations. The display duration for a trial was selected at random from five possible times: 1, 2, 3, 4, and 5 seconds.

Results and Discussion

The average similarity ratings for the six trial types, at each of the five display durations, are shown in Figure 6. Overall, these ratings showed a strong influence of trial type, F (5, 235) = 12.5, p<.05, MSe = 0.10, and a significant interaction between trial type and duration, F(10, 470) = 2.4, p<.05, MSe = 0.04. Using Fisher's post-hoc probabilistic least significant difference (PLSD) adjustment, all trial types except XY Æ XB and XY ÆXX, and XY Æ AB and XY Æ YB were significantly different at a level of p<.05.

_________________________

Insert Figure 6 about here

_________________________

Two individual cell mean comparisons showed a significant nonmonotonicity, using a Fisher's PLSD criteria of p<.05. The display with XY Æ XX obtained a significantly lower similarity rating than XY Æ XB for the two second duration. At this same duration, the display with XY Æ YB obtained a significantly lower similarity rating than XY Æ AB. The difference (in the opposite direction) between XY Æ XB and XY Æ YB was also significant at the 5 second duration. Thus, nonmonotonicities were found at one duration, and these nonmonotonicities were eliminated, and even reversed in one case, for other durations. The duration which produced nonmonotonicities, apparently two seconds or slightly below, was less than the nonmonotonicity-producing duration in Experiment 2A. One reason for this difference may be the simpler nature of the displays in Experiment 2B.

The similarity ratings for 1, 2, 3, 4, and 5 second display durations were 6.0, 6.0, 6.1, 6.1, and 6.1, F(4, 188)=1.2, MSe = 0.04, p > 0.1. The previous trend for similarity ratings to increase with duration was only found for the XY Æ XY display, F(4, 188) = 2.7, MSe = 0.04, p <.05. The average variabilities, in standard deviations, for trials with unaligned and aligned feature matches were 1.7 and 1.4 respectively, paired t(47)=1.66, p = 0.11.

The nonmonotonicities that were found in Experiment 2B were similar to those found in Experiment 2A. In both cases, an intermediate duration produced the greatest amount of nonmonotonicity, the two separate cases of nonmonotonicity were maximized at the same duration, and no nonmonotonicities due to unaligned color features were found when the unaligned features did not compete against properly aligned color features. These three aspects were also present in SIAM's simulation of the influence of duration. As before, the largest discrepancy between SIAM's predictions (shown in Figure 3) and Experiment 2B's results was that SIAM predicted a large influence of duration on increasing similarity ratings, although empirically this trend was only found for XY Æ XY.

General Discussion

The three experiments explored predictions of an alignment-based approach to similarity. The ability of an alignment-based model to account for the (occasionally) nonmonotonic relation between feature matches and similarity is noteworthy because few other models can accommodate nonmonotonicities. Feature-based models (Tversky, 1977), and geometric models of similarity (Torgerson, 1965) of similarity explicitly or implicitly assume monotonicity, and thus would have difficulties accounting for subsets of the results of Experiments 1 and 2. Although some of the putative nonmonotonicities could be explained in terms of emergent features, and thus are consistent with feature-based and geometric models of similarity, at least one nonmonotonicity is difficult to explain in these terms. In this nonmonotonicity, adding a single matching color feature to two displays significantly decreased similarity when A) the matching feature was added to poorly aligned objects, B) mismatching colors had intermediate levels of physical similarity, and C) intermediate display durations were used.

Alignment-based models as a class are able to predict nonmonotonicities because they treat compared entities as hierarchically and relationally structured. An entity is structured if it is composed of parts that themselves have hierarchical organization, or parts that have identifiable relations to each other (relational organization). Alignment-based models compute similarity by first placing the parts of two entities into alignment with each other. Because the entities' representations are structured rather than simply being a flat list of features or dimensions, alignments can be consistent or inconsistent with each other. Given cooperation between consistent alignments and competition between inconsistent alignment, nonmonotonicities can be accommodated if the matching feature which is added to two scenes produces an alignment that competes against other strong alignments. For example, adding the same color to the figures in two displays may decrease the displays' similarity if the color is added to figures that are not well aligned on the basis of other features.


The SIAM Model of Similarity

A specific alignment-based model of similarity, SIAM, was developed. As was empirically found, SIAM predicts that matching color features should only be able to decrease similarity when they occur between poorly aligned objects. Only in this situation will the matches interfere with the processing of the optimally aligned matching features. SIAM also predicts influences of mismatching feature similarity and display duration on whether nonmonotonicities occur.

SIAM correctly predicts nonmonotonicities at intermediate durations and color similarities, as was suggested by the experiments. It is more than a coincidence that for both variables, an intermediate level produces the greatest degree of nonmonotonicity. Nonmonotonicities in SIAM are generated when poorly aligned matches pose significant competition to the preferred matches. When few cycles are processed or feature mismatch values are high, then the preferred matches have only slightly more influence than the less preferred matches. When many cycles are processed, or feature mismatch values are very low, then the poorly aligned matches do not strongly compete against the strong optimal alignments. Only at intermediate levels of these variables are both criteria (strong competition and strong preference for aligned matches) met.

Other alternative explanations of the observed nonmonotonicity were found to be inadequate. The counterbalancing and randomization techniques eliminated explanations in terms of individual color similarities. Claiming that matching features are ignored when they occur between poorly aligned objects explained neither the significant decrease in similarity that their presence occasionally produced nor the more general result that poorly aligned matching features usually increase similarity. Emergent feature explanations also could not account for the critical nonmonotonicity between displays with one versus zero poorly aligned feature matches.

Another model that is not supported by the experiments is a conjunctive features model of similarity. According to a conjunctive features model, objects are represented in terms of conjunctions of simple features, as well as simple features themselves. Although the idea that objects may be represented by conjunctions of features has received support (Gluck & Bower, 1988; Hayes-Roth & Hayes-Roth, 1977), decreasing featural overlap between two displays does not increase the number of conjunctive features possessed by the displays, given the particular materials tested. Thus, the model cannot account for the nonmonotonicities observed here.

The modeling success of SIAM should not be overstated. SIAM does not require nonmonotonicities to occur in the circumstances when they are observed to occur. Several parameters influence whether SIAM predicts nonmonotonicities at all. The more modest claim has been that SIAM can accommodate the observed nonmonotonicities whereas several other models cannot, and can account for the moderating influence of particular experimental manipulations.

Empirical Extensions

Generalizations of the empirically observed nonmonotonicities may be limited for reasons of task or stimuli. Although nonmonotonicities were observed for the similarity rating task, they may not be observed for other measures of similarity. Although nonmonotonicities were found for the butterfly and bezier curve stimuli, they may not be found for other materials. These two concerns about the generalizability of the results will be considered separately.

There is reason to be optimistic about the possibility of obtaining nonmonotonicities with other stimulus sets, given the wide sphere of application for alignment-based models. Alignment-based models of similarity are applicable in situations where entities to be compared are hierarchically or relationally structured. In cases where compared entities are not structured, alignment-based models are not likely to offer advantages over feature-based or geometric models of similarity. Many entities, including landscapes, stories, faces, animals, and objects are aptly described by hierarchical and/or relational descriptions. Alignment-based models have been applied to data sets obtained by several other researchers (Goldstone & Medin, 1994) including Palmer's (1978) stick figures, Corter's (1987) geometric shapes, and Proctor and Healy's (1985) letter series. However, future work is required to discover whether nonmonotonicities are observed for these materials. Palmer (1978) presents results that could be interpreted as nonmonotonicities (adding line common line segments to two stick figures decreased their similarity), but are also interpretable as a monotonic relation between similarity and emergent features.

For purposes of experimental control, rather artificial entities were used in Experiments 1 and 2. In particular, entities consisted of complicated displays of multiple, similar objects. The objects only had loose relations to each other, and the parts of objects were perceptually distinct. Experiments have extended the alignment-based perspective to entities consisting of single objects such as stylized birds (Goldstone, 1994; Goldstone & Medin, 1994a). More impressively, Lynch and Medin (personal communication, May 1994) have obtained evidence suggesting that alignment may play a role in judging the similarity of objects as simple as single triangles. Their experiments provide suggestive evidence that the similarity of triangles is influenced by the similarity of corresponding lines in the triangles, and is also influenced by the similarity of non-corresponding lines. Factors such as proximity and orientation that influence the ease of establishing correct correspondences between line segments also influence the relative importance of aligned and unaligned line segment similarities. In addition, just as alignment plays a role in quite simple objects, it also seems to play a role in richer displays than those used in the current experiments. Alignment-based approaches to similarity have been extended to domains involving abstract and concrete words (Markman & Gentner, 1993b), pictures displaying causal scenes (Markman & Gentner, 1993a), famous people (Spellman & Holyoak, 1992), and stories (Gentner, Ratterman, & Forbus, 1993). Given the prominent role of alignment in these domains, there are grounds for believing that the currently obtained nonmonotonicities will generalize to coherent perceptual stimuli and to conceptually-based stimuli.

As far as being able to generalize findings of nonmonotonicities to other tasks that measure similarity, the prognosis is somewhat mixed. Fortunately, Experiment 2 does provide suggestive evidence that the observed nonmonotonicities are not simply due task demands on participants. Nonmonotonicities were found for intermediate but not for short or long durations. Explicit task demands would probably be expected to exert an influence for long as well as intermediate durations, and perhaps even a greater influence. However, there is reason for skepticism that nonmonotonicities would be found for a same/different task in which similarity is measured by the percentage of incorrect "same" judgments on displays with different objects. This skepticism comes from idiosyncratic aspects of the same/different task. In particular, participants can respond "different" as soon as a single feature is found in one display that does not have a match in the other display. As such, displays with no matching color features are likely to be quickly and accurately called "different." This aspect of the same/different task makes nonmonotonicities unlikely to occur, but also makes the task a problematic measure of similarity (Goldstone & Medin, 1994a); in many cases, impressions of similarity are not eliminated simply because a single mismatching feature is apparent. Other implicit measures of similarity, including similarity-based inferences and memory retrieval, would probably be more reasonable similarity measures for providing evidence for the generalizability of nonmonotonicities.

Models for Nonmonotonicities

Just as alignment is important for concrete and simple objects (see also Gentner & Markman, 1995; Markman & Gentner, 1993a), it is also important for the comparison of more abstract entities. Researchers in analogical reasoning (Gentner 1983, 1989; Gentner & Toupin, 1986; Holyoak & Koh, 1987; Holyoak & Thagard, 1989; Hummel et al., 1994; Ross, 1987, 1989; Spellman & Holyoak, 1992; Wharton et al, 1994) have shown that similarities between stories, proverbs, and algebraic problems all depend on establishing correspondences between the parts of these entities. The current model and experiments were inspired by this previous corpus of research on alignment-based comparisons in analogical reasoning.

As such, it might be expected that models of analogical reasoning may be able to account for the observed nonmonotonicities. Two of most influential current models of analogical reasoning are Structure Mapping Engine (SME, Falkenhainer et al, 1989) and Analogical Constraint Mapping Engine (ACME, Holyoak & Thagard, 1989). The difficulty in applying either of these models to Experiments 1 and 2 is in obtaining evaluations of similarity. When given two scenarios, SME produces a list of connected mappings between the parts of the scenarios. SME can be run in a "literal similarity mode" which is appropriate for modeling similarity ratings, and in this mode both superficial attributes and abstract relations will influence similarity (Forbus & Gentner, 1989; Gentner et al, 1993). Whether SME predicts nonmonotonicities, monotonicities, or no effect due to alternative mappings depends on how SME integrates evidence across the mappings. If only the longest, structurally consistent mapping is considered, then poorly aligned matching features will have no influence on similarity. If inconsistent mappings detract from each other, then these same matches would decrease similarity. If all mappings are considered, then these matches would increase similarity. One of the potential sources of evidence from the experiments that bears on whether multiple inconsistent mapping are integrated together on a single trial was inconclusive. It was argued (see the Discussion to Experiment 1) that if only one consistent alignment was considered on a given trial, then trials involving inconsistent alignments might receive more variable similarity ratings than trials with completely consistent alignments. Although the results from the experiments tended in this direction, they fell short of significance. Further work with SME is necessary to reveal whether SME can account for the effects of duration and featural similarity on nonmonotonicities, and whether it can account for two unaligned matches increasing similarity even though one unaligned match decreases similarity.

Similar uncertainties apply to ACME. Predating SIAM, ACME operates in a similar way, by passing activation between nodes that represent correspondences between displays. ACME computes a global measure of "harmony" between the emerging mappings. However, this measure is probably not a good indicator of similarity. In particular, adding unaligned matches to two displays will always lower the harmony within the system. Thus, the harmony measure successfully can account for nonmonotonicities, but only at the cost of not predicting the more typical monotonic influence of unaligned matches. Thus, it is difficult to apply ACME to the Experiments 1 and 2 until an appropriate measure of similarity within ACME is suggested.

Conclusion

Given the connection between SIAM and other alignment-based approaches to comparison, the current experiments should be taken as not only specifically supporting SIAM, but also as more generally supporting an alignment-based perspective on similarity judgments. Central to this perspective is the premise that when structured entities are compared, correspondences must be established between the entities, and these correspondences influence each other. Consistent correspondences support each other and inconsistent correspondences inhibit each other. The act of comparing displays seems to naturally involve aligning the displays' parts, and this process seems to be well described by an interactive activation process between feature and object correspondences.

References

Attneave, F. (1950). Dimensions of similarity. American Journal of Psychology, 63, 516-556.

Busemeyer, J. R., & Rapoport, A. (1988). Psychological models of deferred decision making. Perception & Psychophysics, 32, 91-134.

Carroll, J. D., & Wish, M. (1974). Models and methods for three-way multidimensional scaling. In D. H. Krantz, R. C. Atkinson, R. D. Luce, & P. Suppes (Eds.) Contemporary developments in mathematical psychology (Vol. 2, pp. 57-105). San Francisco: Freeman.

Clement, C., & Gentner, D. (1991). Systematicity as a selection constraint in analogical mapping. Cognitive Science, 15, 89-132.

Corter, J. E. (1987). Similarity, confusability, and the density hypothesis. Journal of Experimental Psychology: General, 116, 238-249.

Dawson, M. R. (1991). The how and why of what went where in apparent motion: Modeling solutions to the motion correspondence problem. Psychological Review, 98, 569-603.

Falkenhainer, B., Forbus, K.D., & Gentner, D. (1989). The structure-mapping engine: Algorithm and examples. Artificial Intelligence, 41, 1-63.

Forbus, K. D., & Gentner, D. (1989). Structural evaluation of analogies: What counts? Proceedings of the Eleventh Annual Conference of the Cognitive Science Society (pp. 341-348), Ann Arbor, MI. Hillsdale, NJ: Erlbaum.

Gati, I., & Tversky, A. (1982). Representations of qualitative and quantitative dimensions. Journal of Experimental Psychology: Human Perception and Performance, 8, 325-340.

Gati, I., & Tversky, A. (1984). Weighting common and distinctive features in perceptual and conceptual judgments. Cognitive Psychology, 16, 341-370.

Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7, 155-170.

Gentner, D. (1989). The mechanisms of analogical learning. In S. Vosniadou & A. Ortony (Eds.), Similarity, analogy, and thought. New York: Cambridge University Press.

Gentner, D., & Markman, A. B. (1994). Structural alignment in comparison: No difference without similarity. Psychological Science, 5, 152-158.

Gentner, D., & Markman, A. B. (1995). Similarity is like analogy. In C. Cacciari (Ed.), Similarity in Language, Thought, and Perception. (pp. 111-148). Brussels: BREPOL.

Gentner, D., & Ratterman, M. J. (1991). Language and the career of similarity. In S. A. Gelman & J.P. Byrnes (Eds.), Perspectives on Thought and Language: Interrelations in Development (pp. 257-277). London: Cambridge University press.

Gentner, D., Ratterman, M. J., & Forbus, K. D. (1993). The roles of similarity in transfer: Separating retrievability from inferential soundness. Cognitive Psychology, 25, 524-575.

Gentner, D., & Toupin, C. (1986). Systematicity and surface similarity in the development of analogy. Cognitive Science, 10(3), 277-300.

Goldstone, R. L. (1994). Similarity, Interactive Activation, and Mapping. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 3-28.

Goldstone, R.L., Gentner, D., & Medin, D.L. (1989). Relations Relating Relations. Proceedings of the Eleventh Annual Conference of the Cognitive Science Society. Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Goldstone, R. L., & Medin, D. L. (1994a). The time course of comparison. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 29-50.

Goldstone, R.L., & Medin, D.L. (1994b). Interactive Activation, Similarity, and Mapping. in K. Holyoak and J. Barnden (Eds.) Advances in Connectionist and Neural Computation Theory, Vol. 2: Analogical Connections. (pp. 321-362). Ablex : New Jersey.

Goldstone, R.L., Medin, D.L., & Gentner, D. (1991). Relations, Attributes, and the non-independence of features in similarity judgments. Cognitive Psychology. 222-264.

Gluck, M. A. (1991). Stimulus generalization and representation in adaptive network models of category learning. Psychological Science, 2, 50-55.

Gluck, M. A., & Bower, G. H. (1988). From conditioning to category learning: An adaptive network model. Journal of Experimental Psychology: General, 117, 227-247.

Hayes-Roth, B., & Hayes-Roth, F. (1977). Concept learning and the recognition and classification of exemplars. Journal of Verbal Learning and Verbal Behavior, 16, 321-338.

Holyoak, K. J., & Koh, K. (1987). Surface and structural similarity in analogical transfer. Memory & Cognition, 15, 332-340.

Holyoak, K. J., & Thagard, P. (1989). Analogical mapping by constraint satisfaction. Cognitive Science, 13, 295-355.

Hummel, J. E., Burns, B., & Holyoak, K. J. (1994). Analogical mapping by dynamic binding: Preliminary investigations. in K. Holyoak and J. Barnden (Eds.) Advances in Connectionist and Neural Computation Theory, Vol. 2: Analogical Connections. (pp. 416-445). Ablex : New Jersey.

James, W. (1890/1950). The principles of psychology: Volume I. Dover: New York.

Markman, A. B., & Gentner, D. (1993a). Structural alignment during similarity comparisons. Cognitive Psychology, 25, 431-467.

Markman, A. B., & Gentner, D. (1993b). Splitting the differences: A structural alignment view of similarity. Journal of Memory & Language, 32, 517-535.

Marr, D., and Poggio, T. (1979). A computational theory of human stereo vision. Proceedings of the Royal Society of London, 204, 301-328.

McClelland, J. L., & Rumelhart, D.E. (1981). An interactive activation model of context effects in letter perception: Part 1. An account of basic findings. Psychological Review, 88, 375-407.

McClelland, J.L., & Elman, J.L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18, 1-86.

Medin, D.L., Goldstone, R.L., & Gentner, D. (1993). Respects for similarity. Psychological Review, 100, 254-278.

Medin, D.L., Goldstone, R.L., & Gentner, D. (1990). Similarity involving attributes and relations: Judgments of similarity and difference are not inverses. Psychological Science, 1, 64-69.

Medin, D. L., & Schaffer, M. M. (1978). A context theory of classification learning. Psychological Review, 85, 207-238.

Palmer, S. E. (1978). Structural aspects of visual similarity. Memory & Cognition, 6, 91-97.

Proctor, R. W., & Healy, A. F. (1985). Order-relevant and order-irrelevant decision rules in multiletter matching. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11, 519-537.

Ross, B. H. (1987). This is like that: the use of earlier problems and the separation of similarity effects. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 629-639.

Ross, B. H., (1989). Distinguishing types of superficial similarities: Different effects on the access and use of earlier problems. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 456-468.

Shepard, R. N. (1962a) The analysis of proximities: Multidimensional scaling with an unknown distance function. Part I. Psychometrika, 27, 125-140.

Shepard, R. N. (1962b) The analysis of proximities: Multidimensional scaling with an unknown distance function. Part II. Psychometrika, 27, 219-246.

Smith, L. B. (1989). A model of perceptual classification in children and adults. Psychological Review, 96, 125-144.

Spellman, B. A., & Holyoak, K. J. (1992). If Saddam is Hitler then who is George Bush? Analogical mapping between systems of social roles. Journal of Personality & Social Psychology, 62, 913-933.

Torgerson, W. S. (1965). Multidimensionsal scaling of similarity. Psychometrika, 30, 379-393.

Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327-352.

Tversky, A., & Gati, I. (1982). Similarity, separability, and the triangle inequality. Psychological Review, 89, 123-154.

Ullman, S. (1979). The interpretation of visual motion. Cambridge, MA: MIT Press.

Wharton, C. M., Holyoak, K. J., Downing, P. E., Lange, T. E., Wickens, T. D., & Melz, E. R. (1994). Below the surface: Analogical similarity and retrieval competition in reminding. Cognitive Psychology, 26, 64-101.

Author Notes

I wish to thank Douglas Medin for enumerable contributions to this research. In addition, many useful comments and suggestions were provided by Dedre Gentner, James Hampton, John Kruschke, Arthur Markman, Paula Niedenthal, Robert Nosofsky, Richard Shiffrin, Edward Smith, and Linda Smith. This research was funded by Biomedical Research and Support Grant PHS S07 RR7031N from the National Institute of Mental Health, and by National Science Foundation Grant SBR-9409232. Correspondences concerning this article should be addressed to Robert Goldstone, Psychology Department, Indiana University, Bloomington, Indiana 47405.

Table 1

Description of Displays and Similarity Ratings from Experiment 1

____________________________________________________________________________ No. of Matching Features Color Similarity

______________________ ____________________________

Trial Type Aligned Unaligned High Medium Low

Matching Matching

Colors Colors

____________________________________________________________________________

XYÆXY 2 0 7.78 7.67 7.64

XYÆYX 0 2 6.49 6.13 6.22

XYÆXB 1 0 6.80 6.57 6.48

XYÆYB 0 1 6.20 5.44 5.32

XYÆXX 1 1 6.75 6.41 6.46

XYÆAB 0 0 6.03 5.59 5.06

____________________________________________________________________________

Table 2

Similarity Ratings from Experiment 2

___________________________________________________________

Duration

_________________________________________

Trial Type Short Moderate Long

____________________________________________________________

XYÆXY 7.52 7.74 7.93

XYÆYX 6.29 6.14 6.17

XYÆXB 6.43 6.61 6.62

XYÆYB 5.58 5.47 5.56

XYÆXX 6.55 6.40 6.58

XYÆAB 5.40 5.53 5.52

____________________________________________________________

Figure Captions

Figure 1. In Experiments 1 and 2, trials consisted of the starting display and one of the six changed displays. The letters X, Y, A, and B represent particular body colors for the butterflies. Body color matches between displays occurred either between properly corresponding butterflies, or between poorly corresponding butterflies.

Figure 2. Simulation results for the SIAM (Similarity as Interactive Activation and Mapping) model of similarity. SIAM's predicts a nonmonotonicity whenever the trial with XYÆ YB (one poorly aligned color match) obtains a lower similarity estimate than the trial with XYÆ AB (no color matches).

Figure 3. Simulation results for SIAM, as a function of number of cycles of activation passing. Nonmonotonicities are predicted whenever the trial with XYÆ YB (one poorly aligned color match) obtains a lower similarity estimate than the trial with XYÆ AB (no color matches).

Figure 4. SIAM's predictions for nonmonotonicities across the parameter space created by varying cycles and feature mismatch value, with all other parameters (except feature-to-feature-weight = 0.05) given default values. The vertical axis shows the estimated similarity of the trial with XYÆ AB minus the estimated similarity of trial with XYÆ YB. A nonmonotonicity is predicted wherever this difference is positive.

Figure 5. Sample stimuli from Experiment 2B. Different shading patterns indicate a different hue. This figure represents the trial involving "XYÆ YB" because there is an identical hue between objects that have different shapes.

Figure 6. Results from Experiment 2B. Two significant nonmonotonicities are found at the two second display duration.