Localization And Identification Tasks Rely On Different Temporal Frequencies
Thomas A. Busey
Indiana University
In press: Vision Research
Please send correspondence to: Thomas A. Busey
Department of Psychology
Indiana University
Bloomington, IN 47405
email: busey@indiana.edu
Abstract
The temporal frequencies underlying character localization and identification tasks are measured, as suggested by a model that assumes that the two tasks are processed in different cortical pathways and receive contributions from different populations of visual cortical neurons. Data from two-pulse and temporal contrast sensitivity experiments demonstrate that character localization depends upon much higher temporal frequencies than character identification when both are tested in the periphery. Foveal presentations demonstrate that detection and identification tasks rely on the same temporal frequencies. In a control experiment, the letters were blurred to restrict the range of spatial frequencies. However, these stimuli replicated earlier results and demonstrates that the use of higher temporal frequencies by the localization tasks cannot be attributed to the use of different spatial frequencies for different tasks. In addition, near-foveal presentations of the localization task replicate findings from the far periphery, suggesting that the localization task may be processed differently from the detection task regardless of location on the retina. Finally, the temporal frequency differences persist when a single sine-wave grating is used in localization and identification tasks. The results are consistent with any anatomical model that assumes that the neural substrates underlying localization receive or maintain a higher range of temporal frequencies than areas responsible for identification. The findings demonstrate how the time-course of different stimulus attributes can be quantified, and have implications for theories of information processing in which different stimulus attributes are combined.
When we view an object, we are aware of different properties such as the objects identity, location or onset. Much of the neurophysiological work on the functions of the human visual system suggests that information about object identity and location are performed in different brain areas and may receive inputs from different classes of early visual mechanisms. The present work is motivated by these findings. The goal is to determine the conditions under which character identification, detection and localization tasks rely on different sources of visual information, as quantified by the range of temporal frequencies underling different tasks. If the two tasks rely on different temporal frequencies, this suggests that the two tasks are processed separately, and either preserve different ranges of temporal frequencies during processing or receive different inputs from earlier visual pathways.
Psychophysical studies suggest an early segregation of visual information into different channels that vary in their spatio-temporal response properties. When asking observers to detect or differentiate different temporal frequencies, Watson and Robson (1981), found evidence for two temporal frequency channels, which they termed labeled detectors. For the low-temporal frequency channel they estimated that seven distinct sets of spatial frequency channels exist, while for the high temporal frequency channel only three sets of spatial frequency channels exist. Based on these findings they suggest that the high temporal frequency channels have poorer spatial acuity resulting from much more broadly tuned spatial frequency channel bandwidths. These two temporal channels with associated spatial frequency response properties may be the psychophysical analog to the cell classes exhibiting different spatio-temporal response properties in area V1 (Hawken, Shapley & Grosof., 1996), or perhaps the different temporal frequency characteristics of the magnocellular and parvocellular pathways (Merigan & Maunsell, 1993; but see Hawken et al., 1996).
Whatever the anatomical underpinnings, the temporal frequency channels observed by Watson and Robson (1981) may selectively influence later stages of processing, since recent neuro-anatomical evidence suggests that localization and identification tasks are processed in different brain areas. Haxby et. al. (Haxby, Grady, Horwitz, Ungerleider, Mishkin et al., 1991) found that a face-matching task that required identifying faces differentially activated the temporal lobe, while a spatial vision task stressing the locations of objects differentially activated the parietal lobe. A similar result was reported by McIntosh et al. (McIntosh, Grady, Ungerleider, Haxby, Rapoport, & Horwitz, 1994), who used a face identification or dot localization task to examine the relative activation and interactions between the temporal and parietal pathways using PET scans. In addition to these studies that isolate the pathways using different stimuli, other studies have reported similar dissociations using a single stimulus. Corbetta et. al. (Corbetta, Miezin, Dobmeyer, Shulman & Petersen, 1991) found that attending to the color or speed of a stimulus also produces differential activation in the superior temporal sulcus and the inferior parietal lobe respectively, although these differences were not as strong as in experiments that isolated the pathways using different stimuli. All of these results do support the model that localization and identification tasks are processed in separate brain regions.
The contributions of the different temporal frequency channels to different tasks remains uncertain, despite the fact that several studies have attempted to address the contributions of these early visual channels. Gorea (1986) examined the temporal properties underlying the detection and discrimination of low and high spatial frequency gratings in the fovea and found no differences in the temporal frequency information used in each task. Kulikowski and Tolhurst (1973) describe evidence that supports a sustained channel that processes pattern information and a transient channel that processes flicker information, although later studies by Derrington & Henning (1980) dispute these conclusions. Despite the fact that we see little experimental evidence to suggest that different tasks rely on different temporal frequencies, all of these studies have been carried out in the fovea. We might expect differences in the periphery given that cells have been found in the parietal lobe that have receptive fields that cover mainly the periphery, and in some cases even exclude the fovea (Motter & Mountcastle, 1981). If the role of the parietal lobe is to maintain a map of the locations of various objects in the visual scene (Merigan & Maunsell, 1993) or to detect an object in the periphery as a candidate for attention allocation (Corbetta, Miezen, Shulman & Petersen, 1993), we might expect differences between tasks only for peripheral presentations.
In summary, the psychophysical and neuroanatomical evidence suggests that localization and identification tasks are treated separately. Less clear is whether the areas that process these tasks receive different sources of visual information from earlier visual pathways. Merigan & Maunsell (1993) suggest that a partial segregation may exist between subcortical and cortical pathways, which may provide separate inputs to the two areas. However, the behavioral consequences of such segregation are controversial, and the subcortical pathways may not differ in their temporal frequency characteristics (Hawken et al. 1996). Despite this, other authors suggest that the two areas may differ in their temporal frequency characteristics. For example, Watson, Ahumada & Farrell (1986) suggest that different "windows of visibility" are made available to different cortical pathways, such that the parietal lobe pathway might take advantage of high temporal frequencies (associated with rapid velocities) while the temporal lobe pathway may be limited to lower temporal frequencies that provide a more stable percept. Although Watson et al. were concerned with motion, this model may also apply to localization and identification tasks where different aspects of the visual signal may be useful for different tasks. This provides the central question of this work: do the character identification and localization tasks that are thought to be selectively processed in different brain areas rely on different temporal frequencies?
Two Tasks Studied in Two Paradigms
The specific implementation of the two tasks is constrained by the desire to use the same stimulus for both tasks. This holds the visual input constant across the two tasks, which implies that any differences in performance across tasks results from the use of different information derived from the same visual stimulus. Thus in a location detection task, the observer indicates whether a letter appeared left or right of fixation, while an identification task requires simply identifying the letter. Experiment 2 involves foveal detection of a letter presented at fixation in one of two temporal intervals, which is a two-temporal interval task rather than the two-location task used in Experiment 1.
To provide across-paradigm verification and links to previous work, Experiments 3-6 measure the temporal frequencies underlying localization and identification using the more traditional temporal contrast sensitivity function (TCSF) paradigm. The TCSF experiments involve flickering a stimulus at different temporal frequencies around a gray background while the subject adjusts the contrast of the stimulus such that a performance criterion is met.
In the current studies we use both the two-pulse and TCSF paradigms. While the temporal contrast sensitivity function provides a direct estimate of the temporal frequencies underlying a task, measuring it requires long stimulus presentations of 500 ms or more, which introduces the possibility of contaminating eyemovements. The Two-Pulse technique is a more recent design proposed by Ikeda (1965) which uses a more temporally compact stimulus and thus reduces problems from eyemovements. The TCSF paradigm requires fewer conditions, which allows a wider exploration of different stimulus conditions in Experiments 3-6.
The basic question of whether localization and character identification tasks rely on different temporal frequencies can by answered based on qualitative comparisons of data from different tasks. As a result, a complete understanding of the mathematical modeling described below is not necessary, although such models do allow comparisons across the two-pulse and TCSF paradigms. A more complete treatment is found in Busey & Loftus (1994) and Busey (in press).
Characterizing the Temporal Frequencies Used in a Task
In the Two-Pulse paradigm, a stimulus is presented twice, in the same location, separated by a variable interstimulus interval (ISI). Typically the pulses are short duration, ranging from 2 to 30 ms (e.g. Ikeda, 1965; 1986). The first pulse engenders a response in the visual system, and for short ISIs (30-45 ms) the response to the second pulse will interact with the persisting first-pulse response. Two different pulse conditions are used, each of which contains same stimulus presented twice in the same location. In the positive/positive condition, both stimulus pulses are the same contrast (e.g. light gray pulse followed by a second light gray pulse on a gray background). The positive/negative condition reverses the polarity of the second pulse (e.g. a light gray pulse followed by a dark gray pulse). This second condition provides an independent estimate of the amount of interaction between two pulses and thus improves the parameter estimation stage that is required to recover the temporal frequencies used in a task. This positive/negative condition also provides evidence for temporal inhibition (Watson, 1986) since at inter-pulse intervals of 30-45 ms, a positive pulse followed by a negative pulse can produce better sensitivity than a positive pulse followed by another positive pulse, which is consistent with the effects of temporal inhibition.
When modeling the time-course of the response engendered by a stimulus to predict performance, the physical stimulus is characterized as its contrast over time (Figure 1, Inset A). Linear filter models assume a hypothetical impulse response function that determines in part how much a stimulus will persist after offset and whether it is affected by temporal inhibition (Figure 1, Inset B), and the phenomenological response engendered in the visual system is determined by convolving the impulse response function with the physical stimulus wave form. This phenomenological response is termed the sensory response function (Figure 1, Solid Curve).
Insert Figure 1 About Here
The form of the impulse response function is often assumed to be the difference of two gamma functions (e.g. Watson, 1986):
Eq. 1
The first term in Eq. 1 represents an excitatory component, while the second term of Eq. 1 represents an inhibitory component of the response, which tends to sharpen the response and allows it to respond to higher temporal frequencies. The parameter t is the time constant for the excitatory gamma function, and r represents the ratio of the time-constant of the inhibitory component of the response to the excitatory component of the response, and z represents the magnitude of the temporal inhibition component. The parameters n1 and n2 represents the number of stages in each process, and is usually fixed at an integer between 5-10, although the shape of the impulse-response function is relatively unaffected by the precise value chosen. For the present work, n1 was fixed at 9 and n2 was fixed at 10.
Nearly all models assume some form of non-linearity applied to the results of the convolution. The critical assumption of the LST model is that information is extracted not from |a(t)| but from that part of |a(t)|, termed
, that lies outside a sensory threshold, Q. To be precise,
Eq. 2
A fundamental consequence of this formulation is that if the sensory response a(t) is not outside the positive or negative sensory threshold, the stimulus will not be visible to the observer. Although there is evidence against such a high-threshold formulation, the psychometric function relating contrast to performance is quite steep, and thus the sensory threshold serves as an approximation to the true mechanism.
Detection data for stimuli such as high spatial frequencies and color are often modeled by setting z in Eq. 1 to 0, which gives a monotonic impulse response function g(t) as shown as a solid curve in the left panel of Figure 2. An alternative way of representing the same information is by taking the Fourier transform of the impulse response function g(t), which results in a temporal contrast sensitivity function (TCSF). The TCSF plot show the sensitivity of a system to different temporal frequencies. The TCSF corresponding to the solid line in the left panel of Figure 2 is given by the solid line in the right panel. These curves represent a purely sustained response, and give a monotonically-decreasing TCSF, as shown in Figure 2, right panel. This function is termed low-pass since it passes primarily low temporal frequencies.
Insert Figure 2 About Here
Detection data for stimuli containing mainly low spatial frequencies, or stimuli presented on bright backgrounds, often are modeled by z> 0. In this case the impulse response inhibits processing after an initial excitatory response, which results in an inhibitory lobe in the impulse response function g(t) (dashed line in Figure 2, left panel) and a characteristic TCSF with a decrease in sensitivity at low temporal frequencies (dashed curve in Figure 2, right panel). This function is termed band-pass since it passes primarily mid-range temporal frequencies.
The LST model differs from other linear filter models in that it proposes an information extraction mechanism adapted from the information processing literature (e.g. Townsend, 1981; Rumelhart, 1970). The particular formulation assumes that the information extraction rate dI/dt is proportional to the product of the above-threshold sensory response function and the proportion of remaining stimulus information:
= aQ(t) Eq. 3
where aq(t) is the height of the sensory response function above the sensory threshold at time t and I(t) represents the proportion of stimulus information already acquired by time t. The rate parameter cs represents the rate at which task related information is extracted from the sensory response function, such that easier tasks will have a faster information acquisition rate identified by a smaller cs value.
The dependent variable in typical Two-Pulse tasks is contrast threshold. The contrast of the pulses is systematically varied such that a performance criterion is met, such as 82% correct localization or identification. Predictions for contrast threshold data in the Two-Pulse paradigm are made by assuming that proportion correct identification equals the proportion of acquired information. In the experiment, contrast thresholds are measured by varying contrast to produce a performance criterion of 82% correct identification. In the model, the height of the physical contrast function is systematically varied until the model predicts the 82% performance criterion, and this contrast is the predicted contrast threshold. Quantitative predictions are computed via parameter estimation techniques. The parameters of the linear filter, t, r and z, determine the shape of the impulse response function and therefore the range of temporal frequencies passed by the temporal filter. Smaller t values and larger temporal inhibitory components (as determined by the z parameter) imply a relative increase in the sensitivity to higher temporal frequencies.
The sensory nonlinearity q is not the focus of the present study, although it does in part determine the rate at which the positive/positive contrast thresholds decrease as ISI is increased. The cs parameter determines the rate at which task-relevant information is acquired by the visual system. This can also be interpreted as a sensitivity parameter which, for two-pulse data, simply moves the contrast threshold curves up and down.
Experiment 1
The goal of Experiment 1 was to measure the temporal frequencies underlying character localization and identification tasks using the Two-Pulse technique. A character (a 2 or a 5) appeared left or right of fixation on each trial, and participants made both localization (which side was it on) and identification (was it a 2 or a 5) judgments on each trial.
If we find that the localization tasks relies on different temporal frequencies, we would have support for the hypothesis that different visual channels representing ranges of spatial-temporal information, perhaps originating from different classes of visual cortical neurons, contribute to the two tasks.
Methods
The Experiment 1 Methods follow the procedures of similar Two-Pulse experiments (e.g. Ikeda, 1986) to collect contrast thresholds using an adaptive search technique.
Stimuli and Apparatus
Stimulus presentation and response collection took place on a Macintosh II computer and a 14" monochrome monitor. Luminance control and calibration controlled via a VideoAttenuator and the VideoToolbox luminance utilities (Pelli & Zhang, 1991) that provides 4096 gray levels. An oscilloscope and Pin-10 photodiode was used to verify the lack of phosphor persistence from one pulse to the next.
Participants viewed the screen from a distance of 57 cm. The two characters (a 2 or a 5) were rendered in 24 point Times font and subtended a visual angle of 0.57° vertically and 0.39° horizontally. Participants maintained fixation on a centrally located fixation point. The two letters appeared randomly 6 degrees left or right of the fovea on each trial.
Design and Procedure
Two stimulus waveform patterns form the basis of the two-pulse paradigm. In the positive/positive condition a positive-contrast 30 ms pulse of a letter is followed by a variable ISI and a second positive-contrast 30 ms pulse of the same letter. In the positive/negative condition a positive-contrast 30 ms pulse of a letter is followed by a variable ISI and a negative-contrast 30 ms pulse of the same letter. For Experiments 1-3 the contrast was defined as contrast = (Lmax - Background)/(Lmax + Background).
Robust parameter estimates were assured using 6 ISIs between the two pulses. These allow estimation of the amount of interaction between the two pulses, and by inference an estimate of the persistence of the first pulse over time. Combined with the two types of presentations described above and the two tasks, the experiment consisted of 24 conditions.
On each trial the contrast of the pulses was determined by an adaptive search technique (Quest; Watson & Pelli, 1983) that finds the stimulus contrast that affords 82% correct identification over trials. The resulting contrast threshold is converted to sensitivity by taking the logarithm of the inverse of the contrast threshold. Each contrast threshold estimate is based on 96 replications of each condition. Although participants made both localization and identification judgments on each trial, only one response was recorded and separate thresholds were computed for each task at each condition.
Participants
Three participants completed 96 trials per condition. The observers were the author and two naive observers: a female staff member of the Psychology Department and a male advanced undergraduate student. All had normal or corrected-to-normal vision.
Results and Discussion
Figure 3 shows the results from Experiment 1, plotted as log (1/contrast threshold) which is interpreted as contrast sensitivity. The pattern of the data conform to other two-pulse data (e.g. Rashbass, 1970). Consider the Identification data. For the positive/positive condition, as the ISI is increased, performance decreases initially and then increases slightly. The LST model accounts for this decrease with the sensory threshold, q, which causes more area to drop below threshold as the responses engendered by the two pulses separate with longer ISIs. For the positive/negative condition, performance is low for small ISIs, but then increases as ISI increases. At some intermediate ISI (around 30 ms) the positive/negative data cross the positive/positive data and the observer actually becomes more sensitive to the positive/negative stimulus. This crossover results from temporal inhibition in the response and the fact that the two responses engendered from the two pulses sum prior to a rectification. The inhibitory lobe from the first pulse sums with the negative-going excitatory lobe from the negative-contrast second pulse. After rectification this results in more above-threshold area and thus better sensitivity. Appendix A contains a description of the modeling procedures for all Experiments.
Insert Figure 3 About Here
The ISI at which the two curves cross is a qualitative, model-free estimate of the temporal frequencies used in a task. Tasks relying on higher temporal frequencies will produce curves that cross at shorter ISIs. This is clearly the case for the localization data in Figure 3. The crossover point for all three observers occurs at an ISI of 5 ms or less. Although this is a relative measure of the temporal frequencies and suggests that the two tasks rely on different temporal frequencies, the actual range of temporal frequencies requires the LST model and parameter estimation techniques. A direct test of the hypothesis that the localization and identification tasks rely on different temporal frequencies can be made by comparing the impulse-response functions engendered by the two tasks. These are shown in Figure 4.
One interesting aspect of the identification data is that all three observers show clear evidence of temporal inhibition in the estimated impulse-response functions. This suggests that the responses to letters is not mediated by a purely sustained mechanism. These data are not consistent with a model that assumes that the parvo pathway, with its lowpass temporal frequency profile, is the sole contribution to the putative identification area in the temporal lobe (Merigan & Maunsell, 1993).
Insert Figure 4 About Here
Table 1 shows the parameter estimates for the three observers in Experiment 1, while Appendix A contains details of the parameter fits for all Experiments. For all three observers the t, z and r parameters of the impulse response functions systematically differ across tasks, and are consistent within a task across observers. These data support the hypothesis that a localization task relies on higher temporal frequencies than the identification task. This suggests that the two tasks, perhaps mediated by different visual cortical pathways, rely on different sets of visual spatio-temporal channels, originating perhaps in different classes of visual cortical neurons.
Insert Table 1 About Here
One question that remains unanswered by Experiment 1 is whether the differences seen in the temporal frequencies used in localization and character identification tasks extend to foveal presentations and a detection task. Previous work suggests that detection and identification tasks in the fovea rely on the same temporal frequencies. Kulikowski and Tolhurst (1973) reported some differences in the fovea for a flicker detection vs. a pattern detection task for different temporal frequencies, although these findings were later challenged by Derrington & Henning (1980). In addition, Gorea (1986) found no differences in the temporal frequencies used in detection and identification in the fovea, which contradicts our Experiment 1 findings.
One major hypothesis is that the only difference between the fovea and the periphery is the spatial scale at which objects are represented (Thomas, 1987). The bandwidths of the spatial filters do not change, although the foveal stimuli provide more input to higher-spatial-frequency filters due to increased acuity. Since higher spatial frequencies are not well represented by the mechanisms that produce lower temporal frequencies (Watson & Robson, 1981), we might expect that both tasks would rely on slower temporal frequencies as we move into the fovea. Parietal cortex appears to receive much of its input from the periphery, and moving the stimulus to the fovea may cause further shifts in the temporal frequencies used in the two tasks. Experiment 2 was designed to specifically address whether the findings observed in the periphery would also be produced by foveal presentations and a detection task. If the parietal lobe is responsible for preserving spatial relations, as suggested by Ungerleider & Mishkin (1982), we might not expect similar results as Experiment 1, since the task is a detection rather than a localization task.
Experiment 2
Foveal presentations entail only a single location, and thus require the adoption of a two-temporal-interval presentation sequence. Tones delimited two temporal intervals, and the stimulus appeared at fixation in one of the two temporal intervals. On each trial, the observer reported both which interval contained the stimulus as well as whether it was a 2 or a 5. Experiment 2 actually consisted of two replications, done at different background levels. The results did not differ, and thus we discuss both experiments together.
Methods
Stimuli and Apparatus
The stimuli and apparatus were identical to those of Experiment 1. Characters were presented in the fovea, 1.3 degrees below a fixation point. Observers were asked to maintain fixation on the location in which the stimulus would appear, using the fixation point as a reference.
Design and Procedure
A sequence of three brief tones delimited two temporal intervals, each of which contained the stimulus with 50% probability. The participants task on each trial was to indicate which interval contained the stimulus, and whether it was a 2 or a 5.
Participants
The participants consisted of the author, a female staff member and a psychology graduate student.
Results and Discussion
Figure 5 shows the two-pulse data for the three observers collected at two different background levels. Contrary to the findings in Experiment 1 (Figure 3), no differences are observed in the temporal frequencies used in different tasks. All six datasets could be fit by a model that assumes only differences in sensitivity, as expressed by different cs parameters, exist between the two tasks. The impulse-response function parameters (t, r and z ) that characterize the range of temporal frequencies used in each task were identical for the two tasks. The only exception is Observer NQs data at the higher, 20 cd/m2 background level, in which a model that assumed a slightly higher range of temporal frequencies for detection vs. identification produced a slightly better fit to the data. However, these differences are small, and thus we conclude that, in the fovea, detection and identification rely on the same range of temporal frequencies. Table 1 shows the parameter estimates for the three observers in Experiment 2. Appendix A contains the details of the parameter fits for all 6 experiments.
Insert Figure 5 About Here
Two-Pulse Discussion
Experiments 1 and 2 delimit the conditions under which detecting a letter in the periphery or fovea and identifying the letter depend on different temporal frequencies. We observe large differences for peripheral presentations, with localization relying on much higher temporal frequencies than character identification. In the fovea we see no differences in the temporal frequencies used in detection and identification tasks. The estimates of the linear filter parameters allow comparisons across tasks, and we see that identification appears to rely on the same temporal frequencies in the fovea and in the periphery, but localization relies on much higher temporal frequencies in the periphery than detection in the fovea.
These findings leave open two questions that are addressed in Experiments 3-6. First, are these findings somehow specific to the two-pulse paradigm, or would we see the same effects when measuring the temporal frequencies used in each task using the temporal contrast sensitivity paradigm? Second, do these differences in the use of the temporal frequencies depend on the use of different spatial frequencies in different tasks? This second question is addressed in Experiments 4, 5 and 6 by restricting the range of available spatial frequencies by spatially filtering the letters.
Experiments 3-6 adopt the temporal contrast sensitivity paradigm, in which the observer adjusts the contrast of a flickering stimulus until a criterion is met for each task. An advantage of this procedure is that does not require the information processing model components that are necessary to analyze the two-pulse data (cs and q from Eqs 2 and 3). This provides a test of the LST model as applied to the localization and identification tasks, since the impulse-response function parameters are derived from the TCS functions and allow direct comparison with the impulse-response functions derived from Two-Pulse data. Experiments 3-6 also allow comparisons with other flicker studies and provide converging evidence for the conclusions of Experiment 1.
Experiment 3
Experiment 3 measures the temporal frequencies underlying localization and character identification tasks using letters flickered at different temporal frequencies. The observer adjusted the contrast of the stimulus until the letter is either just barely detectable in one of two peripheral locations (localization task) or just barely identifiable (identification task). The resulting contrast threshold for each flicker rate is converted to a contrast sensitivity by taking the log of the reciprocal of the contrast threshold. The resulting TCSF may be directly compared to the examples given in the right panel of Figure 2. If we see differences in shapes of the temporal contrast sensitivity functions for the localization and character identification tasks, we would confirm the Experiment 1 findings.
Methods
Contrast sensitivities to eight temporal frequencies ranging from 2 to 32 Hz were obtained by flickering a letter around a gray background according to a sine-wave weighted by a gaussian envelope. Example temporal functions are shown in Figure 6. The stimuli were presented on a Tektronix 604 oscilloscope with a fast P15 phosphor at a 4 ms (250 Hz) refresh rate. The size of the oscilloscope limited the peripheral presentations to 2.7° from fixation.
Insert Figure 6 About Here
Stimuli and Apparatus
The same apparatus was used for Experiments 3-6. Observers viewed two patches located left and right of a fixation point on the face of a Tektronix 604 oscilloscope. The background luminance was fixed at 20 cd/m2. Stimuli were a 2 and a 5, rendered in the same Times font used in Experiments 1-2. Observers viewed these stimuli from a distance of 86 cm, which resulted in the letters subtending 0.50° vertically and 0.37° horizontally. The center of the letters was located 2.7° or 2.3° left or right of the fixation point. Observer MB completed the study with the letters 2.3° from the fixation point, at which point the stimuli were moved further into the periphery when this became technically feasible. Observer NB completed the study at the 2.7° distance and TB completed the study at both distances.
The display device could not support the high luminance levels required to fit contrast thresholds for identification at 32 Hz, and thus this condition was eliminated for all observers in Experiments 3 and 4.
Design and Procedure
Observers viewed a series of stimuli that appeared randomly left or right of fixation and consisted of either a 2 or a 5. The stimuli appeared about once every second. Observers maintained central fixation and adjusted the contrast of the letters until they met either a criterion of just barely localizable or just barely identifiable required to measure contrast thresholds. When they were satisfied that the current contrast met the criteria for the given task, they pressed a key to continue with the next temporal frequency and task. The order of the tasks and temporal frequencies was randomized.
Participants
Three participants, the author and two graduate students, completed 4 replications of each threshold. The author also completed replications at both the 2.7° and 2.3° distances for comparison with both observers.
Results and Discussion
The data are modeled by computing the Fourier transform of the impulse response function (Eq. 1) and fitting a model that consists only of the impulse response function parameters (t, r and z ) along with a sensitivity parameter s that scales the TCSF vertically. This is the Transfer function G(w):
Eq 4
where w is the temporal frequency flicker rate for a condition, s is a sensitivity parameter and t, r and z are the impulse-response function parameters from Eq. 1 that determine the range of temporal frequencies that underlie a given task. Separate parameter values were fit for the character identification and localization tasks. Often the identification TCSF curves could be fit by assuming no temporal inhibition, in which case z was set to zero, eliminating the second part of Eq. 4.
Figure 7 shows the TCSF data for three observers. The character identification data are characterized by a low-pass function, since the peak sensitivity is at the lowest temporal frequency. Localization appears band-pass; the peak sensitivity occurs for temporal frequencies in the range of 6-8 Hz. The localization data require a model that assumes temporal inhibition (non-zero z parameter) but the character identification data do not. This was justified for the identification data by fitting the full model (which allows z and r to freely vary) to the identification data and comparing the resulting root-mean-squared error to the RMSE from the reduced model (where z was set to zero). Here the RMSE is corrected for the number of free parameters, such that the sum of squared errors is divided by the number of data points minus the number of free parameters. For all of the identification data fits, setting z to zero produced a RMSE that was smaller than the RMSE from the full model. This can happen because the sum of squared error (SSE) is approximately the same in both model fits, but the full model has a smaller denominator, giving it a larger RMSE. Thus, the Identification data does not require a model that assumes temporal inhibition, while the Localization data does.
Direct comparison with the Experiment 1 data are possible by computing the impulse response functions using the t, r and z parameters, which are directly comparable to the impulse response functions derived from the two-pulse data (Figure 4). Note that the assumptions of Eq 1 and the values of t, r and z precisely determine the shape of the impulse response function. These comparisons reveal that the TCSF data replicate the Experiment 1 data: localization relies on higher temporal frequencies than character identification. In general the differences between the two tasks are less extreme than observed 6° in the periphery in Experiment 1, but the current display device only allows peripheral presentations of 2.7° in the periphery. Given that no differences exist in the fovea (Experiment 2, Figure 5), we might expect smaller differences between the two tasks as we move into the fovea.
The temporal inhibition required to model the localization but not identification is a clear demonstration of the qualitative differences between the two tasks. Table 1 contains the estimated impulse response function parameters for Experiment 3, while Appendix A contains details of the parameter fits for all Experiments.
Insert Figure 7 About Here
Experiment 4
One possible explanation for the differences observed in the temporal frequencies used by localization and character identification tasks in Experiments 1 and 3 is that the two tasks rely on different spatial frequencies. Such an explanation cannot readily account for the Experiment 2 data, since the stimuli in Experiment 1 and 2 were identical and yet only Experiment 1 demonstrates a difference between the two tasks. However, a more direct test of this possibility is to restrict the range of available spatial frequencies. The mechanisms that respond well to lower spatial frequency stimuli also tend to preserve higher temporal frequencies (e.g. Robson, 1966) and stimuli above 1.5 cycles per degree tend to produce monophasic rather than biphasic impulse response functions (Ohtani and Ejima, 1988). The numbers used for Experiment 3 were low-pass filtered to restrict the range of available spatial frequencies in Experiment 4. If the differences seen in the temporal domain in Experiments 1 and 3 result from different tasks relying on different spatial frequency bands, then restricting the spatial frequencies should also restrict the temporal frequencies.
The choice for the cutoff spatial frequency was determined according to the following logic. We want to restrict the range of available spatial frequencies. However we also require that the letters are still identifiable as characters, in order to allow comparisons with previous experiments. If the letters no longer appeared character-like, one might argue that the stimuli are somehow processed differently by the higher cortical pathways, leading to different temporal frequencies used in the task. For example, cells along the temporal lobe pathway respond to increasingly complex visual patterns as one moves down the pathway, and these cells may also differ in their temporal frequency response as they combine inputs from earlier cells (Logothesis & Sheinberg, 1996). Thus, filtering the letters beyond legibility may result in a different class of cells responding to the stimuli.
To resolve the tension between restricting the frequencies and maintaining some degree of character legibility, we chose a cutoff frequency of 1.9 cycles per letter. Solomon and Pelli (1994) determined that letters are processed by a spatial filter one octave wide, centered at 3 cycles per letter (around 3.1 cycles per degree in their display). A cutoff frequency of 1.9 cycles per letter is 1/2 octave below 3 cycles per letter, suggesting that the filter mediating letter recognition was still partially activated. This cutoff left the characters barely legible when viewed on the display device, but containing a restricted range of spatial frequencies. Our letters are rather small due to the difficulty of presenting stimuli at a 250 Hz refresh rate, which results in filtered letters that contain spatial frequencies in the range of 1.6 to 5 cycles per degree or 0.61 to 1.9 cycles per letter. Figure 8 shows the stimuli used in Experiments 4 and 5.
Insert Figure 8 About Here
Methods
Stimuli and Apparatus
The Experiment 4 stimuli were low-pass filtered using an ideal filter, which produces a rectangular response in the frequency domain. Although this was a low-pass operation, the small size of the stimulus produces a filtered image with an effective passband of 0.61 to 1.9 cycles per letter. This is important in that it demonstrates that the available spatial frequencies is restricted to a narrow range. To maintain consistency with other TCSF experiments with periodic stimuli, the formula for contrast was changed to contrast = (Lmax - Lmin)/(Lmax + Lmin). This is the Michelson contrast divided by 2.
Design and Procedure
The procedures were identical to those of Experiment 3, except for the use of low-pass, spatially filtered letters.
A control experiment used the same Quest adaptive threshold techniques used in Experiments 1 and 2 to verify that the method of adjustment thresholds were not biased. Separate thresholds were found for each task at each temporal frequency as in Experiments 1 and 2. However, unlike previous experiments, participants made only localization or identification responses on each trial, which were blocked so that a series of trials consisted of only the localization or identification task.
Participants
Two participants, the author and a graduate student, completed 4 replications of each threshold. Observer TB also completed 80 trials at each task and temporal frequency condition in the forced-choice control experiment.
Results and Discussion
Figure 9 shows the TCSF data for two observers for Experiment 4. The display device could not support luminances high enough to obtain a measurement for identification at 32 Hz, and so this point is plotted as an arrow at the maximum allowable contrast. Despite the fact that we have severely limited the range of spatial frequencies to the lowest frequencies that still provide character legibility, we still see differences in the patterns of temporal frequencies used in the localization and character identification tasks. The shapes of the TCSFs mirror those of Experiment 3 (Figure 7). As in Experiment 3, the temporal inhibition required to model the localization but not identification is a clear demonstration of the qualitative differences between the two tasks. Table 1 contains the estimated impulse response function parameters for Experiment 4. When z and r were allowed to freely vary for the identification data, the resulting fit produced a RMSE that was larger than or equivalent to the model obtained by setting z to zero, demonstrating that the additional two free parameters do not help the Identification model fits and that the data do not show evidence of transience.
Insert Figure 9 About Here
An important control on the subjective contrast threshold measurements used in Experiments 3-5 is the use of forced-choice techniques. Derrington and Henning (1980) used forced-choice techniques and failed to replicate earlier findings by Kulikowski and Tolhurst (1973) that dissociated pattern and flicker perception mechanisms. Derrington and Henning (1980), determined that absolute identification performance lies far below the subjective threshold, and that such differences might have contributed to Kulikowski and Tolhursts report that pattern perception relies on different information than flicker perception. To verify that this is not a problem for the current TCSF studies, we measured absolute contrast thresholds for both localization and identification using a force-choice paradigm.
The Quest procedures used in Experiments 1 and 2 were adapted to the TCSF paradigm and low-pass filtered letters of Experiment 4 to verify that the differences between localization and character identification are not simply a result of the use of subjective thresholds. Data from observer TB is shown in Figure 10, and demonstrate the same qualitative pattern observed in Figure 9. The data contain more noise than the subjective threshold technique, but a clear loss in sensitivity at the mid and high temporal frequencies is observed in the character identification data relative to the localization data (the dark dotted line in Figure 10). Thus the differences in the temporal frequencies used in the localization and identification tasks is not a result of the experimental methods employed in Experiments 3-6.
Insert Figure 10 About Here
Based on the differences see between the TCSF of localization and identification in the Experiment 4 data, we conclude that differences in the spatial domain are not sufficient to account for the observed differences across tasks in the temporal domain. In addition, the low-pass nature of temporal frequencies used in character identification and band-pass nature the temporal frequencies used in localization as seen in Figure 7 and Figure 9 are not due to the use of subjective thresholds, since the same conclusions are reached using forced-choice techniques (Figure 10).
Experiment 5
Experiment 4 demonstrates that restricting the range of spatial frequencies to the lowest spatial frequencies still provides evidence that localization relies on higher temporal frequencies than does character identification. This result is consistent with a set of labeled detectors tuned to low spatial frequencies that passes higher temporal frequencies and primarily supports localization, and a set of labeled detectors that gives some response to the low spatial frequencies and passes just slower temporal frequencies to support character identification. One might ask whether these differences still exist when the range of spatial frequencies is restricted to just higher spatial frequencies. Under these conditions we might no longer see evidence of the contribution of the fast detectors tuned to just lower spatial frequencies.
Experiment 5 used band-pass filtered letters that restricted the range of spatial frequencies to an octave wide pass region centered on 3 cycles per letter. The filter included the spatial frequencies in the range of 6 to 9.5 cycles per degree or 2.3 to 3.6 cycles per letter. Figure 8 shows the stimuli used in Experiment 5.
Methods
The methods and observers were identical to Experiment 4, except that the stimuli were spatially band-pass filtered rather than low-pass filtered. This filtering reduced the power of the stimulus, and as a result the display device could not support the high luminance levels required to fit contrast thresholds at 24 and 32 Hz. These conditions were eliminated for both observers in Experiment 5.
Results and Discussion
Figure 11 shows the results from Experiment 5 for two observers. Unlike the data from Experiments 1, 3 and 4, the two tasks appear to rely on the same temporal frequencies. Table 1 shows the impulse response function parameters, which are almost identical across tasks for the two observers. The data show a peak sensitivity at the lowest measured flicker rate, implying that the mechanisms that process these stimuli were temporally lowpass and did not include temporal inhibition. This finding is consistent with other two-pulse studies, that show low-pass characteristics for stimuli containing just higher spatial frequencies (Ikeda, 1986).
Insert Figure 11 About Here
The different TCSF functions for localization and identification seen in the Experiment 5 data are consistent with the notion that the differences seen in localization and identification tasks derive from contributions of different sets of visual spatio-temporal channels, and are not a result of inherent differences in the tasks. When the inputs from the fast labeled detectors are available from stimuli containing lower spatial frequencies (as in Experiments 1, 3 and 4), localization but not character identification can take advantage of this information. If not, both tasks must rely on the inputs from the labeled detectors that are sensitive to the higher spatial frequencies, which only pass slower temporal frequencies and thus give the sustained TCSF curves observed in Figure 11.
Experiment 6
Contrasting Experiments 1 and 2, we see that in the periphery, localization relies on higher temporal frequencies than identification, but in the fovea, detection relies on the same temporal frequencies as identification. This comparison confounds peripheral location (periphery vs. fovea) with task (localization vs. detection). To disambiguate these two factors, Experiment 6 moves the low-pass filtered stimuli of Experiment 4 into the near-fovea, centered 0.35° left or right of the fixation point. This is as close as the letters could be moved without introducing lateral interference from the fixation point. Stimuli, procedures and observers were identical to those of Experiment 4. If we can attribute the differences between localization and identification tasks seen in Experiments 1, 3 and 4 to the peripheral presentations, then we should see no differences between the two tasks when the stimuli are moved into the fovea. However, if the localization task is processed differently from the two-temporal-interval detection task of Experiment 2, then the differences seen in the periphery in Experiments 1, 3 and 4 will persist for the foveal presentations.
Methods
The methods and observers were identical to Experiment 4, except that the center of the filtered letter appeared 0.35° from the fovea. The near edge of the letter appeared 0.12° from the fixation point and the far edge appeared 0.47° from the fixation point.
Results and Discussion
Figure 12 shows the results from Experiment 6 for two observers. Consistent with Experiments 1, 3 and 4, we see small but consistent differences between the two conditions for both observers. Both conditions required the use of temporal inhibition for both observers, which is consistent with Experiment 1.
Insert Figure 12 About Here
The differences in the temporal frequencies used in each task are much smaller than observed in Experiments 1, 3 and 4, and one might consider whether a model with a single set of impulse response function parameters could account for both localization and identification data. In particular, we see that Observer TB has values of t that are quite similar, and Localization actually gives a larger value than Identification, suggesting that slower temporal frequencies underlie Localization. However, the temporal frequencies underlying a given task are determined by all three impulse response function parameters, which can trade off in various ways. In particular, large values of z can introduce enough temporal inhibition such that the entire impulse response function might pass higher temporal frequencies despite a larger value of t. We addressed the issue of whether a single set of temporal frequency parameters could account for both tasks in two ways.
Nested Hypothesis Testing
In the first approach, we used a nested hypothesis testing procedure that fit the Identification data using the temporal parameters obtained from the localization data. Since only the s sensitivity parameter was allowed to vary, this amounts to vertically shifting the localization curve downward. When constraining the identification fit with the temporal parameters from the localization data, the RMSE of the best fit for both observers was substantially larger than the fit assuming separate t, r and z parameters for Identification. This results primarily from the models prediction of much worse sensitivity at the low temporal frequencies than was obtained in the identification data. The full model (with separately estimated parameters for Identification) produces a significantly better fit as revealed by analysis of variance tests. For Observer TB, F(3,3) = 9.75; p<0.05; for Observer NB, F(3,3) = 84.7, p< 0.05. These analyses support the conclusion that localization and identification rely on different temporal frequencies, despite the near-foveal presentations.
Monte Carlo Simulations
A second approach to the question of whether the two tasks provide different temporal frequency estimates comes from estimates of the variability of the t, r and z parameters. These parameters trade off to some degree, and together determine the range of spatial frequencies underlying a task. A distribution of impulse response function parameters was created by using the variability estimates of the thresholds that come from the standard error of the mean (SEM) for each threshold. The procedure went as follows. From the threshold estimates and the SEM for each threshold, a new distribution of threshold estimates was constructed by selecting from a normal distribution with mean equal to each threshold and standard deviation equal to the SEM for that threshold. This created a new set of 8 thresholds for each task. The parameter fitting procedures were then applied to this new set of thresholds and values of t, r, z and s were recorded. This procedure was repeated 1000 times for each task. Plotting the obtained values of t, r, and z in a three-dimensional plot, as shown in Figure 13, demonstrates that while the two tasks provide overlap in some dimensions (e.g. the t dimension) the two tasks clearly provide estimates of t, r, and z that lie in different regions of parameter space. Thus we conclude that Observer TB demonstrates evidence of different temporal frequencies used in the localization and identification tasks.
Insert Figure 13 About Here
TCSF Discussion
Experiments 3-6 confirm the findings from Experiment 1, and define the conditions under which detecting a letter in one of two peripheral locations relies on higher temporal frequencies than identifying the letter. If lower spatial frequencies are present in the stimulus, the localization task can rely on higher temporal frequencies than the identification task. If the letter includes just higher spatial frequencies, the localization and identification tasks appear to rely on the same temporal frequencies.
General Discussion
Converging evidence from the two-pulse and TCSF paradigms demonstrate that, under certain circumstances, localization relies on higher temporal frequencies than identification. With peripheral presentations, localization can take advantage of higher temporal frequencies than those used in character identification, while in foveal presentations detection appears to rely on the same temporal frequencies as character identification. As the stimulus moves into the fovea from the periphery, localization relies on slower and slower and lower temporal frequencies, while identification appears to rely on the same temporal frequencies. These peripheral differences persist in different tasks (two-pulse and TCSF), in spatially-filtered letters that contain lower spatial frequencies, and in localization tasks in the near-fovea. The detection results in the fovea are consistent with those reported by Derrington & Henning (1980), who found no differences in the temporal frequencies underlying detection and identification using forced-choice techniques.
There are two other possible explanations for the differences in temporal frequencies seen in localization and identification (Experiment 1, Figure 3) that need to be ruled out. First, since the stimuli in Experiments 1 and 2 were the same size, cortical magnification might play a role in shifts in the localization datas temporal frequency parameters as the letter was moved into the periphery. Second, despite the fact that we severely restricted the range of spatial frequencies available in Experiments 4 and 6, a range of spatial frequencies was still available and the mechanisms responsible for localization may have been able to take advantage of the lower spatial frequencies. These two possibilities were investigated in control experiments performed by Observer TB. Except for differences described below, the methods and equipment were identical to those used to measure the Experiment 1 data.
Control Experiments
In order to roughly determine the effects of cortical magnification and the reduced acuity in the periphery, the size of the peripheral target used in Experiment 1 was doubled and two-pulse data was collected. As shown in Figure 14, the two-pulse data are quite similar to those of Experiment 1, suggesting that cortical magnification cannot account for the differences between Experiments 1 and 2. F-tests similar to those used in Experiment 6 demonstrate that the identification data are not well-fit if the impulse response function parameters t, z and r are constrained to match those estimated from the localization data (F(3,7) = 6.50, p< 0.05). Thus the differences between localization and identification cannot result solely from cortical magnification effects.
Insert Figure 14 About Here
A second possible explanation for the differences seen peripherally for localization and identification is that the two tasks rely on different spatial frequencies. This possibility was addressed for letters using spatial filtering techniques in Experiments 4 and 6, and even though the spatial frequencies were severely restricted, differences between localization and identification persisted. However, in these experiments we preserved character legibility, which required leaving a range of available spatial frequencies. A sine-wave grating contains only a single spatial frequency, and in a control experiment the digit in Experiment 1 was replaced by a 1.5 cycle per degree (CPD) sine wave grating oriented obliquely either left or right. The grating was placed in a gaussian window of approximately 2.7° in extent. On each trial the observer indicated both whether the grating appeared left or right of fixation and whether it was oriented to the left or right. As shown in Figure 15, the data continue to show that localization depends upon higher temporal frequencies than identification. The differences between the two tasks were confirmed using hierarchical model fitting techniques (F(3,7) = 7.70, p< 0.05). This rules out the possibility that the differences in the temporal domain are only due to differences in the spatial domain.
Insert Figure 15 About Here
Although it is difficult to draw conclusions about the anatomy based on behavioral data, the present results can be used to suggest physiological experiments that do have the ability to address neurophysiological questions. Functional differences in the temporal domain appear as early as area V1: Hawken et al. (1996) report that direction selective cells in area V1 all maintain higher temporal frequencies, while non-direction selective cells are mixed in the range of temporal frequencies that cause the cell to fire. These early differences may extend to later areas, since MT cells are overwhelmingly selective for direction (Felleman & Van Essen, 1987). Anatomical studies suggest that the MT and V4 areas may selectively influence the parietal and temporal lobe pathways respectively, although substantial mixing does occur (Merigan & Maunsell, 1993). In addition, although it appears that the direction selective cells from layers 4B and 6 of V1 project to area MT, the direction selectivity could be generated in area MT from contributions from non-direction selective neurons (Hawken, personal communication). Determining exactly when the dorsal or ventral pathways exhibit a loss in high temporal frequencies requires a systematic measurement of the temporal frequency information expressed at various points along the two pathways leading to the parietal and temporal lobe pathways. However, the present results suggest that a difference will be found, and the data will demonstrate both the architecture of the visual system as well as how it has adapted to take advantage of the stimulus attributes most informative for a particular task.
Despite the fact we have ruled spatial frequency effects as the only explanations for the differences in the temporal domain between localization and identification, spatial frequency may still contribute to this difference. If this were indeed the case, this would have implications for any task, such an illusory conjunction task (e.g. Treisman, 1996), in which location and identification appear to have different time-courses. Such differences may be in part due to a reliance on different spatio-temporal channels when processing location or identification information. One way to address the issue of whether localization can take advantage of lower spatial frequencies than identification would be to adopt the noise-masking paradigm of Solomon and Pelli (1994). They used ideal observer models in condjuction with low-pass or high-pass filtered random noise to quantify the range of spatial frequencies available in a stimulus. Although Solomon and Pelli used only a letter identification task, such procedures could easily be implemented for a localization task, in which the job of the ideal observer model is to determine whether any letter appears left or right of fixation. Interestingly, such a model would make identical predictions for the two-spatial-alternative task used for localization in Experiment 1 and the two-temporal-alternative task used for detection in Experiment 2. Thus it would have difficulty in predicting the shift in temporal frequencies observed between Experiments 1 and 2 for localization and detection. The advantage of this technique is that it can be used to measure the spatial frequencies used by human observers when performing localization, detection or identification tasks. We are currently pursuing such experiments to quantify the exact spatio-temporal characteristics of letter identification and localization.
References
Busey, T.A. (in press). Temporal inhibition in character identification. Accepted by Perception and Psychophysics.
Busey, T.A. & Loftus, G.R. (1994). Sensory and cognitive components of visual information acquisition. Psychological Review , 101, 446-469.
Corbetta, M., Miezin, F. M., Dobmeyer, S., Shulman, G. L., & Petersen, S.E. (1991). Selective and divided attention during visual discriminations of shape, color and speed: Functional anatomy by positron emission tomography. Journal of Neuroscience, 11, 2382-2402.
Corbetta, M., Miezin, F. M., Shulman, G. L., & Petersen, S.E. (1993). A PET study of visuospatial attention. Journal of Neuroscience, 13, 1202-1226.
Derrington, A. M., & Henning, G. B. (1980). Pattern discrimination with flickering stimuli. Vision Research, 21, 597-602.
Felleman, D. J. & Van Essen, D. C. (1987). Receptive field properties of neurons in area V3 of macaque monkey extrastriate cortex. Journal of Neurophysiology, 57, 889-920.
Gorea, A. (1986). Temporal integration characteristics in spatial frequency identification. Vision Research, 26, 511-515.
Hawken, M. J., Shapley, R.M., & Grosof, D. H. (1996). Temporal-frequency selectivity in monkey visual cortex. Visual Neuroscience, 13, 477-492.
Haxby, J. V., Grady, C. L., Horwitz, B., Ungerleider, L. G., Mishkin, M., et al. (1991). Dissociation of object and spatial vision processing pathways in human extrastriate cortex. Proceedings of The National Academy of Science, USA, 88, 1621-1625.
Ikeda, M. (1965). Temporal summation of positive and negative flashes in the visual system. Journal of the Optical Society of America, 55(11), 1527-1534.
Ikeda, M. (1986). Temporal impulse response. Vision Research, 26, 1431-1440.
Kulikowski, J.J., & Tolhurst, D.J. (1973). Psychophysical evidence for sustained and transient mechanisms in human vision. Journal of Physiology, 232, 149-163.
Logothesis, N. K. & Sheinberg, D. L (1996). Visual object recognition. Annual Review of Neuroscience, 19, 577-621.
McIntosh, A. R., Grady, C.L., Ungerleider, L.G., Haxby, J.V., Rapoport, S.I., & Horwitz, B. (1994). Network analysis of cortical visual pathways mapped with PET. The Journal of Neuroscience, 14, 655-666.
Merigan, W. H., & Maunsell, J. H. R. (1993). How parallel are the primate visual pathways? Annual Review of Neuroscience, 16, 369-402.
Motter, B. C. &. Mountcastle, V. B. (1981). The functional properties of the light-sensitive neurons of the posterior parietal cortex studied in waking monkeys: foveal sparing and opponent vector organization. The Journal of Neuroscience, 1, 3-26.
Ohtani, Y., & Ejima, Y. (1988). Relation between flicker and two-pulse sensitivities for sinusoidal gratings. Vision Research, 28, 145-156.
Pelli, D. G. and Zhang, L. (1991) Accurate control of contrast on microcomputer displays. Vision Research, 31, 1337-1350.
Rashbass, C. (1970). The visibility of transient changes of luminance. Journal of Physiology, 210, 165-186.
Robson, J. G. (1966). Spatial and temporal contrast sensitivity functions of the visual system. Journal of the Optical Society of America, 56, 1141-1142.
Solomon, J. &. Pelli, D. (1994). The visual filter mediating letter identification. Nature, 369, 395-397.
Thomas, J. P. (1987). Effect of eccentricity on the relationship between detection and identification. Journal of the Optical Society of America A, 4, 1599-1605.
Treisman, A. (1996). The binding problem. Current Opinion in Neurobiology, 6,171-178.
Ungerleider, L. G. &. Mishkin, M. (1982). Two cortical visual systems. In D. Ingle Goodale, M. & R. Mansfield (Ed.), Analysis of Visual Behavior (pp. 549-586). Cambridge, MA: MIT Press.
Watson, A. B., Ahumada, A.J. & Farrell, J.E. (1986). Window of visibility: A psychophysical theory of fidelity in time-sampled visual motion displays. Journal of the Optical Society of America A, 3, 300-307.
Watson, A. B. (1986). Temporal sensitivity. In K. R. Boff, L. Kaufman, and J.P. Thomas (Eds.), Handbook of Perception and Human Performance (Vol I) New York: Wiley.
Watson, A. B., & Robson, J. G. (1981). Discrimination at threshold: Labeled detectors in human vision. Vision Research, 21, 1115-1122.
Watson, A.B., & Pelli, D.G. (1983). QUEST: A Bayesian adaptive psychometric method. Perception & Psychophysics, 33, 113-120.
Appendix A- Parameter Fitting Procedures
All parameter fits were performed using gradient descent minimization techniques. In all fits, n1 was fixed to 9 and n2 was set to 10. Below we detail the model fits that were required to account for the data from each experiment. When computing RMSE, we used the formula:
Eq A1
where SSE is the sum of squared errors, n is the number of datapoints and p is the number of free parameters in the model fit.
Experiment 1
None of the three observers could be fit by a model that assumed that localization and identification tasks differ only in the rate at which task-specific information is acquired. Thus for each observer we estimated impulse response function (t, r and z) and information extraction rate parameters (cs) for each task.
Experiment 2
For all three observers at both background luminances, we found that we could fit both detection and identification data with a common set of impulse-response function parameters (t, r and z) and allowing separate information extraction rate parameters (cs) for each task. The exception was Observer NQ at the brighter 20 cd/m2 background level, in which a model that assumed separate impulse response function parameters as well as a separate information extraction rate parameter gave a slightly better fit to her data.
Experiments 3 and 4
The localization data demonstrates clear evidence of a fall-off at slower temporal frequencies, while the identification data did not. Thus we fit estimated separate impulse response function parameters (t, r and z) and sensitivity parameters (s) for the two tasks, setting z to zero for the identification task (which makes r irrelevant). The identification data were also fit by letting r and z to freely vary, but the resulting RMSEs for these fits were all either larger than the reduced model or virtually identical, as shown in Table 1 under the heading Transient RMSE. Thus, the introduction of transience via the r and z parameters does not help the model fits to the Identification data, and justifies setting z to zero.
Experiment 5
Neither observer demonstrated evidence of temporal inhibition in this Experiment, and thus we estimated impulse response function (t) and sensitivity parameters (s) for each task. The estimated values of t for the two tasks were virtually identical for the two observers, and thus a model that assumes a common t for both tasks for each observer will account for the data.
Experiment 6
Both observers demonstrated evidence of temporal inhibition in this Experiment for both tasks, and thus we estimated impulse response function (t, r and z) and sensitivity parameters (s) for each task. We also fit a version of the model that fixes the impulse response function parameters (t, r and z) from the localization data and tries to fit the identification data by varying the parameters (s). This produces the fits labeled RMSE-Loc. IRF params. These fits are quite poor, as evidenced by the large increase in RMSE and hierarchical model testing
Table 1. Parameter fits for all Experiments. See Appendix A for details of parameter fits.

Figures
.
|
|
||
|
Figure 1. Theoretical components of the linear filter model of character identification. The physical stimulus is characterized as contrast over time (Inset A), which is convolved with a hypothetical impulse response function (Inset B) that determines the amount of persistence or equivalently the range of temporal frequencies available in the response to the stimulus. The results of the convolution produce the sensory response function (solid curve) that is assumed to represent the perceptual time-course of the response in the visual system. In the two-pulse task, each pulse produces a response (dashed curves) and the interactions between the responses as the ISI is varied is a measure of the temporal response properties of the system that processes location or identity information. |
|
|
|
|
Figure 2. The temporal frequencies underlying a task may be characterized by an impulse response function (left panel) that characterizes the time course of the perceptual response engendered by a stimulus, or by the temporal contrast sensitivity function (right panel), that characterizes the fidelity by which the pathways subserving a given task pass different temporal frequencies. The TCSF plots are the Fourier transform of the impulse response functions into frequency space. High spatial frequency stimuli tend to elicit monophasic impulse response functions, which have no falloff at low temporal frequencies in the TCSF plot. Stimuli containing low spatial frequencies tend to elicit biphasic impulse response functions that contain an inhibitory lobe. Parameters used: Monophasic: {t = 4.38, z = 0} Biphasic: {t = 3.58, r = 2.0, z = 0.39} |
|
|
![]() |
|
Figure 3. Two-Pulse Data from Experiment 1 for 3 observers 6° in the periphery. For localization data, the positive/positive and positive/negative data cross at much a much shorter ISIs (0-5 ms) than the identification data (15-20 ms). These data require a model that assumes that the localization task relies on higher temporal frequencies than the identification task. |
|
![]() |
|
Figure 4. Estimated impulse response functions for Localization and Identification Data from Experiment 1. These demonstrate that the localization task relies on higher temporal frequencies than identification, and allow comparisons across tasks. |
|
![]() |
|
Figure 5. Data for 3 observers at two background luminances for Experiment 2, measured in the fovea. All observers demonstrate no differences across tasks, suggesting that the two tasks rely on the same temporal frequencies. The data were well-fit by a model that assumed a single set of impulse-response parameters (t, r and z ) for both tasks. The exception is observer NQs data at the higher 20 cd/m2 background luminance, which could be fit slightly better by a model with separate linear filter parameters. |
|
|
|
|
Figure 6. Example temporal wave forms used in Experiments 3-5. Observers scaled the contrast of these wave forms to produce an estimated contrast threshold. |
|
|
|
|
|
|
|
|
Figure 7. Temporal contrast sensitivity functions (TCSFs) for three observers for Experiment 3 measured 2.7° or 2.3° in the periphery. The localization data are bandpass, while the character identification data are lowpass. Observer TB measured contrast thresholds at 2.7° and 2.3° in the periphery, demonstrating that as the stimulus moves into the fovea, the location detection task becomes more low-pass. The insets show the recovered impulse response parameters, which can be compared with those from Experiment 1 (Figure 4) to demonstrate that the two paradigms generate the same pattern of data: localization relies on higher temporal frequencies than identification. Note that the 32 Hz Identification data could not be measured due to limitations of the display device, and the symbol with the arrow indicates that the true threshold is somewhere below the marked limit. |
|
|
|
|
Figure 8. Stimuli used in Experiments 4 and 6 (top panels) and Experiment 5 (lower panels). The stimuli are enlarged somewhat, which results in the introduction of spurious high frequencies that are not present in the actual displays. |
|
|
|
Figure 9. Temporal contrast sensitivity functions (TCSFs) for low-pass filtered letters for Experiment 4 measured 2.7° in the periphery. Localization might rely on lower spatial frequencies than identification in Experiment 3, and this might result in the Figure 7 data. The letters used in the Figure 9 data were lowpass filtered to restrict the range of spatial frequencies. Despite this, the data replicate the Experiments 1 and 3 data, demonstrating that localization relies on higher temporal frequencies than identification. |
|
|
|
Figure 10. Temporal contrast sensitivity function for Observer TB using the QUEST threshold-finding procedures from Experiments 1 and 2. Although the data are somewhat noisier than the subjective task data, these objective procedures replicate the Figure 9 data. The dark line is the localization data shifted vertically, demonstrating the falloff of identification performance at higher temporal frequencies. Thus, the subjective techniques used in Experiments 3-6 do not affect the conclusions. |
|
![]() |
|
Figure 11. Temporal contrast sensitivity functions (TCSFs) for bandpass-filtered letters for Experiment 5. When the spatial frequencies are restricted to higher spatial frequencies, no differences are observed in the temporal frequencies underlying the localization and identification tasks. |
|
![]() |
|
Figure 12. Temporal contrast sensitivity functions (TCSFs) for lowpass-filtered letters presented in the near-fovea in Experiment 6. Although differences across tasks are not as pronounced as in Experiments 3 and 4, localization and identification rely on different temporal frequencies for near-foveal presentations. Observer NBs peak frequency differs in the two tasks (8 Hz for Localization vs. 4 Hz for Identification). Observer TB has the same peak frequency for both tasks, but less fall-off at lower temporal frequencies for the Identification task, as can be seen by the vertical translation of the Localization data (thick dashed curve in the Left Panel). |
|
|
|
Figure 13. Parameter space for Localization and Identification tasks for Observer TB for Experiment 1. See text for explanation. |
|
|
|
Figure 14. Two-pulse data for an larger letter presented peripherally. The data are similar to that of Figure 3, demonstrating that cortical magnification issues cannot account for the differences seen between Experiments 1 and 2. Hierarchical model testing verifies the reliability of this difference. |
|
|
|
Figure 15. Two-pulse data for an oriented 1.5 CPD sine-wave grating. Despite containing only a single spatial frequency, the localization data still demonstrate higher temporal frequencies than the identification data. Hierarchical model fitting confirms this difference as reliable. |