A neural basis for real-world visual search in human occipitotemporal cortex Marius V. Peelena,b,c,1 and Sabine Kastnerb,c a Center for Mind/Brain Sciences, University of Trento, 38068 Rovereto, Italy; and bDepartment of Psychology and cPrinceton Neuroscience Institute, Princeton University, Princeton, NJ 08544

Mammals are highly skilled in rapidly detecting objects in cluttered natural environments, a skill necessary for survival. What are the neural mechanisms mediating detection of objects in natural scenes? Here, we use human brain imaging to address the role of top-down preparatory processes in the detection of familiar object categories in real-world environments. Brain activity was measured while participants were preparing to detect highly variable depictions of people or cars in natural scenes that were new to the participants. The preparation to detect objects of the target category, in the absence of visual input, evoked activity patterns in visual cortex that resembled the response to actual exemplars of the target category. Importantly, the selectivity of multivoxel preparatory activity patterns in object-selective cortex (OSC) predicted target detection performance. By contrast, preparatory activity in early visual cortex (V1) was negatively related to search performance. Additional behavioral results suggested that the dissociation between OSC and V1 reflected the use of different search strategies, linking OSC preparatory activity to relatively abstract search preparation and V1 to more specific imagery-like preparation. Finally, whole-brain searchlight analyses revealed that, in addition to OSC, response patterns in medial prefrontal cortex distinguished the target categories based on the search cues alone, suggesting that this region may constitute a top-down source of preparatory activity observed in visual cortex. These results indicate that in naturalistic situations, when the precise visual characteristics of target objects are not known in advance, preparatory activity at higher levels of the visual hierarchy selectively mediates visual search. attention

| categorization | natural vision | object detection

T

he selection of complex stimuli, such as objects, from cluttered environments presents a complicated problem: during real-world visual search, objects are present at unspecified locations and may have an almost infinite number of possible visual appearances. Despite these difficulties, the detection of highly familiar object categories (e.g., people or cars) in natural scenes has been shown to be remarkably fast (1). Surprisingly, such detection is even possible in the near absence of spatial attention, by stark contrast to the detection of simple feature conjunctions (e.g., discriminating “T” from “L”) under comparable task conditions (2). On the basis of these and other findings, it has been suggested that mechanisms related to the visual search for familiar objects in natural scenes may differ from those related to the visual search for simple shapes in artificial scenes, such as typically studied in the laboratory (3). In the present study, we investigated the brain mechanisms mediating the selection of objects in real-world scenes. Specifically, we addressed the role and nature of top-down preparatory processes in the detection of familiar object categories in daily life environments. Theoretical accounts of visual search hold that top-down preparation is an important component of efficient target detection (4, 5). Indeed, most items in a complex scene are not fully processed without top-down attention, as shown, for example, in the change blindness paradigm (6). At a neural level, it has been suggested that search preparation may be mediated by activity of www.pnas.org/cgi/doi/10.1073/pnas.1101042108

neurons that are directly involved in the perceptual discrimination of targets from distractors (7). Evidence for this hypothesis comes from studies in which participants detected a specific shape at a small number of possible locations (8–10). Neurons in monkey inferotemporal (IT) cortex that were activated by a specified target shape also responded during the search preparation phase (9). In humans, several fMRI studies have demonstrated that perceptual expectation of a centrally presented isolated object gives rise to object-specific preparatory activity in visual cortex (10–12), and that such activity facilitates subsequent processing of the anticipated object. These findings raise the possibility that similar anticipatory mechanisms may mediate visual search. However, in these previous functional magnetic resonance imaging (fMRI) studies, objects were presented centrally and the shape of the anticipated objects (e.g., a full-front view of a face) was known in advance, making it likely that these effects reflected explicit visual imagery during the preparation phase. By contrast, during visual search in natural vision, targets are located at unspecified locations in cluttered scenes, and are experienced in unexpected and novel viewing conditions. In other words, in real-world visual search the precise visual characteristics and location of target objects are not known in advance and can thus not be precisely imagined. Therefore, preparation mechanisms mediating the detection of target objects in real-world environments remain largely unknown. In the present study we measured brain activity using fMRI while participants detected (and prepared to detect) people or cars in natural scenes. We found that search preparation by itself evoked category-specific activity patterns in visual cortex. Importantly, category-specific preparatory activity in object-selective cortex (OSC) greatly facilitated the detection of objects, whereas such activity in early visual cortex (V1) hindered object detection. Behavioral results indicated that this dissociation might have reflected the use of different search strategies, with a more abstract search strategy being more effective than a specific imagery-like strategy. Finally, whole-brain searchlight analyses showed that response patterns in medial prefrontal cortex, like those in OSC, both distinguished the target categories present in the scenes and the categories indicated by the search cues alone, suggesting that this region may constitute the top-down source of the category-selective preparatory activity observed in OSC. Results Fourteen participants performed a scene categorization task involving the detection of people or cars in a large set of natural scenes that were new to the participants. The set consisted of heterogeneous photographs of city- and landscapes with a subset containing people or cars that appeared in diverse locations,

Author contributions: M.V.P. and S.K. designed research; M.V.P. performed research; M.V.P. analyzed data; and M.V.P. and S.K. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. 1

To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1101042108/-/DCSupplemental.

PNAS Early Edition | 1 of 6

NEUROSCIENCE

Edited by Robert Desimone, Massachusetts Institute of Technology, Cambridge, MA, and approved June 15, 2011 (received for review January 20, 2011)

sizes, shapes, and viewpoints (Fig. S1). On each trial, a symbolic cue indicated the target category. On 2 out of 3 trials, the cue was followed, after a delay, by a briefly (100 ms) presented and subsequently masked scene. Critically, on 1 of 3 trials, no scene was presented, allowing for the isolation of preparatory activity in the absence of visual input (Fig. 1A). Data were analyzed using multivoxel pattern analysis, a technique that has been successfully used to reveal object category coding in extrastriate visual cortex (13, 14). In a first analysis, responses evoked by the scene stimuli were modeled to probe (i) whether the people and cars embedded in the scenes were processed up to the category level despite the short presentation duration, diverse visual appearance, and cluttered background, and (ii) whether this processing was modulated by the task relevance of a given category. To measure categorical processing of objects, activity patterns evoked by the natural scenes were correlated with activity patterns evoked by isolated pictures of people and cars, presented in a separate category localizer experiment (Materials and Methods). Within OSC (Fig. 1B), defined based on its preference for intact compared with scrambled objects (Materials and Methods), activity patterns to the scenes containing people or cars correlated more strongly with the matching than the nonmatching categories (people or cars, respectively) from the category localizer (t13 = 5.7, P < 0.0001; Fig. 2). Furthermore, this “category information” was modulated by task relevance, with greater in-

Fig. 1. Experimental design and analytical approach. (A) Schematic of trial structure. Participants were instructed to detect either people or cars in briefly presented natural scenes. A symbolic cue (a letter or number) indicated the target category on a trial-by-trial basis. On a proportion of trials (33%) the cue, but no scene stimulus, was presented, to isolate responses to the cue itself (“cue-only trial”). (B) OSC was localized in each individual participant by contrasting activity to intact versus scrambled objects, presented in a separate experiment. Voxels activated by this contrast were selected for multivoxel pattern analysis. (C) Multivoxel response patterns to the cue-only trials (people and cars cues) in the main experiment were correlated with response patterns evoked by exemplar pictures of people and cars, presented without other visual context in a separate category localizer experiment. The between-category correlations (diagonal comparisons) were subtracted from the within-category correlations (horizontal comparisons) to estimate the category-specificity of cue-related activity.

2 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1101042108

formation regarding task-relevant than task-irrelevant objects (t13 = 2.6, P < 0.05; Fig. 2). This result indicates that the processing of the scenes was biased toward the task-relevant category, confirming a previous report (14). By contrast to OSC, no information about the two types of scenes was represented in the voxel patterns of early visual cortex (Materials and Methods): V1 (t13 = −0.3; Fig. 2), V2/V3 (t13 = 0.3, P = 0.8; Fig. 2). There was also no modulation of category information as a function of task relevance in V1 (t13 = −1.4; Fig. 2), V2/V3 (t13 = 1.3, P = 0.2; Fig. 2), showing that category-based attentional modulation was specific to OSC. Next, we analyzed the trials during which no scene stimulus followed the cue presentation, modeling responses evoked by the symbolic cues in the absence of visual input. These analyses were performed to investigate the neural correlates of different preparatory states. Response patterns evoked by the “detect people” and “detect cars” cues were compared with response patterns evoked by actual pictures of people and cars from the independent category localizer (Fig. 1C). A greater similarity between response patterns of matching categories than those of nonmatching categories would provide evidence for contentspecific preparatory activity in visual cortex. Confirming this hypothesis, OSC showed category-specific activity patterns in response to the cues (t13 = 2.0, P = 0.06; Fig. 3A), indicating that the symbolic search instruction alone activated category-selective populations in OSC. Unexpectedly, a similar effect was found in V1 (t13 = 2.9, P < 0.05; Fig. 3A), even though this region did not discriminate between the scenes containing cars and people (Fig. 2). No significant category-specific preparatory activity was found in V2/V3 (Fig. 3A), or in face-, body-, and scene-selective regions of visual cortex (P > 0.4, for all tests; Fig S3A). Finally, although multivoxel activity patterns discriminated the two cue types in OSC and V1, the overall amplitude of activity in these and other ROIs did not (P > 0.3, for all tests; Fig. S4). Subsequently, we tested whether the content specificity (i.e., being more “car-like” or more “person-like”) of the preparatory activity on cue-only trials in V1 and OSC was related, across participants, to the speed and accuracy with which targets were detected on the cue-plus-scene trials. Importantly, the category specificity of cue-related responses in OSC was positively correlated with accuracy [accuracy (acc): r = 0.70, P < 0.01; Fig. 4A] and negatively correlated with response time [(RT): r = −0.89,

Fig. 2. Multivoxel category information in visual cortex. Activity patterns to the scenes containing people or cars were correlated with activity patterns to the isolated pictures of people and cars presented in the category localizer. Category information was defined as the mean difference between matching (e.g., people–people) and nonmatching (e.g., people–cars) correlations. OSC activity patterns contained significant information about the object categories embedded in the natural scenes, both for task-relevant objects (P < 0.00001) and task-irrelevant objects (P < 0.05). Significantly more information was observed for task-relevant (target) objects than for taskirrelevant objects (P < 0.05). No significant category information was present in V1 or V2/V3 activity patterns, and there was no significant modulation of task-relevance in these ROIs. See Fig. S2 for results in face-, body-, and sceneselective regions of interest.

Peelen and Kastner

P < 0.0001; Fig. 4B], indicating that content-specific preparatory activity in OSC was directly related to the speed and accuracy with which objects were detected. By contrast, the category specificity of cue-related responses in V1 was negatively correlated with accuracy (acc: r = −0.53, P < 0.05; Fig. 4A) and positively correlated with RT (r = 0.19, P = 0.5; Fig. 4B), suggesting that in early visual regions content-specific preparatory activity adversely affected object detection. The dissociation between OSC and V1 was confirmed in an additional analysis in which participants were divided in two groups based on their behavioral performance (Fig. 3B). These results suggest that different participants may have used different search strategies: an effective preparation strategy reflected in OSC activity, or an ineffective preparation strategy reflected in V1 activity. What could be the difference between effective and ineffective strategies? Effective preparation likely consists of anticipating relatively general category-diagnostic features, given the large variability in the visual appearance and location of target objects, and the large overlap of specific features between target and distractor objects in natural vision. Conversely, less effective strategies may consist of the preparation to detect specific visual features, such as those associated with a specific or canonical view of a car or person [e.g., horizontal (cars) versus vertical (persons) object segments]. To test this hypothesis, we conducted a behavioral study in which participants (n = 16) performed 6 runs of the same search task as used in the fMRI experiment, and filled out a posttest questionnaire probing their strategy (Table S1). Each of the statements related either to a general strategy, defined as the preparation to detect the target category at a relatively abstract level without vivid visual imagery (e.g., “after the person cue I formed a general idea of what a person in

the scene may look like”), or a specific strategy, defined as a strategy during which participants would visually imagine specific visual features (e.g., “after the person cue I imagined persons with a prototypical posture as seen from the front”). First, although on average participants gave higher ratings to the general (3.2) than the specific (2.3) statements (t15 = 4.2, P < 0.001), there was variability in this difference across participants (range −0.2 to 2.5), indicating that participants reported to use different strategies. Next, we correlated the difference between these two averaged ratings (general minus specific) with behavioral performance across participants. This analysis revealed a strong positive correlation between the rating difference and accuracy (acc: r = 0.67, P < 0.005; RT: r = 0.03, P = 0.9), indicating that the use of the general strategy was beneficial to task performance relative to the use of the specific strategy. This correlation was a result of both a positive relation between accuracy and the rating for the general strategy (r = 0.52, P < 0.05; Fig. 5) and a negative relation between accuracy and the rating for the specific strategy (r = −0.42, P = 0.10; Fig. 5). This relation between search strategy and behavioral performance informs the interpretation of the fMRI results: the beneficial preparatory activity in OSC likely reflected the use of a relatively abstract strategy, whereas the disadvantageous preparatory activity in V1 may have reflected a more specific imagery-likestrategy. Finally, we explored the whole brain for regions that discriminated the scene types and for regions that showed facilitatory cue effects. We used a spherical searchlight approach (15) to test for each voxel in the brain, the degree to which multivoxel patterns in a 10-mm sphere around this voxel could discriminate the scene types based on the category-specific patterns from the

Fig. 4. Relation between category-specific cue effect in visual cortex and behavioral performance. The relation between the category specificity of cuerelated activity (category-specific cue effect, horizontal axis) and (A) accuracy, and (B) response time is shown for areas V1 (red diamonds) and OSC (blue triangles). Each data point represents a participant. The category specificity of cue-related activity was positively related to behavioral performance in OSC, but negatively related to behavioral performance in V1. The category-specific cue effect was calculated as in Fig. 1C.

Peelen and Kastner

PNAS Early Edition | 3 of 6

NEUROSCIENCE

Fig. 3. Category-specific cue effect in visual cortex. (A) Category-specific cue effect in V1, V2/V3, and OSC for all participants (n = 14). The category-specific cue effect was calculated as in Fig. 1C. (B) Category-specific cue effect in V1, V2/V3, and OSC separately for the good (accuracy > 82%, n = 7) and poor participants (accuracy < 82%, n = 7). A Group (good/poor) × ROI (OSC/V1) ANOVA on category-specific cue effects showed a significant interaction between Group and ROI (F1,12 = 16.1, P < 0.005). Whereas the “good” participants showed highly significant category-specific cue activity in OSC (t6 = 6.0, P < 0.001) but not in V1 (t6=-0.7), the “poor” participants showed significant category-specific cue activity in V1 (t6 = 4.2, P < 0.005) but not in OSC (t6 = 0.7, P = 0.5).

Fig. 5. Relation between search strategy and behavioral performance. The degree to which participants reported to use a relatively general search strategy was positively correlated with accuracy (blue triangles), whereas a relatively specific imagery-like search strategy was negatively correlated with accuracy (red diamonds).

independent localizer (identical to the category information measure in the ROI analysis). Results of the individual participants were spatially normalized to allow for a whole-brain group analysis testing for scene-discriminating regions throughout the brain (Materials and Methods). This analysis revealed two bilateral clusters in occipitotemporal cortex, closely overlapping the ventral and dorsal foci of OSC, thus confirming the results of the ROI analysis (Fig. 6). Three additional clusters were located in the frontal cortex: medial prefrontal cortex (mPFC; peak: xyz = 2, 43, 5; t13 = 6.1; P < 0.0001; cluster size: 594 mm3), right middle frontal gyrus (peak: xyz = 40, 11, 35; t13 = 6.0; P < 0.0001; cluster size: 729 mm3), and right precentral gyrus (peak: xyz = 59, −10, 32; t13 = 7.0; P < 0.0001; cluster size: 702 mm3). Interestingly, the mPFC cluster showed a significant category-specific cue effect (t13 = 2.5; P < 0.05), which was positively (although not significantly) correlated with accuracy (r = 0.40). No category-specific cue effect was found in the right middle frontal gyrus and precentral gyrus clusters (P > 0.5 for both tests). We used the same approach to test for regions in which the category-specific cue effect (calculated for each sphere, as in Fig. 1C) positively correlated with behavioral

Fig. 6. Whole-brain analyses. Results from whole-brain group analyses, overlaid on the group-average anatomical scan. (Left) The ventral (z = −18) and dorsal (z = 10) foci of OSC, activated by the univariate contrast between intact and scrambled objects in the OSC localizer (P < 0.001). (Center) Occipitotemporal and medial prefrontal clusters from the multivoxel searchlight analysis testing for spheres that discriminated between the two scene types (people vs. cars) on the basis of independent category localizer patterns (P < 0.001). (Right) The right occipitotemporal cluster from the multivoxel searchlight analysis testing for spheres with a positive correlation between the category-specific cue effect and behavioral performance (r > 0.78, corresponding to P < 0.001).

4 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1101042108

performance (expressed by a normalized behavioral performance score incorporating both accuracy and response time; Materials and Methods). This analysis revealed a large cluster in right occipitotemporal cortex (peak: xyz = 46, −58, 8; r = 0.87; P < 0.0001; cluster size: 1161 mm3), overlapping both OSC and scenediscriminating regions (Fig. 6). The cue-effect in this peak correlated significantly with both accuracy and response time (acc: r = 0.79, P < 0.001; RT: r = −0.76, P < 0.005). Furthermore, it showed significant intact versus scrambled object selectivity (t13 = 3.9; P < 0.005) and scene category information (t13 = 3.5; P < 0.005). Together, these whole-brain analyses indicate that facilitatory preparatory activity is primarily located within objectselective regions that discriminate target from distractor scenes. Furthermore, they provide evidence for mPFC involvement in real-world visual search, as this region both contained information about the categories of objects in the scenes and responded in a category-selective manner to search cues in the absence of visual input. Discussion Our results provide evidence for content-specific activity patterns in visual cortex during the preparation to search for familiar object categories in cluttered natural scenes. Critically, preparatory activity in object-selective visual cortex, but not in early visual cortex, was found to facilitate the subsequent detection of never-before-seen category exemplars. These results indicate that the detection of complex objects in cluttered real-world scenes is selectively mediated by preparatory activity in higher levels of the visual hierarchy, where target scenes are discriminated from distractor scenes. In a previous study (14), we found that attending to a particular object category in briefly presented natural scenes biased the processing of the scenes in favor of the attended category, as confirmed by the present results (Fig. 2). Notably, this effect was similar for spatially attended and spatially unattended scenes, indicating a global biasing mechanism that operates in parallel across the visual field and independent of spatial attention. Our present findings provide important evidence as to the origin of this biasing mechanism. Specifically, our results indicate that search preparation involves the priming or preactivation of neuronal populations that are selective to the target category. This internally generated activity then provides a competitive advantage for target stimuli, biasing the processing of scenes in favor of the attended category, and facilitating target detection across the visual field. The finding that the degree of category selectivity of the preparatory activity in OSC positively correlated with detection performance suggests that this preparatory mechanism is critical for efficient target detection. By contrast to the beneficial effects of preparatory activity in OSC, preparatory activity in early visual cortex was negatively correlated with behavior: Participants who showed categoryspecific cue-evoked activity in V1 performed worse in the category detection task than participants who did not. Based on the known response properties of early visual cortex, an area selective to simple features such as line orientation, this result may indicate that participants who showed selective preparatory activity in V1 were preparing to detect specific visual features such as orientation cues, instead of (or in addition to) more abstract category diagnostic features. Although low-level features may differentially match the features of isolated category exemplars (e.g., cars have more horizontal segments than people), searching for low-level features is unlikely to be the optimal strategy for the detection of objects in cluttered natural scenes, in which many other objects share these low-level features. Our behavioral study confirmed this hypothesis by showing that a search strategy comprising the preparation to detect specific visual features was detrimental to detection performance. The finding of top-down activity in visual cortex, in the absence of visual input, is not unique to visual search paradigms. Peelen and Kastner

Peelen and Kastner

Materials and Methods Participants. Fourteen healthy adult volunteers (six females) participated in two scanning sessions. All participants were right-handed with normal or corrected-to-normal vision and no history of neurological or psychiatric disease. Participants all gave informed written consent for participation in the study, which was approved by the Institutional Review Panel of Princeton University. Stimuli. Natural scene pictures (n = 192) were selected from an online database (30). Forty-eight pictures contained one or more people (but no cars), 48 pictures contained one or more cars (but no people), 48 pictures contained both cars and people, and 48 pictures contained no cars and no people. The pictures were mostly photographs of city streets. The position, viewpoint, and size of the people and cars in the pictures were highly variable, mimicking real-world viewing conditions (see Fig. S1 for sample pictures). Forty-eight different perceptual masks were created. Each was a colored picture of a mixture of white noise at different spatial frequencies on which a naturalistic texture was superimposed (31). Within a scanning session each of the presented pictures was unique. The same set of pictures was used in a second scanning session (see General procedure). All pictures were full-color photographs reduced to 480 (vertical) × 640 (horizontal) pixels. The pictures (size: 7.5° × 10.0°) were presented centrally on top of a fixation cross. A projector outside the scanner room displayed the stimuli onto a translucent screen located at the end of the scanner bore. Participants viewed the screen through a mirror attached to the head coil. General Procedure. Each volunteer participated in two scanning sessions separated by, on average, 34 d. Each session consisted of eight runs of the main experiment, two runs of the category pattern localizer, and two runs of the OSC localizer. Data were analyzed using the AFNI software package and MATLAB (MathWorks, Natick, MA). Main Experiment. Each run consisted of 42 trials and lasted for 210 s. Of these 42 trials, 24 trials were cue-plus-scene trials, 12 were cue-only trials, and 6 were fixation-only (“null”) trials. Each run started and ended with a blank screen showing a fixation cross for 12.25 s. Each experimental trial started with a 0.4-s blank screen, followed by a 0.5-s presentation of a symbolic cue indicating the target category. The cue was followed by a fixation cross that was presented for 2 s, 2.25 s, or 3 s (equiprobable). For cue-plus-scene trials, the scene picture was then presented for 0.1 s, followed directly by a 0.3-s presentation of a perceptual mask and 0.7 s of fixation until the next trial. For cue-only trials, a 0.4-s fixation cross was presented instead of the scene and mask. The average trial duration was 4.4 s (see Fig. 1 for an overview of the trial layout). Of the 36 experimental trials (24 cue-plus-scene trials and 12 cue-only trials) in each run, 18 were “detect people” trials and 18 were “detect cars” trials. The symbolic cues for people-detection trials were “B” (runs 1–4) and “2” (runs 5–8), whereas the cues for car-detection trials were “C” (runs 1–4) and “3” (runs 5–8). Of the 24 scenes presented in each run, 6 contained one or more people (but no cars), 6 contained one or more cars (but no people), 6 contained both cars and people, and 6 contained no cars and no people. Trial order was randomized (without replacement). The task was to press one button for the presence of the target category in the relevant picture pair and another button for the absence of the target category. The mapping of the two buttons (index and middle finger) to present and absent responses was counterbalanced across sessions and participants. Category Pattern Localizer. Category-selective patterns of activation were established using a separate localizer experiment. Stimuli were presented centrally, had a size of 12° × 12° and showed isolated objects shown on a white background. The experiment consisted of four conditions: human bodies, cars, outdoor scenes, and faces. One scanning run consisted of 21 blocks of 14 s each. Blocks 1, 6, 11, 16, and 21 were fixation-only baseline epochs. In each of the remaining blocks, 20 different stimuli from one category were presented. Each stimulus appeared for 350 ms, followed by a blank screen for 350 ms. Twice during each block, the same picture was presented two times in succession. Participants were required to detect these repetitions and report them with a button press (1-back task). Each participant was tested with two different versions of the experiment that counterbalanced for the order of the blocks. In both versions, assignment of category to block was counterbalanced, so that the mean serial position in the scan of each condition was equated.

PNAS Early Edition | 5 of 6

NEUROSCIENCE

For example, directing spatial attention to a particular location has been shown to increase activity in extrastriate regions responsive to stimuli at the attended location (16). Other studies have reported top-down activity of visual cortex in paradigms where participants were asked to visually memorize the orientation of a gabor stimulus (17, 18) or its color (18), were expecting a particular object shape (10–12), or were asked to vividly imagine a specific category exemplar (19, 20) or letter (21). How do these previous findings of top-down activity relate to the preparatory activity reported here? First, an important difference between our visual search task and previous studies on attention, perceptual expectation, working memory, and visual imagery is that in our task, participants did not know what the target objects would look like or where they would appear. It is therefore unlikely that the category-specific cue effects in OSC reflect the visual imagery or memory of a specific shape, given the variability in appearance of target objects in the large number of unique scenes that were presented. Second, our findings are different from the effects of feature-based attention (22), unless one considers object category itself a feature (23). As argued above, and confirmed by our behavioral study, attention to particular low-level features (e.g., color or orientation) will not be helpful in performing the category detection task, in which target objects overlap heavily with distractor objects in terms of the low-level features that they share. Finally, and perhaps most importantly, although it is conceivable that the topdown modulatory mechanisms involved in visual search, featurebased attention, visual working memory, and visual imagery partially overlap, our results show that top-down preparatory activity is an integral part of real-world visual search in that it biases target-selective neuronal populations in favor of target objects, thereby facilitating their detection in cluttered scenes. What constitutes the source of the preparatory activity patterns that we observed in visual cortex? Spatial attention studies have consistently implicated a fronto-parietal network as the source of spatial attention biases in visual cortex (24). Activity in this network precedes activity in visual cortex (25), and activity within these source regions is spatially specific (26). A similar fronto-parietal network has been implicated in feature-based attention (27), and feature-specific responses have been reported in parietal cortex (22). Finally, the prefrontal cortex is thought to exert top-down modulatory influences on visual cortex during working memory maintenance (28). In the present experiment, a whole-brain searchlight analysis identified the medial prefrontal cortex as a putative source of preparatory activity in visual cortex. This region showed category-selective responses to the scenes, similar to OSC. Furthermore, this category selectivity was already present during the preparation phase of the task, as would be expected for a source region. The mPFC region identified here may correspond to a region labeled superior orbital sulcus (SOS) in a previous study, where it was linked to the processing of scene context (29). Consistent with the present findings, it was hypothesized that SOS may maintain an updated representation of scene context to modulate and facilitate object processing in visual cortex (29). Future work needs to follow up on these findings using methods that are optimally suited to address the temporal flow of information in the brain. In summary, the present study has demonstrated contentspecific preparatory activity in OSC during real-world visual search. This preparatory activity biased neural population responses in favor of the task-relevant object category, thereby facilitating the detection of objects in cluttered natural scenes. Our results further suggest that preparatory visual activity is most effective when implemented at the level of visual cortex that discriminates target from distractor scenes, that is, in OSC. Finally, we identified a region in medial prefrontal cortex as a putative source of preparatory activity in visual cortex. Together, these findings provide a neural basis for visual search in our daily-life environment.

OSC Localizer. OSC was identified using a localizer scan with a design identical to that of the category pattern localizer described above, except that pictures of intact and scrambled objects were presented in alternating blocks. Data Acquisition and Preprocessing. Functional [EPI sequence; 34 slices per volume; resolution = 3 × 3 × 3 mm with 1-mm gap; repetition time (TR) = 2.0 s; echo time (TE) = 30 ms; flip angle = 90°) and anatomical (MPRAGE sequence; 256 matrix; TR, 2.5 s; TE, 4.38 ms; flip angle, 8°; 1 × 1 × 1 mm resolution) images were acquired with a 3T Allegra MRI scanner (Siemens, Erlangen, Germany). Functional data were slice-time corrected and motion corrected, and low-frequency drifts were removed with a temporal high-pass filter (cutoff of 0.006 Hz). Only data used for ROI definition were spatially smoothed with a Gaussian kernel (4-mm full-width half-maximum). No spatial smoothing was applied on data used for any of the other analyses. ROI Definition. OSC was defined for each participant in native space, by contrasting responses evoked by intact objects with responses evoked by scrambled objects, at P < 0.05 (uncorrected). V1 (Brodmann area 17) and V2/ V3 (Brodmann area 18) were defined using the Talairach atlas implemented in AFNI (“TT_Daemon”), and projected back to each participant’s native space. Brodmann areas 17 and 18 have been shown to closely correspond to V1 and V2/V3, respectively (32). The mean size, in number of voxels, of the ROIs were: OSC: 400 (SD = 123), V1: 255 (SD = 22), V2/V3: 1,166 (SD = 94). Left and right hemisphere ROIs were combined. Statistical Analysis. For each participant, general linear models were created for the main experiment and the category pattern localizer experiment. One predictor (convolved with a standard model of the hemodynamic response function) modeled each condition. All trials were included in the analyses. Regressors of no interest were also included to account for differences in the mean MR signal across scans and for head motion within scans. These regression analyses resulted, for each voxel, in a t value for each condition in the main experiment and for each condition in the localizer experiment. Following previous studies (13), we normalized these t values by subtracting, for each voxel, the mean t value across the relevant conditions of an experiment (e.g., the mean of bodies and cars in the category localizer) from the t value of each individual condition of this experiment (e.g., bodies and cars). This normalization resulted in the mean t value of each voxel being zero, thereby eliminating the effect of voxelwise response differences that were unspecific to our conditions but leaving condition-related variation

1. Thorpe S, Fize D, Marlot C (1996) Speed of processing in the human visual system. Nature 381:520–522. 2. Li FF, VanRullen R, Koch C, Perona P (2002) Rapid natural scene categorization in the near absence of attention. Proc Natl Acad Sci USA 99:9596–9601. 3. Wolfe JM, Võ ML, Evans KK, Greene MR (2011) Visual search in scenes involves selective and nonselective pathways. Trends Cogn Sci 15:77–84. 4. Duncan J, Humphreys GW (1989) Visual search and stimulus similarity. Psychol Rev 96: 433–458. 5. Wolfe JM, Cave KR, Franzel SL (1989) Guided search: an alternative to the feature integration model for visual search. J Exp Psychol Hum Percept Perform 15:419–433. 6. Simons DJ, Rensink RA (2005) Change blindness: past, present, and future. Trends Cogn Sci 9:16–20. 7. Desimone R, Duncan J (1995) Neural mechanisms of selective visual attention. Annu Rev Neurosci 18:193–222. 8. Chelazzi L, Duncan J, Miller EK, Desimone R (1998) Responses of neurons in inferior temporal cortex during memory-guided visual search. J Neurophysiol 80:2918–2940. 9. Chelazzi L, Miller EK, Duncan J, Desimone R (1993) A neural basis for visual search in inferior temporal cortex. Nature 363:345–347. 10. Stokes M, Thompson R, Nobre AC, Duncan J (2009) Shape-specific preparatory activity mediates attention to targets in human visual cortex. Proc Natl Acad Sci USA 106: 19569–19574. 11. Esterman M, Yantis S (2010) Perceptual expectation evokes category-selective cortical activity. Cereb Cortex 20:1245–1253. 12. Puri AM, Wojciulik E, Ranganath C (2009) Category expectation modulates baseline and stimulus-evoked activity in human inferotemporal cortex. Brain Res 1301:89–99. 13. Haxby JV, et al. (2001) Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293:2425–2430. 14. Peelen MV, Fei-Fei L, Kastner S (2009) Neural mechanisms of rapid natural scene categorization in human visual cortex. Nature 460:94–97. 15. Kriegeskorte N, Goebel R, Bandettini P (2006) Information-based functional brain mapping. Proc Natl Acad Sci USA 103:3863–3868. 16. Kastner S, Pinsk MA, De Weerd P, Desimone R, Ungerleider LG (1999) Increased activity in human visual cortex during directed attention in the absence of visual stimulation. Neuron 22:751–761.

6 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1101042108

intact. The normalized t values of conditions in the main experiment were correlated, across the voxels of an ROI, with the normalized t values of the body and car conditions in the localizer (Fig. 1; see ref. 14). The analysis was performed for each participant and session separately. Correlations were Fisher transformed [0.5 × loge((1 + r)/(1 − r))] before averaging the two sessions and statistical testing. Differences between voxelwise correlations were then tested using repeated-measures ANOVAs and t tests (two-tailed) with participant (n = 14) as random factor. Searchlight Analysis. A whole-brain pattern analysis was performed using a spherical searchlight (15). For each voxel in the brain. we computed voxelwise correlations in a sphere of 10-mm radius (corresponding to 121 voxels) around this voxel. The voxelwise correlations were computed as described in Statistical Analysis. The correlation values from each sphere were Fisher transformed and assigned to the center voxel of this sphere. The correlations were computed for each subject and session separately. Results were transformed into Talairach space (which included resampling to 3 × 3 × 3 mm voxels), the correlations of the two sessions were averaged for each subject, and random-effects group analyses were performed. The first searchlight analysis tested for regions that discriminated between the two scene types (containing people or cars) based on the category localizer patterns (isolated pictures of bodies or cars). The average correlation between matching categories was contrasted with the average correlation between nonmatching categories. The threshold was set to P < 0.001 (uncorrected) and a minimum cluster size of 20 (resampled to 3 × 3 × 3 mm) voxels. The second searchlight analysis tested for regions in which the category-specific cue effect was correlated with behavior. The categoryspecific cue effect was calculated for each sphere as illustrated in Fig. 1C. The normalized behavioral performance score was calculated by taking the mean of the normalized RT score (multiplied by −1, such that higher scores reflected better performance) and the normalized accuracy score. Normalization consisted of subtracting the mean value from each participant’s value, and dividing this by the SD of the group mean. The threshold was set to r > 0.78 (corresponding to P < 0.001) and a minimum cluster size of 20 (resampled) voxels. ACKNOWLEDGMENTS. This work was supported by National Institutes of Health Grants R01-EY017699 and R01-MH064043 and National Science Foundation Grant BCS-1025149.

17. Harrison SA, Tong F (2009) Decoding reveals the contents of visual working memory in early visual areas. Nature 458:632–635. 18. Serences JT, Ester EF, Vogel EK, Awh E (2009) Stimulus-specific delay activity in human primary visual cortex. Psychol Sci 20:207–214. 19. O’Craven KM, Kanwisher N (2000) Mental imagery of faces and places activates corresponding stiimulus-specific brain regions. J Cogn Neurosci 12:1013–1023. 20. Reddy L, Tsuchiya N, Serre T (2010) Reading the mind’s eye: decoding category information during mental imagery. Neuroimage 50:818–825. 21. Stokes M, Thompson R, Cusack R, Duncan J (2009) Top-down activation of shapespecific population codes in visual cortex during mental imagery. J Neurosci 29: 1565–1572. 22. Serences JT, Boynton GM (2007) Feature-based attentional modulations in the absence of direct visual stimulation. Neuron 55:301–312. 23. Treisman A (2006) How the deployment of attention determines what we see. Vis Cogn 14:411–443. 24. Kastner S, Ungerleider LG (2000) Mechanisms of visual attention in the human cortex. Annu Rev Neurosci 23:315–341. 25. Bressler SL, Tang W, Sylvester CM, Shulman GL, Corbetta M (2008) Top-down control of human visual cortex by frontal and parietal cortex in anticipatory visual spatial attention. J Neurosci 28:10056–10061. 26. Szczepanski SM, Konen CS, Kastner S (2010) Mechanisms of spatial attention control in frontal and parietal cortex. J Neurosci 30:148–160. 27. Egner T, et al. (2008) Neural integration of top-down spatial and feature-based information in visual search. J Neurosci 28:6141–6151. 28. Gazzaley A, Rissman J, D’Esposito M (2004) Functional connectivity during working memory maintenance. Cogn Affect Behav Neurosci 4:580–599. 29. Bar M (2004) Visual objects in context. Nat Rev Neurosci 5:617–629. 30. Russell BC, Torralba A, Murphy KP, Freeman WT (2008) LabelMe: a database and webbased tool for image annotation. Int J Comput Vis 77:157–173. 31. Walther DB, Caddigan E, Fei-Fei L, Beck DM (2009) Natural scene categories revealed in distributed patterns of activity in the human brain. J Neurosci 29:10573–10581. 32. Wohlschläger AM, et al. (2005) Linking retinotopic fMRI mapping and anatomical probability maps of human occipital areas V1 and V2. Neuroimage 26:73–82.

Peelen and Kastner

Supporting Information Peelen and Kastner 10.1073/pnas.1101042108

Fig. S1. Examples of scene pictures used in the experiment. Scenes could show people or cars in natural, daily life situations. As in natural vision, the visual appearances and spatial locations of the people and cars in the scenes were highly variable. For example, a scene could show a single person sitting on a bench in a park close to the camera, or could show a group of people walking on a street at a distance.

Fig. S2. Multivoxel category information in category-selective regions of visual cortex. In addition to the main ROIs, several category-selective ROIs were defined using the data of the category localizer, all at P < 0.05 (uncorrected). The body-selective extrastriate body area (EBA) and fusiform body area (FBA) were defined by the contrast between bodies and cars. The face-selective fusiform face area (FFA) was defined by the contrast between faces and cars. Finally, the scene-selective parahippocampal place area (PPA) was defined by the contrast between scenes and the average of cars, bodies, and faces. The mean size, in number of voxels, of the ROIs were: EBA: 161 (SD = 50), FBA: 53 (SD = 30), FFA: 43 (SD = 30), PPA: 149 (SD = 36). Left and right hemisphere ROIs were combined. Significant category information was observed in EBA and PPA (P < 0.01, for both tests), but not in FBA or FFA (P > 0.08, for both tests). There was no modulation of category information as a function of task relevance in EBA, FBA, FFA, or PPA (P > 0.2, for all tests).

Peelen and Kastner www.pnas.org/cgi/content/short/1101042108

1 of 2

Fig. S3. Category-specific cue effect in category-selective regions of visual cortex. (A) Category-specific cue effect in EBA, FBA, FFA, and PPA for all participants (n = 14). The category-specific cue effect was calculated as the difference between within-category correlations minus between-category correlations (Fig. 1C). The category-specific cue effect was not significantly different from zero in any of these ROIs (P > 0.4, for all tests). (B) Category-specific cue effect in EBA, FBA, FFA, and PPA separately for the good (accuracy > 82%, n = 7) and poor participants (accuracy < 82%, n = 7). There was no significant difference between the category-specific cue effect for good and poor participants in any of these ROIs (P > 0.2, for all tests).

Fig. S4. Mean cue-evoked activity in visual cortex. Significant decreases in the mean amplitude of activity were observed in V2/V3 (P < 0.05) and OSC, EBA, FBA, FFA, PPA (P < 0.001, for all tests) in response to the cues. There was no significant difference between the “detect people” and “detect cars” cues in any of the ROIs (P > 0.3, for all tests).

Table S1. Statements used in questionnaire aimed at measuring the degree to which participants used a relatively general or specific search strategy General strategy After the car cue I anticipated detecting cars seen from multiple angles rather than from one angle After the car cue I looked out for cars at the many locations where they may appear in the scene After the person cue I formed a general idea of what a person in the scene may look like After the person cue I looked out for persons, but I didn’t have a vivid mental image of a person Specific strategy After the car cue I vividly imagined a car, as if I could almost see it in front of me After the car cue I looked out for horizontal things that were about the size of a car After the car cue I looked out for a typical car (e.g., a sedan) After the person cue I looked out for vertical things that were about the size of a person After the person cue I imagined persons with a prototypical posture as seen from the front After the person cue I thought about one particular individual Participants rated the level of agreement with each of the statements on a 1 (“fully disagree”) to 5 (“fully agree”) scale.

Peelen and Kastner www.pnas.org/cgi/content/short/1101042108

2 of 2

A neural basis for real-world visual search in human ...

many other objects share these low-level features. Our behav- ioral study confirmed .... onto a translucent screen located at the end of the scanner bore. Partic-.

880KB Sizes 0 Downloads 164 Views

Recommend Documents

The Neural Mechanisms of Prediction in Visual Search
Sep 22, 2015 - MEG data demonstrated that these rapid-response trials were associated with a prediction of (1) target .... using a PC running the MATLAB-based Psychophysics toolbox software .... Exploratory Whole-Brain Analysis. For the ...

The neural basis of visual body perception
'holistic' processing. Source localization. A technique used in electro- encephalogram (EEG) and magnetoencephalogram (MEG) research to estimate the location of the brain areas .... Figure 3 | Event-related potentials reveal similar, but distinct, re

A Regularized Line Search Tunneling for Efficient Neural Network ...
Efficient Neural Network Learning. Dae-Won Lee, Hyung-Jun Choi, and Jaewook Lee. Department of Industrial Engineering,. Pohang University of Science and ...

The Neural Basis of Relational Memory Deficits in ...
2006;63:356-365 ... gions previously found to support transitive inference in .... participants were shown pairs of visual items on a computer screen and asked to ...

Fuzzy Markup Language for RealWorld Applications(Combined ...
Fuzzy Markup Language for RealWorld Applications(Combined)-03272017-2.pdf. Fuzzy Markup Language for RealWorld Applications(Combined)-03272017-2.

pdf-0749\radial-basis-function-rbf-neural-network-control-for ...
... apps below to open or edit this item. pdf-0749\radial-basis-function-rbf-neural-network-contr ... design-analysis-and-matlab-simulation-by-jinkun-liu.pdf.

Neural Basis of Memory
Nov 7, 2004 - How does the transmission of signal take place in neurons? • Do genes play a role in memory ... information, or to the engram, changes that constitute the necessary conditions of remembering (Tulving, cited ..... visual scene we encou

Bottom-Up Guidance in Visual Search for Conjunctions
Results. The error rates are shown in Table 1; they followed the same ... Figure 3. As seen in Figure 3 and in Table 2, the search slope for ..... junction search.

Neural Basis of Memory
Nov 7, 2004 - memory was held in “cell assemblies” distributed throughout the brain. 0.1.1 Organization ... are used for short- and long-term memory storage.

Automatic Human Action Recognition in a Scene from Visual Inputs
problem with Mind's Eye, a program aimed at developing a visual intelligence capability for unmanned systems. DARPA has ... Figure 1. System architectural design. 2.2 Visual .... The MCC (Eq. 2) is a balanced measure of correlation which can be used

Visual search for a target changing in synchrony with ...
Dec 21, 2005 - a random pulse train was based on the refresh rate of the monitor (160 Hz). ... are known to have relatively flat frequency characteristics. (cf. Hirahara ..... Since it took too long to generate a 30 s sequence between trials, the ...

Efficient Neural Models for Visual Attention
process and thus reduce the complexity and the processing time of visual task. Artificial ..... the mean execution time is 0.62 sec, that is 1.6 frame per second.

Efficient neural models for visual attention
Reduce the search space [Tsotsos, 90]. Attentional architecture. Feature extraction. Combination on saliency map. Focus selection through. Winner-Take-All.

neural architecture search with reinforcement ... -
3.3 INCREASE ARCHITECTURE COMPLEXITY WITH SKIP CONNECTIONS AND OTHER. LAYER TYPES. In Section 3.1, the search space does not have skip connections, or branching layers used in modern architectures such as GoogleNet (Szegedy et al., 2015), and Residua

Implementation of a neural network based visual motor ...
a robot manipulator to reach a target point in its workspace using visual feedback. This involves primarily two tasks namely, extracting the coordinate information ...

Neural basis of the non-attentional processing ... - Wiley Online Library
1Beijing Normal University, Beijing, China. 2Beijing 306 Hospital, Beijing, China. 3The University of Hong Kong, Hong Kong. 4University of Pittsburgh, Pittsburgh, Pennsylvania. ♢. ♢. Abstract: The neural basis of the automatic activation of words

Structural basis for inhibition of human PNP by ...
179 cess code: 1ULB) indicates a large movement in the. 180 Lys244 side .... 248 [3] S. Banthia, J.A. Montgomery, H.G. Johnson, G.M. Walsh, In. 249 vivo and in ...

Challenges for System Identification in Neural ... - Randal A. Koene
Aug 12, 2009 - system that implements a degree of intelligence and a degree of generality within .... out of the labs of Winfried Denk (Max Planck), Jeff Lichtman (Harvard) and ..... (2009), doi:10.1007/s12021–009–9052–3 (Published online:.

Challenges for System Identification in Neural ... - Randal A. Koene
Aug 12, 2009 - by decomposing the overall problem into a collection of smaller Sys- .... cognitive neural prosthesis devised by the lab of Theodore Berger ...

Understanding the Neural Basis of Cognitive Bias ... - PsycNET
Apr 29, 2016 - The multilayer neural network trained with realistic face stimuli was also ... modification, visual processing of facial expression, neural network.

Using Complement Coercion to Understand the Neural Basis of ...
differed only in the number of semantic operations required for comprehension ... semantic composition and that the present study seeks to address. The first ...

Setting up the target template in visual search - CiteSeerX
Feb 9, 2005 - so a new target template needs to be set up on every trial. Despite ...... Email: [email protected] or [email protected]. Address: ...

Setting up the target template in visual search
Feb 9, 2005 - Reaction T ime (ms). Figure 3 shows mean RT as a function of cue type, cue leading time, and set size. A repeated-measures ANOVA on cue type, cue leading time, and set size revealed significant main effects of all factors. RT was faster