*Result*: Toward a better understanding of target distinctiveness in visual search: How color, shape, and texture information combine to guide search.
*Further Information*
*People often search for objects distinctive from other objects in the scene along multiple feature dimensions like color and shape. A target distinctive in more than one dimension can lead to an easier search, but it also increases the complexity of modeling search behaviors. Building upon previous research on how people search using information along two feature dimensions, we explored how search unfolds when the target and distractors differ along the dimensions of color, shape, and texture (a tridimensional search). Using a behavioral-computational approach, we found that the target-distractor distinctiveness signal along each dimension combines in a weighted orthogonal way to guide tridimensional searches. Additionally, across two sets of experiments, we demonstrated that the weight assigned to each dimension varied according to its relative usefulness. When the color distinctiveness was most pronounced (Set 1), there was a much stronger prioritization of color information over information carried by shape and texture. When the distinctiveness along individual dimensions was more balanced (Set 2), the weights were distributed more evenly across the three dimensions, but a color prioritization remained. These results have broad implications for cognitive neuroscience, as they place constraints on how visual information from different dimensions is integrated to produce an overall guidance signal, and demonstrate how attention might be flexibly allocated across channels in response to the ecological aspects of the environment. This study should also interest modelers in cognitive science because it demonstrates an approach to understand behavior in complex scenarios based on performance indices estimated under simpler conditions. (PsycInfo Database Record (c) 2026 APA, all rights reserved).*
Toward a Better Understanding of Target Distinctiveness in Visual Search: How Color, Shape, and Texture Information Combine to Guide Search
<cn> <bold>By: Zoe (Jing) Xu</bold>>
>
> <bold>Alejandro Lleras</bold>
>
> <bold>Simona Buetti</bold>
>
<bold>Acknowledgement: </bold>Timothy Vickery served as action editor.This article has been posted as a preprint and is available on the Open Science Framework. The preprint can be accessed at https://osf.io/preprints/osf/857u2. All data, analysis code, and research materials are available on the Open Science Framework (https://osf.io/bmwa4/). Results in Experimental Set 1 were presented at the 2021 Annual Meeting of the Vision Sciences Society.The authors have no competing interests. This project was supported by a grant from the National Science Foundation (Grant BCS 1921735 awarded to Simona Buetti).Zoe (Jing) Xu played a lead role in data curation, formal analysis, investigation, project administration, software, validation, visualization, and writing–original draft and an equal role in conceptualization, methodology, and writing–review and editing. Alejandro Lleras played a supporting role in formal analysis, funding acquisition, investigation, project administration, resources, validation, visualization, and writing–original draft and an equal role in conceptualization, methodology, supervision, and writing–review and editing. Simona Buetti played a lead role in funding acquisition, resources, and supervision, a supporting role in formal analysis, investigation, project administration, validation, visualization, and writing–original draft, and an equal role in conceptualization, methodology, and writing–review and editing.
The concept of target template is commonly used in many visual search theories, usually defined as the mental representation of the target that people are looking for in the visual environment (e.g., Chun & Wolfe, 1996; Malcolm & Henderson, 2009; Wolfe, 2021). The majority of theories agree that the target template facilitates search behaviors by imposing top-down guidance on the search processes (e.g., Adeli et al., 2017; Buetti et al., 2016; Bundesen, 1990; Duncan & Humphreys, 1989; Hoffman, 1979; Hulleman & Olivers, 2017; Liesefeld et al., 2018; Rosenholtz et al., 2012; Wolfe, 1994, 2021; Zelinsky, 2008; for exceptions that propose search being determined solely by bottom-up signals, see Itti & Koch, 2000; Theeuwes, 1991, 1992; Ullman, 1987). Some theories propose a feature-boosting mechanism, whereby a subset of features in the target template that provides the largest discriminability between the target and distractors in the display is boosted. The resulting activation map reflects the extent to which each location contains the boosted target features, determining the likelihood of each location receiving attention and getting scrutinized (e.g., Adeli et al., 2017; Bundesen, 1990; Liesefeld et al., 2018; Wolfe, 1994, 2021; X. Yu & Geng, 2019). Other theories propose that the overall similarity (i.e., the “match”) between the search object and the target template provides the top-down guidance that helps direct attention to the objects that are most likely to be the target (e.g., Buetti et al., 2016; Duncan & Humphreys, 1989; Hoffman, 1979).
While visual search theories tend to agree on the importance of the target template in search processes, there is a lack of consensus or even explicit discussion regarding how complex a target template can be or how much of the visual information contained within a target template can be utilized to guide attention. The majority of visual search theories do not discuss the limits on target template complexity (Adeli et al., 2017; Buetti et al., 2016; Duncan & Humphreys, 1989; Eckstein et al., 2000; Folk et al., 1992; Gaspelin & Luck, 2018; Hoffman, 1979; Hulleman & Olivers, 2017; Najemnik & Geisler, 2005; Navalpakkam & Itti, 2007; Wolfe, 1994; X. Yu & Geng, 2019; for a special case of target template, see Rosenholtz, 2016; Rosenholtz et al., 2012). Generally, these theories assume that the target template and the search items’ representations are constructed and compared over all the relevant dimensions (i.e., the dimensions that differentiate the target from the distractors, e.g., location, motion, color, surface texture, shape, and size). For instance, Navalpakkam and Itti (2007) argued that visual input is represented in multiple feature dimensions including but not limited to color, orientation, and direction of motion. Zelinsky (2008) used a 72-dimensional feature vector to represent visual signals coming from each location of the search scenes, where he modeled human eye movements. Consistent with this perspective, many empirical studies used real-world objects as search stimuli, where the target template receives little constraint in terms of its dimensionality or complexity. Wang et al. (2017) used teddy bears, reindeers, and car models as search stimuli and found that the difference between the target and distractors can guide search behaviors when the two differ along multiple dimensions in a complex way. Henderson et al. (2009) found that people were able to search for the target among distractors that were real-world objects appearing in real-world scenes (see also Malcolm & Henderson, 2009). These latter studies measured how people search for real-world objects that contain visual information from various dimensions, which suggests that people are capable of making comparisons between search items and a complex target template.
However, the ability to search for a target defined along multiple dimensions does not guarantee that the attentional guidance of the target template occurs along all those dimensions. Even though the target and distractors may differ along multiple dimensions, it is possible that only a subset of those dimensions actually guides the search. At first glance, this proposal might seem contradictory to the empirical findings that suggest more specific visual templates of the target provide stronger guidance than less specific or accurate templates (e.g., Malcolm & Henderson, 2009; Vickery et al., 2005; Wolfe et al., 2004). For instance, Vickery et al. (2005) found that cueing target images that differ in sizes or orientations from the actual targets led to slower searches, compared to cueing the exact target image that appears in the search display, suggesting that the detailed visual information stored in the target template, rather than general schematic or semantic information, guides the search process. However, the conclusion that a more specific target template provides stronger guidance arises from empirical evidence at two extremes of this “specificity” continuum: from being as abstract as a word or as distorted as an example differing from the true target in size or orientation to being as specific and precise as the exact image of the target. The optimal solution might lie somewhere in between—there might be a threshold for the amount of information in the target template that can be effectively utilized to guide the search, and additional details in the target template may not provide any additional guidance, or they might be utilized in a less efficient way.
Indeed, several visual search theories (Alexander et al., 2019; Liesefeld et al., 2018; Williams, 1967) suggest limits in how many features guide attention. Williams (1967) showed that when targets had multiple features, eye movements were mainly guided by color, even when shape and size were known. Only in specific cases, like when the target was the largest size, did size also influence fixations. This implies a limit on feature-based guidance.
More recently, Alexander et al. (2019) found that previewing a mismatched target, which differed from the actual target in either shape or orientation, caused a slowdown in visual search—but only when the stimuli were grayscale. When the target differed from distractors along the color dimension, previewing a mismatched target in shape or orientation caused little slowdown. This result suggests that people typically use color to guide their search and that shape and orientation become guiding factors only in the absence of color information.
More recently, there has been a proposal that there exist two distinct forms of target template: one that is primarily involved in providing initial guidance during the search process and a second one engaged in the actual identification of the target once an item is in the focus of attention (Hamblin-Frohman & Becker, 2021; Wolfe, 2021; X. Yu et al., 2022, 2023). The guiding template tends to be simpler or coarser compared to the more detailed verification template. This proposal aligns with the idea that attentional guidance during search might be based on only a subset of the information stored in the full template of the target.
Moreover, the fact that people can search for complex targets does not reveal whether visual signals from different feature dimensions contribute equally to the top-down guidance or whether signals along different dimensions are utilized differently during multidimensional search conditions. In fact, some theories have delved into this question (e.g., Bundesen, 1990; Gaspelin & Luck, 2018; Liesefeld et al., 2018; Wolfe, 1994). In the theory of visual attention, Bundesen (1990) proposed the mechanism of filtering, which refers to the prioritization of certain categories (e.g., red items) by assigning them a higher attentional weight, leading to an increased likelihood of objects belonging to these categories being selected for attention. The signal suppression hypothesis (Gaspelin & Luck, 2018; Sawaki & Luck, 2010) approaches the same idea from the other extreme: To minimize distractions from salient stimuli, features unique to those stimuli can be actively suppressed, ensuring they receive no attentional weights (also see Treisman & Sato, 1990). In both the theory of visual attention and the signal suppression hypothesis, changes in attentional weight apply to specific features (e.g., red, vertical).
The dimension weighting account addressed the attentional weighting process at a feature dimension level (e.g., Found & Müller, 1996; Müller et al., 2003; see also Liesefeld et al., 2018, for a review). Using an oddball search task, where the target identity varied randomly from trial to trial and differed from distractors along either color or orientation, Found and Müller (1996) found an intertrial facilitation: Search was faster when the target and distractors differed in the same dimension as the previous trial (e.g., trial
Overall, a number of theories have tried to address whether there is a limitation on the complexity of a target template and whether specific features or feature dimensions of the target template can be up- or downweighted to facilitate the search process. However, none of those theories quantify any target dimensionality limitations or the strength of dimensional weighting. In other words, current theories have not attempted to quantitatively capture how visual signals from different feature dimensions simultaneously contribute to the top-down guidance and affect search behavior. In the current work, we used a behavioral-computational approach to study multidimensional searches by quantitatively estimating the contribution of visual information from three different feature dimensions (color, shape, and texture) in guiding these searches and making mathematical predictions on search performance.
<h31 id="xge-155-3-839-d543e353">The Present Study</h31>We investigated how observers utilize information stored in the target template to guide attention during
>
><anchor name="fig1"></anchor>
Previous studies have adopted a behavioral-computational approach to examine feature integration in situations where the target differs from distractors along two feature dimensions: Color and shape were examined in Buetti et al. (2019) and Hughes et al. (2024), whereas shape and texture were studied in Xu, Lleras, and Buetti (2021).
In Buetti et al. (2019), search slopes were first measured in
Note that while Buetti et al. (2019) offered an approach to better understand the mathematical rules governing the combination of color and shape in guiding attention, the models considered in the study assumed that color and shape signals are utilized equally. A similar assumption is made in Xu, Lleras, and Buetti (2021), where shape and texture were presumed to be utilized equally to guide attention during bidimensional searches.
Using the same approach, here, we evaluated performance in tridimensional search conditions, where the target differs from distractors along color, shape, and texture dimensions. Our focus was to examine whether signals from all these dimensions contribute to attentional guidance, how they are integrated, and importantly, if signals from different feature dimensions are utilized to varying degrees to determine the overall guidance. We conducted two sets of experiments. The first set used stimuli similar to those in previous bidimensional studies (Buetti et al., 2019; Xu, Lleras, & Buetti, 2021), featuring targets that were more distinctive from distractors on the color dimension compared to the other two dimensions. The second set used stimuli that are better controlled for target–distractor difference across the three feature dimensions. Importantly, these two sets of experiments allowed us to not only confirm the best performing models across different data sets but also investigate the extent to which the most effective models and their parameters vary with different stimuli.
Experimental Set 1
>
A total of 12 experiments were conducted in this set, each with a naïve group of participants. We first conducted three unidimensional search experiments to measure the search slopes in conditions where the target differed from distractors along color only (Experiment 1, color search), along shape only (Experiment 2, shape search), and along texture only (Experiment 3, texture search). Furthermore, we conducted nine tridimensional search experiments where the target differed from each type of distractors along all three dimensions: color, shape, and texture (Experiments 4–12; Figure 1). The methods and experimental protocols (IRB No. 05550: Attentional mechanisms in human vision) were approved by the Institutional Review Board at the University of Illinois, Urbana–Champaign, and are in accordance with the Declaration of Helsinki.
<h31 id="xge-155-3-839-d543e439">Method</h31><bold>Transparency and Openness</bold>
For Experiments 1–24, we report how we determined our sample size, all data exclusions, all manipulations, and all measures, following Journal Article Reporting Standards (Appelbaum et al., 2018). All data, analysis code, and research materials are available on the Open Science Framework (Xu, Lleras, & Buetti, 2024) at
<bold>Participants</bold>
Participants were recruited from either the University of Illinois at Urbana–Champaign or Prolific, in exchange for course credit or for money. Sample size was determined based on data simulation of the previous bidimensional search study (Xu, Lleras, & Buetti, 2021). We estimated the sample size required to produce a small standard error on reaction time (20.33 ms) and on the magnitude of the search slope estimate (3.17 ms/log unit) in the most variable condition (defined by a specific distractor type × set size) in that study. These simulations demonstrated that we would need to include 35 valid participants in each experiment (sample size rationale is detailed in our preregistration report at
For each experiment, four participant inclusion criteria were used: (1) Participants should complete all the trials (i.e., the experiment was not aborted before finishing), (2) participants should make a response on at least 85% of the trials, (3) search accuracy should be higher than 90%, and (4) an individual’s average response time (RT) should fall within 2 standard deviations of the group average RT. The accuracy rate was calculated as the percentage of trials where participants made a correct response divided by the total number of trials where participants made a response. That is, we excluded time-out trials (i.e., trials where participants did not make any response within 5 s; see below for a detailed experimental procedure) when computing the accuracy, as the experiments were conducted online, and it was impossible to ascertain the reason for time-outs. The number of recruited participants, the number of participants included in the analysis, the number of participants excluded for each criterion, included participants’ demographic information, and summary statistics of the measurements in each experiment are shown in Table 1.
>
><anchor name="tbl1"></anchor>
<bold>Apparatus and Stimuli</bold>
All experiments were programmed in JavaScript and conducted on Pavlovia, with participants using their own computers. Because experiments were run online, we had no control over the visual angle of the stimuli on participants’ computers. To compensate for this, before the experiment, we asked participants to rescale an image of a credit card to match the real size of a credit card in order to ensure that stimuli across different computer displays maintained the same physical size (1.2 × 1.2 cm). Stimuli were randomly assigned to a location on the display with a small random jitter, based on two concentric circular grids occupying an area of 15 × 15 cm on the center of participants’ screens. The larger grid had a diameter of 13.8 cm, and the smaller grid had a diameter of 7.4 cm. This size was chosen to allow participants with screens as small as 12.5 in. to see the full search display.
The stimuli were shown on a white background. In any given trial, there was only one target and one type of distractor on the display. In other words, displays were always target-present and homogeneous in terms of distractors. The stimuli had a black square dot on either their left or their right, and the task was to report the location of the dot on the target stimulus. Stimuli used in Experiments 1–12 are shown in Figure 1.
Unidimensional Experiments
>
In Experiment 1 (color search), the target was a red octagon with a white cross texture inside (Figure 1). Distractors shared the shape (octagon) and texture (cross) with the target, but their color was either orange, green, or pink. In Experiment 2 (shape search), the target was a gray octagon with a white cross texture. Distractors shared the color (gray) and texture (cross) with the target, but their shape was either a triangle, a house, or a square. In Experiment 3 (texture search), the target was a gray octagon with a white cross texture. Distractors shared the same color (gray) and shape (octagon) with the target, but their texture was made of either a dot, lines forming a tilted pound key, or a solid gray texture.
Tridimensional Experiments
>
In Experiments 4–12, the target was always a red octagon with a white cross texture inside (the same as Experiment 1) and the distractors were constructed by combining all the distractor colors, shapes, and textures used in Experiments 1–3. There were in total 27 types of tridimensional distractors (i.e., 3 colors × 3 shapes × 3 textures). To keep the study design consistent across all experiments, we only tested three types of distractors in each experiment. Therefore, we divided the tridimensional distractors into nine experimental sessions, each containing three types of distractors. Only one type of distractor was presented along with the target on a given trial.
<bold>Design</bold>
In each experiment, participants searched for the target among one of three types of distractors (e.g., in Experiment 1, participants searched for the red target among orange, pink, or green distractors). We also included a target-only condition where no distractors were presented. For each type of distractors, there were four distractor set sizes: 1, 4, 9, and 19. In total, each experiment contained 13 conditions that were repeated 48 times, summing up to a total of 624 trials. Sample displays are shown in Figure 2.
>
><anchor name="fig2"></anchor>
<bold>Procedure</bold>
Each trial began with a black cross appearing for 0.5 s at the center of the screen over a white background. A search display followed. Participants were asked to search for the target among distractors and report whether the black square dot was on the left or right side of the target by pressing the corresponding left or right arrow key on the keyboard. The search display remained on the screen for 5 s or until a response was made by the participants, whichever occurred first. Visual feedback (“Correct!” or “Wrong!”) was provided after each trial, lasting for 0.5 s. The trial then ended with a white background displayed for an interval of 0.5 s.
<bold>Behavioral–Computational Predictive Approach</bold>
When observers search for a known target among sufficiently different distractor items, processing occurs in parallel and simultaneously at all item locations, within a sufficiently large functional viewing field (Hulleman & Olivers, 2017). In the present study, we expected participants would perform a parallel search over the whole search display because we used similar stimuli as in previous studies where parallel search was obtained (see Buetti et al., 2019; Xu, Lleras, & Buetti, 2021). Such parallel processing is considered to be unlimited capacity, with search items being processed in an independent and exhaustive manner (Buetti et al., 2016). At each location, a contrast signal between the target template and the search item is computed. This contrast signal accumulates stochastically until reaching a rejection threshold, indicating that the item is no longer considered as a potential target (Buetti et al., 2016; Lleras et al., 2020; Townsend & Gregory Ashby, 1983). Items that are not rejected during this parallel processing stage are then scrutinized serially until the target is identified. In easy searches, when the target is surrounded by only one type of sufficiently different distractors, the only item that survives the parallel stage is typically the target (e.g., Buetti et al., 2016, 2019; Xu, Lleras, & Buetti, 2021).
The stochastic contrast accumulation that happens in parallel at all locations across the whole display produces a signature logarithmic increase in RT as a function of set size (Buetti et al., 2016; Lleras et al., 2020; Townsend & Gregory Ashby, 1983). The logarithmic slope LS indexes the time required for one single distractor to be rejected, which is influenced by the visual distinctiveness of the target in relation to the distractor. This distinctiveness term refers to a target–distractor perceptual difference in a top-down fashion—that is, a computation of how perceptually different the target is from the distractor. This distinctiveness is different from the concept of purely bottom-up contrast, which is a computation of how perceptually different an element in the scene is from its immediate surroundings (i.e., the background). The more distinctive the target, meaning the more dissimilar the target and distractors are, the shorter the time needed for this type of distractor to reach the rejection threshold, and the shallower the slope LS will be. Target contrast signal theory proposed that the steepness of the logarithmic slope LS is inversely proportional to the overall top-down contrast/distinctiveness signal
with LS being the logarithmic slope and α being a multiplicative constant factor (Lleras et al., 2020).
In the present study, we estimated the logarithmic search slopes for each of the three target–distractor color pairs (Experiment 1, unidimensional color search), three shape pairs (Experiment 2, unidimensional shape search), and three texture pairs (Experiment 3, unidimensional texture search). We then used these logarithmic slope values to predict the tridimensional search slopes LSc,s,t, where the target and distractors differed along color, shape, and texture. We considered 10 different predictive models (discussed in the Models Retained for Model Comparison section), each based on a unique assumption about how the contrast signals along the three dimensions combine to guide attention in the tridimensional search conditions. That is, for each model, we computed the predicted search RTs for each condition of a specific distractor type and a set size level, using Equation 2 (Lleras et al., 2020):<anchor name="eqn2"></anchor>
where RT0 represents the RT in the target-only condition and LSc,s,t represents the predicted search slope in tridimensional search. The final term is the natural logarithm of the total set size, including all distractors plus the target.
Next, we compared the predicted RTs with the observed RTs across all distractor type by set size conditions in Experiments 4–12 (tridimensional search) by regressing the observed RTs onto the predicted RTs (Equation 3):<anchor name="eqn3"></anchor>
where
In sum, for each tested model, we computed a set of predicted logarithmic slope LSc,s,t values, which, in turn, allowed us to compute a set of predicted RTs (RTpredicted) for all the conditions run in Experiments 4–12, using Equation 2. These predicted RTs were then compared to the observed RTs (using Equation 3) to determine how well each model predicts the tridimensional search performance. Overall, there were 108 mean RTs (27 tridimensional distractor types × 4 set size levels) predicted by each model. The validity of the 10 models was compared based on their
In addition to utilizing our behavioral-computational predictive approach, we also applied a
which equals:<anchor name="eqn5"></anchor>
The optimal value of Minkowski’s
<bold>Models Retained for Model Comparison</bold>
Models Relying on Guidance From One Feature Dimension
>
The models in this category are based on proposals from previous literature that suggest only a single feature dimension leads the multidimensional search (e.g., Alexander et al., 2019; Williams, 1967).
Model 1: Color-Only Model
>
This model assumes that the overall contrast signal in a tridimensional search is determined exclusively by the contrast in the color dimension (Equation 6). This model is rooted in a long-held belief that color is such a distinct feature dimension that it can overshadow other feature dimensions when they are presented together. This model is consistent with the findings of Williams (1967), who observed that when the target was characterized by color, shape, and size, it was primarily the color dimension that guided visual fixations:<anchor name="eqn6"></anchor>
which in turn means:<anchor name="eqn7"></anchor>
Here, LSc,s,t represents the predicted slope for distractors that differ from the target in color, shape, and texture. LScolor refers to the slope observed in Experiment 1, where distractors differed in color from the target.
Model 2: Best Feature Guidance Model
>
This model assumes that performance is determined solely by the largest contrast signal among the three relevant dimensions, with the other two signals being ignored (Equation 8). It is equivalent to saying that observers identify the feature dimension that most effectively differentiates items in the scene and concentrate exclusively on it to reject distractors. In other words, the search slope in the tridimensional search task is the same as the search slope of the most efficient unidimensional condition (Equation 9).
This model is conceptually similar to Guided Search 2.0 (Wolfe, 1994), which posits that within a specific feature dimension (e.g., color), a broadly tuned channel (e.g., red) that most effectively distinguishes the target from distractors is selected to accumulate activation in that feature map (e.g., color map). However, while Guided Search assumes that such selection occurs at the feature level, the best feature guidance model posits that it happens at the dimension level. This model also aligns with the finding in Williams (1967) that when the target was defined by both color and size, and the target size was at the largest level, then size could guide attention. This suggests that the feature dimension determining the overall guidance varies based on the utility of the available feature dimensions, rather than being fixed to a specific one:<anchor name="eqn8"></anchor>
which translates to<anchor name="eqn9"></anchor>
LSshape and LStexture represent the slopes where distractors have different shapes (observed in Experiment 2) or textures (observed in Experiment 3) than the target.
Note that Equation 9 is applied to each type of distractor in each of Experiments 4–12. This means that the winning feature dimension is determined independently for each type of distractor, rather than being fixed across different distractor types within or across different experiments.
Models Including Unweighted Color, Shape, and Texture
>
The models in this category assume that all relevant feature dimensions contribute signals in tridimensional searches, which is consistent with a number of visual search theories (e.g., Bundesen, 1990; Wolfe, 2021; Zelinsky, 2008).
Model 3: Three-Way Orthogonal Combination Model
>
This model assumes that contrast signals along color, shape, and texture combine orthogonally to form the overall contrast signal (Equation 10). This type of integration was shown previously in Xu, Lleras, and Buetti (2021), where the authors found that the overall contrast between the target and distractors that differ along both shape and texture was determined by the orthogonal sum of the two unidimensional contrast vectors. We hypothesized that color contrast would add to the overall contrast in the same orthogonal fashion:<anchor name="eqn10"></anchor>
which translates to<anchor name="eqn11"></anchor>
Model 4: Three-Way Collinear Integration Model
>
This model assumes that contrast signals along color, shape, and texture combine collinearly to form the overall contrast signals (Equation 12). This type of integration was shown previously in Buetti et al. (2019), where the authors found that the overall contrast between the target and distractors that differ along both color and shape was determined by the collinear sum of the two unidimensional contrast vectors. We hypothesized that texture contrast would add to the overall contrast in the same collinear fashion:<anchor name="eqn12"></anchor>
which translates to<anchor name="eqn13"></anchor>
Model 5: Color Collinear–Shape/Texture Orthogonal Integration Model
>
This model is based on both findings that color and shape contrast signals combine collinearly (Buetti et al., 2019), while texture and shape contrast signals combine orthogonally (Xu, Lleras, & Buetti, 2021). These results led to the hypothesis that, given that texture and shape are integral features, the texture and shape contrast signals would combine orthogonally (i.e., following a Euclidean metric, Garner, 1974), and that color would simply add to this (i.e., city-block metric, Garner, 1974) to form the overall contrast (Equation 14) because color is separable from shape. The name of this model emphasizes that color is collinearly added to the orthogonal combination of shape and texture in the final step:<anchor name="eqn14"></anchor>
which translates to<anchor name="eqn15"></anchor>
Model 6: Texture Orthogonal–Color/Shape Collinear Combination Model
>
This model is a variation of Model 5. Specifically, it assumes that color and shape contrast signals first combine collinearly, and then the texture contrast is orthogonally combined with this collinear sum (see Equation 16). The name of this model emphasizes that texture is orthogonally combined with the collinear sum of color and shape contrasts as the final step:<anchor name="eqn16"></anchor>
which solves into<anchor name="eqn17"></anchor>
Models Including Weighted Color, Shape, and Texture
>
In Models 3–6, we assumed that the contrast signals along the three dimensions are equally utilized in forming the overall guidance. However, the extent to which people utilize each dimension might not be uniform (e.g., Bundesen, 1990; Gaspelin & Luck, 2018; Liesefeld et al., 2018; Wolfe, 1994). This idea was computationally explored in Xu, Lleras, Gong, and Buetti (2024), wherein the authors utilized the paradigm introduced by Buetti et al. (2019), using the search slopes in unidimensional color searches and shape searches to predict search performance in bidimensional color and shape searches. The critical manipulation in Xu, Lleras, Gong, and Buetti was to introduce an instruction manipulation in the bidimensional searches: One group of participants was instructed to search for the target color, while another group was instructed to search for the target shape, and last, a final group was instructed to search for the target defined by both color and shape. The results showed that the manipulation of which feature dimension the participants were focused on was captured by corresponding changes in that dimension’s weight parameter. These findings suggested that observers might be able to allocate varying degrees of attentional priority to different feature dimensions as a function of the experimental conditions. In the context of the present study, the notion of the attentional priority or attentional weight becomes relevant when one dimension provides a larger contrast signal than the others, making it more informative, or when one dimension is naturally preferred by human visual systems (e.g., color; see Alexander et al., 2019; Williams, 1967). In such cases, there might be an imbalance in the attentional weight placed on different feature dimensions, influencing the extent to which people utilize contrast signals along each dimension to guide their attention.
Therefore, for the four models that incorporate signals from all three dimensions, we considered a variation where an attentional weight parameter was added to each dimension.
Model 7: Weighted Three-Way Orthogonal Combination Model
>
This model introduces an attentional weight parameter to each of the color, shape, and texture components (Equation 18), building upon the original three-way orthogonal combination model (Model 3). The sum of the three weights is constrained to equal 3, ensuring comparability with the original model:<anchor name="eqn18"></anchor>
which solves into<anchor name="eqn19"></anchor>
with the constraint that<anchor name="eqn20"></anchor>
Model 8: Weighted Three-Way Collinear Integration Model
>
This model also introduces an attentional weight parameter for each dimension (Equation 21), following the framework of the original three-way collinear integration model (Model 4), while maintaining the constraint that the sum of the three weights equals 3:<anchor name="eqn21"></anchor>
which solves into<anchor name="eqn22"></anchor>
Model 9: Weighted Color Collinear–Shape/Texture Orthogonal Integration Model
>
This model introduces weight parameters to both the color component and the combined shape and texture component (Equation 23), based on the original color collinear–shape/texture orthogonal integration model (Model 5). The sum of the two weights should be 2, maintaining consistency with the original model:<anchor name="eqn23"></anchor>
which solves into<anchor name="eqn24"></anchor>
with the constraint that<anchor name="eqn25"></anchor>
Model 10: Weighted Texture Orthogonal–Color/Shape Collinear Combination Model
>
This model adds weight parameters to both the combined color and shape component and the texture component (Equation 26), based on the original texture orthogonal–color/shape collinear combination model (Model 6). Similarly, the sum of these two weights should be 2:<anchor name="eqn26"></anchor>
which solves into<anchor name="eqn27"></anchor>
with the constraint that<anchor name="eqn28"></anchor>
Estimation of the Attentional Weight w
>
For all the weighted models, the optimal values for the attentional weight
<bold>Search Slopes Observed in Unidimensional Search Experiments</bold>
Figure 3 shows the changes in search times as a function of the stimulus set size for each target–distractor pair in Experiments 1–3. Table 2 summarizes the logarithmic slopes observed in these experiments.
>
><anchor name="fig3"></anchor>
>
><anchor name="tbl2"></anchor>
<bold>Search Slopes Observed in Tridimensional Search Experiments</bold>
Table 3 summarizes the logarithmic search slopes observed in Experiments 4–12.
>
><anchor name="tbl3"></anchor>
<bold>Model Comparison</bold>
Table 4 shows the performance of the 10 models tested in the study, along with the Minkowski
>
><anchor name="tbl4"></anchor>
Single Dimension Models
>
Among the models relying on one feature dimension, Model 2 (best feature guidance model;
Weighted Versus Unweighted Models
>
Adding weight terms substantially increases the model fit. Specifically, Model 7 (weighted three-way orthogonal model), Model 8 (weighted three-way collinear model), Model 9 (weighted color collinear–shape/texture orthogonal integration model), and Model 10 (weighted texture orthogonal–color/shape collinear combination model) were 8.8 × 10<sups>10</sups>, 2.3 × 10<sups>13</sups>, 1.4 × 10<sups>11</sups>, and 5 times more likely, respectively, than their corresponding unweighted models (Models 3–6) in accounting for the variability in the observed data. We can conclude that, although participants were integrating information across the three feature dimensions, not all information contributed equally to guidance. The results consistently suggested that participants weighed the information coming from the color dimension more heavily than the one coming from the shape and texture dimensions (Model 7, weighted three-way orthogonal model:
Optimal Minkowski’s r
>
The Minkowski
In the current set of experiments, the optimal
The Winning Model
>
In comparing the four weighted models, the
To visualize the model performance, Figure 4 displays the observed RTs from Experiment 4–12 as a function of the predicted RTs for the four weighted models. These models were constructed based on the most current understanding of how color, shape, and texture combine—that is, color and shape are presumed to combine collinearly, and shape and texture orthogonally, with room for attentional modulation. Note that in each figure, there are 108 RTs being predicted (i.e., four set size levels by 27 tridimensional distractor types).
>
><anchor name="fig4"></anchor>
<bold>Optimal Weight Stability Analysis</bold>
The weighted models (Models 7–10) all contained attentional weight parameters that were estimated on the entire data set. When comparing the weighted models to unweighted models, the implicit assumption is that weight parameters are not free parameters, but rather a characteristic of the data set that is inherent to the condition, and thus should not be counted as an additional parameter in the model. To test the validity of this assumption, we performed a split-half analysis to validate the optimal weights.
For this analysis, we estimated optimal weights of the two top-performing weighted models (Models 7 and 9) using a training data set made up of half of the total data set and assessed their predictive accuracy on the remaining half of the data (the testing set). We began by using the complete data set from the unidimensional experiments (i.e., Experiments 1–3) to estimate unidimensional search slopes. Next, we randomly sampled half of the tridimensional trials to determine optimal weight parameters during tridimensional searches and constructed the two top-performing weighted models (Models 7 and 9). We then tested these models on the remaining half of the tridimensional trials. This process was repeated 100 times to arrive at the parameter estimates and model performance presented in Table 5 and Figure 5. Results showed that both weighted models, constructed based on the training sets, successfully predicted data in the testing sets (
>
><anchor name="tbl5"></anchor>
>
><anchor name="fig5"></anchor>
Results from Experiments 1–12 demonstrated that during tridimensional searches, people likely incorporate information from all the feature dimensions to guide their attention, as the top three performing models all incorporate the contrast signals from all three feature dimensions. Notably, while all three dimensions contribute to search performance, there is a tendency to prioritize signals from the color dimension. This is evidenced by the large optimal weights associated with the color dimension across all weighted models. The optimal Minkowski’s
The fact that color distinctiveness produced more efficient searches than shape and texture might explain why the best feature guidance model outperformed five out of eight models that consider signals from all three dimensions, including all the unweighted tridimensional models and the weighted texture orthogonal–color/shape collinear model. It could also explain why Model 1 (color-only model) outperformed the unweighted Model 3 (three-way orthogonal model) and Model 4 (three-way collinear model). Given that color in this set of experiments tended to have overall stronger guiding signals than shape or texture, models prioritizing the contribution of color had an advantage in explaining the data, regardless of whether the model captured the underlying structure of how signals combine across different dimensions.
Next, we completed a second set of experiments where the contrast signals along the color, shape, and texture dimensions had comparable ranges. The goal was to evaluate whether people would still preferentially allocate a higher attentional weight to the color dimension or whether the weights reflect the usefulness of a feature dimension, in which case, we would expect similar weight parameters for all three dimensions.
Experimental Set 2
>
A set of 12 experiments were conducted using the same paradigm as before, each with a naïve group of participants, but using a new set of feature parameters with more comparable distinctiveness signals across the three visual dimensions and where color no longer had a guidance advantage over shape and texture.<anchor name="b-fn1"></anchor><sups>1</sups> All data, analysis code, and research materials are available on the Open Science Framework (Xu, Lleras, & Buetti, 2024) at
<bold>Participants</bold>
Undergraduate students from the University of Illinois at Urbana–Champaign completed the experiment in exchange for course credit. Since this experimental set was conducted in person, the sample size was determined based on previous in-person experiments in our lab, which showed that averaging the data of 20 subjects produces stable estimates of the group means of reaction times and search slopes for a given search condition (e.g., Buetti et al., 2016; Madison et al., 2018; Ng et al., 2018; Wang et al., 2018) and is sufficient to obtain differentiation between models (e.g., Buetti et al., 2019; Lleras et al., 2019; Wang et al., 2017; Xu, Lleras, & Buetti, 2021; Xu, Lleras, Shao, & Buetti, 2021). Similar to Set 1, a post hoc power analysis (see the Appendix) showed that satisfying model distinguishability started around a sample size of 20, and weight parameters stabilized around a sample size of 10 for Set 2. We aimed to include 25 valid participants in each in-person experiment, but because of the nature of data collection, sometimes more participants ended up being run (e.g., participants had already signed up to participate in the experiment).
For each experiment, participant inclusion criteria were the same as Set 1 except for the following: (a) Search accuracy rate was now calculated as the percentage of trials in which participants made a correct response divided by the total number of trials (i.e., we did not remove time-out trials before calculating accuracy), and (b) an individual’s average RT should fall within 2.5 standard deviations of the group average RT, instead of 2 standard deviations in Set 1. The number of recruited participants, the number of participants included in the analysis, the number of participants excluded for each criterion, demographic information of included participants, and summary statistics of the measurements in each experiment are reported in Table 6.
>
><anchor name="tbl6"></anchor>
<bold>Apparatus and Stimuli</bold>
Experiments were again programmed in JavaScript and conducted on Pavlovia. Participants used Mac minis and gamma-corrected 24-in. LCD displays with a 239.75-Hz refresh rate and a resolution of 1,920 × 1,080 in the lab to complete them. Stimuli were randomly displayed on a grid located at the center of the screen, spanning approximately 27 × 27 cm (approximately 25° of visual angle, with a viewing distance of 60 cm). The stimuli, measuring roughly 1.4 × 1.4 cm in physical size (1.3° × 1.3° of visual angle), were positioned with a small random jitter based on three concentric circular grids, measuring about 25.5 cm, 13.7 cm, and 7.4 cm in diameter, respectively. Examples of stimuli are shown in Figure 6.
>
><anchor name="fig6"></anchor>
Unidimensional Experiments
>
In Experiments 13–15, the target was always a red (
>
><anchor name="tbl7"></anchor>
Tridimensional Experiments
>
In Experiments 16–24, the target was the same as in Experiments 13–15 (a red octagon with a white cross texture inside), and the distractors were constructed by combining all the distractor colors, shapes, and textures used in Experiments 13–15. All the rest were the same as in the first set of experiments.
<bold>Design</bold>
The design and procedure were the same as in the first set of experiments, except that for this set of experiments, for each type of distractors, there were five distractor set sizes: 2, 4, 9, 19, and 31. We also included a target-only condition where no distractors were presented. In total, each experiment contained 16 conditions that were repeated 40 times, summing up to a total of 640 trials.
<bold>Behavioral–Computational Predictive Approach</bold>
The same predictive approach as used in Experimental Set 1 was adopted for this set of experiments. The optimal value of Minkowski’s
<bold>Search Slopes Observed in Unidimensional Search Experiments</bold>
Figure 7 shows how the search times changed as a function of the stimulus set size for each target–distractor pair in Experiments 13–15. Table 7 summarizes the logarithmic slopes observed in these experiments.<anchor name="b-fn2"></anchor><sups>2</sups>
>
><anchor name="fig7"></anchor>
<bold>Search Slopes Observed in Tridimensional Search Experiments</bold>
Table 8 summarizes the logarithmic search slopes observed in Experiments 16–24.
>
><anchor name="tbl8"></anchor>
<bold>Model Comparison</bold>
Table 9 shows the performance of the 10 models tested for this set of experiments and the Minkowski
>
><anchor name="tbl9"></anchor>
Single Dimension Models
>
Model 2 (best feature guidance model) and Model 1 (color-only model) exhibited poorer performance compared to the other eight models that combine contrast signals from all three dimensions. This result aligns with what we found in the first set of experiments, indicating that during tridimensional searches, people likely incorporate signals from more than one dimension to guide their search. This finding contrasts with the theories solely relying on color, as suggested by Williams (1967), or those suggesting the use of the most informative dimension to guide the search.
Weighted Versus Unweighted Models
>
The results show that adding the weight terms increases the model fit. Specifically, weighted Models 7–10 were 4.6 × 10<sups>12</sups>, 1.4 × 10<sups>4</sups>, 2.3 × 10<sups>4</sups>, and 1.3 times more likely, respectively, than their corresponding unweighted models in explaining the observed data. Similar to the first data set (Experiments 1–12), participants favored information from the color dimension, as evidenced by a systematic bias in the dimensional weights: Across all the weighted models, the weight assigned to color exceeds one (see Table 9). This trend is particularly evident in the winning weighted three-way orthogonal combination model (Model 7), where the color weight (
The Winning Model
>
The winning model was Model 7 (weighted three-way orthogonal model), which was nearly 30,000 times more likely than the second best performing Model 9 (weighted color collinear–shape/texture orthogonal model; and even more for the rest of the models) in explaining the observed data (see the full AIC comparison results in Table 9).
To visualize model performance, Figure 8 displays the observed RTs from Experiments 16–24 as a function of the predicted RTs for the four weighted models. Note that in each figure, there are 135 RTs being predicted (i.e., 5 set size levels by 27 tridimensional distractor types).
>
><anchor name="fig8"></anchor>
<bold>Optimal Weight Stability Analysis</bold>
We performed a split-half analysis to validate the optimal weights in Experimental Set 2 using the same procedure as in Set 1. As is shown in Table 10, both weighted models constructed based on the training set successfully predicted performance on the testing set, achieving almost identical
>
><anchor name="tbl10"></anchor>
>
><anchor name="fig9"></anchor>
In the second set of experiments, we adjusted the distinctiveness signals along the color dimension so that it would not have a guidance advantage over shape and texture, as it did in Experimental Set 1. This adjustment led to several differences in people’s search performance compared to the first set of experiments.
<bold>Optimal Attentional Weights</bold>
A quick visual comparison between Tables 4 and 9 reveals that, across all weighted models, the color (or color and shape integrated) weights were always substantially larger in the first set of experiments compared to the second set. These results suggest that, when the target–distractor distinctiveness across the three feature dimensions was more comparable, participants placed relatively less priority on the color dimension. However, observers still continued to prioritize color relative to the other dimensions (
We also observed that the performance of Model 2 (best feature guidance model) and Model 1 (color-only model) was worse in the second set of experiments compared to the first set. This confirms that the advantages of these two models in the first set may have stemmed from the fact that, more often than not, color produced stronger guiding signals than shape or texture in that stimulus set. Once we better controlled for the strength of guiding signals across the three dimensions, these two models lost their advantages and
<bold>Optimal Minkowski’s r</bold>
The change in priority allocated to the three dimensions is also evident in the optimal values of Minkowski’s
<bold>Winning Models</bold>
Interestingly, the weighted three-way orthogonal model and weighted color collinear–shape/texture orthogonal model were the top two performing models across both data sets (see Figures 4 and 8), despite differences in the stimuli and experimental conditions (online for the first set vs. in-lab for the second set). Furthermore, the relative changes in weights in these models as well as the change in Minkowski’s
Additional Analyses: Reliability Analyses of the Winning Models Across Two Sets
>
Across two data sets, we demonstrated that two models, namely, the weighted color collinear–shape/texture orthogonal model in the first set of experiments and the weighted three-way orthogonal model in the second set of experiments, had great predictive power and were substantially more likely than any other candidate models. Consequently, the mathematical laws underlying feature combination appear to vary as a function of the ecology of the experimental conditions. This dependence on the experimental conditions limits the generalizability of the results.
To better understand how the winning model varies with experimental conditions, we bootstrapped the data sets to obtain the range of the two top-performing models’
>
><anchor name="fig10"></anchor>
>
><anchor name="tbl11"></anchor>
We also sampled participants with replacement for 100 times to examine the reliability of the winning model across participants (Figure 10, bottom). The results showed the weighted three-way orthogonal model achieves a higher
The conclusion that an orthogonal combination model might be more universal is also supported by recent findings from Hughes et al. (2024), who also found stronger evidence for an orthogonal combination model when using a Bayesian modeling technique to evaluate feature combination rules at the participant level, when evaluating two dimensions (color and shape). That said, as was mentioned in Hughes et al., we acknowledge that it is difficult from a one-time calculation over the participants’ aggregated data to decisively conclude which model is the best, given variability within participants and between participants and variability induced by stimuli selection. The two winning models produce very similar
General Discussion
>
Visual search in real life is almost always multidimensional. Looking for a phone among laptops, keyboards, books, and mugs, which differ in color, shape, texture, size, etc., happens much more often than looking for an object that differs from surrounding objects along only one specific feature, like size or color. When visual information is available along multiple dimensions, how does the human visual system utilize these various sorts of information to find a target? Does the visual system utilize them all? Do we prioritize some dimensions and deprioritize others? Is a multidimensional search more efficient than comparable unidimensional searches, or does the additional informational load in multidimensional searches incur a processing cost? The results from the present study provide us with an initial set of answers to these questions. Yes, the visual system can use all the information available, along at least three feature dimensions, to guide attention. Furthermore, the attentional guidance system values information along some visual dimensions, like color, more than others, like shape and texture. Finally, multidimensional search is more efficient than comparable unidimensional searches. These answers represent an initial stepping stone toward understanding attentional guidance in real-life search scenarios.
Previous research had demonstrated that, in bidimensional searches, where the target differed from distractors along two dimensions, distinctiveness signals from each dimension combine in a lawful manner to guide attention (e.g., Buetti et al., 2019; Hughes et al., 2024; Xu, Lleras, & Buetti, 2021). The present study pushed these initial investigations further to understand how distinctiveness signals along three feature dimensions—specifically, color, shape, and surface texture—integrate to produce overall top-down guidance in tridimensional searches. Across two sets of experiments, we also tested how the feature integration rules vary with different stimuli. In the first set, color tended to provide larger guidance signals; in the second set, guidance from the three feature dimensions was more comparable. As a reminder, in tridimensional searches, the target never shared any features with the distractors; therefore, the visual signals from all three feature dimensions differentiated the target from the distractors. Several findings emerged from this study.
>
><anchor name="fig11"></anchor>
These findings offer insights into the properties of target templates. They suggest that, at least for the three dimensions tested here, the target template representation contains information along all of them and that they are all utilized by the human visual system to guide search. This conclusion seems to stand in contrast with Williams (1967), where the author concluded that participants’ fixations during search were mainly guided by color when the target was defined along two (i.e., color and shape, or color and size) or three dimensions (i.e., color, shape, and size).
There are several possible ways of reconciling Williams’s and the current results. In Williams (1967), stimuli consisted of forms in specific colors, shapes, and sizes, each containing a two-digit number. Participants searched for a target number and were provided with a verbal description containing varying amounts of information regarding the color, shape, and size of the associated form right before each search trial. The search displays were always heterogeneously composed of 100 stimuli, each defined by a unique combination of colors, shapes, and sizes. Because the target information changed randomly throughout the experiment, there was no fixed visual target template when participants performed the search. Previous research has shown that search slows down when the target template is verbally presented rather than visually shown (e.g., Malcolm & Henderson, 2009) and when the target template (especially the feature dimensions defining the target) changes across trials (e.g., Krummenacher et al., 2001; Lleras et al., 2025; Müller et al., 2003). Due to the varying nature of the target in Williams’s study, it may have been difficult for participants to build a stable and useful target template. Consequently, participants might have adopted a minimal effort strategy (Irons & Leber, 2016, 2020), using the easiest feature dimension to narrow down the possible target locations, then serially fixating on each item until finding the target. In contrast, in our task, the target was defined by a fixed color, shape, and texture throughout the task, making it easier for participants to create and maintain a stable target template containing information from all three feature dimensions to guide the search.
Also, the stimuli selection in Williams’s study might have been biased toward the color dimension. In Williams’s study, search was faster when the target was specified by color (mean time = 7.6 s) compared to when it was specified by size (mean time = 16.4 s) or shape (mean time = 20.7 s), and as a baseline, searching based on the target number alone produced a mean time of 22.8 s. These results reflect a situation where color information might have been prioritized over the other dimensions simply because it carried larger contrast signals than the other dimensions, and it likely minimized the contribution of the latter two dimensions to attentional guidance. In the present study, we ran two separate experimental sets manipulating the relative usefulness of the color dimension, and we were able to isolate the color advantage produced by the specific color features from any inherent preference toward the color dimension in the human visual system.
Additionally, our computational methodology is likely more sensitive in detecting the contributions of various features to guidance, even in cases where one visual dimension appears more useful than the others (as observed in the first set of experiments). It is possible that Williams’s (1967) method, which primarily involved observing where the majority of eye fixations occurred, was not sensitive enough to detect contributions from other dimensions.
There is evidence in both behavioral and neuroscientific studies supporting the idea that people adjust the attentional weight associated with a feature dimension based on its relative usefulness (e.g., Found & Müller, 1996; Grubert et al., 2011; Lee & Geng, 2020; Müller et al., 2003; Xu, Lleras, Gong, & Buetti, 2024; X. Yu & Geng, 2019; J. M. Yu et al., 2025). Behaviorally, Xu, Lleras, Gong, and Buetti (2024) found that a simple instruction manipulation influenced the degree of attention paid to color and shape. Using a singleton search paradigm, Grubert et al. (2011) also found that people searched faster for a bidimensional target than a unidimensional one, but this benefit was stronger when the participants knew they were looking for a bidimensional target compared to when they were expecting a unidimensional target. This indicates that the same stimuli were better utilized when preparing to receive signals from both color and shape dimensions than when preparing to receive a difference signal along only one dimension. Neurologically, Töllner et al. (2008) showed that the N2pc component (a covert attention indicator) appeared earlier in dimension-repetition trials (where the target differed from distractors along the same dimension as the previous trial) compared to dimension-switching trials, which indicates an attentional focus change due to an intertrial effect (also see Gramann et al., 2010; Töllner et al., 2010). Using functional magnetic resonance imaging, Pollmann et al. (2000) found that in an oddball search task, when the target is defined in the same dimension within a block, the associated brain areas are activated to a higher level compared to when the target defining dimension changed across trials, indicating the possibility of attentional focus enhancement on particular feature dimensions. Finally, such finding of dynamic weight adjustment also aligns nicely with the concept of weight in artificial neural networks, which adjusts dynamically based on new input to maximize performance (e.g., Krizhevsky et al., 2017; LeCun et al., 2015; Rumelhart et al., 1986; Thakur & Peethambaran, 2020).
<h31 id="xge-155-3-839-d543e2672">On the Uniqueness of Color as a Guiding Feature</h31>Our results showed that there appears to be an inherent preference for attending to color information in tridimensional searches. The higher emphasis placed on color was evident across both sets of experiments when color generally provided a larger contrast signal than the other two dimensions (Set 1) and when color provided approximately the same degree of featural contrast as shape and texture (Set 2). Since the weight represents how much of the associated unidimensional contrast signal contributes to the tridimensional search guidance, the fact that the color weight (or a combined color and shape weight) value is larger than 1 across all the weighted models indicates an inherent preference or prioritization for the color dimension. This observed preference for color is consistent with earlier findings that people tend to prioritize color signals (e.g., Alexander et al., 2019; Hulleman, 2020; Williams, 1967; Xu, Lleras, Gong, & Buetti, 2024). For instance, Xu, Lleras, Gong, and Buetti (2024) found that regardless of the dimension(s) participants were instructed to focus on (color, shape, or both), the attentional weight associated with the color dimension consistently remained above 1 (and the weight associated with shape was below 1), indicating a persistent prioritization of color regardless of search instructions. These findings on color preference align well with Conway’s (2014) proposal that color is a privileged perceptual dimension that is meant to index interest in the visual world. Color information is also likely more robust to optical and perceptual transformations than shape information. For instance, color information can survive changes in accommodation that typically blur the shape of objects that are beyond the depth of field of the current fixation. Color information can also help recover identity information of low-resolution objects presented in the periphery (e.g., Castelhano & Henderson, 2008; Oliva & Schyns, 2000; Oliva & Torralba, 2001; Rousselet et al., 2005; Torralba, 2009; Wurm et al., 1993).
<h31 id="xge-155-3-839-d543e2714">On Model Performance Indices</h31>In both sets of experiments, to indicate model performance, we prioritized the index of
Here, we observed a prediction slope of 1.03 and an intercept of 4.22 in Set 1, which would indicate a relatively perfect prediction. However, in Set 2, the prediction slope of the winning model was 0.73 and the intercept was 171.74. These values deviate systematically from a perfect prediction, yet the high
We should note that we are not overly concerned by the fitted intercept values larger than zero (as was the case in Set 2). Such a constant offset in prediction times is likely to reflect non-search-related processes, such as differences in perceptual encoding, response selection, and response execution. More likely in our experiments, this offset might be capturing differences in overall RTs between groups, since unidimensional and tridimensional searches were conducted on different groups of participants.
The fitted slope distant from 1 could be seen as a “correction” to the predicted search slope LS, reflecting the extent to which the slope measured in unidimensional searches underpredicts the speed of contrast accumulation in multidimensional searches. Such a misalignment indicates that processing efficiency is higher in the multidimensional search compared to the unidimensional search. In fact, this is what we proposed in Xu, Lleras, and Buetti (2021): When shape and texture combine, there is an overall prediction slope of 0.75, meaning that participants searched faster than we predicted by a factor of 3/4 in their search time, which might arise from coactivation when combining signals from different dimensions. In Set 2, the weighted three-way orthogonal model achieves the highest
A limitation of the current methodology comes from the fact that incorporating signals from three dimensions almost invariably results in very rapid search, producing shallow tridimensional search slopes and thereby leaving limited room to observe differences in modeling. That being said, our best performing models did account for over 90% of the available variance in the data. Next steps might involve a continued exploration of feature space with the goal of selecting stimuli that yield smaller unidimensional distinctiveness (i.e., steeper unidimensional slopes), ensuring a larger range of overall distinctiveness when combining the three dimensions.
Furthermore, in the present study, attentional weights were estimated from the observed tridimensional search data, rather than determined a priori, which limits the predictive power of the models. However, the split-half analysis in both Sets 1 and 2 suggests that the weight parameters are stable estimates of the prioritization people place on different feature dimensions. Moreover, changes in these weight values across different experimental sets (see Figure 12) reflect modifications in the properties of the search stimuli rather than mere data noise. These results open possibilities for making a priori predictions of the attentional weights. Moving forward, we aim to develop methods for quantitatively predicting the values of attentional weights based on contextual factors, such as the relative usefulness of different feature dimensions, the innate preference for certain dimensions, or the top-down emphasis placed on specific dimensions (as demonstrated in Xu, Lleras, Gong, & Buetti, 2024). By making a priori predictions about how the human visual system dynamically modulates attentional weights as a function of these contextual factors, we can enhance our ability to forecast human behavior in more complex and realistic search scenarios.
>
><anchor name="fig12"></anchor>
Additionally, modeling in the present study was performed on aggregated data (i.e., individual averaged RTs were computed across trials, and then group averaged RTs were computed across participants). This approach ensures stable search slope and RT estimates, but it comes at the cost of not being able to predict variability between participants. That is, we were unable to model how any given participant’s unidimensional slopes combined to predict tridimensional slopes at the individual level. (However, we did include both participant-level and trial-level bootstrapping as an additional analysis to examine the variation in the winning models’ performance to mimic between- and within-participant variability.) In other words, the current modeling results are more representative of an average participant, rather than of any one participant. Future studies could use a multilevel modeling approach (e.g., Hughes et al., 2024) to better separate the contribution of the manipulated task factors (e.g., target–distractor distinctiveness, set size) from participant- and trial-level variability, which allows for a better modeling of individual differences and intertrial variation in search behavior.
Finally, in this study, target–distractor distinctiveness was indexed by the participants’ logarithmic search slope, rather than measured on a calibrated perceptual space. While we did use CIELAB space to select the colors in Set 2 (colors were taken from an iso-lightness color circle in the CIELAB space), no similar feature space was used to aid stimuli selection of shape or texture, since there is no calibrated perceptual space for measuring similarity along shape or texture. This is why, in the present study, we chose to use search efficiency as an operational definition of the perceived featural difference between two stimuli. This is not too far-fetched because search efficiency is (theoretically and empirically) related to perceptual similarity (see Duncan & Humphreys, 1989, 1992, for the theoretical link). Indeed, in a recent study from our laboratory (Lleras et al., 2025), we studied the direct relationship between search efficiency and perceptual similarity in color space, using the CIELAB space, which is a calibrated perceptual space for color. A similar relationship between search performance and color similarity was also found in Chapman and Störmer (2024), where the authors demonstrated that the search slope is directly related to the inverse of the color distance in the CIELAB space, and in Chapman and Störmer (2022), where they observed a relationship between the search RT and color distance; albeit, these authors found these relationships at the higher end of the similarity space. It will be important to continue to study and understand the relationship between search slopes and perceptual similarity across different feature dimensions.
<h31 id="xge-155-3-839-d543e2803">Constraints on Generality</h31>This study was conducted with undergraduate students at the University of Illinois, Urbana-Champaign, as well as participants recruited from Prolific, an online data collection platform frequently used in psychological research. Participants were aged 18–30 years, required to have normal visual acuity and color vision, and included 201 males, 526 females, 13 nonbinary individuals, and one participant who chose not to respond. The reported findings should generalize well to the general population within a similar age range and across genders, with normal visual acuity and color vision.
Conclusion
>
The present study demonstrated that visual distinctiveness signals from color, shape, and texture all contributed to predict search performance in tridimensional search. The modeling suggested that the distinctiveness signals across these dimensions combine in a weighted three-way orthogonal manner to determine the overall distinctiveness that guides tridimensional search. We also quantitatively estimated the attentional weight parameter for each feature dimension, which captured the extent to which people prioritize the signal from that specific dimension. Finally, by manipulating the usefulness of the color dimension relative to shape and texture, we showed that people have an inherent preference for using color information to guide search. In addition to that preference, the relative usefulness of each feature dimension also influences the extent to which any one dimension is prioritized in a given search scenario.
This study provides not only scientific evidence regarding how vision manages complex, multidimensional signals but also a framework for modeling complex task performance using simpler ones. Our findings also contribute to applied fields such as product design. Understanding how humans process visual information can inform the creation of more visually intuitive and user-friendly products, such as by incorporating the most efficiently combined feature dimensions in visual elements and prioritizing important information using the inherently preferred dimension of color. The conclusions drawn from this study also have implications for the development of more biologically accurate neural networks to better understand and predict human behaviors on a larger scale.
Footnotes
<anchor name="fn1"></anchor><sups> 1 </sups> A pilot color search experiment containing eight participants was run to select the target and distractor colors that produced slopes in the similar range as shape and texture, which stayed the same across Set 1 and Set 2.
<anchor name="fn2"></anchor><sups> 2 </sups> Note that although Sets 1 and 2 used the same shape and texture features, there were some critical differences between the two sets of experiments that complicate direct comparison of slope values across the two sets. First, the search grid was larger in Set 2 (27 × 27 cm) compared to Set 1 (15 × 15 cm), resulting in greater average target eccentricity in Set 2. Eccentricity is known to slow down search efficiency (e.g., Carrasco et al., 1995; Carrasco & Frieder, 1997; Wang et al., 2018). This effect is especially pronounced in more difficult conditions (i.e., those eliciting steeper slopes), where low target–distractor discriminability makes it particularly challenging to identify targets in the far periphery. In easier conditions (e.g., distractors such as squares, triangles, and solid textures), the large discriminability signals may reduce the impact of larger eccentricity on performance. Second, the stimuli in the shape-only and texture-only conditions were gray in Set 1, but red in Set 2. It is unclear how this signal may have impacted search efficiency, but it is worth pointing out that the unidimensional perceptual comparisons were different across the two sets.
References
<anchor name="c1"></anchor>Adeli, H., Vitu, F., & Zelinsky, G. J. (2017). A model of the superior colliculus predicts fixation locations during scene viewing and visual search.
Alexander, R. G., Nahvi, R. J., & Zelinsky, G. J. (2019). Specifying the precision of guiding features for visual search.
Appelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M., & Rao, S. M. (2018). Journal article reporting standards for quantitative research in psychology: The APA Publications and Communications Board Task Force Report.
Buetti, S., Cronin, D. A., Madison, A. M., Wang, Z., & Lleras, A. (2016). Towards a better understanding of parallel visual processing in human vision: Evidence for exhaustive analysis of visual information.
Buetti, S., Xu, J., & Lleras, A. (2019). Predicting how color and shape combine in the human visual system to direct attention.
Bundesen, C. (1990). A theory of visual attention.
Cant, J. S., Arnott, S. R., & Goodale, M. A. (2009). fMR-adaptation reveals separate processing regions for the perception of form and texture in the human ventral stream.
Cant, J. S., & Goodale, M. A. (2007). Attention to form or surface properties modulates different regions of human occipitotemporal cortex.
Carrasco, M., Evert, D. L., Chang, I., & Katz, S. M. (1995). The eccentricity effect: Target eccentricity affects performance on conjunction searches.
Carrasco, M., & Frieder, K. S. (1997). Cortical magnification neutralizes the eccentricity effect in visual search.
Castelhano, M. S., & Henderson, J. M. (2008). The influence of color on the perception of scene gist.
Cavina-Pratesi, C., Kentridge, R. W., Heywood, C. A., & Milner, A. D. (2010). Separate channels for processing form, texture, and color: Evidence from FMRI adaptation and visual object agnosia.
Chapman, A. F., & Störmer, V. S. (2022). Feature similarity is non-linearly related to attentional selection: Evidence from visual search and sustained attention tasks.
Chapman, A. F., & Störmer, V. S. (2024). Target-distractor similarity predicts visual search efficiency but only for highly similar features.
Chun, M. M., & Wolfe, J. M. (1996). Just say no: How are visual searches terminated when there is no target present?
Conway, B. R. (2014). Color signals through dorsal and ventral visual pathways.
Cui, A. Y., Buetti, S., Xu, Z. J., & Lleras, A. (2025). Evaluating the contribution of parallel processing of color and shape in a conjunction search task.
Duncan, J., & Humphreys, G. (1992). Beyond the search surface: Visual search and attentional engagement.
Duncan, J., & Humphreys, G. W. (1989). Visual search and stimulus similarity.
Eckstein, M. P., Thomas, J. P., Palmer, J., & Shimozaki, S. S. (2000). A signal detection model predicts the effects of set size on visual search accuracy for feature, conjunction, triple conjunction, and disjunction displays.
Folk, C. L., Remington, R. W., & Johnston, J. C. (1992). Involuntary covert orienting is contingent on attentional control settings.
Found, A., & Müller, H. J. (1996). Searching for unknown feature targets on more than one dimension: Investigating a “dimension-weighting” account.
Garner, W. R. (1974).
Garner, W. R., & Felfoldy, G. L. (1970). Integrality of stimulus dimensions in various types of information processing.
Gaspelin, N., & Luck, S. J. (2018). Distinguishing among potential mechanisms of singleton suppression.
Gramann, K., Töllner, T., & Müller, H. J. (2010). Dimension-based attention modulates early visual processing.
Grubert, A., Krummenacher, J., & Eimer, M. (2011). Redundancy gains in pop-out visual search are determined by top-down task set: Behavioral and electrophysiological evidence.
Hamblin-Frohman, Z., & Becker, S. I. (2021). The attentional template in high and low similarity search: Optimal tuning or tuning to relations?
Henderson, J. M., Malcolm, G. L., & Schandl, C. (2009). Searching in the dark: Cognitive relevance drives attention in real-world scenes.
Hoffman, J. E. (1979). A two-stage model of visual search.
Hughes, A. E., Nowakowska, A., & Clarke, A. D. (2024). Bayesian multi-level modelling for predicting single and double feature visual search.
Hulleman, J. (2020). Quantitative and qualitative differences in the top-down guiding attributes of visual search.
Hulleman, J., & Olivers, C. N. L. (2017). The impending demise of the item in visual search.
Irons, J. L., & Leber, A. B. (2016). Choosing attentional control settings in a dynamically changing environment.
Irons, J. L., & Leber, A. B. (2020). Developing an individual profile of attentional control strategy.
Itti, L., & Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts of visual attention.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks.
Krummenacher, J., Müller, H. J., & Heller, D. (2001). Visual search for dimensionally redundant pop-out targets: Evidence for parallel-coactive processing of dimensions.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning.
Lee, J., & Geng, J. J. (2020). Flexible weighting of target features based on distractor context.
Liesefeld, H. R., Liesefeld, A. M., Pollmann, S., & Müller, H. J. (2018). Biasing allocations of attention via selective weighting of saliency signals: Behavioral and neuroimaging evidence for the dimension-weighting account. In T.Hodgson (Eds.),
Lleras, A., Buetti, S., & Xu, Z. J. (2022). Incorporating the properties of peripheral vision into theories of visual search.
Lleras, A., Wang, Z., Madison, A., & Buetti, S. (2019). Predicting search performance in heterogeneous scenes: Quantifying the impact of homogeneity effects in efficient search.
Lleras, A., Wang, Z., Ng, G. J. P., Ballew, K., Xu, J., & Buetti, S. (2020). A target contrast signal theory of parallel processing in goal-directed search.
Lleras, A., Xu, Z. J., Tan, H. J. H., Shao, Y., & Buetti, S. (2025). Quantifying the relationship between search efficiency and perceptual similarity in color space across different efficient search tasks.
Madison, A., Lleras, A., & Buetti, S. (2018). The role of crowding in parallel search: Peripheral pooling is not responsible for logarithmic efficiency in parallel search.
Malcolm, G. L., & Henderson, J. M. (2009). The effects of target template specificity on visual search in real-world scenes: Evidence from eye movements.
Mayer, K. M., & Vuong, Q. C. (2013). Automatic processing of unattended object features by functional connectivity.
Müller, H. J., Reimann, B., & Krummenacher, J. (2003). Visual search for singleton feature targets across dimensions: Stimulus- and expectancy-driven effects in dimensional weighting.
Najemnik, J., & Geisler, W. S. (2005). Optimal eye movement strategies in visual search.
Navalpakkam, V., & Itti, L. (2007). Search goal tunes visual features optimally.
Ng, G. J. P., Lleras, A., & Buetti, S. (2018). Fixed-target efficient search has logarithmic efficiency with and without eye movements.
Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relationship.
Oliva, A., & Schyns, P. G. (2000). Diagnostic colors mediate scene recognition.
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope.
Pollmann, S., Weidner, R., Müller, H. J., & von Cramon, D. Y. (2000). A fronto-posterior network involved in visual dimension changes.
Pramod, R. T., & Arun, S. P. (2014). Features in visual search combine linearly.
Pramod, R. T., & Arun, S. P. (2016). Object attributes combine additively in visual search.
Rosenholtz, R. (2016). Capabilities and limitations of peripheral vision.
Rosenholtz, R., Huang, J., Raj, A., Balas, B. J., & Ilie, L. (2012). A summary statistic representation in peripheral vision explains visual search.
Rousselet, G., Joubert, O., & Fabre-Thorpe, M. (2005). How long to get to the “gist” of real-world natural scenes?
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors.
Sawaki, R., & Luck, S. J. (2010). Capture versus suppression of attention by salient singletons: Electrophysiological evidence for an automatic attend-to-me signal.
Schurgin, M. W., Wixted, J. T., & Brady, T. F. (2020). Psychophysical scaling reveals a unified theory of visual memory strength.
Thakur, S., & Peethambaran, J. (2020).
Theeuwes, J. (1991). Cross-dimensional perceptual selectivity.
Theeuwes, J. (1992). Perceptual selectivity for color and form.
Töllner, T., Gramann, K., Müller, H. J., Kiss, M., & Eimer, M. (2008). Electrophysiological markers of visual dimension changes and response changes.
Töllner, T., Zehetleitner, M., Gramann, K., & Müller, H. J. (2010). Top-down weighting of visual dimensions: Behavioral and electrophysiological evidence.
Torralba, A. (2009). How many pixels make an image?
Townsend, J. T., & Gregory Ashby, F. (1983).
Treisman, A., & Sato, S. (1990). Conjunction search revisited.
Ullman, S. (1987). Visual routines. In M. A.Fischler & O.Firschein (Eds.),
Vickery, T. J., King, L.-W., & Jiang, Y. (2005). Setting up the target template in visual search.
Wang, Z., Buetti, S., & Lleras, A. (2017). Predicting search performance in heterogeneous visual search scenes with real-world objects.
Wang, Z., Lleras, A., & Buetti, S. (2018). Parallel, exhaustive processing underlies logarithmic search functions: Visual search with cortical magnification.
Williams, L. G. (1967). The effects of target specification on objects fixated during visual search.
Witkowski, P. P., & Geng, J. J. (2022). Attentional priority is determined by predicted feature distributions.
Wolfe, J. M. (1994). Guided search 2.0 a revised model of visual search.
Wolfe, J. M. (2021). Guided search 6.0: An updated model of visual search.
Wolfe, J. M., Horowitz, T. S., Kenner, N., Hyle, M., & Vasan, N. (2004). How fast can you change your mind? The speed of top-down guidance in visual search.
Wurm, L. H., Legge, G. E., Isenberg, L. M., & Luebker, A. (1993). Color improves object recognition in normal and low vision.
Xu, Z. J., Buetti, S., & Lleras, A. (2020, May11).
Xu, Z. J., Lleras, A., & Buetti, S. (2021). Predicting how surface texture and shape combine in the human visual system to direct attention.
Xu, Z. J., Lleras, A., & Buetti, S. (2024, June2).
Xu, Z. J., Lleras, A., Gong, Z. G., & Buetti, S. (2024). Top-down instructions influence the attentional weight on color and shape dimensions during bidimensional search.
Xu, Z. J., Lleras, A., Shao, Y., & Buetti, S. (2021). Distractor-distractor interactions in visual search for oriented targets explain the increased difficulty observed in nonlinearly separable conditions.
Xu, Z. J., Yu, J., Lleras, A., & Buetti, S. (2025). Investigating the contribution of unpredictable target features to attentional guidance.
Yu, J. M., Xu, Z. J., Lleras, A., & Simona, B. (2025). Exploring the impact of target-distractor featural contrast on feature prioritization in efficient visual search.
Yu, X., & Geng, J. J. (2019). The attentional template is shifted and asymmetrically sharpened by distractor context.
Yu, X., Hanks, T. D., & Geng, J. J. (2022). Attentional guidance and match decisions rely on different template information during visual search.
Yu, X., Zhou, Z., Becker, S. I., Boettcher, S. E. P., & Geng, J. J. (2023). Good-enough attentional guidance.
Zelinsky, G. J. (2008). A theory of eye movements during target acquisition.
Zhang, W., & Luck, S. J. (2008). Discrete fixed-resolution representations in visual working memory.
Given that the two top-performing models showed similar performance (i.e., the weighted color collinear–shape/texture orthogonal model achieves 0.7% higher R2 in the first set, and the weighted three-way orthogonal model achieves 1.6% higher R2 in the second set), we performed a post hoc power analysis, estimating the sample size necessary to observe a stable model difference.
For each tridimensional experiment, we sampled with replacement 50 times at each sample size, from 1 to 40, and calculated the model performance distinguishability as a function of sample size. Results were reported in Figure A4. Overall, R2 values of both models keep increasing with sample size, but model comparison results consistently favor the weighted three-way orthogonal model since the sample size is 2 (Figure A4, left). In Set 1, the weighted three-way orthogonal model is at least 1,040 times (at the sample size of 2) and on average 1.47 × 106 times more likely than the weighted color collinear–shape/texture orthogonal model in accounting for the variability in the data. In Set 2, the weighted three-way orthogonal model is at least 2.45 × 106 times (also at the sample size of 2) and on average 1.83 × 1010 times more likely than its component model.
To estimate power, for each simulation, we computed the relative likelihood of the winning model (i.e., the weighted three-way orthogonal model) compared to the other model (i.e., the weighted color collinear–shape/texture orthogonal model). We then reported the proportion of times the winning model’s relative likelihood was greater than 10, which serves as a reasonable cutoff point to conclude that there is robust evidence in favor of the winning model. See Figure A4 (right). For Set 1, the results showed that power increases quickly with sample size, reaching as much as 50% with as few as two subjects and going above 80% by a sample size of 7. The measure is noisy but stable between 0.7 and 0.9 over the range of 7–40. We interpret this as an adequate amount of power. For Set 2, there is a more systematic increase of power with sample size, with a clear positive trend that can be observed starting at around 7. Between 20 and 40, power varies between 0.67 and 0.88, which is again an adequate amount of power. Overall, these analyses suggest that we had gathered sufficient data to have confidence in our conclusions regarding the winning model.
For the weighted models, the weight estimates (see Figure A5) were stabilized around the sample size of 5 for the weighted color collinear–shape/texture orthogonal model and around 20 in Set 1 and around 10 in Set 2 for the weighted three-way orthogonal model. These results confirm that 20 participants should be sufficient for both distinguishing between the performance of the two top-performing models and obtaining stable weight estimates.