Result: Toward a better understanding of target distinctiveness in visual search: How color, shape, and texture information combine to guide search.

Title:

Toward a better understanding of target distinctiveness in visual search: How color, shape, and texture information combine to guide search.

Authors:

Xu ZJ; Department of Psychology, University of Illinois Urbana-Champaign., Lleras A; Department of Psychology, University of Illinois Urbana-Champaign., Buetti S; Department of Psychology, University of Illinois Urbana-Champaign.

Source:

Journal of experimental psychology. General [J Exp Psychol Gen] 2026 Mar; Vol. 155 (3), pp. 839-875. Date of Electronic Publication: 2026 Jan 22.

Publication Type:

Journal Article

Language:

English

Journal Info:

Publisher: American Psychological Assn Country of Publication: United States NLM ID: 7502587 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1939-2222 (Electronic) Linking ISSN: 00221015 NLM ISO Abbreviation: J Exp Psychol Gen Subsets: MEDLINE

Imprint Name(s):

Original Publication: Washington, American Psychological Assn.

MeSH Terms:

Color Perception*/physiology , Pattern Recognition, Visual*/physiology , Attention*/physiology , Form Perception*/physiology , Visual Perception*/physiology, Humans ; Female ; Male ; Young Adult ; Adult

Grant Information:

National Science Foundation

Entry Date(s):

Date Created: 20260122 Date Completed: 20260205 Latest Revision: 20260205

Update Code:

20260205

DOI:

10.1037/xge0001895

PMID:

41569513

Database:

MEDLINE

Further Information

*People often search for objects distinctive from other objects in the scene along multiple feature dimensions like color and shape. A target distinctive in more than one dimension can lead to an easier search, but it also increases the complexity of modeling search behaviors. Building upon previous research on how people search using information along two feature dimensions, we explored how search unfolds when the target and distractors differ along the dimensions of color, shape, and texture (a tridimensional search). Using a behavioral-computational approach, we found that the target-distractor distinctiveness signal along each dimension combines in a weighted orthogonal way to guide tridimensional searches. Additionally, across two sets of experiments, we demonstrated that the weight assigned to each dimension varied according to its relative usefulness. When the color distinctiveness was most pronounced (Set 1), there was a much stronger prioritization of color information over information carried by shape and texture. When the distinctiveness along individual dimensions was more balanced (Set 2), the weights were distributed more evenly across the three dimensions, but a color prioritization remained. These results have broad implications for cognitive neuroscience, as they place constraints on how visual information from different dimensions is integrated to produce an overall guidance signal, and demonstrate how attention might be flexibly allocated across channels in response to the ecological aspects of the environment. This study should also interest modelers in cognitive science because it demonstrates an approach to understand behavior in complex scenarios based on performance indices estimated under simpler conditions. (PsycInfo Database Record (c) 2026 APA, all rights reserved).*

Toward a Better Understanding of Target Distinctiveness in Visual Search: How Color, Shape, and Texture Information Combine to Guide Search

<cn> <bold>By: Zoe (Jing) Xu</bold>
> Department of Psychology, University of Illinois Urbana-Champaign
> Department of Psychology, University of Washington
> <bold>Alejandro Lleras</bold>
> Department of Psychology, University of Illinois Urbana-Champaign
> <bold>Simona Buetti</bold>
> Department of Psychology, University of Illinois Urbana-Champaign </cn>

<bold>Acknowledgement: </bold>Timothy Vickery served as action editor.This article has been posted as a preprint and is available on the Open Science Framework. The preprint can be accessed at https://osf.io/preprints/osf/857u2. All data, analysis code, and research materials are available on the Open Science Framework (https://osf.io/bmwa4/). Results in Experimental Set 1 were presented at the 2021 Annual Meeting of the Vision Sciences Society.The authors have no competing interests. This project was supported by a grant from the National Science Foundation (Grant BCS 1921735 awarded to Simona Buetti).Zoe (Jing) Xu played a lead role in data curation, formal analysis, investigation, project administration, software, validation, visualization, and writing–original draft and an equal role in conceptualization, methodology, and writing–review and editing. Alejandro Lleras played a supporting role in formal analysis, funding acquisition, investigation, project administration, resources, validation, visualization, and writing–original draft and an equal role in conceptualization, methodology, supervision, and writing–review and editing. Simona Buetti played a lead role in funding acquisition, resources, and supervision, a supporting role in formal analysis, investigation, project administration, validation, visualization, and writing–original draft, and an equal role in conceptualization, methodology, and writing–review and editing.

The concept of target template is commonly used in many visual search theories, usually defined as the mental representation of the target that people are looking for in the visual environment (e.g., Chun & Wolfe, 1996; Malcolm & Henderson, 2009; Wolfe, 2021). The majority of theories agree that the target template facilitates search behaviors by imposing top-down guidance on the search processes (e.g., Adeli et al., 2017; Buetti et al., 2016; Bundesen, 1990; Duncan & Humphreys, 1989; Hoffman, 1979; Hulleman & Olivers, 2017; Liesefeld et al., 2018; Rosenholtz et al., 2012; Wolfe, 1994, 2021; Zelinsky, 2008; for exceptions that propose search being determined solely by bottom-up signals, see Itti & Koch, 2000; Theeuwes, 1991, 1992; Ullman, 1987). Some theories propose a feature-boosting mechanism, whereby a subset of features in the target template that provides the largest discriminability between the target and distractors in the display is boosted. The resulting activation map reflects the extent to which each location contains the boosted target features, determining the likelihood of each location receiving attention and getting scrutinized (e.g., Adeli et al., 2017; Bundesen, 1990; Liesefeld et al., 2018; Wolfe, 1994, 2021; X. Yu & Geng, 2019). Other theories propose that the overall similarity (i.e., the “match”) between the search object and the target template provides the top-down guidance that helps direct attention to the objects that are most likely to be the target (e.g., Buetti et al., 2016; Duncan & Humphreys, 1989; Hoffman, 1979).

While visual search theories tend to agree on the importance of the target template in search processes, there is a lack of consensus or even explicit discussion regarding how complex a target template can be or how much of the visual information contained within a target template can be utilized to guide attention. The majority of visual search theories do not discuss the limits on target template complexity (Adeli et al., 2017; Buetti et al., 2016; Duncan & Humphreys, 1989; Eckstein et al., 2000; Folk et al., 1992; Gaspelin & Luck, 2018; Hoffman, 1979; Hulleman & Olivers, 2017; Najemnik & Geisler, 2005; Navalpakkam & Itti, 2007; Wolfe, 1994; X. Yu & Geng, 2019; for a special case of target template, see Rosenholtz, 2016; Rosenholtz et al., 2012). Generally, these theories assume that the target template and the search items’ representations are constructed and compared over all the relevant dimensions (i.e., the dimensions that differentiate the target from the distractors, e.g., location, motion, color, surface texture, shape, and size). For instance, Navalpakkam and Itti (2007) argued that visual input is represented in multiple feature dimensions including but not limited to color, orientation, and direction of motion. Zelinsky (2008) used a 72-dimensional feature vector to represent visual signals coming from each location of the search scenes, where he modeled human eye movements. Consistent with this perspective, many empirical studies used real-world objects as search stimuli, where the target template receives little constraint in terms of its dimensionality or complexity. Wang et al. (2017) used teddy bears, reindeers, and car models as search stimuli and found that the difference between the target and distractors can guide search behaviors when the two differ along multiple dimensions in a complex way. Henderson et al. (2009) found that people were able to search for the target among distractors that were real-world objects appearing in real-world scenes (see also Malcolm & Henderson, 2009). These latter studies measured how people search for real-world objects that contain visual information from various dimensions, which suggests that people are capable of making comparisons between search items and a complex target template.

However, the ability to search for a target defined along multiple dimensions does not guarantee that the attentional guidance of the target template occurs along all those dimensions. Even though the target and distractors may differ along multiple dimensions, it is possible that only a subset of those dimensions actually guides the search. At first glance, this proposal might seem contradictory to the empirical findings that suggest more specific visual templates of the target provide stronger guidance than less specific or accurate templates (e.g., Malcolm & Henderson, 2009; Vickery et al., 2005; Wolfe et al., 2004). For instance, Vickery et al. (2005) found that cueing target images that differ in sizes or orientations from the actual targets led to slower searches, compared to cueing the exact target image that appears in the search display, suggesting that the detailed visual information stored in the target template, rather than general schematic or semantic information, guides the search process. However, the conclusion that a more specific target template provides stronger guidance arises from empirical evidence at two extremes of this “specificity” continuum: from being as abstract as a word or as distorted as an example differing from the true target in size or orientation to being as specific and precise as the exact image of the target. The optimal solution might lie somewhere in between—there might be a threshold for the amount of information in the target template that can be effectively utilized to guide the search, and additional details in the target template may not provide any additional guidance, or they might be utilized in a less efficient way.

Indeed, several visual search theories (Alexander et al., 2019; Liesefeld et al., 2018; Williams, 1967) suggest limits in how many features guide attention. Williams (1967) showed that when targets had multiple features, eye movements were mainly guided by color, even when shape and size were known. Only in specific cases, like when the target was the largest size, did size also influence fixations. This implies a limit on feature-based guidance.

More recently, Alexander et al. (2019) found that previewing a mismatched target, which differed from the actual target in either shape or orientation, caused a slowdown in visual search—but only when the stimuli were grayscale. When the target differed from distractors along the color dimension, previewing a mismatched target in shape or orientation caused little slowdown. This result suggests that people typically use color to guide their search and that shape and orientation become guiding factors only in the absence of color information.

More recently, there has been a proposal that there exist two distinct forms of target template: one that is primarily involved in providing initial guidance during the search process and a second one engaged in the actual identification of the target once an item is in the focus of attention (Hamblin-Frohman & Becker, 2021; Wolfe, 2021; X. Yu et al., 2022, 2023). The guiding template tends to be simpler or coarser compared to the more detailed verification template. This proposal aligns with the idea that attentional guidance during search might be based on only a subset of the information stored in the full template of the target.

Moreover, the fact that people can search for complex targets does not reveal whether visual signals from different feature dimensions contribute equally to the top-down guidance or whether signals along different dimensions are utilized differently during multidimensional search conditions. In fact, some theories have delved into this question (e.g., Bundesen, 1990; Gaspelin & Luck, 2018; Liesefeld et al., 2018; Wolfe, 1994). In the theory of visual attention, Bundesen (1990) proposed the mechanism of filtering, which refers to the prioritization of certain categories (e.g., red items) by assigning them a higher attentional weight, leading to an increased likelihood of objects belonging to these categories being selected for attention. The signal suppression hypothesis (Gaspelin & Luck, 2018; Sawaki & Luck, 2010) approaches the same idea from the other extreme: To minimize distractions from salient stimuli, features unique to those stimuli can be actively suppressed, ensuring they receive no attentional weights (also see Treisman & Sato, 1990). In both the theory of visual attention and the signal suppression hypothesis, changes in attentional weight apply to specific features (e.g., red, vertical).

The dimension weighting account addressed the attentional weighting process at a feature dimension level (e.g., Found & Müller, 1996; Müller et al., 2003; see also Liesefeld et al., 2018, for a review). Using an oddball search task, where the target identity varied randomly from trial to trial and differed from distractors along either color or orientation, Found and Müller (1996) found an intertrial facilitation: Search was faster when the target and distractors differed in the same dimension as the previous trial (e.g., trial N: color difference; trial N + 1: color difference), compared to when they differed in a different dimension (e.g., trial N: color difference; trial N + 1: orientation difference). This pattern suggests that a specific feature dimension can be upweighted in anticipation of its usefulness for distinguishing between the target and distractors, and further work suggests participants may have the ability to adjust the weight associated with any specific dimension (Müller et al., 2003).

Overall, a number of theories have tried to address whether there is a limitation on the complexity of a target template and whether specific features or feature dimensions of the target template can be up- or downweighted to facilitate the search process. However, none of those theories quantify any target dimensionality limitations or the strength of dimensional weighting. In other words, current theories have not attempted to quantitatively capture how visual signals from different feature dimensions simultaneously contribute to the top-down guidance and affect search behavior. In the current work, we used a behavioral-computational approach to study multidimensional searches by quantitatively estimating the contribution of visual information from three different feature dimensions (color, shape, and texture) in guiding these searches and making mathematical predictions on search performance.

<h31 id="xge-155-3-839-d543e353">The Present Study</h31>

We investigated how observers utilize information stored in the target template to guide attention during efficient visual searches in situations where the target differs from distractors along three feature dimensions—color, shape, and surface texture—referred to as tridimensional searches. Efficient search refers to situations where the features characterizing the target are sufficiently visually distinct from those defining the nontarget objects, such that peripheral vision is able to process multiple objects in parallel until it locates the target (e.g., Lleras et al., 2022). In the following tridimensional search experiments, the target template is defined by a color (e.g., red), a shape (e.g., octagon), and a texture (e.g., a cross). The distractors do not share any of these features with the target (e.g., see Figure 1, Experiments 4–12). Therefore, observers can rely on multiple visual signals from different feature dimensions to distinguish the target from distractors. It is currently unknown how the visual signal along three feature dimensions is integrated to guide the search process and if there are any constraints on this integration process. Indeed, it is possible that not all dimensions contribute equally to attentional guidance, or perhaps only a subset of the dimensions contributes to guidance.
>
><anchor name="fig1"></anchor>

Previous studies have adopted a behavioral-computational approach to examine feature integration in situations where the target differs from distractors along two feature dimensions: Color and shape were examined in Buetti et al. (2019) and Hughes et al. (2024), whereas shape and texture were studied in Xu, Lleras, and Buetti (2021).

In Buetti et al. (2019), search slopes were first measured in unidimensional search conditions, where people searched for a target that differed from distractors along one feature dimension, either color (e.g., searched for a red target among yellow distractors or among orange distractors) or shape (e.g., searched for a triangle target among semicircle distractors or among diamond distractors). Next, the search times in bidimensional search conditions were measured, where people searched for a target that differed from distractors along both color and shape dimensions (e.g., searched for a red triangle among yellow semicircles). The authors then used a model comparison approach to compare different mathematical models that varied in how they combined the search slopes from the unidimensional search conditions to predict the search times observed in the bidimensional searches. Buetti et al. found that information along color and shape dimensions is accumulated independently and is subsequently combined in a collinear fashion (Garner, 1974; Garner & Felfoldy, 1970), following a city-block metric (also see Pramod & Arun, 2014, 2016; but see Hughes et al., 2024, who found evidence supporting a Euclidean metric combination). Xu, Lleras, and Buetti (2021) found that shape and texture dimensions combine orthogonally, following a Euclidean distance metric (Garner, 1974; Garner & Felfoldy, 1970).

Note that while Buetti et al. (2019) offered an approach to better understand the mathematical rules governing the combination of color and shape in guiding attention, the models considered in the study assumed that color and shape signals are utilized equally. A similar assumption is made in Xu, Lleras, and Buetti (2021), where shape and texture were presumed to be utilized equally to guide attention during bidimensional searches.

Using the same approach, here, we evaluated performance in tridimensional search conditions, where the target differs from distractors along color, shape, and texture dimensions. Our focus was to examine whether signals from all these dimensions contribute to attentional guidance, how they are integrated, and importantly, if signals from different feature dimensions are utilized to varying degrees to determine the overall guidance. We conducted two sets of experiments. The first set used stimuli similar to those in previous bidimensional studies (Buetti et al., 2019; Xu, Lleras, & Buetti, 2021), featuring targets that were more distinctive from distractors on the color dimension compared to the other two dimensions. The second set used stimuli that are better controlled for target–distractor difference across the three feature dimensions. Importantly, these two sets of experiments allowed us to not only confirm the best performing models across different data sets but also investigate the extent to which the most effective models and their parameters vary with different stimuli.

Experimental Set 1

A total of 12 experiments were conducted in this set, each with a naïve group of participants. We first conducted three unidimensional search experiments to measure the search slopes in conditions where the target differed from distractors along color only (Experiment 1, color search), along shape only (Experiment 2, shape search), and along texture only (Experiment 3, texture search). Furthermore, we conducted nine tridimensional search experiments where the target differed from each type of distractors along all three dimensions: color, shape, and texture (Experiments 4–12; Figure 1). The methods and experimental protocols (IRB No. 05550: Attentional mechanisms in human vision) were approved by the Institutional Review Board at the University of Illinois, Urbana–Champaign, and are in accordance with the Declaration of Helsinki.

<h31 id="xge-155-3-839-d543e439">Method</h31>

<bold>Transparency and Openness</bold>

For Experiments 1–24, we report how we determined our sample size, all data exclusions, all manipulations, and all measures, following Journal Article Reporting Standards (Appelbaum et al., 2018). All data, analysis code, and research materials are available on the Open Science Framework (Xu, Lleras, & Buetti, 2024) at <a href="https://osf.io/bmwa4/" target="_blank">https://osf.io/bmwa4/</a>. Data were analyzed using R (Version 4.2.3) and Excel. Experiments 1–12’s design and analysis were preregistered (Xu et al., 2020) at <a href="https://osf.io/p5txf" target="_blank">https://osf.io/p5txf</a>. Experiments 13–24 were not preregistered, but they only differed from Experiments 1–12 in terms of the stimuli used.

<bold>Participants</bold>

Participants were recruited from either the University of Illinois at Urbana–Champaign or Prolific, in exchange for course credit or for money. Sample size was determined based on data simulation of the previous bidimensional search study (Xu, Lleras, & Buetti, 2021). We estimated the sample size required to produce a small standard error on reaction time (20.33 ms) and on the magnitude of the search slope estimate (3.17 ms/log unit) in the most variable condition (defined by a specific distractor type × set size) in that study. These simulations demonstrated that we would need to include 35 valid participants in each experiment (sample size rationale is detailed in our preregistration report at <a href="https://osf.io/p5txf" target="_blank">https://osf.io/p5txf</a>). A post hoc power analysis (see Appendix) showed satisfying model distinguishability starting at sample size of 7, and model parameter estimates stabilized around set size of 20.

For each experiment, four participant inclusion criteria were used: (1) Participants should complete all the trials (i.e., the experiment was not aborted before finishing), (2) participants should make a response on at least 85% of the trials, (3) search accuracy should be higher than 90%, and (4) an individual’s average response time (RT) should fall within 2 standard deviations of the group average RT. The accuracy rate was calculated as the percentage of trials where participants made a correct response divided by the total number of trials where participants made a response. That is, we excluded time-out trials (i.e., trials where participants did not make any response within 5 s; see below for a detailed experimental procedure) when computing the accuracy, as the experiments were conducted online, and it was impossible to ascertain the reason for time-outs. The number of recruited participants, the number of participants included in the analysis, the number of participants excluded for each criterion, included participants’ demographic information, and summary statistics of the measurements in each experiment are shown in Table 1.
>
><anchor name="tbl1"></anchor>

<bold>Apparatus and Stimuli</bold>

All experiments were programmed in JavaScript and conducted on Pavlovia, with participants using their own computers. Because experiments were run online, we had no control over the visual angle of the stimuli on participants’ computers. To compensate for this, before the experiment, we asked participants to rescale an image of a credit card to match the real size of a credit card in order to ensure that stimuli across different computer displays maintained the same physical size (1.2 × 1.2 cm). Stimuli were randomly assigned to a location on the display with a small random jitter, based on two concentric circular grids occupying an area of 15 × 15 cm on the center of participants’ screens. The larger grid had a diameter of 13.8 cm, and the smaller grid had a diameter of 7.4 cm. This size was chosen to allow participants with screens as small as 12.5 in. to see the full search display.

The stimuli were shown on a white background. In any given trial, there was only one target and one type of distractor on the display. In other words, displays were always target-present and homogeneous in terms of distractors. The stimuli had a black square dot on either their left or their right, and the task was to report the location of the dot on the target stimulus. Stimuli used in Experiments 1–12 are shown in Figure 1.

Unidimensional Experiments

In Experiment 1 (color search), the target was a red octagon with a white cross texture inside (Figure 1). Distractors shared the shape (octagon) and texture (cross) with the target, but their color was either orange, green, or pink. In Experiment 2 (shape search), the target was a gray octagon with a white cross texture. Distractors shared the color (gray) and texture (cross) with the target, but their shape was either a triangle, a house, or a square. In Experiment 3 (texture search), the target was a gray octagon with a white cross texture. Distractors shared the same color (gray) and shape (octagon) with the target, but their texture was made of either a dot, lines forming a tilted pound key, or a solid gray texture.

Tridimensional Experiments

In Experiments 4–12, the target was always a red octagon with a white cross texture inside (the same as Experiment 1) and the distractors were constructed by combining all the distractor colors, shapes, and textures used in Experiments 1–3. There were in total 27 types of tridimensional distractors (i.e., 3 colors × 3 shapes × 3 textures). To keep the study design consistent across all experiments, we only tested three types of distractors in each experiment. Therefore, we divided the tridimensional distractors into nine experimental sessions, each containing three types of distractors. Only one type of distractor was presented along with the target on a given trial.

<bold>Design</bold>

In each experiment, participants searched for the target among one of three types of distractors (e.g., in Experiment 1, participants searched for the red target among orange, pink, or green distractors). We also included a target-only condition where no distractors were presented. For each type of distractors, there were four distractor set sizes: 1, 4, 9, and 19. In total, each experiment contained 13 conditions that were repeated 48 times, summing up to a total of 624 trials. Sample displays are shown in Figure 2.
>
><anchor name="fig2"></anchor>

<bold>Procedure</bold>

Each trial began with a black cross appearing for 0.5 s at the center of the screen over a white background. A search display followed. Participants were asked to search for the target among distractors and report whether the black square dot was on the left or right side of the target by pressing the corresponding left or right arrow key on the keyboard. The search display remained on the screen for 5 s or until a response was made by the participants, whichever occurred first. Visual feedback (“Correct!” or “Wrong!”) was provided after each trial, lasting for 0.5 s. The trial then ended with a white background displayed for an interval of 0.5 s.

<bold>Behavioral–Computational Predictive Approach</bold>

When observers search for a known target among sufficiently different distractor items, processing occurs in parallel and simultaneously at all item locations, within a sufficiently large functional viewing field (Hulleman & Olivers, 2017). In the present study, we expected participants would perform a parallel search over the whole search display because we used similar stimuli as in previous studies where parallel search was obtained (see Buetti et al., 2019; Xu, Lleras, & Buetti, 2021). Such parallel processing is considered to be unlimited capacity, with search items being processed in an independent and exhaustive manner (Buetti et al., 2016). At each location, a contrast signal between the target template and the search item is computed. This contrast signal accumulates stochastically until reaching a rejection threshold, indicating that the item is no longer considered as a potential target (Buetti et al., 2016; Lleras et al., 2020; Townsend & Gregory Ashby, 1983). Items that are not rejected during this parallel processing stage are then scrutinized serially until the target is identified. In easy searches, when the target is surrounded by only one type of sufficiently different distractors, the only item that survives the parallel stage is typically the target (e.g., Buetti et al., 2016, 2019; Xu, Lleras, & Buetti, 2021).

The stochastic contrast accumulation that happens in parallel at all locations across the whole display produces a signature logarithmic increase in RT as a function of set size (Buetti et al., 2016; Lleras et al., 2020; Townsend & Gregory Ashby, 1983). The logarithmic slope LS indexes the time required for one single distractor to be rejected, which is influenced by the visual distinctiveness of the target in relation to the distractor. This distinctiveness term refers to a target–distractor perceptual difference in a top-down fashion—that is, a computation of how perceptually different the target is from the distractor. This distinctiveness is different from the concept of purely bottom-up contrast, which is a computation of how perceptually different an element in the scene is from its immediate surroundings (i.e., the background). The more distinctive the target, meaning the more dissimilar the target and distractors are, the shorter the time needed for this type of distractor to reach the rejection threshold, and the shallower the slope LS will be. Target contrast signal theory proposed that the steepness of the logarithmic slope LS is inversely proportional to the overall top-down contrast/distinctiveness signal C being accumulated for a given target–distractor pair (Equation 1; Lleras et al., 2020):<anchor name="eqn1"></anchor>with LS being the logarithmic slope and α being a multiplicative constant factor (Lleras et al., 2020).

In the present study, we estimated the logarithmic search slopes for each of the three target–distractor color pairs (Experiment 1, unidimensional color search), three shape pairs (Experiment 2, unidimensional shape search), and three texture pairs (Experiment 3, unidimensional texture search). We then used these logarithmic slope values to predict the tridimensional search slopes LSc,s,t, where the target and distractors differed along color, shape, and texture. We considered 10 different predictive models (discussed in the Models Retained for Model Comparison section), each based on a unique assumption about how the contrast signals along the three dimensions combine to guide attention in the tridimensional search conditions. That is, for each model, we computed the predicted search RTs for each condition of a specific distractor type and a set size level, using Equation 2 (Lleras et al., 2020):<anchor name="eqn2"></anchor>where RT0 represents the RT in the target-only condition and LSc,s,t represents the predicted search slope in tridimensional search. The final term is the natural logarithm of the total set size, including all distractors plus the target.

Next, we compared the predicted RTs with the observed RTs across all distractor type by set size conditions in Experiments 4–12 (tridimensional search) by regressing the observed RTs onto the predicted RTs (Equation 3):<anchor name="eqn3"></anchor>where a and b are free parameters in the simple linear regression.

In sum, for each tested model, we computed a set of predicted logarithmic slope LSc,s,t values, which, in turn, allowed us to compute a set of predicted RTs (RTpredicted) for all the conditions run in Experiments 4–12, using Equation 2. These predicted RTs were then compared to the observed RTs (using Equation 3) to determine how well each model predicts the tridimensional search performance. Overall, there were 108 mean RTs (27 tridimensional distractor types × 4 set size levels) predicted by each model. The validity of the 10 models was compared based on their R<sups>2</sups>, Akaike information criterion (AIC) values, and AIC comparison likelihoods.

In addition to utilizing our behavioral-computational predictive approach, we also applied a Minkowski r-metric model (Equation 4), which is a comparison model aimed at validating the performance of the 10 tested models. The Minkowski r-metric model contains the parameter of r, which captures the extent to which the leading feature dimension (i.e., the one providing the largest contrast signals) is prioritized (e.g., Nosofsky, 1986). A Minkowski’s r close to 1 indicates an equal contribution among the three dimensions, reflecting that the magnitude of the contrast signals from each feature dimension is simply added to each other to determine the overall guidance, with no feature dimension receiving more weight than the others. In other words, the information carried by each dimension is considered equally by the system when determining overall guidance. Conversely, larger r values imply that there is a more informative dimension that receives greater weight compared to the less informative dimensions when combining their respective contrast signals for overall guidance. This is because r values larger than 1 effectively exaggerate the impact of the dimension with the largest contrast value, relative to those with smaller contrast values. This model serves as a benchmark for the 10 models tested in this study, validating the level of disproportion each model hypothesizes across the three feature dimensions:<anchor name="eqn4"></anchor>which equals:<anchor name="eqn5"></anchor>

The optimal value of Minkowski’s r was established through the following way: We simulated tridimensional search RTs for the Minkowski r-metric model at each r value, sampled from 0.1 to 30 at an interval of 0.1 (as reported in Appendix Table A1). We then compared the predicted tridimensional search RTs at each r value with the observed times. The r value yielding the highest R<sups>2</sups> (.8606) was 13.3, and it was chosen as the optimal parameter for the Minkowski r-metric model in the first set of experiments (see Appendix Figure A1, Left for the variation of R<sups>2</sups> as a function of r value).

<bold>Models Retained for Model Comparison</bold>

Models Relying on Guidance From One Feature Dimension

The models in this category are based on proposals from previous literature that suggest only a single feature dimension leads the multidimensional search (e.g., Alexander et al., 2019; Williams, 1967).

Model 1: Color-Only Model

This model assumes that the overall contrast signal in a tridimensional search is determined exclusively by the contrast in the color dimension (Equation 6). This model is rooted in a long-held belief that color is such a distinct feature dimension that it can overshadow other feature dimensions when they are presented together. This model is consistent with the findings of Williams (1967), who observed that when the target was characterized by color, shape, and size, it was primarily the color dimension that guided visual fixations:<anchor name="eqn6"></anchor>which in turn means:<anchor name="eqn7"></anchor>

Here, LSc,s,t represents the predicted slope for distractors that differ from the target in color, shape, and texture. LScolor refers to the slope observed in Experiment 1, where distractors differed in color from the target.

Model 2: Best Feature Guidance Model

This model assumes that performance is determined solely by the largest contrast signal among the three relevant dimensions, with the other two signals being ignored (Equation 8). It is equivalent to saying that observers identify the feature dimension that most effectively differentiates items in the scene and concentrate exclusively on it to reject distractors. In other words, the search slope in the tridimensional search task is the same as the search slope of the most efficient unidimensional condition (Equation 9).

This model is conceptually similar to Guided Search 2.0 (Wolfe, 1994), which posits that within a specific feature dimension (e.g., color), a broadly tuned channel (e.g., red) that most effectively distinguishes the target from distractors is selected to accumulate activation in that feature map (e.g., color map). However, while Guided Search assumes that such selection occurs at the feature level, the best feature guidance model posits that it happens at the dimension level. This model also aligns with the finding in Williams (1967) that when the target was defined by both color and size, and the target size was at the largest level, then size could guide attention. This suggests that the feature dimension determining the overall guidance varies based on the utility of the available feature dimensions, rather than being fixed to a specific one:<anchor name="eqn8"></anchor>which translates to<anchor name="eqn9"></anchor>

LSshape and LStexture represent the slopes where distractors have different shapes (observed in Experiment 2) or textures (observed in Experiment 3) than the target.

Note that Equation 9 is applied to each type of distractor in each of Experiments 4–12. This means that the winning feature dimension is determined independently for each type of distractor, rather than being fixed across different distractor types within or across different experiments.

Models Including Unweighted Color, Shape, and Texture

The models in this category assume that all relevant feature dimensions contribute signals in tridimensional searches, which is consistent with a number of visual search theories (e.g., Bundesen, 1990; Wolfe, 2021; Zelinsky, 2008).

Model 3: Three-Way Orthogonal Combination Model

This model assumes that contrast signals along color, shape, and texture combine orthogonally to form the overall contrast signal (Equation 10). This type of integration was shown previously in Xu, Lleras, and Buetti (2021), where the authors found that the overall contrast between the target and distractors that differ along both shape and texture was determined by the orthogonal sum of the two unidimensional contrast vectors. We hypothesized that color contrast would add to the overall contrast in the same orthogonal fashion:<anchor name="eqn10"></anchor>which translates to<anchor name="eqn11"></anchor>

Model 4: Three-Way Collinear Integration Model

This model assumes that contrast signals along color, shape, and texture combine collinearly to form the overall contrast signals (Equation 12). This type of integration was shown previously in Buetti et al. (2019), where the authors found that the overall contrast between the target and distractors that differ along both color and shape was determined by the collinear sum of the two unidimensional contrast vectors. We hypothesized that texture contrast would add to the overall contrast in the same collinear fashion:<anchor name="eqn12"></anchor>which translates to<anchor name="eqn13"></anchor>

Model 5: Color Collinear–Shape/Texture Orthogonal Integration Model

This model is based on both findings that color and shape contrast signals combine collinearly (Buetti et al., 2019), while texture and shape contrast signals combine orthogonally (Xu, Lleras, & Buetti, 2021). These results led to the hypothesis that, given that texture and shape are integral features, the texture and shape contrast signals would combine orthogonally (i.e., following a Euclidean metric, Garner, 1974), and that color would simply add to this (i.e., city-block metric, Garner, 1974) to form the overall contrast (Equation 14) because color is separable from shape. The name of this model emphasizes that color is collinearly added to the orthogonal combination of shape and texture in the final step:<anchor name="eqn14"></anchor>which translates to<anchor name="eqn15"></anchor>

Model 6: Texture Orthogonal–Color/Shape Collinear Combination Model

This model is a variation of Model 5. Specifically, it assumes that color and shape contrast signals first combine collinearly, and then the texture contrast is orthogonally combined with this collinear sum (see Equation 16). The name of this model emphasizes that texture is orthogonally combined with the collinear sum of color and shape contrasts as the final step:<anchor name="eqn16"></anchor>which solves into<anchor name="eqn17"></anchor>

Models Including Weighted Color, Shape, and Texture

In Models 3–6, we assumed that the contrast signals along the three dimensions are equally utilized in forming the overall guidance. However, the extent to which people utilize each dimension might not be uniform (e.g., Bundesen, 1990; Gaspelin & Luck, 2018; Liesefeld et al., 2018; Wolfe, 1994). This idea was computationally explored in Xu, Lleras, Gong, and Buetti (2024), wherein the authors utilized the paradigm introduced by Buetti et al. (2019), using the search slopes in unidimensional color searches and shape searches to predict search performance in bidimensional color and shape searches. The critical manipulation in Xu, Lleras, Gong, and Buetti was to introduce an instruction manipulation in the bidimensional searches: One group of participants was instructed to search for the target color, while another group was instructed to search for the target shape, and last, a final group was instructed to search for the target defined by both color and shape. The results showed that the manipulation of which feature dimension the participants were focused on was captured by corresponding changes in that dimension’s weight parameter. These findings suggested that observers might be able to allocate varying degrees of attentional priority to different feature dimensions as a function of the experimental conditions. In the context of the present study, the notion of the attentional priority or attentional weight becomes relevant when one dimension provides a larger contrast signal than the others, making it more informative, or when one dimension is naturally preferred by human visual systems (e.g., color; see Alexander et al., 2019; Williams, 1967). In such cases, there might be an imbalance in the attentional weight placed on different feature dimensions, influencing the extent to which people utilize contrast signals along each dimension to guide their attention.

Therefore, for the four models that incorporate signals from all three dimensions, we considered a variation where an attentional weight parameter was added to each dimension.

Model 7: Weighted Three-Way Orthogonal Combination Model

This model introduces an attentional weight parameter to each of the color, shape, and texture components (Equation 18), building upon the original three-way orthogonal combination model (Model 3). The sum of the three weights is constrained to equal 3, ensuring comparability with the original model:<anchor name="eqn18"></anchor>which solves into<anchor name="eqn19"></anchor>with the constraint that<anchor name="eqn20"></anchor>

Model 8: Weighted Three-Way Collinear Integration Model

This model also introduces an attentional weight parameter for each dimension (Equation 21), following the framework of the original three-way collinear integration model (Model 4), while maintaining the constraint that the sum of the three weights equals 3:<anchor name="eqn21"></anchor>which solves into<anchor name="eqn22"></anchor>

Model 9: Weighted Color Collinear–Shape/Texture Orthogonal Integration Model

This model introduces weight parameters to both the color component and the combined shape and texture component (Equation 23), based on the original color collinear–shape/texture orthogonal integration model (Model 5). The sum of the two weights should be 2, maintaining consistency with the original model:<anchor name="eqn23"></anchor>which solves into<anchor name="eqn24"></anchor>with the constraint that<anchor name="eqn25"></anchor>

Model 10: Weighted Texture Orthogonal–Color/Shape Collinear Combination Model

This model adds weight parameters to both the combined color and shape component and the texture component (Equation 26), based on the original texture orthogonal–color/shape collinear combination model (Model 6). Similarly, the sum of these two weights should be 2:<anchor name="eqn26"></anchor>which solves into<anchor name="eqn27"></anchor>with the constraint that<anchor name="eqn28"></anchor>

Estimation of the Attentional Weight w

For all the weighted models, the optimal values for the attentional weight w associated with different feature dimensions were established prior to comparing the models’ performance and prediction accuracy. Specifically, we simulated the tridimensional search RTs for each of the weighted models using various w values within a specified range (reported in Appendix Table A1). By regressing the observed tridimensional RTs onto the predicted RTs using Equation 3, the w values that yielded the highest R<sups>2</sups> were selected as the optimal parameters fixed for each model (see Appendix Figure A2). This procedure allows all models to have the same two free parameters, namely, slope and intercept, when evaluating model performance using Equation 3. This put 10 tested models on the same level when comparing their performance based on AIC values. It should be noted that we also performed a split-half predictive analysis to validate the robustness of the w value for the winning model, as described in the Optimal Weight Stability Analysis section of the Results section. The results of that analysis demonstrated that the weights estimated on half of the data robustly predict performance on the other half of the data.

<h31 id="xge-155-3-839-d543e1960">Results</h31>

<bold>Search Slopes Observed in Unidimensional Search Experiments</bold>

Figure 3 shows the changes in search times as a function of the stimulus set size for each target–distractor pair in Experiments 1–3. Table 2 summarizes the logarithmic slopes observed in these experiments.
>
><anchor name="fig3"></anchor>
>
><anchor name="tbl2"></anchor>

<bold>Search Slopes Observed in Tridimensional Search Experiments</bold>

Table 3 summarizes the logarithmic search slopes observed in Experiments 4–12.
>
><anchor name="tbl3"></anchor>

<bold>Model Comparison</bold>

Table 4 shows the performance of the 10 models tested in the study, along with the Minkowski r-metric model, which serves as a comparison model. Table 4 includes the model expressions, R<sups>2</sups>, AIC values, and AIC comparison likelihoods for all other models against the winning model.
>
><anchor name="tbl4"></anchor>

Single Dimension Models

Among the models relying on one feature dimension, Model 2 (best feature guidance model; R<sups>2</sups> = 85.81%, AIC = 842.65) performed the best, being 67 times more likely than Model 1 (color-only model; R<sups>2</sups> = 84.66%, AIC = 851.05) in explaining the variability in the observed data. Although this relatively simple best feature guidance model outperformed several more complex models, including Model 10 (weighted texture orthogonal–color/shape collinear model; 12 times less likely) and all the unweighted models (Models 3–6 ranging from 12 to 7.8 × 10<sups>5</sups> times less likely), it was considerably less likely than the three winning weighted models: Model 9 (weighted color collinear–shape/texture orthogonal integration model; R<sups>2</sups> = 90.76%, AIC = 796.31), Model 7 (weighted three-way orthogonal model; R<sups>2</sups> = 90.06%, AIC = 804.23), and Model 8 (weighted three-way collinear model; R<sups>2</sups> = 89.68%, AIC = 808.26), to account for the observed data (1.2 × 10<sups>10</sups>, 2.2 × 10<sups>8</sups>, and 2.9 × 10<sups>7</sups> times less likely, respectively). Therefore, we can conclude that it is unlikely that participants relied solely on a single dimension to guide search; instead, it is more likely that they integrated information across multiple feature dimensions.

Weighted Versus Unweighted Models

Adding weight terms substantially increases the model fit. Specifically, Model 7 (weighted three-way orthogonal model), Model 8 (weighted three-way collinear model), Model 9 (weighted color collinear–shape/texture orthogonal integration model), and Model 10 (weighted texture orthogonal–color/shape collinear combination model) were 8.8 × 10<sups>10</sups>, 2.3 × 10<sups>13</sups>, 1.4 × 10<sups>11</sups>, and 5 times more likely, respectively, than their corresponding unweighted models (Models 3–6) in accounting for the variability in the observed data. We can conclude that, although participants were integrating information across the three feature dimensions, not all information contributed equally to guidance. The results consistently suggested that participants weighed the information coming from the color dimension more heavily than the one coming from the shape and texture dimensions (Model 7, weighted three-way orthogonal model: wcolor = 2.3, wshape = 0.35, and wtexture = 0.35; Model 8, weighted three-way collinear model: wcolor = 2.4, wshape = 0.25, and wtexture = 0.35; Model 9, weighted color collinear–shape/texture orthogonal model: wcolor = 1.7 and wshape&texture = 0.3; and Model 10, weighted texture orthogonal–color/shape collinear model: wcolor&shape = 1.3 and wtexture = 0.7). Note that the maximal weight a feature dimension could receive was 3 in the former two models and 2 in the latter two models.

Optimal Minkowski’s r

The Minkowski r-metric model acts as a confirmatory model alongside the 10 models tested in the study. Unlike the other models, the Minkowski r-metric model does not define a specific type of combination among the three dimensions. For instance, when r equals 1, the three dimensions combine collinearly, corresponding to Model 4 (three-way collinear integration model). When r equals 2, the three dimensions combine orthogonally, resembling Model 3 (three-way orthogonal combination model). The specific type of combination rule in the Minkowski r-metric model depends on the specific data sets; thus, it is not a predetermined model like the 10 models tested. However, Minkowski’s r effectively captures the extent to which the leading dimension drives the overall guidance. Indeed, as r increases toward infinity, the Minkowski r-metric approaches Model 2 (best guidance model), where the visual dimension with the overall larger contrasts contributes the most to guidance.

In the current set of experiments, the optimal r that yielded the largest R<sups>2</sups> for the Minkowski r-metric model was 13.3, indicating that the leading dimension received disproportionately greater importance compared to the other feature dimensions. This result aligns with the observed model performance: All of the weighted models outperformed the corresponding unweighted counterparts, indicating an imbalance across the three feature dimensions in terms of prioritization during tridimensional search. Furthermore, across all weighted models, the weights associated with the color dimension were much larger than those for the other dimensions, confirming color’s predominance in this set of experiments.

The Winning Model

In comparing the four weighted models, the R<sups>2</sups> of Model 9 (weighted color collinear–shape/texture orthogonal integration model) was the highest (which was also the highest among the 10 models tested). Specifically, Model 9 (R<sups>2</sups> = 90.76%, AIC = 796.31) was found to be 53 times more likely than Model 7 (weighted three-way orthogonal combination model; R<sups>2</sups> = 90.06%, AIC = 804.23), 393 times more likely than Model 8 (weighted three-way collinear integration model; R<sups>2</sups> = 89.86%, AIC = 808.25), and 1.2 × 10<sups>10</sups> times more likely than Model 10 (weighted texture orthogonal–color/shape collinear combination model; R<sups>2</sups> = 85.14%, AIC = 847.62) in explaining the observed data. However, as discussed in the Additional Analyses section, later bootstrapping analyses suggested that Model 7, the weighted three-way orthogonal combination model, might be more universal and the overall winning model.

To visualize the model performance, Figure 4 displays the observed RTs from Experiment 4–12 as a function of the predicted RTs for the four weighted models. These models were constructed based on the most current understanding of how color, shape, and texture combine—that is, color and shape are presumed to combine collinearly, and shape and texture orthogonally, with room for attentional modulation. Note that in each figure, there are 108 RTs being predicted (i.e., four set size levels by 27 tridimensional distractor types).
>
><anchor name="fig4"></anchor>

<bold>Optimal Weight Stability Analysis</bold>

The weighted models (Models 7–10) all contained attentional weight parameters that were estimated on the entire data set. When comparing the weighted models to unweighted models, the implicit assumption is that weight parameters are not free parameters, but rather a characteristic of the data set that is inherent to the condition, and thus should not be counted as an additional parameter in the model. To test the validity of this assumption, we performed a split-half analysis to validate the optimal weights.

For this analysis, we estimated optimal weights of the two top-performing weighted models (Models 7 and 9) using a training data set made up of half of the total data set and assessed their predictive accuracy on the remaining half of the data (the testing set). We began by using the complete data set from the unidimensional experiments (i.e., Experiments 1–3) to estimate unidimensional search slopes. Next, we randomly sampled half of the tridimensional trials to determine optimal weight parameters during tridimensional searches and constructed the two top-performing weighted models (Models 7 and 9). We then tested these models on the remaining half of the tridimensional trials. This process was repeated 100 times to arrive at the parameter estimates and model performance presented in Table 5 and Figure 5. Results showed that both weighted models, constructed based on the training sets, successfully predicted data in the testing sets (R<sups>2</sups> column in Table 5). These results suggest that the weight parameters are stable estimates of the prioritization people place on different feature dimensions. Additionally, the split-half testing results suggest that Model 7 might be a slightly better model. This was later confirmed in the bootstrapping and post hoc power analysis results; see the Additional Analyses section and Appendix for details.
>
><anchor name="tbl5"></anchor>
>
><anchor name="fig5"></anchor>

<h31 id="xge-155-3-839-d543e2150">Discussion</h31>

Results from Experiments 1–12 demonstrated that during tridimensional searches, people likely incorporate information from all the feature dimensions to guide their attention, as the top three performing models all incorporate the contrast signals from all three feature dimensions. Notably, while all three dimensions contribute to search performance, there is a tendency to prioritize signals from the color dimension. This is evidenced by the large optimal weights associated with the color dimension across all weighted models. The optimal Minkowski’s r value of 13.3 further confirms that the visual dimension with the largest distinctiveness signals (in this case, color) was overrepresented compared to the other two dimensions in determining the overall guidance. Indeed, examining the stimuli and search performance in the unidimensional search experiments confirms the leading role of color. The target–distractor distinctiveness along the color dimension (producing the smallest set of search slope values: 25–58 ms/log unit in Experiment 1) was larger compared to the distinctiveness along the shape (44–85 ms/log unit in Experiment 2) and texture (38–101 ms/log unit in Experiment 3) dimensions. Overall, color was the most useful feature dimension for guiding tridimensional searches in these experiments.

The fact that color distinctiveness produced more efficient searches than shape and texture might explain why the best feature guidance model outperformed five out of eight models that consider signals from all three dimensions, including all the unweighted tridimensional models and the weighted texture orthogonal–color/shape collinear model. It could also explain why Model 1 (color-only model) outperformed the unweighted Model 3 (three-way orthogonal model) and Model 4 (three-way collinear model). Given that color in this set of experiments tended to have overall stronger guiding signals than shape or texture, models prioritizing the contribution of color had an advantage in explaining the data, regardless of whether the model captured the underlying structure of how signals combine across different dimensions.

Next, we completed a second set of experiments where the contrast signals along the color, shape, and texture dimensions had comparable ranges. The goal was to evaluate whether people would still preferentially allocate a higher attentional weight to the color dimension or whether the weights reflect the usefulness of a feature dimension, in which case, we would expect similar weight parameters for all three dimensions.

Experimental Set 2

A set of 12 experiments were conducted using the same paradigm as before, each with a naïve group of participants, but using a new set of feature parameters with more comparable distinctiveness signals across the three visual dimensions and where color no longer had a guidance advantage over shape and texture.<anchor name="b-fn1"></anchor><sups>1</sups> All data, analysis code, and research materials are available on the Open Science Framework (Xu, Lleras, & Buetti, 2024) at <a href="https://osf.io/bmwa4/" target="_blank">https://osf.io/bmwa4/</a>.

<h31 id="xge-155-3-839-d543e2173">Method</h31>

<bold>Participants</bold>

Undergraduate students from the University of Illinois at Urbana–Champaign completed the experiment in exchange for course credit. Since this experimental set was conducted in person, the sample size was determined based on previous in-person experiments in our lab, which showed that averaging the data of 20 subjects produces stable estimates of the group means of reaction times and search slopes for a given search condition (e.g., Buetti et al., 2016; Madison et al., 2018; Ng et al., 2018; Wang et al., 2018) and is sufficient to obtain differentiation between models (e.g., Buetti et al., 2019; Lleras et al., 2019; Wang et al., 2017; Xu, Lleras, & Buetti, 2021; Xu, Lleras, Shao, & Buetti, 2021). Similar to Set 1, a post hoc power analysis (see the Appendix) showed that satisfying model distinguishability started around a sample size of 20, and weight parameters stabilized around a sample size of 10 for Set 2. We aimed to include 25 valid participants in each in-person experiment, but because of the nature of data collection, sometimes more participants ended up being run (e.g., participants had already signed up to participate in the experiment).

For each experiment, participant inclusion criteria were the same as Set 1 except for the following: (a) Search accuracy rate was now calculated as the percentage of trials in which participants made a correct response divided by the total number of trials (i.e., we did not remove time-out trials before calculating accuracy), and (b) an individual’s average RT should fall within 2.5 standard deviations of the group average RT, instead of 2 standard deviations in Set 1. The number of recruited participants, the number of participants included in the analysis, the number of participants excluded for each criterion, demographic information of included participants, and summary statistics of the measurements in each experiment are reported in Table 6.
>
><anchor name="tbl6"></anchor>

<bold>Apparatus and Stimuli</bold>

Experiments were again programmed in JavaScript and conducted on Pavlovia. Participants used Mac minis and gamma-corrected 24-in. LCD displays with a 239.75-Hz refresh rate and a resolution of 1,920 × 1,080 in the lab to complete them. Stimuli were randomly displayed on a grid located at the center of the screen, spanning approximately 27 × 27 cm (approximately 25° of visual angle, with a viewing distance of 60 cm). The stimuli, measuring roughly 1.4 × 1.4 cm in physical size (1.3° × 1.3° of visual angle), were positioned with a small random jitter based on three concentric circular grids, measuring about 25.5 cm, 13.7 cm, and 7.4 cm in diameter, respectively. Examples of stimuli are shown in Figure 6.
>
><anchor name="fig6"></anchor>