Abstract
Objectives
As crimes unfold, how is situational information channeled into decisions? Methodological tools capable of answering this question can advance criminological theory and research. This preregistered study introduces Focal Visual Attention (FVA) as a decision-making construct for measuring visual information processing during realistic, dynamic crime-related scenarios.
Methods
We recruited 262 German-speaking male participants who viewed two 5-minute immersive 360° videos depicting a personal provocation and a witnessed sexual harassment incident in a bar from a point-of-view perspective. Eye-tracking recorded ~ 4.35 million gaze points. Post hoc, mean-shift clustering, an unsupervised machine-learning algorithm, identified regions of interest (ROIs). Fixations within these ROIs served as proxies for FVA. Linear mixed-effects models examined temporal changes in FVA and associations with individual characteristics.
Results
Overall, visual attention to conflict-relevant stimuli increased over time, with additional avoidance fixations toward exits. Participants with histories of violence and/or prior police involvement showed distinct attention patterns. FVA also varied with self-reported emotions, cognitive appraisals of risk and benefit, and intentions to respond aggressively.
Conclusions
FVA provides a robust, quantifiable measure of moment-to-moment visual attention in crime and analogous decision making. Methodologically, this study demonstrates a scalable framework for analyzing high-density eye-tracking data using immersive 360° video and unsupervised machine learning. Substantively, the findings extend our understanding of crime decision making with potential to inform research in policing, situational crime prevention, and fear-of-crime.
Introduction
Information transmission and processing are central to choice-based theories of crime and the policies they inform (Anwar and Loughran 2011; Gibbs 1968; Stafford and Warr 1993). As Geerken and Gove (1975:498) argued, understanding real-world crime decisions requires a comprehensive account of how individuals acquire, interpret, and respond to information about the risks and costs of offending within specific social environments. Contemporary research increasingly examines the situational and contextual factors that shape criminal opportunities and how individuals extract and evaluate information from these opportunities to form expectations about potential outcomes (Barnum et al. 2021; Bucci 2024; Pickett et al. 2018; Pogarsky et al. 2017, 2018). Researchers have further extended this inquiry to multifaceted cues—ranging from explicit provocation to subtle non-verbal signals—that can trigger affective responses in real time and recalibrate perceptions of risk and reward (Bouffard and Miller 2014; Nagin 2007; van Gelder 2013).
Capturing these real-time evaluations of dynamic criminal opportunities, however, challenges traditional research methods. Written vignettes, for example, maximize researcher control over the stimuli respondents experience but present static scenarios that lack the realism of real-world offending. Real-life crime opportunities often involve multiple, sometimes competing stimuli that jointly shape decision processes, including peer pressure, verbal threats, ambiguous gestures, and other social and environmental cues (Cherbonneau & Jacobs, 2019; Jacobs, Topalli, & Wright, 2000; Topalli , 2005; see also Hoeben & Thomas, 2019). Such stimuli cannot be captured using written vignettes. To approximate these dynamics more effectively, criminologists are increasingly turning to immersive virtual environments that simulate realistic crime scenarios (van Gelder 2023).
Virtual reality (VR) and immersive 360° video simulations enhance realism by approximating the unfolding, multi-sensory character of real-world events (van Gelder et al., 2025). This realism, however, requires researchers to relinquish some experimental control and allow participants to determine what stimuli they focus on and for how long. For criminological theories of decision making, this presents both an opportunity and a challenge. Rational choice and related perspectives emphasize that individuals weigh risks, costs, and benefits within situational contexts (Cornish & Clarke, 1986; Nagin et al., 2015), yet these frameworks seldom specify which environmental cues are incorporated into that calculus or how such cues are initially perceived. Progress therefore requires methodological tools capable of documenting where participants allocate attention, the intensity of that allocation, and how patterns of attention evolve in situ. Such tools can make it possible to assess not only whether specific contextual inputs matter, but also whether they were perceptually registered in the first place—a necessary precondition for the inputs to influence a crime decision.
In this study, we introduce Focal Visual Attention (FVA) as an empirically tractable variable to capture attention during crime-relevant decisions. We define FVA as a behavioral index of visual attention in dynamic environments (Orquin and Mueller Loose 2013). FVA provides a new tool for tracing how situational cues are noticed—or ignored—in unfolding crime opportunities. Importantly, this approach addresses limitations of from retrospective accounts and hypothetical reasoning by providing direct behavioral evidence of attentional allocation during decision making. In doing so, FVA enables pinpointing which environmental cues enter the cost–benefit calculus (Becker 1968; Clarke & Cornish, 1985; Loughran et al. 2016).
In this study, we first introduce a method for extracting meaningful measures of FVA from eye-tracking data in immersive virtual scenarios. Second, we illustrate the utility of this method by analyzing data from 262 German-speaking males who experienced two immersive 360° VR videos: (1) a barroom confrontation in which the participant was directly provoked, and (2) a sexual harassment incident in which the participant witnessed a woman being harassed by another patron. Across these immersive scenarios, eye-tracking data were recorded, yielding more than 4.35 million fixation points—providing an unparalleled level of granularity for studying visual attention during crime-relevant decision making. However, analyzing such high-density, unstructured data presents distinct challenges. Most eye-tracking studies rely on predefined regions of interest (ROIs), but in complex, rapidly unfolding VR environments, defining ROIs a priori is often infeasible (Rahal and Fiedler 2019; van Gelder et al. 2024). To address this, we apply mean-shift clustering, a machine learning algorithm that identifies ROIs post hoc based on fixation density, to index FVA.
As a formal demonstration of the framework’s potential, we use the resulting FVA measure to examine three questions: (1) how does visual attention vary over time across both scenarios? (2) How does attention shift based on situational stimuli experienced during the scenarios (e.g., provocation)? (3) Do individual differences, such as prior experiences, relate to idiosyncrasies in visual information processing? Finally, to illustrate the potential applicability of FVA for criminological research questions, we examine how FVA co-varies with established decision inputs, including integral emotions, risk and benefit perceptions, and behavioral intentions.
Our goal is to provide criminologists with a computationally feasible and replicable approach for measuring attention in realistic, dynamic settings. By unpacking the perceptual processes entailed in judgment and choice, this framework expands the methodological toolkit for testing and refining theories of crime decision making, with applications extending to areas such as policing, public safety, and informal social control.
Immersive Scenarios and Eye-Tracking Capabilities
The methodological value of immersive technologies—such as virtual reality (VR) and immersive 360° videos—for studying crime and decision-making is increasingly evident in criminological research (e.g., Herman et al. 2024; McClanahan et al. 2025; Nee, 2024; Sergiou et al. 2024; van Gelder 2023; van Gelder et al., 2017, 2019). Two features are particularly important. First, immersive simulations better approximate the sensory and emotional complexity of real-world crime opportunities. Such situations often involve ambiguous cues—provocation, nonverbal signals, peer presence, or spatial proximity—that interact in real time. Immersive simulations allow researchers to systematically manipulate these features, controlling both overt behaviors like direct threats and subtle dynamics such as eye contact, tone of voice, and body language. This facilitates standardized stimulus delivery while still evoking naturalistic emotional responses, which are difficult to elicit using static or decontextualized formats (Diniz Bernardo et al. 2021; Ruggiero et al. 2017).Footnote 1
Second, integrating eye-tracking into immersive scenarios provides precise measurement of focal visual attention (FVA), defined here as the observable pattern of gaze allocation that reveals which aspects of a dynamic scene individuals attend to and prioritize during decision making (Ford et al. 2010; Orquin and Mueller Loose 2013). In practical terms, FVA reflects the selective filtering process through which specific environmental cues are perceptually registered while others are disregarded, providing a behavioral index of which information becomes cognitively available for evaluation (e.g., Ferrer et al. 2016; Finucane 2011; Satmarean et al. 2022). This makes it possible to trace moment-to-moment shifts in visual focus and link them to other key processes such as emotional responses, cognitive inputs, and behavioral intentions.
By contrast, traditional approaches such as written vignettes or static surveys reduce complex situations to fixed snapshots, obscuring the temporal structure and fluidity of real-world decision making, where individuals must interpret overlapping cues under conditions of ambiguity and stress (Exum and Bouffard 2010). In practice, people must interpret partial and sometimes conflicting information, decide which cues to prioritize, and they do so while managing simultaneous emotional and cognitive demands. Such dynamic and often chaotic processes cannot be approximated with static, text-based formats.
Immersive simulations, in turn, address these shortcomings by allowing participants to visually move through unfolding environments that more closely resemble everyday encounters. Within these settings, participants distribute their visual attention freely, responding in real time, for example to stimuli signaling provocation or threat, or subtle social signals (van Gelder, 2023; van Gelder et al. 2019, 2025). The current study leverages eye-tracking data to assess whether participants visually engage with manipulated situational stimuli and to assess how such engagement shapes decision making in ecologically valid contexts of potential conflict. We argue that this could provide a foundation for linking visual attention to core theoretical processes—such as information acquisition, evaluation, and choice—that underlie crime-relevant behavior, a connection we develop in the following section.
Focal Visual Attention in Crime Decision Making
Crime-related decisions often unfold in environments that are dynamic, ambiguous, and emotionally charged. In such contexts, individuals’ perceptions of risk, opportunity, and provocation are not fixed, but are actively constructed through information processing (Geerken and Gove 1975; Matsueda et al. 2006; Stafford and Warr 1993). Decades of criminological research have successfully identified core subjective decision inputs—most notably perceived risks, costs, and benefits of crime—and demonstrated how these perceptions reliably predict behavioral outcomes (Klepper & Nagin, 1989; Loughran et al. 2016; Matsueda et al. 2006; Piliavin et al., 1986; see Apel, 2013, 2022 for review). This work has firmly established perceptions as central explanatory mechanisms in offender decision making. Nevertheless, because most prior studies have relied on surveys, written vignettes, or other static methodologies, existing evidence primarily documents the consequences of perceptions rather than discerning the direct processes by which they are formed in situ. As a result, the origins of these perceptions have remained largely inferred rather than directly observed.
Research indicates that decision inputs are simultaneously idiosyncratic—shaped by individuals’ histories, biases, and cognitive limits—and coherent, insofar as they remain tethered to objective features of the setting (Barnum et al. 2021). Thus, perceptions reflect the joint influence of person-level dispositions and situational conditions. What remains understudied in criminology is how situational information is channeled into those perceptions. We propose attentional allocation as a fundamental process that links objective features of the environment to the perceptions that guide choice, capturing both the pull of external cues and the filtering imposed by individual differences (Goldstein and Cacciamani 2022). Yet systematic evidence is limited regarding which situational stimuli capture attention, how long attention is sustained, and how these temporal dynamics—especially under stress and uncertainty—relate to perceptions and decision outcomes (Bradley 2009; Orquin and Mueller Loose 2013). Absent this evidence, we cannot specify which features persist into higher-order evaluation (e.g., risk, moral acceptability, anticipated benefits) and which do not ultimately influence choice.
Evidence from cognitive psychology, neuroscience, and behavioral decision science demonstrates that visual attention is central to this filtering process. For example, laboratory experiments show that visual fixations on a target reliably predict choice outcomes between two options, even when the alternatives are equivalent (Krajbich et al., 2010). Prolonged gaze corresponds with subjective weighting of information, strengthening confidence, preference, and willingness to act (Shimojo et al. 2003; Zizlsperger et al. 2012). Neuroimaging studies further demonstrate that attentional allocation modulates neural valuation processes, altering how costs and benefits are computed (Lim et al., 2011). These findings converge on a central conclusion that perceptions of risk, opportunity, and provocation are actively constructed through visual attentional dynamics.Footnote 2 Accordingly, specifying how environmental cues are registered, sustained in attention, and incorporated into evolving perceptions enriches our understanding of crime decision-making.
Leveraging immersive scenarios, we highlight four key applications of FVA for advancing criminological theory and decision making research. First, FVA serves as a process-tracing tool that verifies whether participants attended to the environmental features entailed an experimental manipulation, and for how long (Ford et al. 2010). This addresses a common limitation in criminological experiments, where researchers infer treatment effects from assignment rather than from participants’ actual perception of the manipulated cue (e.g., Nivette et al., 2024).Footnote 3 For example, in a recent study, van Sintemaartensdijk and colleagues (2021) asked participants to appraise a virtual neighborhood for burglary opportunities while varying levels of guardianship from passive presence to active intervention. Participants in guardianship conditions reported higher perceived likelihood of being caught, leading the authors to conclude that the mere presence of a guardian deters crime. FVA offers the additional capability to determine whether participants visually registered the guardian, quantify the dose of attention (e.g., cumulative dwell) and establish temporal priority relative to subsequent appraisals of competing environmental features (Orquin & Loose, 2013). We use FVA to index the moment-to-moment allocation of overt visual attention (gaze) to situational cues, thereby capturing which information is evaluated, or alternatively, filtered out, which in the current study are provocation and third-party sexual harassment.
Second, FVA provides a tractable method for quantifying gaze allocation across unfolding situations, identifying when and how attention shifts among competing cues. Criminogenic settings are saturated with ambiguous, rapidly changing, and often nonverbal signals, such as facial expressions, gaze direction, hand movements, posture, and interpersonal distance, that can redirect attention in real time. A sudden movement by a provocateur, for example, may draw focus away from a victim or an exit. By revealing whether such cues are noticed, how intensively they are scrutinized, and in what sequence they guide subsequent gaze, FVA elaborates on the perceptual pathways through which heuristics, biases, priming, and contextual features shape judgment (Herman & Pogarsky, 2025; Pogarsky et al. 2017).Footnote 4 Patterns of fixation can also index latent action tendencies, with sustained dwell on a provocateur’s face or hands suggesting vigilance and a retaliatory orientation, whereas rapid glances toward exits or nearby bystanders aligning with avoidance or help-seeking (Ford et al. 2010).
Third, FVA illuminates individual differences in information processing by linking attentional allocation to experiential histories and stable dispositions. Developmental and social-psychological research suggests that prior victimization and related experiences shape whether attention to aggressors predicts retaliatory responding or whether attention to defenders predicts protective behavior (Troop-Gordon et al. 2019). Criminological decision-making frameworks similarly hold that past offending, sanctioning, and victimization prompt the recalibration of perceived costs and benefits and, by extension, the salience of cues (Anwar and Loughran 2011; Paternoster et al., 1983; Jacobs & Cherbonneau, 2019). Embedding FVA within immersive scenarios makes these propositions directly testable: researchers can assess whether individuals with different backgrounds systematically prioritize sanction cues, provocateurs, victims, or avenues of escape, and whether these attentional profiles mediate or moderate downstream emotions, evaluations, and intentions. Such evidence helps refines theory by showing when and for whom particular cues matter, thereby integrating heterogeneity in perception with heterogeneity in behavior.
Finally, in addition to capturing situational shifts and individual differences in FVA within our immersive 360° video scenarios, the experimental design allows for a preliminary assessment of whether visual attention to key situational stimuli—provocateurs, victims, or escape routes—systematically aligns with decision inputs emphasized in rational-choice and deterrence frameworks. Prior research has highlighted that emotional and cognitive responses to provocation during high-intensity conflicts, like those depicted in our scenarios, interact to shape fight-or-flight behaviors (Barnum and Solomon 2019; Carmichael and Piquero 2004; van Gelder et al. 2019, 2023; see also Collins, 2008; Katz 1988; van Gelder 2013). What remains largely unexplored is the extent to which these decision inputs are tied to specific elements of the conflict itself. Accordingly, we examine whether participants who focus on the conflict versus other environmental features report differential emotional experiences, expectations about the risks and benefits of using violence, and intentions to respond aggressively. Thus, the findings also provide insight into the role of affect in crime-related decision-making.Footnote 5 By directly linking what individuals attend to with what they subsequently perceive and choose, FVA connects situational structure to decision-making in real time, helping to further unpack the interplay of attention, emotion, and cognition in criminological contexts.
Current Study
In this preregistered study, we analyze data from two immersive 360° video scenarios designed to capture participants’ gaze in real time.Footnote 6 Prior work using portions of these scenarios (Barnum et al., 2024; Herman et al. 2024; van Gelder et al. 2024) and related virtual simulations (van Gelder et al. 2019; Gelder et al. 2022) demonstrates that participants engage with these environments in ways that approximate real-world conflicts and crime opportunities. However, these earlier studies primarily relied on post hoc self-reports of expectations and intentions, limiting the ability to identify precisely which situational elements drew attention at critical decision points.
The present study builds on this foundation by leveraging the dynamic structure of our scenarios, whereby threat and provocation unfold over time, with continuous eye-tracking to derive an objective measure of FVA. This approach allows us to: (a) verify whether participants visually registered key stimuli, such as provocateurs, victims, or escape routes, (b) describe how attention shifted as the situation evolved, (c) explore individual differences in attentional processing, and (d) examine whether FVA relates to key decision-making measures.
Data and Sample
We recruited young adult German males, as the immersive scenarios featured male antagonists and were filmed in German. Between March 2023 and May 2024, we distributed recruitment flyers at bars, restaurants, and university buildings in Freiburg, Germany. Eligibility criteria included male sex, German language proficiency, and no history of epilepsy or seizure disorders. In total, 297 German-speaking males between 18 and 33 years old (M = 24.27, SD = 3.31) took part in the study. Thirty-one participants had to be excluded due to initial-session technical difficulties with the eye-tracking apparatus, primarily related to achieving precise participant-specific calibration within the immersive virtual environment. Such calibration is a prerequisite for valid gaze measurement and is widely recognized as a common source of data loss in VR eye-tracking studies (e.g., Liu et al. 2024). We excluded an additional four participants who failed an attention check, resulting in a final analytic sample of n = 262. A sensitivity analysis suggested that the minimal detectable effect size for this sample was f2 = 0.05, assuming a level of significance of α = 0.05 and a desired test power of 1-β = 0.80.
Most participants were born in Germany (88%) with an average age of 24 years (SD = 3.35; Range = 18–33). Around 35% of participants had been involved in a physical confrontation in the last 3 years, and around 10% had been arrested at least once prior to the data collection. The study was approved by the ethics council of the Max Planck Society (application number 2021_37).
Study Procedure and Immersive Scenarios
All participants completed a one-hour laboratory session at the MAXLab Freiburg, a virtual reality and behavioral research facility.Footnote 7 Participants were compensated 20 euros for their time. During the session, participants experienced two immersive 360° video scenarios and completed two surveys—one embedded within the VR experience and the other administered via Qualtrics on a laptop. Figure 1 presents an overview.
The video scenarios are part of the MAXLab Aggression and Bystander Intervention Scenario Set (MAXLab_ABISS; van Gelder et al. 2024), which was specifically designed to study the role of emotions in decisions related to crime and interpersonal violence. Participants viewed two scenarios in randomized order, with an approximately 20-minute interval between them. We refer to the first as the “fight scenario” and the second as the “harassment scenario.” van Gelder et al. (2024) offer comprehensive descriptions of the scenarios and their components, the technical specifications used in their creation, and information on how to obtain the materials for research purposes.
Both scenarios were filmed from a first-person point-of-view of the study participant in a crowded Irish pub. Key roles in the scenarios were portrayed by professional actors. The bar was also populated with extras to generate the appearance of a lively pub. Importantly, participants can look in any direction at any time throughout the video, creating an unconstrained, naturalistic 360° viewing experience. Throughout the virtual environment, bar patrons moved around the space, talked among themselves, and ordered drinks, producing a dynamic and socially complex but realistic setting. Moreover, we embed several subtle auditory cues using spatial audio to guide viewers’ attention toward key events occurring in the scenario without restricting natural viewing to further enhance presence and realism (van Gelder et al. 2024). These include common bar sounds such as a bartender knocking on the bar, a glass breaking, and a group of women cheering loudly at a nearby table.
From the start, participants were positioned at the bar, looking outward into the barroom. The scenarios progressed in three phases that transition seamlessly. The first phase consisted of a baseline period during which no meaningful events transpired, allowing participants to acclimate to the virtual barroom. In the second phase, participants were randomly assigned to one of three possible versions of an initial interaction with other patrons per scenario.Footnote 8 For the fight scenario, participants were engaged by (1) a male who later becomes the aggressor in the fight scenario; (2) a male who is not involved in the subsequent conflict; or (3) the bartender only. For the harassment scenario, participants were engaged by (1) a female who later becomes the victim in the harassment scenario; (2) a female who is not involved in the subsequent conflict; or (3) the bartender only. The third involved the conflict itself, wherein the antagonist turns toward the participant and verbalizes multiple insults in the fight scenario. In the harassment scenario, a male patron harasses a nearby woman near the participants’ location within the bar. Figure 2 presents a first-person view of these conflict scenes.
Participants could not interact with or influence these individuals; they remained passive observers throughout the scenario.Footnote 9 Because viewing was unrestricted, participants were not required to fixate on these interactions, allowing us to directly assess whether and when they naturally directed attention to key social cues, particularly the emotionally charged conflicts.
Measurement
Measuring FVA with Eye-Tracking Data
As stated earlier, one objective of this study is to provide researchers with clear guidance for processing eye-tracking data in immersive environments and generating scalable metrics of visual attention. The scenarios were presented using an HTC Vive Pro Eye head-mounted display equipped with integrated eye-tracking (Vive 2022). As participants viewed each 360° video, the eye-tracker continuously recorded the direction of their gaze. The system first translated each gaze vector into a three-dimensional coordinate on the virtual image sphere, which represents the full immersive visual field (see Fig. 2). Using a technique known as ray casting, these 3D gaze points were then converted into two-dimensional pixel coordinates on the surface of the video sphere, identifying precisely where each participant was looking at any given moment. This process allowed us to capture real-time gaze behavior with high spatial and temporal resolution. On average, the system recorded 5,141 gaze points per participant in the fight scenario and 6,226 in the harassment scenario. Across all participants and scenarios, we collected approximately 4.35 million gaze data points for analysis.
To transform this high-density eye-tracking data into a tangible measure of visual attention, eye-tracking studies commonly rely on the concept of “fixations” within regions of interest (ROIs). An ROI refers to a predefined area within the decision environment—such as a face or object—identified prior to data collection (Fuhl et al. 2018; Raschke et al. 2014). A fixation is defined as a sustained gaze at a specific point, typically lasting at least 60 milliseconds to distinguish it from a saccade, which is a brief, involuntary shift in eye position (Hooge et al. 2022; Horstmann et al. 2009; Inhoff and Radach 1998; Rayner 1998; Wedel et al. 2023). More fixations on a given ROI indicate more attentional allocation, whereas fewer fixations reflect less (Raschke et al. 2014; van Renswoude et al. 2018).
As made clear, eye-tracking data are inherently complex. In the 5-minute video scenarios, we recorded more than 5,000 gaze points per participant. For our analyses, we apply a machine learning approach that does not require predefined ROIs. This method enables the extraction of meaningful metrics of FVA suitable for inferential analysis (Cios et al. 2007; Ivanová et al. 2022; Vella et al. 2017). Next, we outline a practical approach for generating FVA metrics from raw eye-tracking data using commercially available immersive technology and conventional computer programs.
Establishing Key Video Segments: Attention Phases Variable
To define meaningful ROIs from our eye-tracking data, we first needed to identify analytically tractable segments of the video scenarios, in which patterns of visual attention could be interpreted in relation to specific stimuli. This enables us to distinguish, for example, between attention to background noise, neutral stimuli, and conflict-relevant cues. We segmented the scenarios by time intervals corresponding to the three distinct phases of the narrative: baseline, initial interaction, and conflict.
Because our primary interest is whether participants attend to the main conflict in each scenario, we use the conflict phase as the anchor for identifying comparable analysis segments. Both conflict phases in the fight and harassment scenarios are equally long (~ 24 s). Conversely, the baseline and interaction phases vary in duration across scenarios, leading to large differences in the volume of eye-tracking data per phase. To address this, we extract exactly 24 s of eye-tracking data for each phase, i.e., baseline, initial interaction, and conflict, in both scenarios, thus equalizing the duration of data across phases.Footnote 10
We take this approach for several reasons. First, by holding the time window constant, we can more precisely isolate the effect of the primary conflict on participant attention relative to other segments, while avoiding any confounding from differing time durations. Second, it allows us to statistically account for visual “noise,” such as incidental movement like patrons walking to the bathroom, that is observable but unrelated to the core narrative. To capture this, we aggregate all gaze data outside the three 24-second intervals into a single “noise” control category. This enables direct comparison of attention during the emotionally salient conflict phase versus the less intense baseline and interaction phases, while controlling for distraction from irrelevant stimuli throughout the scenario.
To operationalize this approach, we created a four-level categorical variable, Attention Phases, to disentangle the effect of attention at various points of the scenario on key decision variables including rational choice considerations and behavioral intentions. Attention phases are coded such that baseline = 0 (reference category), initial interaction = 1, primary conflict = 2, and residual noise = 3.
Deriving Data Driven ROIs
The goal of this analysis is to identify measurable regions of interest (ROIs) that reflect attention to emotionally salient stimuli and can be incorporated into decision-making models both as predictors and outcomes. All computations for defining ROIs and calculating fixation metrics were conducted in Python using the Jupyter Lab development environment (Kluyver et al. 2016).Footnote 11
To achieve this, we apply an unsupervised machine-learning algorithm to cluster the eye-tracking data. Unsupervised algorithms are well-suited for detecting structure in large datasets when little prior information is available regarding category boundaries or distributional form (Ghahramani 2004). Given the unstructured, continuous nature of our immersive gaze data, and the absence of predefined ROIs, we utilize mean-shift clustering, a non-parametric approach that computes clusters based on the spatial density of data. Mean-shift clustering is advantageous, as it does not require prior assumptions about the number or shape of clusters, unlike other approaches such as k-means clustering (Comaniciu and Meer 2002; Fukunaga and Hostetler 1975). This flexibility makes it particularly suitable for dynamic, naturalistic environments where visual fixations are distributed across complex stimuli.
Clusters are created by iteratively shifting data points towards local density maxima (Cheng 1995). This is achieved using a kernel density estimator, which defines the local density around each point. The general form of the kernel density function is:
Here, n is the total number of data points, and d is the dimensionality of the data. There are two dimensions in our case, representing x- and y-pixel coordinates on the video sphere. The parameter h is the bandwidth, or the radius of the area for which the local density estimation is conducted (Comaniciu and Meer 2002). The term K(x – xi) is the kernel function that determines the weight of each data point in the density estimation. This function is typically a Gaussian kernel of the form:
where xi is a specific data point from the set x, and c is a bandwidth-related scaling constant (Vert et al. 2004). The local density is recalculated at each iteration of the algorithm. Based on these calculations, data points are progressively shifted toward their local density maxima, increasing density in subsequent iterations. As a result, the number of distinct maxima decreases as more points converge around shared areas of density (Comaniciu and Meer 2002; Fashing and Tomasi 2005). The direction and magnitude of this movement are defined by the mean shift vector:
This process iterates until convergence, at which point the remaining local density maxima define the final cluster centers, each representing a group of gaze points directed at a high-density focal area (Aliyari Ghassabeh 2015; Fukunaga and Hostetler 1975). Applied to the eye-tracking data, these final clusters represent the ROIs.Footnote 12
Survey Items
Beyond presenting a scalable measure of FVA using immersive technologies, we also included several survey items in the subsequent analyses, including prior experiences, emotional responses, cognitive appraisals, and behavioral intentions. Exact item wording is provided in Table 5 of the Appendix.
Prior Experience
To explore individual differences in FVA patterns, we measured two experiential variables: (1) prior involvement in physical altercations and (2) history of arrests. For both measures, we combined the non-zero categories to create two binary measures indicating whether participants had been in any physical altercations in the previous three years (0 = none; 1 = one or more) and whether participants had ever been arrested (0 = never; 1 = one or more times), respectively.Footnote 13
Integral Anger and Fear
We measured experienced anger and fear with a survey instrument adapted from prior work (Barnum and Solomon 2019; Herman et al. 2024; van Gelder et al. 2022). Importantly, participants reported how they “feel right now in the moment” directly within the VR headset using an in-VR survey, immediately following each scenario. This design served two purposes. First, presenting survey questions during the video would risk disrupting immersion and emotional engagement, which are central to the effectiveness of 360° environments. As such, capturing emotions directly in the headset immediately after the scenario is the closest practical approximation for capturing momentary, in situ emotions. Second, the questions were worded to limit emotive forecasting, based on research that people often have difficulty predicting how they would hypothetically feel in the future (Loewenstein 1996).
We measured anger and fear using three-item self-report scales for each. To assess anger, participants rated the extent to which they felt angry, disgusted, and annoyed; to assess fear, they rated how nervous, afraid, and stressed they felt. All ratings were provided on a 7-point Likert scale ranging from 1 (“not at all”) to 7 (“very”). We averaged the items to create a composite score for anger (αFight = 0.74; αHarassment = 0.69) and fear (αFight = 0.84; αHarassment = 0.81) such that high scores represent greater feelings of each emotion, respectively.
Decision-making considerations
Consistent with prior research on the cognitive processes underlying crime-related decisions, we measured three subjective expectations regarding the use of violence (e.g., Barnum and Solomon 2019; Loughran et al. 2016; Thomas et al., 2022). To measure the perceived risk of using violence, for each scenario participants indicated how dangerous it would be to engage in violence and to estimate the likelihood that injury to the respondent would occur. To measure intrinsic rewards participants indicated how fun or exciting it would be to engage in violence in either scenario. All three items were measured on a 7-point scale such that higher scores reflect more risk or benefits. Given the conceptual similarity between the two risk items, we averaged them to construct a composite risk index for each scenario (rFight = 0.52 and rHarassment = 0.59).
Behavioral Intentions
Finally, prior research suggests that visual attention may also influence behavior independently of conscious deliberation (Orquin et al. 2021). Therefore, we also explored the association between FVA and two measures tapping into intentions to use violence in both scenarios (Pogarsky 2004). First, we measured intentions to respond with verbal aggression by asking participants how likely they would insult the antagonist in the given scenario. Second, two items asked participants to rate how likely it is and how ready they would be to use violence in the given scenario. All three items were rated on a 7-point Likert scale. Unsurprisingly, the two violence measures were significantly correlated within scenarios (rFight = 0.69 and rHarassment = 0.58), so we averaged them to create a general willingness to use violence index for each scenario. Higher values the verbal item and the violence index reflect a greater intention to engage in aggressive behavior.Footnote 14
Results
To examine the role of visual attention in decision making during interpersonal conflict, we present a series of preregistered, theoretically informed analyses using our derived FVA metric. Our primary objective is to demonstrate the theoretical relevance and practical utility of FVA as an indicator of attentional focus during aggressive encounters. In addition to our situational analyses, we also report preliminary results on the association between FVA and established decision-making variables, including emotions, perceptions of risks and rewards, and behavioral intentions. We conduct all of the following analyses using RStudio (Posit Team 2025).
Descriptive Analyses
Gaze behavior
To illustrate participants’ gaze behavior across different phases of the fight and harassment scenarios, we first generated visual heatmaps (Fig. 3) depicting the distribution of raw gaze points prior to the derivation of fixations on the ROI.
Heatmaps of participants’ gaze across the attentional phases of the fight (a to c) and the Harassment (d to f) scenarios. Red indicates areas of high gaze density, white areas of low one. For clarity, a static picture from the respective 360° video environment has been included. Due to the nature of 360° videos, the picture might appear distorted. X- and y- coordinates are relative values. The y-axis is inverted, as pixel coordinates are originally counted from top to bottom
As can be seen in Fig. 3, gaze is broadly distributed across the visual field throughout all three phases. During the initial interaction and conflict phases, attention becomes more focused on the unfolding events but it remains relatively dispersed. These patterns are consistent with findings from van Gelder et al. (2024), who used heatmaps to illustrate how participants’ attention shifted in response to the auditory cues embedded in the scenarios. While heatmaps provide only coarse insights into gaze behavior, the substantial variation in gaze locations depicted in Fig. 3 underscores the need for methods capable of capturing more nuanced attentional patterns in dynamic environments.
Deriving regions of interest
Our primary variable of interest, FVA, was operationalized as fixations within regions of interest (ROIs). As described above, we applied mean-shift clustering to gaze data from the 24-second conflict segment of each scenario to derive these ROIs.Footnote 15 This segment contains the key provocation or harassment event and is most directly relevant to our theoretical focus on decision making under strain. In both scenarios, clustering reveals two distinct high-density fixation clusters, as shown in Fig. 4. The corresponding bandwidths are h = 0.11 for the fight and h = 0.20 for the harassment scenario data.
Clusters obtained by applying mean-shift clustering to the eye-tracking data during the conflicts in the fight scenario (a) and the harassment scenario (b). Black crosses represent cluster centers. For clarity, a static picture from the respective 360° video environment has been included. Due to the nature of 360° videos, the picture might appear distorted. X- and y- coordinates are relative values
As shown in Fig. 4, the resulting clusters are broad and included considerable noise, making them difficult to interpret in relation to the underlying video content. To improve cluster specificity, we apply a density-based cut-off. For each data point, we calculate its kernel density estimate and exclude values falling below two standard deviations above the mean density. This thresholding approach, adapted from prior work on cluster optimization, enhances the interpretability and functional value of the remaining clusters (Comaniciu and Meer 2002; Ester et al. 1996). The resulting clusters, depicted in Fig. 5, are more spatially coherent and thus more interpretable. In both the fight and harassment scenarios, two regions of interest emerged: one consistently captured the location of the conflict event, while the other is directed at the pub’s exit. These refined ROIs serve as the basis for subsequent analyses of FVA, allowing us to evaluate how participants’ visual attention to theoretically relevant cues—conflict versus escape—relates to decision making.
Clusters after applying a cutoff based on kernel density (KDE = Kernel Density Estimator) for both the fight scenario (a) and harassment scenario (b). Note that, to ease interpretation, only the boundaries of each cluster are depicted. For clarity, a static picture from the respective 360° video environment has been included. Due to the nature of 360° videos, the picture might appear distorted. X- and y- coordinates are relative values
To determine whether participants fixated on one of the regions of interest identified through clustering, we first extract fixation events from the raw eye-tracking data. Following established conventions, we define a fixation as a sequence of gaze points that remain within a small spatial area for at least 60 milliseconds (Hooge et al. 2022; Inhoff and Radach 1998). As explained earlier, this threshold differentiates fixations, which are reflective of sustained visual attention, from transient saccades, which serve primarily to reposition the eyes rather than encode information.
Although ROIs were derived from the 24-second conflict intervals, fixation detection was applied to the full eye-tracking dataset to capture all relevant attentional behavior. Gaze sequences shorter than 60 milliseconds were excluded, as such fleeting events are not considered meaningful indicators of attention.Footnote 16
Next, using the spatial boundaries of each ROI cluster, we assess whether each fixation fell within or outside an ROI. This yields a binary indicator of focal visual attention (FVA), coded as 1 if the fixation occurred within an ROI and 0 otherwise. This binary FVA variable is used as an outcome in our subsequent analyses.Footnote 17
Fixations on regions of interest
In both scenarios, participants fixated significantly more on the conflict than on the exit. The average fixation count on the conflict was 9.97 in the fight and 10.27 in the harassment scenario compared to 1.25 and 0.85 on the exit, respectively. Although less pronounced, the presence of this second ROI suggests that in some instances participants may have actively disengaged from the conflict, redirecting attention toward potential exits.
Recall that each scenario included one of three possible versions of a randomly assigned initial interaction phase: (1) an interaction with the future aggressor (fight) or future victim (harassment), (2) an interaction with unrelated bystanders, or (3) an interaction with the bartender only. While this manipulation is not the primary focus of the current study, it offered an opportunity to examine whether pre-conflict interactions influenced later fixation patterns. To explore this, we plot the number of fixations on the ROIs across the entire duration of each video separated by initial interaction conditions. The results are presented in Figs. 6 and 7.
Number of fixations on the conflict (provocateur/harassment) throughout the fight (top) and harassment (bottom) scenarios and across the respective conditions (Condition 1 = interaction with the same person that is involved in the conflict, Condition 2 = interaction with an unrelated person, Condition 3 = no interaction) of these scenarios (left to right). Colored dotted lines represent the onset of the attention phases of the scenarios, i.e. a baseline (light blue), the initial interaction (gold), and the conflict (salmon)
Number of fixations on the pub’s exit throughout the fight (top) and harassment (bottom) scenarios and across the respective conditions (Condition 1 = interaction with the same person that is involved in the conflict, Condition 2 = interaction with an unrelated person, Condition 3 = no interaction) of these scenarios (left to right). Colored dotted lines represent the onset of the attention phases of the scenarios, i.e. a baseline (light blue), the initial interaction (gold), and the conflict (salmon)
As expected, the fixation counts on the conflict increased dramatically during the conflicts, reaching peaks of approximately 25 fixations per second in both scenarios. By contrast, fixation counts were lower during the initial interaction phase—15 fixations per second in the fight scenario and fewer than 10 in the harassment scenario. This discrepancy likely reflects differences in spatial layout: in the fight scenario, the initial interaction occurred in closer proximity to the eventual conflict, thereby anchoring participants’ visual orientation in that direction. Similar though less pronounced, patterns emerged for the pub’s exit. Fixations on the exit peaked at lower levels overall, suggesting that participants were less likely to attend to escape routes than to remain visually engaged with the unfolding conflict.
A notable nuance emerged in the moments preceding the conflict. During this interval, participants were briefly engaged by the bartender, who asked whether they would like another drink. Because this interaction occurred in a direction away from the conflict, fixation counts temporarily dropped. The subsequent redirection of gaze toward the conflict ROIs once aggression began indicates that participants were actively processing both auditory and visual cues, shifting attention dynamically as the situation unfolded. These findings offer preliminary support for the role of FVA in crime-related decision making, underscoring the importance of conflict content as a key source of situational information (see also van Gelder et al. 2024). At the same time, fixations on the exit highlight a potential avenue for future research on visual attention to opportunities for avoidance and de-escalation, a point we elaborate on below.Footnote 18
Although there were some slight variation in fixations across the interaction conditions, no considerable differences emerged for either of the ROIs. Accordingly, we combined the data across these conditions for subsequent regression analyses, while including controls to account for possible residual influence.
Inferential Analyses
We use logistic mixed models and report odds ratios (OR) as coefficients for analyses where the outcome is the binary indicator of whether a fixation falls within an ROI. These models incorporated participant-level variation through a random intercept, while predictors of interest (e.g., prior experience with violence) were modeled as fixed effects. The model is specified as:
where γ00 is the fixed intercept, γ10 the fixed slope for the respective predictor variable Xij, and u0j the subject-specific random intercept. The index i represents the individual observation of the eye-tracking data that is collected for each participant, as indexed by j.
Temporal changes of FVA
We first examine how fixations on the ROIs change over time. To do so, we estimate a total of four logistic models as specified in Eq. (4), one for each ROI (conflict and exit) in both scenarios (fight and harassment). Predictors included (a) a continuous variable representing video playback time and (b) the categorical attention phases variable (baseline, initial interaction, conflict, noise). Participant-level random intercepts accounted for heterogeneity in baseline fixation likelihood.
In the fight scenario, fixation likelihood decreased slightly—but significantly—as video playback progressed. Relative to the baseline phase, fixation likelihood increased during the first interaction and rose sharply during the conflict. For clarity, we report odds ratios (OR) from the logistic mixed-effects models, with 95% confidence intervals (CI) and p-values in parentheses; unless stated otherwise, ORs compare each phase to the baseline phase. During the conflict phase, the odds of fixating on the conflict ROI were 7.08 times of those odds at baseline (95% CI: 6.96–7.20, p < 0.001), and the odds of fixating on the exit ROI were 4.04 times the baseline odds (95% CI: 3.88–4.20, p < 0.001).
In the harassment scenario, the overall effect of video playback on fixation likelihood is positive and more pronounced. As in the fight scenario, fixation likelihood increases significantly during the conflict phase. Specifically, the odds of fixating on the conflict ROI were 2.56 times the baseline odds (95% CI: 2.52–2.60, p < 0.001), and the odds of fixating on the exit ROI were 3.30 times of those odds at baseline (95% CI: 3.16–3.45, p < 0.001). Full model results are shown in Table 1. These results align with the descriptive trends in Figs. 6 and 7: as the situations escalate, FVA shifts toward conflict and exit cues, indicating real-time responsiveness to scenario dynamics.
FVA across scenarios
Next, we compare the fixation likelihoods on the ROIs, as well as their temporal changes, across the fight and harassment scenarios. To do so, we again estimate a logistic mixed-effects model as specified in Eq. (4), now including a binary indicator for scenario type (fight vs. harassment), the categorical attention phases variable, and their interaction as predictors. Of particular interest are the interaction terms, which indicate whether FVA patterns differ between scenarios during the conflict phase. We estimated one model per region of interest.
Overall fixation likelihood is slightly lower in the harassment scenario compared to the fight scenario for both ROIs. However, the interaction terms reveal meaningful phase variation: relative to the fight scenario, fixation probability in the harassment scenario is lower during the first interaction phase but significantly higher during the conflict phase. During the conflict phase, the odds of fixating on the conflict ROI are 1.37 times higher in the harassment than in the fight scenario (95% CI: 1.34–1.39, p < 0.001), and the odds of fixating on the exit ROI are 1.60 times higher (95% CI: 1.53–1.67, p < 0.001; see Table 2). Together, these results reinforce our first core finding: even in immersive environments where participants can freely allocate attention, fixations systematically converge on the conflict events designed to elicit strong negative emotions. This pattern supports the interpretation of FVA as a decision-making mechanism that captures not just what individuals later report perceiving but how they visually register and prioritize cues in real time.
Prior Experience and FVA
We next examine whether participants’ experiences with physical confrontation and arrest relate to their fixation likelihood on the ROIs. To do so, we estimate a total of eight logistic models as specified in Eq. (4), each including either prior violent confrontation or arrest history, together with their interaction with the categorical attention phases variable, as predictors. As before, the interaction terms between the demographic variables and the attention phases variable allow us to test whether prior experiences condition FVA during specific attention phases of the scenarios. Again, the models were estimated for each scenario and region of interest. Tables 3 and 4 present full model results.
First, we highlight associations between prior violence experiences and FVA. In the fight scenario, participants with at least one prior experience of violence show 11% higher odds of fixating on the conflict ROI (95% CI: 1.08–1.13, p < 0.001) and 18% higher odds of fixating on the exit ROI (95% CI: 1.13–1.24, p < 0.001) than those without such experiences. In the harassment scenario, effects are generally weaker or nonsignificant, with one exception: during the conflict phase, participants with prior violence show 30% higher odds of fixating on the conflict ROI (95% CI: 1.21–1.39, p < 0.001).
Next, we turn to associations between prior arrest experiences and FVA. Participants with a prior arrest history exhibit a substantially higher overall odds of fixating on the exit ROI—44% higher in the fight scenario (95% CI: 1.35–1.54, p < 0.001) and 54% higher in the harassment scenario (95% CI: 1.43–1.67, p < 0.001)—while their overall odds of fixating on the conflict ROI are lower or nonsignificant. Phase-specific contrasts qualify these averages: during the conflict phase of the fight scenario, those with prior arrest show 30% higher odds of fixating on the conflict ROI (95% CI: 1.24–1.36, p < 0.001) and 31% higher odds of fixating on the exit ROI (95% CI: 1.20–1.42, p < 0.001). In the harassment scenario’s conflict phase, they again show 31% higher odds of fixating on the exit ROI (95% CI: 1.19–1.44, p < 0.001).
In summary, our findings mirror the influence of prior experience on how individuals assess stressful situations. Participants with experiences of violence or arrest appear more vigilant for situational cues, either by visually monitoring the conflict itself or by searching for routes of escape. At the same time, the distinct effects of violent confrontation versus arrest suggest an important difference: while prior violence may sensitize individuals to conflict cues, prior arrest appears to shift attention toward avoidance. This might reflect the potential consequences of direct involvement, highlighting the importance for future research to further unpack the role of different experiential dimensions, such as offending vs. victimization, on perception formation and decision making. Moreover, differentiating participants’ prior involvement with the justice system, e.g. prior arrest, prior sentencing, or probation experiences, would allow for more fine-grained insights into situational deterrence-related processes.
FVA and Decision-Making Variables
Finally, we provide a preliminary analysis of FVA’s role within an extended offender decision making framework. Post-VR survey ratings captured integral emotions, such as anger and fear, decision making considerations regarding risk and benefit, and intentions regarding verbal or violent behavior. These analyses are intended to examine whether participants who are more visually fixated on the identified ROIs differ in reported emotions, perceptions, and intentions. In doing so, we assigned participants into one of two groups according to whether the respondent had below- or above-average fixation counts on both identified ROIs during the conflict phase. The latter group included participants who were most visually fixated on the environmental stimuli of interest.Footnote 19
Next, we conducted a series of unpaired two-tailed t-tests comparing the mean of reported decision input variables between participants below versus above average in fixations on both ROIs. Figures 8 and 9 below present descriptive results, with further information on the t-tests included in the supplementary materials. Distributions for each decision variable are visually depicted in Figs. 10, 11 and 12 in the Appendix.
Ratings of integral emotions, rational choice considerations, and behavioral intentions for participants with above (salmon) and below (light blue) average fixation numbers on the regions of interest during the conflict of the fight scenario. Points indicate mean scores. Whiskers indicate standard errors
Ratings of integral emotions, rational choice considerations, and behavioral intentions for participants with above (salmon) and below (light blue) average fixation numbers on the regions of interest during the conflict of the harassment scenario. Points indicate mean scores. Whiskers indicate standard errors
In the fight scenario, participants with above-average fixation counts on the conflict ROI report greater anger (t = 2.047, p = 0.042, d = 0.27), higher perceived benefits of violence (t = 2.390, p = 0.018, d = 0.32), and greater intentions to intervene verbally or physically (verbal: t = 2.575, p = 0.011, d = 0.34; physical: t = 2.972, p = 0.003, d = 0.40). Differences in perceived risk were observed but not significant. By contrast, fixations on the pub’s exit were associated with higher fear (t = 2.274, p = 0.044, d = 0.94) and noticeable, but insignificant changes in benefit appraisals (t = 1.875, p = 0.102, d = 0.94).
In the harassment scenario, participants with above-average fixations on the harassment ROI report increased perceived benefits (t = 1.813, p = 0.071, d = 0.23) and greater intentions to engage in violence (t = 1.547, p = 0.123, d = 0.20), with notable differences in intentions for verbal intervention (t = 1.901, p = 0.057, d = 0.24). However, none of these differences reach conventional statistical significance. For participants with above-average fixations on the pub’s exit, only intentions for verbal intervention are meaningfully higher (t = 1.901, p = 0.029, d = 0.95).
Taken together, these findings suggest that FVA is relevant for decision making, though its impact may depend on context. In situations of direct provocation, as in the fight scenario, sustained visual attention to the conflict is associated with heightened anger, stronger benefit appraisals, and greater willingness to act. In scenarios involving harm to others, such as harassment, visual attention also promotes action, but to a lesser degree, with fear emerging as a particularly important correlate. Divergent patterns depending on whether participants focus on the conflict itself or on the pub’s exit highlight that FVA does not merely reflect where individuals look—it may shape how they appraise and respond to social conflict. An important direction for future research would be to examine whether prior offending experience moderates the relationship between focal visual attention and the decision-making processes identified here. For example, such analyses could help clarify whether tendencies to visually engage with the primary action versus scanning avenues of exit reflect learned, experience-based strategies associated with offender sophistication, or instead emerge from situationally induced emotions and cognitions during the conflict.
Sensitivity Analyses
All reported results depend on our definition of eye-tracking clusters underlying the two regions of interest (ROIs) and the fixations on them. As detailed in the Methods section, these clusters were defined with reference to the content of each scenario, resulting in 24-second time intervals corresponding to the duration of the conflicts in both the fight and harassment scenarios.
To evaluate the robustness of our findings, we repeated the clustering procedure using finer-grained 10-second intervals. This analysis again produced two ROIs highly similar to the original ones, which we then used to compute fixations. Further details of this process are provided in the supplementary materials. Descriptively, fixation patterns across these ROIs followed trends consistent with the original analyses: a steady increase in attention throughout the scenario and a pronounced peak during the conflicts. As expected, the total number of fixations was lower, reflecting the reduced number of gaze points within the shorter intervals (see Figs. 13 and 14, Appendix).
Re-estimating the primary models—including both video playback and scenario attention phases—yielded a very similar pattern of effects, with only slightly smaller effect sizes. Full model results are reported in Table 6 (Appendix). Together, these findings indicate that the main results are robust and not simply an artifact of our temporal operationalization of fixations on the conflict-related ROIs.
Conclusion
Leveraging immersive 360° video scenarios and continuous eye-tracking, this study introduced Focal Visual Attention (FVA) as a quantifiable construct that connects situational inputs with the perceptual and cognitive processes underlying crime-relevant decision making. Operationalized through fixation counts within machine-learned regions of interest (ROIs), FVA varied systematically with key situational stimuli and was significantly associated with emotional arousal, perceived risks and benefits, and behavioral intentions. These results validate FVA as a process-oriented measure that can capture the perceptual foundations of judgment and choice. The discussion that follows outlines the broader implications of these findings for criminological theory, experimental methods, and future research.
The core contribution of this study lies in providing a methodological solution to persistent limitations in experimental criminology, which has long relied on retrospective self-reports or static vignettes to infer in-the-moment decision processes. By combining immersive 360° video with real-time eye-tracking, we developed a scalable, high-resolution approach for directly observing how individuals visually attend to unfolding crime-relevant events. This design captures moment-to-moment fluctuations in gaze and engagement as participants process dynamic situational cues—offering a real-time window into how attention and emotion shift before, during, and after the focal event. Despite having complete freedom to view any element of the environment, participants concentrated their attention precisely where intended—on provocations and victimizations—underscoring the ecological and construct validity of the FVA framework.
Beyond verifying perceptual compliance, this approach enhances construct validity by aligning measurement with theoretical processes of interest and strengthens ecological fidelity by replicating real-world complexity under controlled conditions (Exum and Bouffard 2010; Herman et al. 2024). Integrating continuous behavioral data within immersive scenarios bridges a long-standing divide between laboratory precision and naturalistic realism in criminological research. Notably, the emergence of the exit ROI across both conflict scenarios illustrates this potential. A substantial subset of participants directed their attention toward the barroom exit during escalation—a realistic but subtle response that would be invisible in traditional survey measures. This attentional shift may reflect latent “choice sets” in which individuals actively evaluate a range of behavioral alternatives, including avoidance or withdrawal, rather than merely deciding to offend or not offend (Kijowski & Wilson, 2023; Bouffard et al., 2025). Observing such micro-level attentional dynamics challenges the common dichotomy of “crime” versus “non-crime” and suggests that decision making in criminogenic contexts may unfold across a more complex spectrum of action alternatives. Future work should continue leveraging physiological measures such as FVA to unpack these nuanced choice architectures that remain hidden in conventional self-report instruments.
Together, these findings mark an important first step toward integrating visual attention into the study of crime decision making. Moving forward, researchers could extend this framework to immersive scenarios that incorporate subtler environmental cues, such as signs of disorder, guardianship, or collective efficacy (McClanahan et al., 2024; Nee et al. 2019; van Sintemaartensdijk et al. 2021), to test whether FVA similarly captures the perceptual underpinnings of decision making in less emotionally charged contexts.
Beyond linking situational elements with visual attention, our findings also contribute to enduring discussions in crime decision-making research by illuminating individual differences in perceptual processing. Participants with prior arrest or violence experiences displayed distinct patterns of visual attention when viewing crime-relevant scenarios. These differences—such as heightened focus on victims or avoidance of authority-related cues—suggest that FVA may serve as a perceptual mechanism through which prior justice system contact shapes how individuals interpret situations and form behavioral intentions (McClanahan et al. 2025; Topalli 2005; Wright et al., 1996). While the present analyses centered on experiential factors, future work could examine a wider array of influences that may drive idiosyncratic variation in attentional allocation. These include personal biases and cognitive frames (Pogarsky et al. 2017; Pickett, 2018), stable personality traits (van Gelder & de Vries, 2014), physiological dispositions (Armstrong & Boutwell, 2012), and social relationships and situational interdependencies (Thomas, Nguyen, & Jackson 2023). Expanding attention-based research to incorporate these dimensions will allow scholars to more precisely model the intersection of individual, situational, and structural forces that govern how people perceive and act within criminogenic contexts. In doing so, researchers will be able to further disentangling the different factors that may jointly shape attention and emotional and cognitive responses (e.g., trait-emotionality) during conflicts and criminal opportunities.
This study also provides the first empirical demonstration linking observed FVA with emotional experiences and subjective evaluations in immersive, crime-relevant scenarios. By connecting dynamic gaze behavior to self-reported emotions and cognitive appraisals, our findings offer an initial window into how visual attention aligns with participants’ affective and evaluative engagement during interpersonal conflicts. Specifically, we show that FVA varies with key emotional states—such as anger and fear—and that these patterns correspond with differences in perceived risks, benefits, and behavioral intentions. Thus, this study situates FVA within a broader movement to integrate affective mechanisms into decision-making models of crime, a topic that has received growing theoretical and empirical attention (Barnum and Solomon 2019; Carmichael and Piquero 2004; Exum 2002; van Gelder et al. 2019; van Gelder 2023).
At the same time, it is important to emphasize that the present analyses are descriptive rather than causal. Although associations between FVA, emotional arousal, and cognitive appraisals were consistent and theoretically coherent, this design was not intended to disentangle the temporal sequencing of these processes. Research in psychology and neuroscience suggests that, in some contexts, attentional processes can motivate choice with minimal cognitive mediation (Shimojo et al. 2003; Zizlsperger et al. 2012), while in others, attention is guided by preexisting emotional or motivational states (Ferrer et al. 2016). Future work should therefore move beyond correlational analyses to experimentally manipulate visual salience or attentional focus to test causal pathways. Establishing the temporal flow among perception, affect, and appraisal will be crucial for developing fully process-based theories of crime decision making and clarifying when attention functions as a precursor, mediator, or outcome of emotional and cognitive mechanisms.
To strengthen future tests of these relationships, researchers should consider integrating psychophysiological indicators—such as heart rate variability or pupil dilation—alongside self-report measures. These physiological indices can increase measurement precision and help identify when emotional arousal peaks during evolving events. This approach would be especially useful for designs, like ours, where key decision constructs are measured immediately after the scenario. While our use of in-VR surveys mitigates issues of emotional forecasting by capturing affect in the moment, it cannot precisely link reported experiences to specific situational stimuli. The inclusion of continuous physiological measurement could further clarify the timing and intensity of affective responses and illuminate how emotion and FVA jointly shape decision processes during unfolding conflicts.
Beyond its theoretical contributions to decision-making models, this study broadens the behavioral measurement toolkit in criminology by demonstrating how FVA can capture meaningful variation in perceptual processing during crime-relevant events. As a real-time, non-intrusive indicator of where individuals direct their gaze, FVA provides continuous behavioral data on what people perceive as salient, threatening, or safe. Importantly, the logic of eye-tracking and FVA extends beyond virtual-reality laboratories. Webcam- and screen-based eye-tracking now enable comparable analyses in online experiments and field settings, allowing researchers to examine how participants allocate attention across theoretically relevant cues using standard video stimuli. As these technologies become more accessible, they offer scalable tools for process-based criminological research—bridging cognitive, affective, and environmental approaches across domains of crime, justice, and public safety.
The implications of this work extend well beyond the study of offender decision making. In policing and tactical research, immersive eye-tracking can quantify where and for how long officers fixate on potential threats, bystanders, or environmental features under realistic yet controlled conditions—revealing attentional biases and informing precision-based training in situational awareness and threat discrimination (Heusler and Sutter 2022; Huhta et al. 2022). In situational crime prevention, FVA can uncover how individuals visually navigate and prioritize environmental features such as exits, surveillance signage, or obstructed views (Nee et al. 2019). These gaze patterns reflect perceived opportunity and vulnerability and can inform the design of spaces that heighten perceived risk and reduce criminal opportunity. Similarly, in fear-of-crime and community safety research, FVA provides a behavioral lens into which cues—such as darkness, isolation, or social disorder—draw attention and elicit fear or avoidance responses. By revealing how environmental features capture attention and shape perceptions of safety, FVA-based approaches can support more empirically grounded urban design, lighting policy, and public communication strategies (Crosby and Hermens 2019; McClanahan et al. 2025).
Finally, while the current study advances eye-tracking research in criminology, several methodological considerations warrant attention. Unlike many laboratory-based eye-tracking studies, we could not implement a priori regions of interest (ROIs) due to the dynamic nature of the immersive 360° environments. Instead, ROIs were derived a posteriori using a machine-learning clustering approach, which—although flexible and scalable—can be susceptible to algorithmic bias and data noise (Raschke et al. 2014). Parameter tuning and interpretability may vary across scenarios, introducing potential uncertainty into ROI boundaries. Future work could address this challenge by employing hybrid approaches that combine data-driven clustering with theoretically guided ROI definitions or by constraining visual fields in certain contexts. Each alternative, however, carries trade-offs: limiting participants’ vision sacrifices ecological realism, while predefined ROIs require prior knowledge of gaze patterns in similar virtual settings. Additionally, fixation counts represent a relatively coarse measure of visual engagement. Incorporating finer-grained gaze metrics—such as dwell time, scan-path length, or gaze entropy—would provide a more nuanced understanding of attentional dynamics and visual search strategies (Rahal and Fiedler 2019). Addressing these refinements will enhance precision and reproducibility as immersive methods become more widely adopted in experimental criminology.
This study offers a blueprint for capturing how people see and interpret crime-relevant situations. By introducing a replicable framework for quantifying FVA in immersive environments, we demonstrate how emotion, attention, and cognition can be integrated within a single high-fidelity experimental design to illuminate the perceptual roots of criminal decision-making. In doing so, this work moves beyond traditional accounts of rational choice and deterrence to provide a process-based foundation for understanding how situational features become psychologically meaningful in real time. The next step for this line of research is to establish the temporal and causal ordering of these interrelated processes—clarifying when attention precedes, mediates, or follows affective and cognitive responses. Advancing in this direction holds promise not only for refining decision-making theory in criminology but also for informing more precise, perception-driven approaches to crime prevention and policy.
Notes
Written vignettes can also present similar details to readers. However, in a typical 150-word scenario, space for details about key variables is limited. Unnecessary or verbose detail not only distracts readers from the purpose of the vignette and therefore the study, but it likely limits affective experiences by invoking cognitive efforts related to reading comprehension.
It is important to acknowledge that visual input is not the only sensory channel relevant to decision-making. Verbal cues, for instance, rely on auditory perception, which can also shape how individuals interpret and respond to a situation. To account for this, we incorporated aural cues into our virtual environments. However, the present study focuses on visual attention, as it is directly measurable via eye-tracking and, unlike hearing, represents an active sense—one that individuals can consciously direct toward or away from particular stimuli (Goldstein and Cacciamani 2022).
By “inferring treatment uptake from assignment rather than perception,” we refer to the common practice in randomized experiments of treating condition assignment as if it guarantees that participants perceived the manipulated cue as intended. This conflates assignment with perceptual compliance and can dilute estimated effects when some participants never register the relevant stimulus. See work on manipulation/attention checks and construct validation and broader discussions of experimental noncompliance (Dafoe et al., 2018; Ejelöv & Luke, 2020; Gaines, Kuklinski, & Quirk, 2007).
Consider a barroom scenario where a provocateur makes a sudden hand movement, FVA would indicate (i) whether the observer fixated on the hand (detection), (ii) how long attention remained there (dwell time), and (iii) the sequence in which gaze shifted across elements of the scene (scan-path; e.g., hand → victim’s face → exit). This pattern of looking may reveal nuances in mental shortcuts (heuristics), enduring biases, subtle pre-exposures that heighten concept accessibility (e.g., priming), and situational properties (e.g., crowd density, distance to exits) during real-time decision making.
While emotions and FVA likely influence one another in rapid feedback loops, the present study treats FVA as an observed behavioral correlate rather than a causal intermediary between emotion and judgment.
The preregistration can be accessed here: https://osf.io/ejbsu/?view_only=dcfa5d6d3fb74ce1a427d2d6e2100a2e.
These initial interactions were designed to evoke varying levels of anger and arousal. Though participants were randomly assigned to one condition per scenario for the current data collection, the experimental element is outside of our current scope. We are focused on whether visual attention to the conflict shapes decision processes. Instead, these initial interactions serve to enhance the realism of the scenario and create more opportunities for participants to engage with the environment. We do control for scenario randomization in the multivariate analyses. For more information on the different segments of this phase, see Herman et al. (2024).
Participants could not interact with the actors, but the use of high-fidelity 360-degree video maximized visual and social realism compared with computer-generated avatars. This deliberate trade-off—reducing interactivity to enhance ecological and emotional validity—was intended to optimize the authenticity of participants’ affective and attentional responses.
To assure that our findings are not idiosyncratic to the selected intervals, we conduct sensitivity analyses with different intervals, as described in the Results section below.
Eye-tracking data were captured as individual.txt files for each participant by the HTC Vive Pro Eye headset. We merged these files into a single dataset and conducted standard preprocessing, including renaming and recoding variables. Detailed preprocessing procedures are documented in the study’s online supplementary materials (see https://osf.io/vzc4w/?view_only=1a91d50812c349a59b105ce763570c40).
When applying mean-shift clustering to two-dimensional eye-tracking data, the bandwidth parameter remains as the sole estimated parameter. Conceptually, smaller bandwidths result in fine-grained regions of interest (ROIs), while larger bandwidths smooth over local variations, producing broader ROIs. Several heuristics exist for estimating this bandwidth, such as Silverman’s and Scott’s rules (Scott 2015; Silverman 2018). However, we opted for a data-driven approach, benchmarking various bandwidth estimates as proposed by Comaniciu and Meer (2002).
Note that these variables initially had five levels. After inspection of their distribution, we decided to collapse the categorical variables on experiences with violence or arrest to dichotomous ones. This was due to the small number of participants in our sample who had chosen those categories. Readers can refer to Table 5 in the Appendix for the original item levels. The item distributions can be found in the supplementary materials.
Emotions are measured in random order. We also include three “filler” emotions: bored, carefree, and excited. These, however, are not the focus of the current study. Though satisfactory, the internal consistencies of the anger and fear indices are slightly lower than in prior work (Barnum and Solomon 2019; van Gelder et al. 2022). This may be due to translation of the original English items to German for this study. However, we chose to use similar items for replication purposes. Importantly, the findings presented on our subjective emotions measure are largely consistent with prior work using different geographic and sociodemographic samples.
We employed the scikit-learn library in Python (Pedregosa et al., 2011) to perform mean-shift clustering. To determine the optimal bandwidth (h), we utilized the estimate_bandwidth function, which computes the bandwidth based on pairwise distances between data points. Specifically, we selected a quantile (q) of the pairwise distance distribution, where the bandwidth corresponds to the distance at the q-th percentile. We evaluated multiple mean-shift models across quantiles ranging from 0.1 to 0.9 (with 0.5 representing the median), as well as their corresponding bandwidths (Comaniciu and Meer 2002). To assess model quality, we employed silhouette scores, which measure how similar each point is to its own cluster compared to other clusters (Rousseeuw 1987). The optimal bandwidth was selected based on the highest average silhouette score. To ensure robustness, we excluded solutions resulting in fewer than two clusters. Given the computational complexity of these operations (Huang et al. 2019), we leveraged parallel processing using the joblib library (Joblib Development Team 2024), enabling efficient utilization of multiple processor cores on our system. This allowed us to conduct all computations on a commercial laptop and without specialized hardware. Readers can refer to the supplementary materials for further details.
The spatial threshold used to determine whether participants’ gaze remained within a single location was set to 0.1% of the screen area. Given the HTC Vive Pro headset’s resolution of 4896 × 2448 pixels, this corresponds to approximately a five-pixel shift horizontally and a two-pixel shift vertically. Using this threshold, 17% of gaze points in the fight scenario and 17.5% in the harassment scenario were excluded, as they did not meet the minimum fixation criteria.
Note that variation in this metric may be influenced by individual eye characteristics, including blinking rate, dryness, vision impairments, and forms of ametropia. To mitigate these effects, we adjusted the video focus and headset lens positioning for each participant based on their interpupillary distance to ensure optimal visibility. Additionally, all participants confirmed they had not consumed alcohol or drugs prior to the study.
We would like to thank an anonymous reviewer for highlighting the theoretical relevance of this finding for future research.
To examine the robustness of our findings, we repeat the below analyses twice with either the 75th or 25th percentiles as alternative cut-off values for categorizing our data. Readers can refer to the supplementary materials for the details of these analyses.
References
Aliyari Ghassabeh Y (2015) A sufficient condition for the convergence of the mean shift algorithm with Gaussian kernel. J Multivar Anal 135:1–10. https://doi.org/10.1016/j.jmva.2014.11.009
Anwar S, Loughran TA (2011) Testing a bayesian learning theory of deterrence among serious juvenile offenders*. Criminology 49(3):667–698. https://doi.org/10.1111/j.1745-9125.2011.00233.x
Apel R (2013) Sanctions, Perceptions, and Crime: Implications for Criminal Deterrence. Journal of Quantitative Criminology 29(1):67-101. https://doi.org/10.1007/s10940-012-9170-1
Apel R (2022) Sanctions, perceptions, and crime. Annu Rev Criminol 5:205-227. https://doi.org/10.1146/annurev-criminol-030920-112932
Armstrong TA, Boutwell BB (2012) Low Resting Heart Rate and Rational Choice: Integrating Biological Correlates of Crime in Criminological Theories. Journal of Criminal Justice 40(1):31-39. https://doi.org/10.1016/j.jcrimjus.2011.11.001
Barnum TC, Herman S, van Gelder J-L, Ribeaud D, Eisner M, Nagin DS (2024) R eactive guardianship: Who intervenes? How? And why?. Criminology 62(3):587-618. https://doi.org/10.1111/1745-9125.12380
Barnum TC, Nagin DS, Pogarsky G (2021) Sanction risk perceptions, coherence, and deterrence*. Criminology 59(2):195–223. https://doi.org/10.1111/1745-9125.12266
Barnum TC, Solomon SJ (2019) Fight or flight: integral emotions and violent intentions. Criminology 57(4):659–686. https://doi.org/10.1111/1745-9125.12222
Becker GS (1968) Crime and punishment: an economic approach. J Polit Econ 76(2):169–217
Bouffard JA, Miller HA (2014) The role of sexual arousal and overperception of sexual intent within the decision to engage in sexual coercion. J Interpers Violence 29(11):1967–1986. https://doi.org/10.1177/0886260513515950
Bouffard JA, Niebuhr N, Exum ML (2025) Incorporating subjectively-derived behavioral responses into traditional tests of criminal decision-making: a research note. Criminal Justice Studies 38(3):304-323. https://doi.org/10.1080/1478601X.2025.2531751
Bradley MM (2009) Natural selective attention: orienting and emotion. Psychophysiology 46(1):1–11. https://doi.org/10.1111/j.1469-8986.2008.00702.x
Bucci R (2024) Addressing the ‘Dirty little secret’ of deterrence: testing the effects of increased police presence on perceptions of arrest risk. J Quant Criminol 40(2):311–342. https://doi.org/10.1007/s10940-023-09570-3
Carmichael S, Piquero AR (2004) Sanctions, perceived anger, and criminal offending. J Quant Criminol 20(4):371–393. https://doi.org/10.1007/s10940-004-5869-y
Cheng Y (1995) Mean shift, mode seeking, and clustering. IEEE Trans Pattern Anal Mach Intell 17(8):790–799. https://doi.org/10.1109/34.400568
Che rbonneau M, Jacobs BA (2019) Imminent Capture and Noncompliance: Probing Deterrence in Extreme Environments. Justice Quarterly 36(6): 1122-1143. https://doi.org/10.1080/07418825.2018.1476577
Cios KJ, Swiniarski RW, Pedrycz W, Kurgan LA (2007) Unsupervised Learning: Clustering. Data Mining. Springer US, pp 257–288. https://doi.org/10.1007/978-0-387-36795-8_9
Clarke RV, Cornish DB (1985) Modeling Offenders’ Decisions: A Framework for Research and Policy. Crime Opportunity Theories. Routledge
Collins R (2008) Violence: A Micro-sociological Theory. Princeton University Press
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619. https://doi.org/10.1109/34.1000236
Cornish DB, Clarke RV (1986) The reasoning criminal: rational choice perspectives on offending. Transaction Publishers
Crosby F, Hermens F (2019) Does it look safe? An eye tracking study into the visual aspects of fear of crime. Q J Exp Psychol 72(3):599–615. https://doi.org/10.1177/1747021818769203
Dafoe A, Zhang B, Caughey D (2018) Information Equivalence in Survey Experiments. Political Analysis 26(4):399-416. https://doi.org/10.1017/pan.2018.9
Diniz Bernardo P, Bains A, Westwood S, Mograbi DC (2021) Mood induction using virtual reality: a systematic review of recent findings. J Technol Behav Sci 6(1):3–24. https://doi.org/10.1007/s41347-020-00152-9
Ejelöv E, Luke TJ (2020) “Rarely safe to assume”: Evaluating the use and interpretation of manipulation checks in experimental social psychology. Journal of Experimental Social Psychology 87: 103937 https://doi.org/10.1016/j.jesp.2019.103937
Ester M, Kriegel H-P, Xu X (1996) A Density-Based algorithm for discovering clusters in large Spatial databases with noise. University of Munich
Exum ML (2002) The application and robustness of the rational choice perspective in the study of intoxicated and angry intentions to aggress. Criminology 40(4):933–966. https://doi.org/10.1111/j.1745-9125.2002.tb00978.x
Exum ML, Bouffard JA (2010) Testing Theories of Criminal Decision Making: Some Empirical Questions about Hypothetical Scenarios. In: Piquero AR, Weisburd D (eds) Handbook of Quantitative Criminology. Springer, pp 581–594. https://doi.org/10.1007/978-0-387-77650-7_28
Fashing M, Tomasi C (2005) Mean shift is a bound optimization. IEEE Trans Pattern Anal Mach Intell 27(3):471–474. https://doi.org/10.1109/TPAMI.2005.59
Ferrer RA, Stanley JT, Graff K, Klein WMP, Goodman N, Nelson WL, Salazar S (2016) The effect of emotion on visual attention to information and decision making in the context of informed consent process for clinical trials. J Behav Decis Mak 29(2–3):245��253. https://doi.org/10.1002/bdm.1871
Finucane AM (2011) The effect of fear and anger on selective attention. Emotion 11(4):970–974. https://doi.org/10.1037/a0022574
Ford BQ, Tamir M, Brunyé TT, Shirer WR, Mahoney CR, Taylor HA (2010) Keeping your eyes on the prize: anger and visual attention to threats and rewards. Psychol Sci 21(8):1098–1105. https://doi.org/10.1177/0956797610375450
Fuhl W, Kübler TC, Santini T, Kasneci E (2018) Automatic Generation of Saliency-based Areas of Interest for the Visualization and Analysis of Eye-tracking Data. VMV, 47–54. https://www.hci.uni-tuebingen.de/assets/pdf/publications/AGAS2018.pdf
Fukunaga K, Hostetler L (1975) The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inf Theory 21(1):32–40. https://doi.org/10.1109/TIT.1975.1055330
Geerken MR, Gove WR (1975) Deterrence: some theoretical considerations. Law Soc Rev 9(3):497–513. https://doi.org/10.2307/3053169
Gaines BJ, Kuklinski JH, Quirk PJ (2007) The Logic of the Survey Experiment Reexamined. Political Analysis 15(1):1-20. https://doi.org/10.1093/pan/mpl008
Ghahramani Z (2004) Unsupervised Learning. In: Bousquet O, von Luxburg U, Rätsch G (eds) Advanced Lectures on Machine Learning: ML Summer Schools 2003, Canberra, Australia, February 2—14, 2003, Tübingen, Germany, August 4—16, 2003, Revised Lectures. Springer, pp 72–112. https://doi.org/10.1007/978-3-540-28650-9_5
Gibbs JP (1968) Crime, punishment, and deterrence. Southwest Soc Sci Q 48(4):515–530
Goldstein EB, Cacciamani L (2022) Sensation and perception (Eleventh edition, student edition). Cengage
Herman S, Barnum TC, Minà PE, Wozniak P, van Gelder J-L (2024) Affect, emotions, and crime decision-making: emerging insights from immersive 360° video experiments. J Exp Criminol. https://doi.org/10.1007/s11292-024-09615-y
Herman S, Pogarsky G (2025) Moral Salience, Situational Moral Evaluations, and Criminal Choice. Crime & Delinquency 71(9):3157-3191. https://doi.org/10.1177/00111287241259474
Heusler B, Sutter C (2022) Gaze control in law enforcement: comparing a tactical Police unit to patrol officers. J Police Crim Psychol 37(4):777–793. https://doi.org/10.1007/s11896-020-09412-z
Hoeben EM, Thomas KJ (2019) Peers and offender decision-making. Criminology & Public Policy 18(4):759-784. https://doi.org/10.1111/1745-9133.12462
Hooge ITC, Niehorster DC, Nyström M, Andersson R, Hessels RS (2022) Fixation classification: how to merge and select fixation candidates. Behav Res Methods 54(6):2765–2776. https://doi.org/10.3758/s13428-021-01723-1
Horstmann N, Ahlgrimm A, Glöckner A (2009) How distinct are intuition and deliberation? An eye-tracking analysis of instruction-induced decision modes. Judgm Decis Mak 4(5):335–354. https://doi.org/10.1017/S1930297500001182
Huang F, Chen Y, Li L, Zhou J, Tao J, Tan X, Fan G (2019) Implementation of the parallel mean shift-based image segmentation algorithm on a GPU cluster. Int J Digit Earth 12(3):328–353. https://doi.org/10.1080/17538947.2018.1432709
Huhta J-M, Di Nota PM, Surakka V, Isokoski P, Ropo E (2022) Experience-dependent effects to situational awareness in police officers: an eye tracking study. Int J Environ Res Public Health 19(9):9. https://doi.org/10.3390/ijerph19095047
Inhoff AW, Radach R (1998) Chapter 2—Definition and Computation of Oculomotor Measures in the Study of Cognitive Processes. In: Underwood G (ed) Eye Guidance in Reading and Scene Perception. Elsevier Science Ltd, pp 29–53. https://doi.org/10.1016/B978-008043361-5/50003-1
Ivanová L, Laco M, Benesova W (2022) Unsupervised clustering-based analysis of the measured eye-tracking data. Fourteenth Int Conf Mach Vis (ICMV 2021) 12084:53–60. https://doi.org/10.1117/12.2623035
Jacobs BA, Cherbonneau M (2019) Carjacking and the management of natural surveillance. Journal of Criminal Justice 61:40-47. https://doi.org/10.1016/j.jcrimjus.2019.01.002
Jacobs BA, Topalli V, Wright R (2000) Managing retaliation: drug robbery and informal sanction threats. Criminology 38(1):171-198 https://doi.org/10.1111/j.1745-9125.2000.tb00887.x
Joblib Development Team (2024) Joblib: Running Python functions as pipeline jobs [Computer software]. https://joblib.readthedocs.io/
Krajbich I, Armel C, Rangel A (2010) Visual fixations and the computation and comparison of value in simple choice. Nature Neuroscience 13(10):1292-1298. https://doi.org/10.1038/nn.2635
Katz J (1988) Seductions of crime: Moral and sensual attractions in doing evil. https://philpapers.org/rec/KATSOC-3/1000
Kijowski MC, Wilson T (2023) Integrating subjectively-derived choice sets to expand offender decision-making. Journal of Crime and Justice 46(1):24-43 https://doi.org/10.1080/0735648X.2022.2062035
Klepper S, Nagin D (1989) The deterrent effect of perceived certainty and severity of punishment revisited. Criminology 27(4):721-746. https://doi.org/10.1111/j.1745-9125.1989.tb01052.x
Kluyver T, Ragan-Kelley B, Granger B, Bussonnier M, Frederic J, Kelley K, Hamrick J, Grout J, Corlay S, Ivanov P, Avila D, Abdalla S, Willing C, Team JD (2016) Jupyter Notebooks – a publishing format for reproducible computational workflows. Positioning and Power in Academic Publishing: Players, Agents and Agendas. IOS, pp 87–90. https://doi.org/10.3233/978-1-61499-649-1-87
Lim S-L, O'Doherty JP, Rangel A (2011)The Decision Value Computations in the vmPFC and Striatum Use a Relative Value Code That is Guided by Visual Attention. Journal of Neuroscience 31(37):13214-13223. https://doi.org/10.1523/JNEUROSCI.1246-11.2011
Liu J, Chi J, Yang Z (2024) A review on personal calibration issues for video-oculographic-based gaze tracking. Front Psychol 15:1309047. https://doi.org/10.3389/fpsyg.2024.1309047
Loewenstein G (1996) Out of control: visceral influences on behavior. Organ Behav Hum Decis Process 65(3):272–292. https://doi.org/10.1006/obhd.1996.0028
Loughran TA, Paternoster R, Chalfin A, Wilson T (2016) Can rational choice be considered a general theory of crime? Evidence from individual-level panel data. Criminology 54(1):86–112. https://doi.org/10.1111/1745-9125.12097
Matsueda RL, Kreager DA, Huizinga D (2006) Deterring delinquents: a rational choice model of theft and violence. Am Sociol Rev 71(1):95–122. https://doi.org/10.1177/000312240607100105
McClanahan WP, Nagin DS, Otte M, Wozniak P, Van Gelder J (2025) How environmental features and perceptions influence the perceived risks and rewards of criminal opportunities. Criminology. https://doi.org/10.1111/1745-9125.12401
McClanahan WP, Sergiou CS, Siezenga AM, Gerstner D, Elffers H et al (2024)Neighborhood crime reduction interventions and perceived livability: A virtual reality study on fear of crime. Cities 147:104823. https://doi.org/10.1016/j.cities.2024.104823
Nagin DS (2007) Moving choice to center stage in criminological research and theory. Criminology 45:259
Nagin DS, Solow RM, Lum C (2015) Deterrence, criminal opportunities, and police. Criminology 53(11):74-100 https://doi.org/10.1111/1745-9125.12057
Nee C (2024) The impact of emotion on offender decision-making: advancing our understanding through virtual re-enactment. Psychol Crime Law 0(0):1–20. https://doi.org/10.1080/1068316X.2024.2305205
Nee C, van Gelder J-L, Otte M, Vernham Z, Meenaghan A (2019) Learning on the job: studying expertise in residential burglars using virtual environments. Criminology 57(3):481–511. https://doi.org/10.1111/1745-9125.12210
Nivette A, Nägel C, Stan A (2024) The use of experimental vignettes in studying police procedural justice: a systematic review. Journal of Experimental Criminology 20(1):151-186 https://doi.org/10.1007/s11292-022-09529-7
Orquin JL, Lahm ES, Stojić H (2021) The visual environment and attention in decision making. Psychol Bull 147(6):597–617. https://doi.org/10.1037/bul0000328
Orquin JL, Mueller Loose S (2013) Attention and choice: a review on eye movements in decision making. Acta Psychol 144(1):190–206. https://doi.org/10.1016/j.actpsy.2013.06.003
Paternoster R, Saltzman LE, Waldo GP, Chiricos TG (1983) Perceived Risk and Social Control: Do Sanctions Really Deter?. Law & Society Review 17(3):457-479. https://doi.org/10.2307/3053589
Pedregosa F (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825
Pickett JT (2018) Using Behavioral Economics to Advance Deterrence Research and Improve Crime Policy: Some Illustrative Experiments. Crime & Delinquency 64(12):1636-1659. https://doi.org/10.1177/0011128718763136
Pickett JT, Roche SP, Pogarsky G (2018) Toward a bifurcated theory of emotional deterrence. Criminology 56(1):27–58. https://doi.org/10.1111/1745-9125.12153
Piliavin I, Gartner R, Thornton C, Matsueda RL (1986) American Sociological Review 51(1):101-19. https://www.jstor.org/stable/2095480
Pogarsky G (2004) Projected offending and contemporaneous rule-violation: implications for heterotypic continuity*. Criminology 42(1):111–136. https://doi.org/10.1111/j.1745-9125.2004.tb00515.x
Pogarsky G, Roche SP, Pickett JT (2017) Heuristics and Biases, rational Choice, and sanction perceptions. Criminology 55(1):85–111. https://doi.org/10.1111/1745-9125.12129
Pogarsky G, Roche SP, Pickett JT (2018) Offender decision-making in criminology: contributions from behavioral economics. Annu Rev Criminol 1(1):379–400. https://doi.org/10.1146/annurev-criminol-032317-092036
Posit Team (2025) RStudio: integrated development environment for R. Posit software. PBC. [Computer software]
Rahal R-M, Fiedler S (2019) Understanding cognitive and affective mechanisms in social psychology through eye-tracking. J Exp Soc Psychol 85:103842. https://doi.org/10.1016/j.jesp.2019.103842
Raschke M, Blascheck T, Burch M (2014) Visual Analysis of Eye Tracking Data. In: Huang W (ed) Handbook of Human Centric Visualization. Springer, pp 391–409. https://doi.org/10.1007/978-1-4614-7485-2_15
Rayner K (1998) Eye movements in reading and information processing: 20 years of research. Psychol Bull 124(3):372–422. https://doi.org/10.1037/0033-2909.124.3.372
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7
Ruggiero G, Frassinetti F, Coello Y, Rapuano M, di Cola AS, Iachini T (2017) The effect of facial expressions on peripersonal and interpersonal spaces. Psychol Res 81(6):1232–1240. https://doi.org/10.1007/s00426-016-0806-x
Satmarean TS, Milne E, Rowe R (2022) Working memory guidance of visual attention to threat in offenders. PLoS One 17(1):e0261882. https://doi.org/10.1371/journal.pone.0261882
Scott DW (2015) Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons
Sergiou C-S, Elffers H, Van Gelder J-L (2024) Waar letten inbrekers op?: Hoe observeren van inbrekers in een virtual reality omgeving en hardop-denken rapportage ons inzicht kan verdiepen in waar inbrekers op letten. Tijdschrift voor Criminologie 66(2):152–172. https://doi.org/10.5553/TvC/0165182X2024066002003
Shimojo S, Simion C, Shimojo E, Scheier C (2003) Gaze bias both reflects and influences preference. Nat Neurosci 6(12):12. https://doi.org/10.1038/nn1150
Silverman BW (2018) Density Estimation for Statistics and Data Analysis. Routledge. https://doi.org/10.1201/9781315140919
Stafford MARKC, Warr M (1993) A reconceptualization of general and specific deterrence. J Res Crime Delinq 30(2):123–135. https://doi.org/10.1177/0022427893030002001
Stafford MC, Warr M (1993) A Reconceptualization of General and Specific Deterrence. Journal of Research in Crime and Delinquency 30(2):123-135. https://doi.org/10.1177/0022427893030002001
Thomas KJ, Baumer EP , Loughran TA (2022) Structural predictors of choice: Testing a multilevel rational choice theory of crime. Criminology 60(4):606-636. https://doi.org/10.1111/1745-9125.12314
Thomas KJ, Nguyen H, Jackson EP (2023) Value orientations, life transitions, and desistance: Assessing competing perspectives. Criminology 61(1):103-131. https://doi.org/10.1111/1745-9125.12325
Topalli V (2005) Criminal expertise and offender decision-making: an experimental analysis of how offenders and non-offenders differentially perceive social stimuli. Br J Criminol 45:269
Troop-Gordon W, Gordon RD, Schwandt BM, Horvath GA, Lee EE, Visconti KJ (2019) Allocation of attention to scenes of peer harassment: visual–cognitive moderators of the link between peer victimization and aggression. Dev Psychopathol 31(2):525–540. https://doi.org/10.1017/S0954579418000068
van Gelder J-L (2013) Beyond rational choice: the hot/cool perspective of criminal decision making. Psychol Crime Law 19(9):745–763. https://doi.org/10.1080/1068316X.2012.660153
van Gelder J-L (2023) Virtual reality for criminologists: a road map. Crime Justice. https://doi.org/10.1086/726691
van Gelder J-L, Barnum TC, Herman S, Wozniak P (2024) The MAXLab aggression and bystander intervention scenario set (MAXLab_ABISS): A modular scenario set for studying decision making in situations of interpersonal violence in virtual reality. J Experimental Criminol. https://doi.org/10.1007/s11292-024-09645-6
van Gelder J-L, de Vries R (2014) Rational Misbehavior? Evaluating an Integrated Dual-Process Model of Criminal Decision Making. Journal of Quantitative Criminology 30(1):1-27. https://doi.org/10.1007/s10940-012-9192-8
van Gelder J-L, De Vries RE, Demetriou A, Van Sintemaartensdijk I, Donker T (2019) The virtual reality scenario method: moving from imagination to immersion in criminal decision-making research. J Res Crime Delinq 56(3):451–480. https://doi.org/10.1177/0022427818819696
van Gelder J-L, de Vries RE, van Sintemaartensdijk I, Donker T (2022) Personality pathways to aggression: testing a trait-state model using immersive technology. Criminology 60(3):406–428. https://doi.org/10.1111/1745-9125.12305
van Gelder J-L, Mertens E, Nagin D, Siezenga A, Gerstner D, Webb M et al (2025) Virtual reality: What is it and should criminologists pay attention?. The Criminologist 51(4): 14-18.
van Gelder J-L, Nee C, Otte M, Demetriou A, van SintemaartensdijkI, van Prooijen JW (2017) Virtual burglary: Exploring the potential of virtual reality to study burglary. Journal of Research on Crime and Delinquency 54(1):29-62 https://doi.org/10.1177/0022427816663997
van Renswoude DR, Raijmakers MEJ, Koornneef A, Johnson SP, Hunnius S, Visser I (2018) Gazepath: an eye-tracking analysis tool that accounts for individual differences and data quality. Behav Res Methods 50(2):834–852. https://doi.org/10.3758/s13428-017-0909-3
van Sintemaartensdijk I, Van Gelder JL, Van Prooijen JW, Nee C, Otte M, Van Lange P (2021) Mere presence of informal guardians deters burglars: a virtual reality study. J Exp Criminol 17(4):657–676. https://doi.org/10.1007/s11292-020-09430-1
Vella F, Infantino I, Scardino G (2017) Person identification through entropy oriented mean shift clustering of human gaze patterns. Multimedia Tools Appl 76(2):2289–2313. https://doi.org/10.1007/s11042-015-3153-9
Vert J-P, Tsuda K, Schölkopf B (2004) A primer on kernel methods. https://doi.org/10.7551/mitpress/4057.003.0004
Vive (2022) VIVE Pro Eye [Computer software]. HTC Vive. https://www.vive.com/de/support/vive-pro-eye/
Wedel M, Pieters R, van der Lans R (2023) Modeling eye movements during decision making: a review. Psychometrika 88(2):697–729. https://doi.org/10.1007/s11336-022-09876-4
Wright R, Logie RH, Decker SH (1996) Criminal Expertise and Offender Decision Making: An Experimental Study of the Target Selection Process in Residential Burglary. Journal of Research in Crime and Delinquency 32(1):39-53. https://doi.org/10.1177/0022427895032001002
Zizlsperger L, Sauvigny T, Haarmeier T (2012) Selective attention increases choice certainty in human decision making. PLoS One 7(7):e41136. https://doi.org/10.1371/journal.pone.0041136
Acknowledgements
We are grateful to Jessica Dietzer, Dan Nagin, and Greg Pogarsky for their insightful feedback on earlier versions of this manuscript. We also thank Justin Pickett and the anonymous reviewers for their careful and constructive comments, which greatly strengthened the paper. Any remaining errors are our own.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The project described in this work was approved by the Max-Planck-Society Ethics Council.
This manuscript is not currently under review, forthcoming, or published (in any form) in any other scholarly outlet. The authors have no competing interests to declare that are relevant to the content of this article.
There was no external financial support for the project described in this work. The authors have no relevant financial or non-financial interests to disclose.
All supplementary materials to this manuscript can be accessed with this link: https://osf.io/vzc4w/?view_only=1a91d50812c349a59b105ce763570c40. The preregistration to this study can be accessed with this link: https://osf.io/ejbsu/?view_only=dcfa5d6d3fb74ce1a427d2d6e2100a2e.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Boxplots for ratings of anger (a) and fear (b) in the fight (salmon) and harassment (light blue) scenarios. White boxes indicate the 1 st and 3rd quartile of the ratings, black lines the median ratings, red rhombi the respective average ratings, and black dots influential outliers. Whiskers represent the range of values that are not considered outliers. Colored areas depict the rating distribution
Boxplots for ratings of perceived risks (a) and benefits (b) of reacting violently to the conflict in the fight (salmon) and harassment (light blue) scenarios. White boxes indicate the 1 st and 3rd quartile of the ratings, black lines the median ratings, red rhombi the respective average ratings, and black dots influential outliers. Whiskers represent the range of values that are not considered outliers. Colored areas depict the rating distribution
Boxplots for ratings of violent intention (combined measure of readiness and likelihood of employing violence) in the fight (salmon) and harassment (light blue) scenarios. White boxes indicate the 1 st and 3rd quartile of the ratings, black lines the median ratings, red rhombi the respective average ratings, and black dots influential outliers. Whiskers represent the range of values that are not considered outliers. Colored areas depict the rating distribution
Results of using ten- versus 24-second-long time intervals for defining regions of interest. Depicted are number of fixations on the first region of interest (provocateur/harassment) throughout the fight (top) and harassment (bottom) scenarios and across the respective conditions (Condition 1 = interaction with the same person that is involved in the conflict, Condition 2 = interaction with an unrelated person, Condition 3 = no interaction) of these scenarios (left to right). Colored dotted lines represent the onset of the attention phases of the scenarios, i.e. a baseline (light blue), the initial interaction (gold), and the conflict (salmon)
Results of using ten- versus 24-second-long time intervals for defining regions of interest. Depicted are number of fixations on the second region of interest (exit) throughout the fight (top) and harassment (bottom) scenarios and across the respective conditions (Condition 1 = interaction with the same person that is involved in the conflict, Condition 2 = interaction with an unrelated person, Condition 3 = no interaction) of these scenarios (left to right). Colored dotted lines represent the onset of the attention phases of the scenarios, i.e. a baseline (light blue), the initial interaction (gold), and the conflict (salmon)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Knabe, M., Barnum, T.C., Herman, S. et al. Focal Visual Attention in Crime Decision Making: Behavioral Insights from Immersive 360° Video Eye-Tracking. J Quant Criminol (2026). https://doi.org/10.1007/s10940-026-09657-7
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1007/s10940-026-09657-7













