Abstract
As video games continue to evolve, understanding what drives player enjoyment remains a key challenge. Player reviews provide valuable insights, but their unstructured nature makes large-scale analysis difficult. This study applies generative AI and machine learning, leveraging Microsoft Phi-4 small language model (SLM) and Google Cloud, to quantify and analyze game reviews from Steam and Meta Quest stores. The approach converts qualitative feedback into structured data, enabling comprehensive evaluation of key game design elements, monetization models, and platform-specific trends. The findings reveal distinct patterns in player preferences across PC and VR games, highlighting factors that contribute to higher player enjoyment. By using Google Cloud for large-scale data storage and processing, this study establishes a scalable framework for game review analysis. The study’s insights offer actionable guidance for game developers, helping optimize game mechanics, pricing strategies, and player engagement.
1 Introduction
The global video game market size is projected to grow from USD 248.52 billion in 2023 to USD 664.96 billion by 2033, according to GlobeNewswire, registering a Compound Annual Growth Rate (CAGR) of 10.32% [1]. This indicates that the video game industry is not only expanding rapidly but also has a substantial impact on the global economy.
As the video game industry continues its rapid expansion, game reviews become essential as they assess the enjoyment factor in games, which is the core element that keeps players engaged. Reviews help identify both the strengths and weaknesses of a game, providing valuable feedback for developers and players alike. They also play a crucial role in determining a game’s success or failure, influencing players’ decisions on whether to invest their time and money in a particular title. Additionally, reviews serve as a tool for understanding what makes a game enjoyable while highlighting areas for potential improvement [2, 3]. However, gamers can be a particularly demanding audience, with many holding high expectations regarding game quality [4].
Existing research primarily focused on analyzing player feedback through text-mining and natural language processing (NLP) techniques only [5,6,7]. For example, techniques such as Latent Dirichlet Allocation (LDA), a topic modeling method, and TextBlob, an NLP library, were used to extract valuable insights from player reviews regarding their preferences, challenges, and experiences across various video games [8, 9]. While these techniques are powerful tools for large-scale analysis, they often struggle to capture the more nuanced aspects of player feedback, such as emotional complexity, sarcasm, or cultural context, thereby limiting the depth of the insights generated. As a result, they may fail to accurately interpret emotions of enjoyment toward a particular game or contextual descriptions related to specific game design elements [10, 11].
This study aims to lay the foundation for a deeper understanding of the factors that contribute to player enjoyment in video games across both personal computer (PC) and virtual reality (VR) platforms, as well as various genres. By leveraging generative AI techniques, such as automated text summarization, sentiment analysis, and topic modeling, an empirical analysis will be conducted of player reviews from Steam and Meta Quest stores. This approach enables the identification of key elements like gameplay mechanics, story and characters, level design, and monetization strategies that correlate with positive player experiences. The insights gained from this study can inform game developers and designers about what aspects resonate most with players, guiding the creation of more engaging and enjoyable games.
Additionally, it provides a data-driven basis for future research in game design, player psychology, and user experience, ultimately contributing to advancements in the gaming industry and academic studies related to interactive media. Based on the analyzed literature, the research questions are:
-
RQ1: What are the key game design features that contribute most significantly to player enjoyment in PC versus VR games?f
-
RQ2: In what ways do monetization strategies affect player enjoyment in free-to-play versus non-free-to-play games?
-
RQ3: What differences exist in player enjoyment between PC and VR game genres?
-
RQ4: How can generative AI techniques be employed to identify the primary factors influencing positive player reviews across different game genres and platforms?
Given the rapidly evolving nature of the gaming industry, this study focuses on games released between 2020 and 2024, a period marked by significant shifts in player behavior following the COVID-19 pandemic. While this temporal scope allows us to capture emerging trends, it may not fully represent longer-term patterns in gaming preferences. Moreover, user-generated reviews, while rich in insights, introduce challenges such as review bombing, where mass negative reviews may distort sentiment analysis and misrepresent player enjoyment [7]. Finally, while generative AI techniques facilitate large-scale text analysis, they may struggle with linguistic nuances like sarcasm and humor, potentially impacting sentiment classification accuracy [12].
The remainder of this paper is organized as follows. Section 2 reviews prior work on generative AI, machine learning techniques in video game review analysis, and studies on player enjoyment and monetization strategies. Section 3 describes the methodology, including data collection from Steam and Meta Quest reviews and the application of generative AI for text analysis and feature extraction. Section 4 presents the results of the statistical and machine learning analyses. Finally, Sects. 5 and 6 discuss the findings, highlight their implications for game developers, outline limitations, and suggest directions for future research.
2 Literature review
2.1 Popular video games platforms
Valve is a major player, primarily through its Steam platform, which is the largest digital distribution platform for PC games. Steam supports a wide variety of video games, and its compatibility with multiple head-mounted displays (HMDs) makes it a significant player in the personal computer virtual reality (PCVR) space [8]. PCVR refers to virtual reality (VR) experiences that rely on a personal computer for processing power. In this setup, the headset is tethered to the PC via cables and mainly serves as a display and motion-tracking device, while the intensive graphics rendering and computational tasks are performed by the PC’s high-end graphics processing unit (GPU) and central processing unit (CPU). This allows PCVR to deliver more immersive, visually detailed, and high-fidelity experiences than standalone headsets, which run applications directly on their built-in hardware [13].
It is important to note that, unlike Meta’s store, Steam does not use a numerical rating system for game reviews. Instead, players can rate a game as either ‘Recommended’ (positive review) or ‘Not Recommended’ (negative review). In contrast, Meta employs a numerical rating system ranging from 0 to 5, where 0 indicates not recommended at all, while 5 signifies highly recommended. These reviews also include a textual component where players share their personal experiences. While anyone can read these reviews, Valve Corporation provides an application programming interface (API) that enables the scraping of review data. This scraped data can then be analyzed using various methods to understand player experiences [14] or to evaluate the helpfulness of reviews.
The Meta Quest store serves as the official VR app store for Meta’s HMDs. It can be accessed both within VR and in 2D on PCs and mobile devices. The platform comprises four subcategories: Quest, Rift, Go, and GearVR. As of July 2022, the store offers a total of 3,547 VR games. Meta boasts a significant user base of over 1,490,000, supported by three primary types of HMDs: phone-powered VR (e.g., GearVR), PC VR (e.g., Rift, Rift S), and all-in-one standalone devices (e.g., Quest, Quest 2, Quest 3, Quest 3 s). Notably, the Meta Quest store is exclusive to Meta series HMDs and is not compatible with devices from other manufacturers [15].
Meta dominates the market with around 60% of VR headset distribution (Oculus Rift S, Meta Quest 2, Meta Quest 3) which is illustrated in Fig. 1. Thus, the Meta Quest store is a reliable data source due to its exclusive focus on VR content. Unlike platforms such as SteamVR or PlayStation VR, which support both PC and VR games, the Meta Quest store offers only VR-specific experiences [9].
Steam users’ share of VR headsets by device [16]
2.2 Limitations of existing research
Several studies have analyzed reviews from different digital video game distribution platforms. Pagano and Maalej found that most reviews are submitted shortly after an application’s release, with the frequency declining rapidly over time. They also noted that reviews often cover multiple topics such as feature requests, user experience, and bug reports [17]. Guzsvinecz focused on PC video game reviews using textual data from Steam. Lin and colleagues analyzed reviews of early access titles and specific top-level genres (TLGs) and genres [6], while Guzsvinecz examined the effects of playtime and game mechanics on reviews within the Souls-like subgenre [5].
While Viggiato and Bezemer in [18] leveraged Transformer models like OPT-175B and Yang Yu et al. in [19] employed Bidirectional Encoder Representations from Transformers (BERT) for sentiment analysis of game reviews, they overlooked several crucial elements that influence player enjoyment, such as gameplay mechanics, story development, level design, and the immersive qualities unique to VR games.
Another limitation is the genre-specific focus of some studies. For instance, Guzsvinecz’s focus on “Souls-like” games provided valuable insights into that particular genre like correlations between positive reviews, playtime, and game design features such as graphics and style but limited the applicability of the findings to other types of games [5].
Although monetization strategies are central to the gaming industry’s economic model and heavily influence user enjoyment, only Dayi and his team addressed monetization strategies in their analysis of Steam game reviews [6]. Their study merely noted that free-to-play video games tend to receive shorter reviews compared to non-free-to-play games in terms of text length. However, it did not explore the correlation between monetization and player enjoyment.
The focus on analyzing English-language reviews in several studies, including [4, 6, 10], and [19], limits the generalizability of findings by excluding non-English-speaking players and diverse cultural perspectives. This introduces bias by focusing on Western markets, neglecting significant gaming regions such as Asia. Additionally, the linguistic nuances and sentiment expressions unique to non-English reviews are missed, potentially affecting the accuracy of sentiment analysis. As a result, these studies do not fully capture the global gaming community’s feedback, limiting the breadth of insights into player experiences.
2.3 Summary
The comparative analysis in Table 1 further reinforces the gaps identified in previous research and underscores how our study advances the field. The references to the journals corresponding to Table 1 are: JA1: [4], JA2: [6], JA3: [5], JA4: [8], JA5: [10], JA6: [20], JA7: [7], JA8: [9], JA9: [18], JA10: [11], JA11: [19]. All these studies were published between 2019 and 2024.
While multiple studies have analyzed Steam game reviews, few have combined this dataset with Meta game reviews, which offer a different perspective on player sentiment. Our study is among the few that integrate both sources, allowing for a more comprehensive analysis that bridges platform-specific biases.
Moreover, while prior studies have explored correlations between reviews and game genres, game mechanics, or monetization strategies, these aspects have largely been examined in isolation. Table 1 highlights that most existing research tends to focus on only one or two of these factors rather than considering them holistically. Our study addresses this limitation by simultaneously examining correlations with game genres, gameplay elements—including story, mechanics, and artistic aspects—and monetization models. This multifaceted approach provides a richer understanding of how various design and economic factors shape player sentiment.
Another important distinction is our study’s adoption of generative AI, which remains largely unexplored in existing research. By leveraging generative AI, our study is able to conduct a more nuanced analysis of player feedback, moving beyond sentiment classification to uncover underlying trends and emergent themes in game reviews.
Additionally, Table 1 underscores the significant variation in the number of games and reviews analyzed across studies. While some studies focus on only a handful of titles, others examine tens of thousands of reviews without necessarily offering detailed insights into game design factors. Our study strikes a balance by analyzing a substantial dataset of 4,856 games and 485,600 reviews, ensuring both depth and breadth in our findings.
Finally, the inclusion of multilingual reviews in our study marks a notable departure from prior research, which has predominantly focused on English-language data. As illustrated in Table 1, only a few studies have considered reviews in multiple languages. By addressing this gap, our study provides a more globally representative perspective on player experiences, capturing diverse linguistic and cultural contexts that are often overlooked.
In sum, the comparative findings from the literature review substantiate the need for a more integrative and expansive approach to game review analysis. By leveraging a diverse dataset, incorporating generative AI, and examining multiple interrelated factors, our study offers a more comprehensive and methodologically advanced contribution to the field.
3 Methodology
3.1 Overview
To answer the RQs, reviews of all video games on the Steam and Meta Quest stores from 2020 to 2024 were scraped and analyzed. This section is divided into two parts: Subsection 3.2 outlines the scraping process, subsections 3.3 and 3.4 detail the data processing and preparation. Finally, subsection 3.5 describes data visualization. All data analyses and visualizations were conducted using Google Cloud [21]. Scraping code and functions used in the analysis were sourced from Github [22, 23] and PyPi [24]. Figure 2 provides an overview of the methodology adopted in this study.
3.2 Data scraping
For Steam games, the process starts by loading previous datasets to avoid redundant requests. The script then retrieves a list of all games from Steam’s API, using unique game IDs (AppIDs) as references. For each game, it checks if the game has already been processed or if it remains unreleased, in which case it is skipped. If a game is new and relevant, a request to the Steam API retrieves details shown in Fig. 3 like the game’s name, release date, genres, and price.
Similarly, for VR games from the Meta Quest store, the process starts by accessing every game’s unique URL from VRDB, a comprehensive database for Meta Quest VR games on the Meta Quest store [25]. The script makes HTTP requests to VRDB’s website to retrieve pages of game listings, utilizing pagination to scrape multiple pages efficiently. Once fetched, the HTML content is parsed using BeautifulSoup to locate embedded JavaScript data containing game details. Using regular expressions, key attributes such as game ID, name, and genres are extracted, while additional details like release date, ratings, and price are captured, as can be seen in Fig. 4.
To ensure accuracy, the script filters out games with release dates of 2019 or earlier, gathering only data for games that meet the specified date criteria. The extracted information is then saved to a JSON file, with regular autosaves to prevent data loss. Unreleased games or those with insufficient data are logged separately to streamline future runs.
By focusing on games with release dates starting from 2020, the scraper maintains a targeted dataset, storing only high-priority information and minimizing unnecessary API calls. The script’s systematic approach ensures efficient, relevant data collection while maintaining data integrity and completeness.
Once all games are scraped, the review scraping script is executed to collect reviews for each game. The process begins by loading each game’s ID and details from the raw games metadata file. It then verifies whether each game has a minimum of 25 reviews to process, ensuring that games with a low number of reviews are excluded to avoid potential biases. If the criterion is met, the script retrieves reviews in all available languages. This approach ensures the creation of a culturally diverse and generalized dataset, mitigating biases towards any specific culture or demographic. For each game with sufficient reviews, a CSV file is generated. These files are named according to the game ID and the total number of reviews scraped, enabling the incremental storage of all collected reviews. Reviews are written in order of rating, with the highest-rated review appearing in the first row. The Python process stops collecting reviews for a game once all reviews are gathered or the specified total review count is reached. This batch-oriented design facilitates efficient and organized data collection across multiple games and languages.
Tables 2 and 3 present the output of a custom Python script that processed individual CSV files to generate a descriptive summary. Of 65,686 Steam games, reviews were collected for 23,107 that met the minimum criterion of at least 25 reviews, totaling 31,832,390 reviews. Similarly, of 9,336 VR games on Meta, reviews were collected for 3,210, totaling 701,360 reviews. The resulting dataset of Steam games is openly accessible at Mendeley Data (https://doi.org/10.17632/jxy85cr3th.2) [26], while the dataset from the Meta Quest store is available upon request.
3.3 Data processing
3.3.1 Data quantification schema
The tokenization schema created for processing game and review data ensures systematic analysis by leveraging industry standards and insights from reputable sources. Age ranges are classified using the PEGI (Pan-European Game Information) rating system [27], while price ranges are based on VGinsights Steam Analytics [28], as shown in Fig. 5. These price ranges are divided into five categories:
-
$0.01–$4.99 (below the ±$5 range for indie games).
-
$5–$14.99 (within the ±$5 range for indie games).
-
$15–$24.99 (within the ±$5 range for AA games).
-
$25–$39.99 (within the ±$5 range for AAA games).
-
$40+ (above the ±$5 range for Premium AAA games).
Average price of selected games on Steam [28]
Also, game genres are classified based on [4, 29, 30], and include Action, Adventure, Casual, Puzzle, Role-Playing Game (RPG), Racing, Simulation, Sports, Strategy, Fighting, Horror, Battle Royale, Shooter, Survival, Music, Education, Entertainment, Meditation, and Exercise. As shown in Table 4, these genres ensure comprehensive coverage of both PC and VR games. The ‘Value’ column in Table 4 represents the presence of a given genre in a game, where 0 indicates its absence and 1 signifies its inclusion. This classification allows for a structured analysis of game genres by the generative AI model.
Furthermore, the schema incorporates critical game design elements derived from [3, 31, 32], covering ‘Gameplay’, ‘Graphics’, ‘Difficulty’, ‘Story’, ‘Audio’, ‘Avatar Customization’, ‘Controls’, ‘Monetization Model’, ‘Replayability’, ‘Community’, ‘Multiplayer’, and ‘Spatial Presence’.
As shown in Table 5, these design elements are essential in assessing the overall quality of a game, as they influence player engagement, immersion, and enjoyment. Each game review is scored on a 1–5 scale against these design elements, allowing for a structured evaluation of gameplay mechanics, difficulty, graphics, audio, and spatial presence, which contribute to the overall experience. Storytelling enhances emotional investment, while customization options enable greater player expression. Additionally, monetization models, multiplayer features, and community engagement shape long-term retention and player perception. By evaluating these aspects, the schema provides a comprehensive framework for understanding how good or bad a game is.
Building on the framework established by Lin and his team [6], we further identified distinct categories to classify each review effectively, as shown in Table 6. These attributes allow for a structured analysis of review content, capturing aspects such as helpfulness, sentiment, suggestions, and technical issues. Additionally, we introduced the ‘Recommended’ and ‘Review Language’ attributes to enhance the classification schema. The ‘Recommended’ attribute provides insight into the overall sentiment of the reviewer, indicating whether they endorse the game. Meanwhile, ‘Review Language’ was added to account for linguistic diversity in user feedback, ensuring a more comprehensive understanding of reviews across different regions and player demographics.
The ‘Value’ column in Table 6 represents whether a specific review characteristic is present, with 0 indicating its absence and 1 signifying its inclusion. For ‘Review Language’, values range from 1 to 11, denoting different languages to facilitate multilingual analysis. By incorporating these elements, the schema enables a more nuanced evaluation of player reviews, supporting deeper insights into user experiences and preferences.
The language categories were determined based on the top 10 most spoken languages on Steam, as identified in Steam’s October 2024 Monthly Survey shown in Fig. 6. These include Simplified Chinese, English, Russian, Spanish, Portuguese, German, Japanese, French, Polish, and Turkish. Additionally, a separate category, ‘Other’ was included to account for reviews written in languages outside of this list, ensuring comprehensive language classification within the schema.
Steam’s most common languages [33]
3.3.2 Quantifying game metadata
This process involves converting raw game metadata into structured categorical and numerical features. The goal is to create a uniform dataset where each game is represented by a set of predefined attributes. For both Meta Quest and Steam stores, the same schema is applied for feature extraction and tokenization. This ensures that game data from different platforms is processed in a unified manner, facilitating cross-platform analysis.
As shown in Tables 7 and 8, categorical mapping is applied to games using a set of predefined categories, each associated with relevant keywords. These keywords are extracted directly from the raw scraped games files, where they appear under the respective key names listed in the ‘Derived From’ column. For example, on the Steam store, the ‘Action’ category includes terms like ‘Action RPG’, ‘Action Roguelike’, ‘Beat ‘em up’, and ‘Hack and Slash’, which are found in the ‘Genre’ or ‘Tag Mapping’ fields in the game’s metadata [34].
As noted in Table 8, some categories are not explicitly defined. This is because Meta does not provide specific labels for these categories. However, despite the absence of direct categorization, certain attributes can still be inferred from the raw game metadata.
For example, the ‘Free_to_Play’ category is determined based on the price value found in the raw game file. If the price is 0, the ‘Free_to_Play’ value is set to 1; otherwise, it is assigned 0. Similarly, since these games are sourced from the Meta Quest store, the ‘Is_Steam’ attribute is set to 0 by default, as they are not from Steam’s store. On the other hand, the ‘Is_VR’ attribute is automatically set to 1, as all VR games are inherently 3D. The final output of this process is a CSV file in which each row represents a single quantified game.
3.3.3 Quantifying game reviews
To convert unstructured player reviews into analyzable features, we use Microsoft’s Phi-4 small language model (SLM), a 14-billion-parameter Transformer-based model designed for high-quality reasoning with relatively modest computational requirements [35, 36]. Hosting Phi-4 locally allows us to process large volumes of review text while retaining control over the data. Figure 7 illustrates how Phi-4 achieves high-quality performance despite its relatively small size (14B parameters). Compared to both smaller and much larger models, Phi-4 demonstrates superior results on the Massive Multitask Language Understanding (MMLU) benchmark, highlighting its position at the frontier of “small but mighty” language models.
Phi-4 model benchmark [36]
Each review is mapped onto two complementary schemas. The first captures review-level attributes such as recommendation status, bug reports, suggestions, and language, as shown in Table 6. The second encodes ratings for 12 game-design elements including Gameplay, Graphics, Difficulty, and Spatial Presence, as defined in Table 5. The aim is to convert each free-text review shown in Fig. 8 into the standardized structure illustrated in Fig. 9, enabling all downstream analyses to treat reviews consistently as rows within a tabular dataset.
Only the reviews datasets were quantified, as the main file containing all games did not require generative AI quantification. Comparing the category columns in Figs. 8 and 9 reveals significant differences in rows, columns, and data values, adding complexity to the transformation process. The quantification approach employs generative AI for transformation, while column extraction was done manually based on literature. Although generative AI can suggest columns, adopting standard categories is generally preferable for comparability with other analyses.
The comprehensive schemas in Tables 5 and 6 are processed using Microsoft’s Phi-4 small language model hosted locally to tokenize data systematically, providing a robust foundation for advanced machine learning analysis.
The model quantifies the fields outlined in these schemas, assigning 0 to fields where there is insufficient data to evaluate. For instance, when the model processes a review such as “This game is amazing,” it generates a structured JSON output as can be seen below in Table 9.
To achieve this, we’ve created a custom Python script. The script begins by defining both schemas highlighted in Tables 5 and 6 through a Pydantic model [37], which ensures consistent encoding of each text review’s attributes and game design elements.
Table 10 shows how fields like ‘Gameplay’ and ‘Graphics’ are represented in a Pydantic model. This data processing model imposes strict type and range constraints, preventing incomplete or misleading values from entering the dataset.
For each review, the script sends a prompt to the locally hosted Phi-4 SLM and expects a JSON object that conforms to the Pydantic schema. The prompt, summarized in Table 11, instructs the model to output integer-valued fields only. A LangChain-based wrapper converts the schema into format instructions embedded in the prompt so that the generated JSON can be parsed and validated automatically.
The SLM assigns scores to fields such as ‘Gameplay’, ‘Graphics’, and ‘Difficulty’, and classifies reviews by attributes like language and recommendation status. The script sanitizes text, and stores successfully parsed reviews with their original IDs and in a new CSV file. This workflow yields a transparent, machine-readable dataset that preserves data integrity while capturing qualitative insights from player reviews.
We quantified the 100 most upvoted reviews from a total of 4,856 PC and VR games. Specifically, we analyzed the top 3,860 games with the highest number of ratings from the Steam store and 996 from the Meta Quest store. By focusing on the most-rated reviews from games with the highest number of ratings, we aimed to ensure a representative analysis of player sentiment while minimizing biases that may arise from less-reviewed or niche titles.
3.3.4 Evaluation
In order to evaluate the proposed approach, we analyzed the top-rated 1,000 text reviews from the game HELLDIVERS 2, which had the highest number of reviews among all the games scraped. The tokenization process highlighted earlier in the report transformed these reviews into structured numerical data. This tokenized dataset, stored in a.csv file, was uploaded to Google Looker Studio [38] to generate visualizations.
Figure 10 illustrates the proportion of recommended versus non-recommended reviews, as well as the alignment between the SLM-based tokenization assessments and the actual raw data. The discrepancy between the SLM-based assessment and the actual recommendation data from Steam can be attributed to the fact that SLM-based assessment determines whether a review is positive or negative based on text sentiment analysis, which involves evaluating word choice, tone, and context. However, on Steam, users explicitly click ‘Recommended’ or ‘Not Recommended’, which may not always align with the sentiment expressed in the text.
Figure 11 illustrates the Phi-4 model’s attempt to label the languages of the top 1,000 highest-rated reviews. The results align with Gamalytic’s statistics [39], which indicate that 40.2% of the player base originates from English-speaking countries, such as the United States and the United Kingdom, while 8.4% are from China.
To evaluate the effectiveness of Phi-4 in quantifying qualitative player feedback, we applied it to the dataset of 1,000 textual reviews from HELLDIVERS 2. As seen in the bar charts in Fig. 12, Phi-4 successfully scored and quantified the reviews based on the 12 key game design elements shown in Table 5, demonstrating its ability to extract meaningful insights from unstructured text. The structured rating distributions suggest that the model is highly effective in translating subjective player experiences into a numerical assessment.
For the ‘Difficulty’ element in HELLDIVERS 2, the bar chart on the left in Fig. 13 reveals that most ratings are concentrated around scores 2 and 3. This indicates that players generally perceive the difficulty as mostly fixed and unfair, as outlined in the schema (Table 5).
Interestingly, this assessment corresponds closely with data from the GameFAQs survey [40] which surveyed 46 players in total when it comes to the difficulty of HELLDIVERS 2 and the result can be seen in the pie chart on the right in Fig. 13. Both the SLM’s assessment and the statistic from GameFAQs underscore the game’s challenging nature. The GameFAQs data highlights subjective player sentiments such as ‘Tough’ or ‘Unforgiving,’ whereas Phi-4 focuses on structured evaluations of the difficulty design, including fairness and pacing. These perspectives are complementary, converging on the notion that the game is demanding and overly difficult. This underscores the validity of the model’s evaluation, highlighting its alignment with player-reported perceptions of the game’s difficulty.
From Fig. 14, it’s evident that core elements such as ‘Gameplay,’ ‘Graphics,’ and ‘Story’ earn high ratings, while mid-range aspects like ‘Difficulty’ and ‘Replayability’ suggest room for improvement. ‘Monetization ranks lowest, indicating that players enjoy the current approach least and view it as an area in need of enhancement. To provide an overall assessment of HELLDIVERS 2, an aggregate score of 3.5/5 was calculated by averaging the individual design element ratings shown in Fig. 14, aligning well with published ratings. Notably, this SLM-derived score does not conflict with the 6.6/10 overall user rating on Metacritic [41], highlighting a general consistency with user perceptions.
3.4 Data Preparation
Before conducting the analyses, we first prepared our quantified data. We began by combining the 4,856 individual quantified review files, each representing reviews for a single game, into a single dataset. The objective was to calculate the average rating for each game design element and store it as a single row per game in a CSV file named ‘Tokenized_Reviews_Averages.csv’. Each of the 12 game design elements was rated on a 1–5 scale (1 = very low, 5 = very high). The elements are:
-
Gameplay.
-
Difficulty.
-
Graphics.
-
Story.
-
Audio.
-
Avatar Customization.
-
Controls.
-
Monetization Model.
-
Replayability.
-
Community.
-
Multiplayer.
-
Spatial Presence.
The Python script iterated through all review CSVs, calculated average scores per game, and merged them into a single dataset. This dataset was later uploaded to BigQuery for further analysis.
Finally, all game data was structured and stored in Google Cloud Storage, a cloud-based service for storing data [42]. This step was necessary to ensure that the dataset could be processed seamlessly, allowing for structured exploration and comparative analysis across different game platforms. The data uploaded to Google Cloud Storage consisted of two key datasets: ‘Tokenized Game Metadata.csv’ and ‘Tokenized Reviews Data.csv’, both of which served as the foundation for all subsequent analytical steps.
The next step was to use Google BigQuery, a platform that allows large-scale data analysis and management [21], to write queries on our data and derive insights and visualizations. The first action taken was to merge both datasets into a single table in BigQuery, with the price attribute transformed from a numerical value into a categorical attribute, as outlined in the Data quantification schema Sect. (3.3.1). This transformation made it easier to run queries and analyze game metadata alongside review data.
3.5 Creating visuals in Google looker
3.5.1 Correlation between positive ratings and game design elements
Once we had our dataset in Google BigQuery, we began running queries to visualize the data differently. We created a query that calculates the correlation between price category and high rating percentages across different game design elements for both PC and VR games using the CORR() function in BigQuery.
First, the query converts Price_Category into numerical values (0 to 5), where 0 represents free-to-play games and 5 represents premium AAA games, enabling statistical correlation analysis. It then computes the percentage of high ratings (4+) across 12 game design elements (such as ‘Graphics’, ‘Audio’, ‘Gameplay’, and ‘Replayability’) and averages them to obtain a total high rating percentage for each game.
Using the CORR() function, the query determines the Pearson correlation coefficient between the encoded price category and the total high rating percentage, separately for VR games (Is_VR = 1) and PC (non-VR) games (Is_VR = 0). This analysis provides insights into whether pricing influences overall game ratings differently in VR and PC gaming environments.
3.5.2 Comparing player enjoyment across VR and PC games
A BigQuery query was written to calculate the percentage of PC and VR games that receive high ratings (4 + on a 1–5 scale) across 12 game design elements. It first groups the dataset by Is_VR (where 1 represents VR games and 0 represents PC games) to analyze the difference in rating distributions between these two categories. For each design element, it computes the percentage of games with high ratings by dividing the count of games rated 4 or higher by the total number of games in that category. Additionally, the query calculates a total high rating percentage by averaging the high ratings across all 12 design elements, providing an overall metric for game quality in VR vs. PC games. This helps in identifying whether VR games tend to receive higher or lower ratings compared to non-VR games across different aspects of game design.
3.5.3 Highest-rated game design elements by game genre
A BigQuery query was developed to analyze how different game genres influence player ratings across game design elements. The games metadata dataset originally stored genres as separate binary columns (Table 4), with a value of 1 indicating membership. To enable more flexible analysis, the query first unpivots these columns into a long-format structure, where each row represents a game–genre combination. This allows each game to be analyzed under multiple genres while preserving key metadata such as VR support, 3D capability, and multiplayer mode. The second stage unpivots game design elements, producing a dataset where each row corresponds to a rating for a game within a given genre. This structure supports detailed comparisons of how genres affect design aspect ratings, revealing whether some genres excel in specific areas. The final output provides a structured view of the interactions among game genre, platform type (VR vs. PC), and gameplay features.
To measure variances, we calculated standard deviations for both genre and platform (PC vs. VR). For each game design element, we first found the average rating for each genre (e.g., ‘Gameplay’ rating for all ‘Action’ games). Then, we used the STDDEV_SAMP function on these averages to see how much the ratings varied across genres. We did the same for platforms where ratings were grouped by platform, genre, and design element. Then we calculated platform-level averages and the standard deviation of the genre-level averages to understand how much player opinions varied within each platform. The standard deviation between platforms for a given design element is calculated as shown in (1). Here, \(\:{x}_{PC}\) and \(\:{x}_{VR}\) represent the mean ratings of the design element on PC and VR platforms, respectively. Since there are only two platforms \(\:n=2\), the general sample standard deviation formula reduces to the absolute difference between the two platform means divided by 2.
3.5.4 Applying machine learning using XGBoost
To extend the analysis beyond descriptive statistics and explore predictive modeling, a machine learning approach was implemented using XGBoost, a high-performance gradient boosting algorithm [43]. XGBoost was selected for this study due to its ability to handle non-linear interactions, categorical variables, and missing data more effectively than traditional regression models [44]. A boosted tree model predicts the output \(\:\widehat{{y}_{i}}\) for an input \(\:{x}_{i}\) as the sum of multiple decision trees as expressed in (2) where M is the total number of trees, \(\:{F}_{m}\)(\(\:{x}_{i}\)) is the prediction from the m-th decision tree [44].
Game ratings are influenced by intricate dependencies between features, such as the interplay between price, genre, and multiplayer functionality. For instance, while VR games might receive high ratings for spatial presence, they do not necessarily score higher on community engagement. Similarly, higher-priced games may exhibit strong graphics quality but lower multiplayer engagement. These types of non-linear relationships are naturally handled by XGBoost’s decision tree-based structure.
Additionally, XGBoost provides robustness against outliers [44], a common challenge in game rating analysis. Some free-to-play games, for example, receive extreme ratings, either overwhelmingly positive or negative, based on factors unrelated to gameplay quality, such as microtransactions. Linear Regression is highly sensitive to such extreme values, whereas XGBoost assigns less influence to outliers, improving prediction stability.
Lastly, XGBoost scales efficiently to large datasets, making it particularly well-suited for analyzing thousands of games and millions of player reviews. The dataset used in this study contained a diverse set of features with complex interdependencies, making XGBoost’s ability to handle correlated features a critical advantage.
The objective is to predict overall game ratings based on game metadata, player reviews, and pricing categories. By leveraging XGBoost, we aim to uncover complex relationships between game attributes and player enjoyment, allowing for data-driven insights into what factors contribute to highly rated games. To ensure accurate predictions, we performed one-hot encoding for the Price Category attribute and created a new table without irrelevant attributes such as ID, Is Steam, Required Age, and Is Early Access. Table 12 presents the BigQuery query used to prepare the dataset for machine learning.
Following this, we initiated the training of XGBoost by executing the query shown in Table 13 on the newly prepared dataset.
4 Analysis
4.1 Overview of quantified data
Figure 15 presents the top 10 genre distribution for PC vs. VR games, showcasing the percentage of the quantified dataset of games metadata occupied by each genre across the two platforms. The left side of the figure illustrates PC games, while the right side highlights VR games. The distributions are based on the total dataset of 4,856 games, capturing the most commonly occurring genres in each platform.
In PC games, the most prevalent genre is Casual, which makes up the largest portion of the dataset, followed closely by Adventure. Action games hold the third-highest share, while genres such as Battle Royale, Role-Playing Games, and Simulation have a moderate presence. The remaining genres—Strategy, Horror, Puzzle, and Survival—appear less frequently but are still among the most common.
In VR games, Action emerges as the most dominant genre, significantly surpassing other categories. Adventure and Casual games also hold substantial shares but trail behind Action. Simulation follows closely, while Shooter, Survival, and Horror genres show moderate representation. The lowest-ranking genres in this top 10 list include Puzzle, Sports, and Battle Royale.
This distribution provides an essential context for the data analysis, offering insight into how genre representation differs between PC and VR platforms. The differences in distribution suggest that certain genres are more prevalent in VR than in PC and vice versa.
4.2 Correlation between price and players’ positive ratings
4.2.1 PC games
Figure 16 presents the correlation values between various game design elements and price in PC games. ‘Avatar Customization’ (0.97) and ‘Audio’ (0.96) exhibit the highest positive correlations with price, meaning that as game prices increase, these features are rated more favorably. ‘Spatial Presence’ (0.94) and ‘Replayability’ (0.85) also show strong positive correlations, followed by ‘Graphics’ (0.62), ‘Controls’ (0.37), and ‘Gameplay’ (0.36), which maintain moderate positive relationships with price.
Story (0.19) and Community (0.08) display weak positive correlations, indicating that these aspects have little direct association with price. Difficulty (−0.59) is the first feature with a negative correlation, meaning that as game price increases, difficulty ratings tend to decrease. Multiplayer (−0.97) and Monetization Model (−0.99) have the strongest negative correlations, suggesting that these elements are rated lower in higher-priced games.
4.2.2 VR games
As shown in Fig. 17, in VR games, ‘Spatial Presence’ (0.93) exhibits the highest positive correlation with price, meaning that more expensive VR games tend to provide significantly better immersive experiences. ‘Story’ (0.86) and ‘Audio’ (0.84) also show strong positive correlations with price, suggesting that higher-budget VR games are more likely to feature engaging narratives and high-quality sound design. ‘Graphics’ (0.82), ‘Controls’ (0.79), and ‘Replayability’ (0.64) follow closely behind, maintaining notable positive relationships with price. ‘Gameplay’ (0.55) and ‘Multiplayer’ (0.54) also correlate positively with price, though to a slightly lesser degree.
Monetization Model (−0.17) and Community Engagement (0.06) exhibit minimal correlation with price, indicating that monetization strategies and community-driven aspects do not strongly depend on a game’s cost in VR. Difficulty (−0.99) has the strongest negative correlation with price, meaning that as the price of a VR game increases, difficulty ratings tend to decrease significantly.
4.2.3 PC vs. VR games
The comparison of Figs. 16 and 17 highlights key differences in how price correlates with player ratings across various game design elements in PC and VR games. While both platforms share some similarities, notable differences emerge in how game price influences design priorities and player reception.
One of the most significant differences is in the correlation between price and Story, which is high in VR games (0.86) but low in PC games (0.19). This means that higher-priced VR games are strongly associated with better storytelling, whereas story quality in PC games does not show a meaningful relationship with price. Audio quality, on the other hand, shows a strong correlation with price in both PC (0.96) and VR (0.84) games, indicating that sound design is consistently valued in premium titles regardless of platform.
Calculating the average percentage of high ratings across 12 game design elements, and computing the correlation between price and ratings separately for VR and PC games gave us two correlation values that can be seen in Fig. 18. We can see that VR games have a much higher total positive correlation (0.8) with price compared to PC (0.5).
4.3 Comparing player enjoyment across VR and PC games
Table 14 presents the percentage of PC and VR games that received high ratings (4 + out of 5) across various game design elements. The overall average rating is higher for VR games (49.03%) than for PC games (33.82%), indicating that VR games generally receive more favorable reviews across most design aspects. However, the degree of enjoyment varies depending on the specific elements of game design.
Among the individual design elements, ‘Graphics’ stand out as one of the highest-rated features on both platforms. A higher proportion of VR games (73.11%) receive strong ratings in this category compared to PC games (62.51%), reflecting a 10.6% difference in favor of VR. In contrast, audio design follows a different trend. While PC games (61.81%) receive a relatively high percentage of positive ratings, VR games (54.73%) fall slightly behind, marking the only category where PC games outperform VR.
Gameplay mechanics show the most pronounced disparity between the two platforms. VR games (83.08%) receive overwhelmingly higher ratings than PC games (45.42%), representing a 37.66% difference—one of the largest gaps in the dataset. The largest difference is observed in spatial presence, where VR games (83.93%) significantly outperform PC games (35.54%), with a 48.39% gap—the most substantial observed. Similarly, VR games also hold a major advantage in replayability, with 65.46% of VR titles receiving high ratings compared to 27.06% of PC games, marking a 38.4% difference.
Storytelling elements show little variation between the two formats, with PC games (36.7%) and VR games (33.85%) receiving comparable ratings, showing only a 2.85% difference. Monetization models also show a higher percentage of positive ratings in VR, with 49.14% of VR games receiving high ratings compared to 37.3% of PC games, a difference of 11.84%. Similarly, the multiplayer experience is rated more favorably in VR, with 33.51% of VR games receiving high ratings compared to 14.65% of PC games, showing an 18.86% gap. Finally, difficulty ratings remain fairly consistent across both platforms, with VR games (4.55%) and PC games (4.39%) receiving similarly low percentages of high ratings, showing only a 0.16% difference.
4.4 Highest-rated game design elements per game genre
4.4.1 PC games
This analysis highlights the key game design elements that matter most across different genres in both PC and VR platforms, offering valuable insights for developers, designers, and publishers to optimize game features and enhance player experience.
Table 15 presents the average ratings of various game design elements across different PC game genres. Each row represents a genre, while each column corresponds to a specific game design element. The values in the table indicate the average rating given to each element within that genre.
‘Graphics’ and ‘Audio’ tend to have relatively high scores across all genres, with ‘Music’ (3.84) and ‘Puzzle’ (3.72). While ‘Audio’ receives the highest scores in ‘Music’ (4.01) and ‘Entertainment’ (3.55). ‘Avatar Customization’ is consistently rated between 3.3 and 3.8 across genres, with the highest value found in ‘Entertainment’ (3.83). ‘Gameplay’ scores are relatively consistent across genres, with ‘Entertainment’ (3.53) and ‘Strategy’ (3.41) at the higher end, while ‘Spatial Presence’ scores range between 3.1 and 3.4, with ‘Shooter’ (3.37) and ‘Horror’ (3.45) being among the highest-rated.
‘Story’ ratings show variation across genres, with ‘Horror’ (3.39), ‘RPG’ (3.26), and ‘Adventure’ (3.29) having the highest values, whereas genres like ‘Shooter’ (3.13) and ‘Sports’ (3.12) have lower ratings. ‘Community’ ratings are in the 3.0 to 3.4 range, with ‘Education’ (3.42) and ‘Adventure’ (3.15) scoring higher, while ‘Entertainment’ (2.25) has the lowest rating.
‘Monetization Model’ generally has the lowest scores among all elements, with ‘Fighting’ (2.74) and ‘Strategy’ (2.91) being at the lower end, while ‘Puzzle’ (3.30) and ‘Casual’ (3.13) score slightly higher. ‘Replayability’ ratings vary, with ‘Entertainment’ (3.54) having the highest value, while genres like ‘Horror’ (2.94) and ‘Puzzle’ (2.97) have lower scores. ‘Difficulty’ ratings are relatively close across genres, with ‘Entertainment’ (3.43) and ‘Music’ (3.00) at the higher end, and ‘Strategy’ (2.95) and ‘Simulation’ (2.95) at the lower end.
‘Multiplayer’ ratings are highest in ‘Sports’ (2.98) and ‘Fighting’ (2.95), while ‘Music’ (2.69) and ‘Horror’ (2.80) rate it the lowest. ‘Controls’ ratings fluctuate, with ‘Strategy’ (3.17) and ‘Racing’ (3.15) having the highest values, and ‘Survival’ (2.56) and ‘Horror’ (2.74) ranking lower.
4.4.2 VR games
Like Table 15, Table 16 presents the average ratings of various game design elements across VR game genres instead. Unlike in PC games, ‘Spatial Presence’ scores highly across all VR genres, with ‘Strategy’ (4.23), ‘Entertainment’ (4.33), and ‘Shooter’ (4.06) receiving the highest ratings, while even the lowest-scoring genres, ‘Education’ (3.94) and ‘Battle Royale’ (3.92), remain relatively high. This contrasts with PC games, where ‘Spatial Presence’ varied more significantly across genres, showing that immersion is a more universal expectation in VR. ‘Gameplay’ ratings remain strong across genres, similar to PC, with ‘Entertainment’ (4.00), ‘Sports’ (3.91), and ‘Strategy’ (3.85) leading, while games tagged with ‘Adventure’ (3.76) and ‘Role-Playing Game’ (3.71) have slightly lower ratings. In VR, the spread of ‘Gameplay’ ratings is narrower than in PC games, where RPGs showed more differentiation.
‘Graphics’ ratings are consistently high across all genres, with ‘Entertainment’ (4.00), ‘Strategy’ (3.86), and ‘Puzzle’ (3.90) leading, while ‘Battle Royale’ (3.57) and ‘Horror’ (3.64) are among the lower-rated. The highest PC game ratings for ‘Graphics’ were around 3.84, meaning VR games tend to have slightly higher visual ratings overall, likely due to their dependence on high-quality visuals for immersion.
‘Replayability’ in VR shows a different distribution compared to PC, with ‘Sports’ (3.85), ‘Music’ (3.82), and ‘Shooter’ (3.68) being the highest-rated, while ‘Entertainment’ (2.50) is significantly lower than in PC. This suggests that VR entertainment applications often provide one-time immersive experiences rather than repeatable ‘Gameplay’ sessions.
‘Audio’ ratings in VR are highest in ‘Music’ (3.98), ‘Adventure’ (3.60), and ‘Education’ (3.47), whereas in PC, ‘Shooter’ (3.57) and ‘Fighting’ (3.58) had stronger ‘Audio’ ratings. While VR ratings remain more concentrated, PC games had a broader range of ‘Audio’ scores, showing that audio quality is prioritized differently between the two platforms.
‘Monetization Model’ ratings remain among the lowest-rated elements in both VR and PC. The highest VR scores appear in ‘Puzzle’ (3.63), ‘Shooter’ (3.63), and ‘Sports’ (3.46), while ‘Strategy’ (3.40) and ‘Fighting’ (3.50) rank lower, a pattern similar to PC games, where monetization was one of the least favored aspects across most genres.
‘Avatar Customization’ is rated highly in ‘Entertainment’ (4.00), ‘Sports’ (3.56), and ‘Education’ (3.71), while ‘Fighting’ (3.51) and ‘Battle Royale’ (3.43) are at the lower end. In PC games, ‘Avatar Customization’ followed a similar distribution, though the difference between highest- and lowest-rated genres is more pronounced in VR.
‘Community’ ratings in VR are strongest in ‘Education’ (3.85) and ‘Entertainment’ (3.50), while ‘Fighting’ (3.37) and ‘Shooter’ (3.36) have lower ratings. This follows the same pattern as PC games, where ‘Education’ and ‘Entertainment’ showed higher ‘Community’ engagement compared to competitive genres.
‘Story’ ratings are highest in ‘Entertainment’ (4.00), ‘Role-Playing Game’ (3.48), and ‘Strategy’ (3.33), while ‘Sports’ (3.26) and ‘Fighting’ (3.28) score lower. While ‘Role-Playing Games’ and ‘Adventure’ games had similarly high ‘Story’ ratings in both VR and PC, ‘Entertainment’ games score significantly higher in VR, suggesting a stronger focus on immersive storytelling.
‘Multiplayer’ ratings in VR are strongest in ‘Role-Playing Game’ (3.29), ‘Adventure’ (3.21), and ‘Sports’ (3.33), while ‘Fighting’ (3.16) and ‘Shooter’ (3.19) score slightly lower. Compared to PC, ‘Multiplayer’ scores in ‘Role-Playing Games’ are higher in VR, suggesting that social VR role-playing experiences are more emphasized than in traditional PC games.
‘Controls’ ratings are highest in ‘Shooter’ (3.28), ‘Strategy’ (3.35), and ‘Racing’ (3.28), while ‘Education’ (2.76) and ‘Horror’ (2.84) have the lowest ratings. PC games also showed high ‘Controls’ ratings for ‘Strategy’ and ‘Racing’, though ‘Education’ did not rank as low as it does in VR.
‘Difficulty’ ratings in VR are relatively even across all genres, with ‘Education’ (3.06) ranking highest, while ‘Fighting’ (2.97) and ‘Horror’ (2.95) rank lowest. This differs from PC games, where ‘Fighting’ had higher ‘Difficulty’ ratings, suggesting that VR developers may prioritize accessibility over challenge in these genres.
Figures 19 and 20 illustrate the standard deviations of the averages presented in Tables 15 and 16. The results indicate that variation in ratings across genres is generally small, with values ranging from 0.06 to 0.51. This suggests that most elements are rated fairly consistently across different genres. For example, ‘Difficulty’ (0.06) shows almost no variation, meaning players perceive challenge levels similarly regardless of genre on both PC and VR. In contrast, ‘Multiplayer’ (0.51) exhibits the widest spread on PC, making it highly genre-dependent.
In comparison, differences across platforms are more moderate but still meaningful. As shown in Table 17, values range from as low as 0.01 up to about 0.70, indicating that certain ‘Design Elements’ vary more noticeably between PC and VR experiences. For instance, ‘Multiplayer’ (0.70) and ‘Spatial Presence’ (0.57) show the strongest platform differences, suggesting that social interaction and immersive presence are perceived quite differently depending on the medium (PC or VR). Conversely, ‘Difficulty’ (0.01) and ‘Audio’ (0.07) remain highly consistent, with little perceptual difference between PC and VR players.
4.5 Predicting what game genres have the biggest impact on ratings
Table 18 presents the predicted average overall rating for different game genres in VR and PC gaming, based on the XGBoost regression model. The values represent the expected user enjoyment scores for each genre on their respective platforms.
In the VR Games category, the highest predicted ratings are for ‘Role-Playing Game’ (3.14) and ‘Strategy’ (3.14), both receiving the same expected enjoyment level. ‘Battle Royale’ (3.13) follows closely behind, with only a minor difference from the top-rated genres. ‘Music’ (3.09), ‘Shooter’ (3.09), and ‘Fighting’ (3.09) share identical predicted ratings, forming the next group of highly-rated genres, slightly below the leading ones. ‘Sports’ (3.08) and ‘Survival’ (3.08) also have nearly identical predicted values, ranking just below the previous group. ‘Racing’ (3.07) and ‘Simulation’ (3.06) follow closely, maintaining a small gap from the higher-scoring genres. Further down, ‘Action’ (3.04), ‘Casual’ (3.04), and ‘Adventure’ (3.02) show slight decreases in predicted ratings, with ‘Puzzle’ (3.01) positioned just below them. ‘Horror’ (2.97) ranks slightly lower than ‘Puzzle’, while ‘Education’ (2.95) follows closely. The lowest-rated genre in this category is ‘Entertainment’ (2.36), which is significantly lower than all other genres.
While ‘Role-Playing Game’ and ‘Strategy’ rank at the top in VR gaming, ‘Battle Royale’ (2.89) receives the highest predicted rating in PC games. ‘Fighting’ (2.88), ‘Sports’ (2.87), and ‘Music’ (2.86) follow closely, all within a narrow range. ‘Shooter’ (2.85) and ‘Action’ (2.85) maintain identical predicted scores, positioning them just below the leading group. Further down, ‘Racing’ (2.84), ‘Strategy’ (2.82), and ‘Role-Playing Game’ (2.81) show small differences, indicating a slightly lower expected enjoyment compared to their VR counterparts (3.14 for both ‘Role-Playing Game’ and ‘Strategy’). ‘Casual’ (2.79), ‘Adventure’ (2.78), and ‘Simulation’ (2.78) are positioned closely together, showing minimal variation in their expected ratings. ‘Survival’ (2.77) and ‘Horror’ (2.75) rank just below them, while ‘Puzzle’ (2.73) is next in line. At the lower end of the predictions, ‘Education’ (2.73) and ‘Entertainment’ (2.64) receive the lowest ratings, with ‘Entertainment’ ranking the lowest among all PC game genres. While ‘Entertainment’ games score slightly higher in PC gaming (2.64) compared to VR (2.36), they remain the least favorably rated genre overall.
5 Discussion
5.1 Discussion of key findings
5.1.1 Research question 1: What are the key game design features that contribute most significantly to player enjoyment in PC versus VR games?
RQ1 examines which design features matter most on each platform. Our results point to ‘Spatial Presence’, Graphics’, ‘Audio’, and ‘Gameplay’ as the most consistent drivers of high ratings, with clear platform specific patterns.
‘Spatial Presence’ stands out in VR, where immersion is valued more strongly than on PC. This supports the common view that presence is central to virtual reality [45]. Audio relates closely to overall reception on both platforms, though PC titles tend to edge VR slightly. This may reflect higher expectations for spatial and three-dimensional sound in immersive settings. ‘Gameplay’ and ‘Replayability’ also favor VR. Players often describe the physical, hands-on interaction as fresh and engaging across sessions. This kind of deeply engaging, interactive play is reminiscent of a flow state, a condition of optimal experience where challenge and skill are balanced, which can greatly enhance enjoyment [45]. By contrast, on PC sustained interest more often comes from broad content variety, social features, or community activity. The emphasis on community and social play in PC titles suggests that players value social connectedness, aligning with the relatedness need described in self-determination theory [46].
5.1.2 Research question 2: In what ways do monetization strategies affect player enjoyment in free-to-play versus non-free-to-play games?
RQ2 looks at how monetization shapes player enjoyment across free to play and non free to play titles. In PC games, pricey releases that add microtransactions or frequent DLC tend to draw strong pushback, which is in line with prior work on player resistance to paywalls [47]. From a motivation perspective, this reaction reflects how aggressive monetization can undermine players’ sense of autonomy and intrinsic motivation, thereby diminishing enjoyment [46]. Players generally want to feel in control of their play experience; if monetization schemes (e.g., paywalls or pay-to-win mechanics) are perceived as controlling or unfair, they erode the fun. Indeed, self-determination theory posits that when external rewards or costs pressure the player, it can thwart autonomy and reduce intrinsic enjoyment [46]. Multiplayer reception also slips as price rises, while lower cost and free to play multiplayer games often build stronger communities, as seen with League of Legends and CS, GO [48, 49]. The thriving communities in these free titles suggest that social bonds can compensate for or even enhance enjoyment in the absence of upfront costs. In other words, a vibrant community can fulfill players’ relatedness needs, reinforcing their enjoyment even when the game monetization relies on optional purchases [46].
Furthermore, price relates to several design elements. Higher priced games more often focus on avatar customization, graphics, controls, and replayability, which fits the idea that premium titles invest in visuals, polish, player expression, and long term value [50]. By contrast, cheaper PC games tend to feel more demanding, which suggests sharper learning curves or less refined mechanics, while premium titles aim for a more accessible experience [51].
Story shows a split by platform. In VR, non free to play games are closely linked with richer storytelling, hinting that premium VR releases put more weight on narrative quality. In PC, story quality appears less tied to price, which means strong narratives can show up at many price points.
On the other hand, VR pricing shows only a weak link with how players view monetization [52]. Many VR players accept paying up front and have seen fewer pay to win or microtransaction heavy models. VR also receives a larger share of positive views on monetization compared to PC. Taken together, if a PC game is free to play, players are more open to cosmetic or optional purchases, while premium PC games that stack on extra costs are judged more harshly.
5.1.3 Research question 3: What differences exist in player enjoyment between PC and VR game genres?
RQ3 examines how enjoyment differs by genre across PC and VR. Overall, VR titles tend to receive warmer responses, and this is most visible in features such as ‘Gameplay’, ‘Spatial Presence’, and ‘Replayability’. These patterns agree with prior work on VR user experience that links immersion with higher enjoyment [53].
By genre, games tagged with ‘Role-Playing Game’ and ‘Strategy’ stand out in VR, where immersion and deliberate decision making fit the strengths of the medium. These trends suggest that the strong presence and embodied play in VR can elevate genres reliant on narrative, tactics, or exploration. For example, a rich RPG story or complex strategy scenario may feel more compelling when the player is immersed in it [45]. ‘Battle Royale’ titles perform best on PC yet still earn solid reception in VR, indicating that fast-paced competitive play is enjoyable on both platforms, though players might appreciate it for different reasons in each context.
Clear platform preferences also appear. PC games lean toward ‘Casual’ and ‘Adventure’ genres, reflecting broad reach and easy entry. VR catalogs favor ‘Action’ and ‘Simulation’ genres, where hands on interaction and spatial awareness are central. Multiplayer experiences often feel more engaging in VR, helped by the heightened sense of presence, spatial audio, and the ability to read others’ body language in-game. This richer social presence in VR means cooperative or competitive play can better satisfy players’ social needs, an echo of the relatedness factor in player motivation theories detailed by Grasse and his team [46]. In an RPG, for instance, interacting with other players’ avatars in an embodied way can deepen camaraderie and enjoyment. ‘Controls’ and precision matter in both ecosystems (particularly for genres like ‘Racing’ or ‘Strategy’), but we see that VR releases often aim for approachability over very high difficulty in genres such as ‘Fighting’ or ‘Horror’. This difference might be due to VR developers balancing challenge to avoid player frustration or motion discomfort. From a flow perspective, keeping difficulty approachable helps a wider range of players enter a flow state of enjoyment rather than becoming overwhelmed [45]. In summary, the genre-based findings reinforce that VR’s strength lies in providing immersive, physically engaging experiences that align with theories of presence and flow, whereas PC’s strength is its versatility and social connectivity, speaking to a wide spectrum of player motivations.
5.1.4 Research question 4: How can generative AI techniques be employed to identify the primary factors influencing positive player reviews across different game genres and platforms?
RQ4 looks at how generative AI can surface what drives positive reviews across genres and platforms. Our pipeline uses large scale automated text summarization, sentiment analysis, and topic modeling. Together, these methods reveal patterns that are hard to find by hand.
Sentiment from player comments helps point to problem pairs, for example price with difficulty, and price with monetization, and it flags areas that may need attention, such as VR audio design. Topic modeling groups comments into clear themes. These include pay to win frustration, concerns about fairness in skill based systems, and dislike of short campaigns in expensive VR adventure games. Prior work, for example Guzsvinecz, shows that these methods can also expose genre specific details that do not always appear with manual coding [5].
In sum, generative AI offers a scalable and data driven way to learn which design features attract players and which ones push them away. This lets developers and researchers update their games and studies with more focus and better precision.
5.2 Implications of the study
The findings from this study offer practical and theoretical implications that can guide both industry practitioners and academic researchers. In this section we focus on how to act on them.
First, platform matters. Developers should tailor design to the strengths of each platform. In VR, prioritize immersion and physical engagement, with strong motion controls, high fidelity visuals, and polished interactive mechanics that support ‘Spatial Presence’, ‘Gameplay’, and ‘Replayability’. On PC, give careful attention to audio design and to monetization choices. Players respond better when pricing and added purchases feel transparent and fair, which helps sustain goodwill and loyalty.
Second, integrate generative AI into the development and research cycle. Automated text summarization, sentiment analysis, and topic modeling let teams process large volumes of feedback, surface recurring pain points, and monitor how attitudes change over time. This supports quicker design pivots and clearer product decisions. The same approach can extend to mobile and console as cross-platform play grows.
Third, match genre to platform strengths. VR tends to reward genres that lean on deep interaction and narrative, such as RPG and Strategy, while some casual or educational titles may struggle to reach high enjoyment without careful adaptation. Aligning content with what each platform does best is essential for positive user experiences.
5.3 Limitations of the study
While this study provides valuable insights into player sentiment and game design trends in the post pandemic gaming landscape, several limitations must be acknowledged. The focus on games released between 2020 and 2024, while justifiable due to shifts in player behavior, restricts the generalizability of findings to earlier or future releases. The presence of review bombing also poses a challenge for sentiment analysis, which may not always reflect genuine player experiences [7].
In addition, relying on generative AI for text tokenization can introduce bias, since models may misread sarcasm, humor, or context, which can affect sentiment classification [12]. Finally, due to limited resources and the compute required to process each review with the Phi 4 SLM, only the 100 most rated reviews were analyzed for each of the 4,856 PC and VR games. This sampling choice can underrepresent lesser known or niche titles and can emphasize mainstream preferences, which may overlook perspectives from smaller player communities.
6 Conclusion and future works
In this study, we analyzed reviews from 4,856 PC and VR games, using a generative AI pipeline to uncover what drives player enjoyment across genres and platforms. Below are the takeaways:
-
Platform matters. VR tends to earn stronger responses for immersion and hands on interaction, while PC strengths vary by element and audience. Designing to each platform’s capabilities is key.
-
Monetization shapes sentiment, especially for premium PC titles where added costs can strain goodwill. Clear and optional purchase models are received more positively. VR players are generally more comfortable with paying up front.
-
Genre and platform should be matched with care. Story heavy and strategy focused experiences align well with VR’s sense of presence, while several competitive and pick up genres remain a better fit for PC.
-
Generative AI methods, including text summarization and topic modeling, help teams process large volumes of feedback and surface themes such as concerns about pay to win and the importance of fair difficulty. These tools provide a practical, data driven basis for design decisions.
Together, these points suggest that understanding the interplay among immersion, monetization, and player expectations is central to making more engaging games. Developers and publishers can use these insights to tune price models, adjust difficulty, and refine interactive systems. Researchers can apply the same approach to other platforms and contexts to test how robust these patterns are.
Future studies should widen the release window beyond 2020 to 2024 so results generalize to earlier and future titles. Methods that detect and reduce review bombing are also needed, for example agent-based AI workflows that combine gameplay telemetry, community discussions, and streaming interactions to spot manipulative or inauthentic reviews. Model accuracy can improve by refining Transformer models, so they read sarcasm, humor, and context more reliably, supported by targeted human checks where needed. Finally, to reduce sampling bias, researchers can analyze more than the top 100 reviews per game and use stratified sampling that includes both popular and niche titles, while investing in more efficient processing so larger review sets are practical.
Data availability
The datasets analyzed in this study were collected from publicly accessible sources, namely Steam and the Meta Quest stores. The processed dataset, including quantified game design elements and metadata, is available upon request. Code used for data processing and analysis is available upon request.
References
GlobeNewswire I, Video Games Market Size Expected to Reach USD 664.96 Billion By (2033) : The global video games market size is calculated at USD 274.63 billion in 2024 and is expected to reach around USD 664.96 billion by 2033, registering a double-digit CAGR of 10.32% from 2024 to 2033., NASDAQ OMX Corporate Solutions, Inc., 2024
Sweetser P, Wyeth P (2005) GameFlow: A Model for Evaluating Player Enjoyment in Games
Caroux L, Pujol M (2024) Player enjoyment in video games: a systematic review and meta-analysis of the effects of game design choices. Int J Hum Comput Interact 40(16):4227–4238. https://doi.org/10.1080/10447318.2023.2210880
Guzsvinecz T, Szűcs J (2023) Length and sentiment analysis of reviews about top-level video game genres on the steam platform. Comput Human Behav. https://doi.org/10.1016/j.chb.2023.107955
Guzsvinecz T (2023) The correlation between positive reviews, playtime, design and game mechanics in souls-like role-playing video games. Multimed Tools Appl 82(3):4641–4670. https://doi.org/10.1007/s11042-022-12308-1
Lin D, Bezemer CP, Zou Y, Hassan AE (2019) An empirical study of game reviews on the Steam platform. Empir Software Eng 24(1):170–207. https://doi.org/10.1007/s10664-018-9627-4
Epp R, Lin D, Bezemer CP (2021) An empirical study of trends of popular virtual reality games and their complaints. IEEE Trans Games 13(3):275–286. https://doi.org/10.1109/TG.2021.3057288
Lu Y, Ota K, Dong M (2024) An empirical study of VR head-mounted displays based on VR games reviews. Games: Research and Practice 2(3):1–20. https://doi.org/10.1145/3665988
Yoon D-M, Han S-H, Park I, Chung T-S (2024) Analyzing VR game user experience by genre: a text-mining approach on Meta Quest Store reviews. Electronics 13(19):3913. https://doi.org/10.3390/electronics13193913
Li X, Zhang Z, Stefanidis K (2021) A data-driven approach for video game playability analysis based on players’ reviews. Information. https://doi.org/10.3390/info12030129
Viggiato M, Lin D, Hindle A, Bezemer CP (2021) What causes wrong sentiment classifications of game reviews. IEEE Trans Games. https://doi.org/10.1109/TG.2021.3072545
Krugmann JO, Hartmann J Sentiment Analysis in the Age of Generative AI. Customer Needs Solutions 11(1). https://doi.org/10.1007/s40547-024-00143-4
Rendevski N et al (2022) PC VR vs Standalone VR Fully-Immersive Applications: History, Technical Aspects and Performance, in 2022 57th International Scientific Conference on Information, Communication and Energy Systems and Technologies (ICEST), IEEE, Jun. pp. 1–4. https://doi.org/10.1109/ICEST55168.2022.9828656
Kang H-N (2017) Int J Innov Manage Technol 90–94. https://doi.org/10.18178/ijimt.2017.8.2.709. A Study of Analyzing on Online Game Reviews Using a Data Mining Approach: STEAM Community Data
Meta meta.com. Accessed: Dec. 02, 2024. [Online]. Available: https://www.meta.com/experiences/
Alsop T Share of steam users with a virtual reality (VR) headset worldwide as of September 2024, by device, https://www.statista.com/statistics/265018/proportion-of-directx-versions-on-the-platform-steam/
Pagano D, Maalej W (2013) User feedback in the appstore: An empirical study, in 2013 21st IEEE International Requirements Engineering Conference (RE), IEEE, Jul. pp. 125–134. https://doi.org/10.1109/RE.2013.6636712
Viggiato M, Bezemer CP (2024) Leveraging the OPT Large Language Model for Sentiment Analysis of Game Reviews, IEEE Trans Games, vol. 16, no. 2, pp. 493–496, Jun. https://doi.org/10.1109/TG.2023.3313121
Yu Y, Dinh DT, Nguyen BH, Yu F, Huynh VN (2023) Mining insights from esports game reviews with an aspect-based sentiment analysis framework. IEEE Access 11:61161–61172. https://doi.org/10.1109/ACCESS.2023.3285864
Dong J, Ota K, Dong M (2024) User Experience of Different Groups in Social VR Applications: An Empirical Study Based on User Reviews, IEEE Trans Comput Soc Syst, pp. 1–13, Sep. https://doi.org/10.1109/tcss.2024.3416208
Google BQ (2025) overview, Google. Accessed: Mar. 01, [Online]. Available: https://cloud.google.com/bigquery/docs/introduction
Martin B, Steam-Games-Scraper (2022) Github. doi: https://github.com/FronkonGames/Steam-Games-Scraper
Mukeshiyer23 MetaGamesExtraction. Accessed: Jan. 03, 2025. [Online]. Available: https://github.com/mukeshiyer23/MetaGamesExtraction/tree/main
Zhu, Zhihan steam-review-scraper 0.1.0, May 26, 2021, PyPi. doi: https://pypi.org/project/steam-review-scraper
VRDB VRDB (2024) Accessed: Nov. 03, 2024. [Online]. Available: https://vrdb.app/
Abdelqader H Steam Games Metadata and Player Reviews (2020–2024), Jun. 30, 2025, Mendeley Data. https://doi.org/10.17632/jxy85cr3th.2
Information PEG What Do The Labels Mean. Accessed: Nov. 22, 2024. [Online]. Available: https://pegi.info/
Insights VG Steam Analytics – Customised video game analysis. Accessed: Nov. 22, 2024. [Online]. Available: https://vginsights.com/steam-analytics
R. I. C. NK, K. TAP, Jin H, Lee (2014) Facet Analysis of Video Game Genres, in iConference 2014 Proceedings, iSchools, Mar. https://doi.org/10.9776/14057
Li X (2020) Towards Factor-oriented Understanding of Video Game Genres using Exploratory Factor Analysis on Steam Game Tags, in IEEE International Conference on Progress in Informatics and Computing (PIC), IEEE, Dec. 2020, pp. 207–213. https://doi.org/10.1109/PIC50277.2020.9350753
Guo Z, Thawonmas R, Ren X (2024) Rethinking dynamic difficulty adjustment for video game design. Entertain Comput. https://doi.org/10.1016/j.entcom.2024.100663
Caroux L (2023) Presence in video games: a systematic review and meta-analysis of the effects of game design choices. Appl Ergon. https://doi.org/10.1016/j.apergo.2022.103936
Clement J (2024) Most common main languages of Steam gaming platform users in October 2024, Oct. Accessed: Nov. 30, 2024. [Online]. Available: https://www.statista.com/statistics/957319/steam-user-language/
Corporation V Popular Tags. Accessed: Mar. 02, 2025. [Online]. Available: https://store.steampowered.com/tag/browse/#global_492
Uszkoreit J (2017) Transformer: A Novel Neural Network Architecture for Language Understanding, Aug. Accessed: Sep. 10, 2025. [Online]. Available: https://research.google/blog/transformer-a-novel-neural-network-architecture-for-language-understanding/
Microsoft I (2024) Phi-4: Microsoft’s Newest Small Language Model Specializing in Complex Reasoning, Jul. [Online]. Available: https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090
Colvin S Pydantic. [Online]. Available: https://docs.pydantic.dev/1.10/
Google L, Studio Google. Accessed: Jan. 19, 2025. [Online]. Available: https://cloud.google.com/looker
Gamalytic, HELLDIVERS 2 - Steam Stats. Accessed: Jan. 05, 2025. [Online]. Available: https://gamalytic.com/game/553850
GameFAQs (2025) Helldivers 2 - Statistics, Accessed: Jan. 05, 2025. [Online]. Available: https://gamefaqs.gamespot.com/pc/407551-helldivers-2/stats
Metacritic, Helldivers 2 PC User Reviews (2025), Accessed: Jan. 06, 2025. [Online]. Available: https://www.metacritic.com/game/helldivers-2/user-reviews/?platform=pc
Google (2025) Object storage for companies of all sizes, Accessed: Mar. 03, 2025. [Online]. Available: https://cloud.google.com/storage?hl=en
Asselman A, Khaldi M, Aammou S (2023) Enhancing the prediction of student performance based on the machine learning XGBoost algorithm. Interact Learn Environ 31(6):3360–3379. https://doi.org/10.1080/10494820.2021.1928235
Chen T, Guestrin C (2016) XGBoost, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: ACM, Aug. pp. 785–794. https://doi.org/10.1145/2939672.2939785
Yang S, Zhang W (2022) Presence and flow in the context of virtual reality storytelling: what influences enjoyment in virtual environments? Cyberpsychol Behav Soc Netw 25(2):101–109. https://doi.org/10.1089/cyber.2021.0037
Grasse KM, Kreminski M, Wardrip-Fruin N, Mateas M, Melcer EF (2022) Using self-determination theory to explore enjoyment of educational interactive narrative games: a case study of Academical. Front Virtual Real. https://doi.org/10.3389/frvir.2022.847120
Lemmens JS (2022) Play or pay to win: loot boxes and gaming disorder in FIFA ultimate team. Telemat Inform Rep 8:100023. https://doi.org/10.1016/j.teler.2022.100023
Song J (2023) Why League of Legends is the Greatest Multiplayer Game of All Time, Jul. Accessed: Mar. 10, 2025. [Online]. Available: https://medium.com/@jksong1106/why-league-of-legends-is-the-greatest-multiplayer-game-of-all-time-efd65b599975#:~:text=It%20may%20seem%20similar%20to,they%20will%20control%20each%20game
Moore B Counter-Strike: Global Offensive (for PC) Review, PCMAG. Accessed: Mar. 10, 2025. [Online]. Available: https://www.pcmag.com/reviews/counter-strike-global-offensive-for-pc
Sobociński MD (2019) Quality of video games: introduction to a complex issue. Quality Production Improvement - QPI 1(1):487–494. https://doi.org/10.2478/cqpi-2019-0066
Koster R (2018) The cost of games. Jan
Buchta K et al (2022) Microtransactions in VR. A qualitative comparison between voice user interface and graphical user interface, in 2022 15th International Conference on Human System Interaction (HSI), IEEE, Jul. pp. 1–5. https://doi.org/10.1109/HSI55341.2022.9869475
Fisher N, Kulshreshth AK (2024) Exploring dynamic difficulty adjustment methods for video games. Virtual Worlds 3(2):230–255. https://doi.org/10.3390/virtualworlds3020012
Funding
Open Access funding enabled and organized by CAUL and its Member Institutions. This research received no specific grant from any funding agency, commercial entity, or not-for-profit organization. All computational resources were self-funded, and the study was conducted independently without external financial support.
Author information
Authors and Affiliations
Contributions
Hisham Abdelqader conceptualized the study, designed the methodology, and carried out the data collection and processing. The data analysis and visualization were conducted by the author using generative AI and machine learning techniques. The manuscript was drafted and revised by Hisham Abdelqader. The author takes full responsibility for the integrity and accuracy of the study’s findings.
Corresponding author
Ethics declarations
Ethical approval
This study was conducted in accordance with the ethical guidelines set forth by the University of Wollongong in Dubai. Given that this research involved the analysis of publicly available game reviews, it did not require formal approval from an institutional ethics review board. No personally identifiable data were collected, and all user-generated content was anonymized in adherence to ethical research principles.
Consent to participate
Not applicable. This study utilized publicly available data from digital game distribution platforms (Steam and Meta Quest) without direct interaction with human participants. As such, no consent to participate was required.
Consent to publish
Not applicable. The dataset used in this study consists of publicly accessible game reviews, and no proprietary or confidential information was disclosed. The findings of this research are presented in aggregate form to ensure no individual user can be identified.
Competing interests
The author declares no competing interests. The study was carried out independently, and no conflicts of interest, financial or otherwise, influenced the research process or its findings.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Abdelqader, H. Using generative AI to uncover what drives player enjoyment. Multimed Tools Appl 85, 217 (2026). https://doi.org/10.1007/s11042-026-21207-8
Received:
Revised:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1007/s11042-026-21207-8



















