*Result*: External Correlates of Adult Digital Problem-Solving Process: An Empirical Analysis of PIAAC PSTRE Action Sequences
National Science Foundation (NSF), Division of Social and Economic Sciences (SES)
National Science Foundation (NSF), Division of Mathematical Sciences (DMS)
1826540
2119938
2310664
*Further Information*
*Computerized assessments and interactive simulation tasks are increasingly popular and afford the collection of process data, i.e., an examinee's sequence of actions (e.g., clickstreams, keystrokes) that arises from interactions with each task. Action sequence data contain rich information on the problem-solving process but are in a nonstandard, variable-length discrete sequence format. Two methods that directly extract features from the raw action sequences, namely multidimensional scaling and sequence-to-sequence autoencoders, produce multidimensional numerical features that summarize original sequence information. This study explores the utility of action sequence features in understanding how problem-solving behavior relates to cognitive proficiencies and demographic characteristics. This is empirically illustrated with the process data from the 2012 PIAAC PSTRE digital assessment. Regularized regression results showed that action sequence features are more predictive of examinees' demographic and cognitive characteristics compared to final outcomes. Partial least squares analysis further aided the identification of behavioral patterns systematically associated with demographic/cognitive characteristics.*
*As Provided*
External Correlates of Adult Digital Problem-Solving Process : An Empirical Analysis of PIAAC PSTRE Action Sequences / Original Article
<cn> <bold>By: Susu Zhang</bold>>
> <bold>Xueying Tang</bold>
>
> <bold>Qiwei He</bold>
>
> <bold>Jingchen Liu</bold>
>
> <bold>Zhiliang Ying</bold>
>
<bold>Acknowledgement: </bold>Funding: This work has been supported by the National Science Foundation (#SES-1826540, #SES-2119938 and DMS-2310664) and Institute of Education Sciences, U.S. Department of Education, through Grant IES R305A210344 to Georgetown University.
Assessment of examinee proficiency using computerized simulation tasks is gaining increasing relevance in both large-scale assessments, such as the Programme for the International Assessment of Adult Competencies (PIAAC; e.g., OECD, 2012) and the Programme for International Student Assessment (PISA; e.g., OECD, 2014) surveys, and in high-stakes testing, such as the US medical licensure exam (e.g., Dillon et al., 2004). Simulation tasks are typically interactive and resemble real-life situations, requiring examinees to demonstrate the ability or skills to perform tasks that are often complex. This also introduces new measurement opportunities for the collection of process data that arise from an examinee’s interaction with each task/item. Process data are commonly logged by the computer as a time-stamped sequence of actions, such as clickstreams and keystrokes, performed by an examinee in pursuit of solving an item. In carefully engineered simulation tasks, computer-logged action sequences, which explicitly document test-taking behavior, may reveal information about the examinee’s response process. This affords analysis of the test-taking process at a larger scale compared to traditional think-aloud cognitive interviews, which typically involve a smaller number of examinees concurrently or retrospectively describing how they arrived at their answers (e.g., Ericsson & Simon, 1998).
Process data, on top of final scores, offer a wealth of information about individual differences, test-taking engagement, and the steps examinees take to reach their final response. Studies have demonstrated the utility of process data for a multitude of practical tasks: To start, process data can provide additional information on the measured proficiency or skills, allowing better measurement via process-incorporated scoring rules (Zhang et al., 2023) and process-based measurement models, which typically associate continuous latent proficiency (Chen, 2020; Han et al., 2022; LaMar, 2018; Liu et al., 2018; Xiao & Liu, 2024) or discrete latent skill mastery (Zhan & Qiao, 2022; Liang et al., 2022) with examinees’ choices of correct/incorrect subsequent actions, observed action subsequences, or sequence length. Furthermore, analyses of behavioral characteristics associated with successful/unsuccessful final performance (e.g., Gao, Cui, et al., 2022; Gao, Zhai, et al., 2022; Greiff et al., 2015; He & von Davier, 2016; Qiao & Jiao, 2018; Qiao et al., 2023; Ulitzsch et al., 2021, 2023) can inform test validation and automated scoring. Exploratory analyses of action sequences or sequence-derived patterns, often with cluster analysis (Eichmann et al., 2020; He et al., 2019; Gao, Cui, et al., 2022; Gao, Zhai, et al., 2022; He, Borgonovi, & Suárez-Álvarez, 2023; Hao & Mislevy, 2019; Ulitzsch et al., 2022) or with topic modeling of actions or subsequences (Fang & Ying, 2020; Xu et al., 2018), have revealed different behavioral prototypes among the examinees as they face the same task, providing insights on how individuals navigate and approach computerized tests, digital platforms encountered in daily life, collaborative problems, etc. Hidden Markov and neural language models applied to action sequences have also been shown to reveal stages or subtasks for solving a problem (Wang et al., 2023; Xiao et al., 2021; Xu et al., 2020).
The current paper aims to provide an approach to exploratory sequence analysis to understand the relationship between problem-solving behavior and the test taker’s external characteristics (i.e., background variables), for example, cognitive constructs other than the measured trait, demographics, and educational or job-related outcomes. Relationships between background variables and problem-solving behavior have been documented in many prior studies; for instance, demographic variables such as gender, migration status, or socioeconomic status were found related to interaction style in the PISA 2012 complex problem-solving assessment (Eichmann et al., 2020) and navigation behavior in PISA 2018 multiple-source reading tasks (He, Borgonovi, & Suárez-Álvarez, 2023). When a pattern in the problem-solving process is found associated with a background variable, the type of insight gained differs depending on whether the specific sequential pattern is theorized to provide evidence about the measured proficiency, i.e.,
Two overarching pursuits in exploratory sequence analysis for the process-background relationship are (1) to quantify the strength of association between a background variable and problem-solving process and, where there is a substantial association, (2) to extract and interpret background-relevant sequential patterns. Despite its rich information, action sequence is in a nonstandard format: On a simulation task, each examinee’s observed process data come in a temporally ordered sequence of computer-logged events. This precludes the use of applicable exploratory techniques that require structured input data. Therefore, the first step of the proposed approach is to transform the process data into structured data, specifically numerical feature variables that are learned to maximally preserve original sequence information. The current study adopts two recent methods for data-driven feature extraction, namely multidimensional scaling (MDS; Tang et al., 2020) and sequence-to-sequence autoencoders (Seq2seq; Tang, Wang, et al., 2021). Both methods automatically extract numerical features from raw action sequences and do not require a priori feature engineering using domain knowledge or a term-document matrix. To quantify the strength of the association between a background variable and the problem-solving process on a simulation task, the second step builds a regression model for the background variable using the extracted sequence features, and the prediction accuracy on new samples helps quantify the amount of information on a background variable provided by the action sequences. Process features are high-dimensional and contain noise that is irrelevant to specific background variables of interest. We thus employ regularized regression to perform variable selection in the regression. Process features extracted in a data-driven manner are high-dimensional dense vectors and lack inherent interpretations. To facilitate the identification of specific sequential patterns that explain the process-background relationship, the third step employs partial least squares analysis, which identifies a few principal variables that maximally explain the covariance between sequence features and a specified background variable. This affords inspection of how sequential pattern changes as the principal variables vary from lowest to highest. We illustrate these steps via an empirical analysis of the Problem Solving in Technology-Rich Environments (PSTRE) assessment and background questionnaire data from the 2012 PIAAC survey, but our approach can generalize to the analysis of other simulation-based assessments that collect action sequences and background information.
The rest of the paper is organized as follows. The section Motivating Example introduces the 2012 PIAAC survey and the PSTRE assessment as well as the current research questions. The next section provides a review of the literature on sequence analysis applied to PIAAC PSTRE and introduces the proposed approach to exploratory sequence analysis of the process-background relationship. The Empirical Analysis Methods and Empirical Analysis Results sections present the methods and results of the empirical study. The Discussion section provides a discussion of the empirical findings, as well as their practical implications for assessment design and interventions.
Motivating Example
>
The PIAAC (e.g., Schleicher, 2008) is an international large-scale assessment carried out by the Organization for Economic Co-operation and Development (OECD) to assess the cognitive and workplace skills of working-age individuals worldwide. In its first cycle in 2012, working-age individuals (16–65 years) across 25 countries and regions were measured on literacy, numeracy, and PSTRE. In addition to the three cognitive assessments, participants were also administered a background questionnaire, which collected self-reported information on their education, social background, engagement in literacy, numeracy, and use of informational and communicative technology (ICT) at home and at work, educational background, language background, employment information, and others such as health status and political efficacy (Kirsch & Thorn, 2013).
The PSTRE assessment consisted of two test blocks, with 14 items in total. For each PSTRE item, the test environment resembled commonly seen ICT platforms, such as e-mail clients, web browsers, and spreadsheets. Examinees were prompted to complete specific tasks on these interactive platforms. Under the PIAAC framework, PSTRE is defined as the use of digital technology, communication tools, and the internet to obtain and evaluate information, communicate with others, and perform practical tasks (OECD, 2012). As actual PSTRE items are unreleased, Figure 1 presents an item from the OECD Education and Skills Online assessment, which illustrates the interface of the PSTRE items. An examinee works in the simulated web browser to complete the task described on the left: Five web pages (1st subfigure) are returned from a search of “Job search” and examinees are asked to bookmark all pages that do not require registration or fees. Clicking on each link will direct them to the corresponding website. For example, clicking the second link, “Work Links,” directs an examinee to the second subfigure, and further clicking on “Learn More” directs the examinee to the third subfigure. To finish and exit the item, they can click on the right arrow icon (“Next”) below the item instructions, and there will be a pop-out window with two options, namely confirming exit (“Next_OK”) or returning to the task (“Next_Cancel”). An action sequence on the task will consist of the clicks and keystrokes by the examinee on the simulated browser, with “Start” as the initial action and “Next_OK” at the end. For example, if an examinee clicked the second link (“Work links”) on the initial page, clicked “Learn More” on the next page, clicked the “Back” button on the toolbar twice to return to the home page, and clicked “Next” in the left panel and “OK” in the pop-up window, this examinee’s action sequence will be recorded as “Start, Click_W2, Click_Learn_More, Toolbar_Back, Toolbar_Back, Next, Next_OK.” Based on predefined scoring rubrics, the PSTRE assessment computed a final binary or polytomous score for each item, which was used to estimate individuals’ PSTRE proficiency.
>
><anchor name="fig1"></anchor>
The PSTRE assessment measured adults’ abilities to solve problems in personal, work, and civic contexts using digital environments to better understand how working-age adults utilize digital tools in practical problem-solving, thereby providing insights to policymakers and educators on fostering digital literacy and addressing skill gaps. Across several studies, it was found that PSTRE proficiency was associated with demographic characteristics as well as employment outcomes of the participants (He et al., 2019; Liao et al., 2019; Nwakasi et al., 2019). Although differing across countries that varied in policies, labor market structures, and social contexts, PSTRE proficiency was found to be positively associated with self-reported income in many countries or regions. Participation in adult education and training, which is expected to increase exposure to ICT tools, was consistently found associated with higher PSTRE proficiency. Furthermore, gender and age differences were found in PSTRE proficiency, with female and older adult participants receiving lower PSTRE scores in many countries and regions (He et al., 2021; Liao et al., 2019).
The high demand for digital literacy in most economic activities preordains the importance of PSTRE skills for the workforce. This calls for understanding of the sequential patterns in digital problem-solving that explain the differences in PSTRE scores observed for individuals with varying demographics, exposure to adult training and ICT tools, and employment outcome, as documented by the existing literature. The current study explores the relationship between how adults solve PSTRE problems, as reflected through the PSTRE action sequence data, and background variables. Six demographic variables were considered: This includes two demographic variables, namely age (in years) and gender (male or female), one employment outcome variable, namely log of country median-adjusted hourly income (log(Income)), and three variables related to education and ICT exposure, namely self-reported ICT skill use at home (ICTHome) and at work (ICTWork) and country median-adjusted total years of education (YRSEdu).<anchor name="b-fn1"></anchor><sups>1</sups> As some of the PSTRE tasks appeared to also involve numeracy skills (e.g., spreadsheets), performance (i.e., plausible values<anchor name="b-fn2"></anchor><sups>2</sups>) on numeracy was also included as an external cognitive variable. Specifically, the following research questions are addressed:
>
>
Analysis of PSTRE Action Sequence and Background-Process Relationship
>
The PIAAC PSTRE process data have been shown in multiple previous studies to provide valuable additional information on individuals’ problem-solving processes beyond the final scores. For example, on tasks involving spreadsheets, He and von Davier (2016) used
The relationship between the PSTRE problem-solving process and background variables has also been explored in several previous studies. Some examined demographic differences in globally defined problem-solving characteristics. For example, He et al. (2021) employed the longest common subsequence approach to examine how problem-solving efficiency and sequence similarity to a reference sequence (reflecting optimal problem-solving) on the seven items in the 2nd PSTRE block relate to demographic characteristics such as gender, age, and familiarity with digital platforms. A few studies also explored the demographic differences in task-specific behavior, for instance, key actions or short subsequences of actions (i.e.,
Action sequences come in a nonstandard format. As a toy example, below are three arbitrarily selected participants’ observed action sequences on item U06b, omitting “Start” and “Next, Next_OK” at the beginning and the end:
>
On the same item, the number of actions performed by each individual differed. An examinee’s observed sequence contains a list of temporally ordered categorical actions (e.g., “Click_W4”), with the number of possible actions ranging between 26 and 636 on the 14 items. A common approach in sequence analysis is to first transform the original variable-length, ordered, categorical sequences, which preclude most statistical methods, into rectangular data: In doing so, a set of numeric features are extracted to preserve original sequence information. There are many methods to fill this task. One approach is via a
The goal is to extract
Seq2seq feature extraction (Tang, Wang, et al., 2021), on the other hand, aims to preserve information that can be used for sequence reconstruction: The task is achieved by training on the
MDS and Seq2seq both perform sequence-based feature extraction in that the feature extraction is conducted to reconstruct original sequences or individual pairwise differences on the original sequences. Compared to term-document-matrix-based methods for sequence feature extraction, which often adopts matrix factorization (e.g., latent semantic analysis, non-negative matrix factorization; Deerwester et al., 1990; Lee & Seung, 1999) to find lower dimensional features that can reconstruct the term-document matrix, MDS and Seq2seq are expected to capture additional information beyond standalone frequencies of single actions or short subsequences, including ordering of actions and long-term effects. On the PSTRE data, feature extraction based on MDS and Seq2seq has documented performance in preserving original sequence information (e.g., Tang et al., 2020; Tang, Wang et al., 2021; Zhang et al., 2023), and software for both is available in the ProcData (Tang, Zhang, et al., 2021) R package. We chose MDS and Seq2Seq for feature extraction since the objective of both methods is to maximally preserve the original sequence information. This preservation allows for the relationships between background variables and sequential patterns to be retained as much as possible in the transformed feature space.
<h31 id="zfp-232-2-120-d131e544">Quantifying Sequence-Background Association</h31>For each examinee and item, the extracted MDS or Seq2seq features are dense
>
><anchor name="tbl1"></anchor>
When an outcome variable is well-predicted with the sequence features of an item, interpretations can be sought to uncover the specific sequential patterns associated with the variable. Although the sequence features are high-dimensional, one may conjecture that patterns relevant to the background may be represented with just a few principal dimensions. Partial least squares (PLS; Wold et al., 2002) decomposition can perform dimension reduction on the
The dimension for the PLS approximation,
as well as the standard error of RMSEP. The number of PLS components to retain,
Empirical Analysis Methods
> <h31 id="zfp-232-2-120-d131e752">Data and Instruments</h31>
The current study used the PSTRE item-level sequence data and background questionnaire data from 3,645 examinees in five countries or regions, including the United Kingdom (England and Northern Ireland), Ireland, Japan, the Netherlands, and the United States. These examinees were administered all 14 PSTRE items, presented as two blocks of seven items each.<anchor name="b-fn3"></anchor><sups>3</sups> Action sequence data were available for each participant at the item level. Table 2 presents a brief description of the 14 PSTRE items, including task names, the types of environments involved, percent of individuals who received full credit, and descriptive statistics of the action sequences. The background variables were available in the PIAAC background questionnaire data. Here, to reduce the nuisance introduced by country effects, we converted hourly income to US dollars based on 2012 exchange rates and adjusted income and YRSEdu by subtracting the country’s median on that variable. There was missingness for a few background variables, including log(Income), ICTHome, ICTWork, and YRSEdu. For predicting a particular variable, individuals with missing values were case-wise deleted. In particular, two outcome variables, ICTWork and log(Income), contained a substantial amount of missingness across all countries or regions (31%–45%). ICTHome and YRSEdu contained a small to moderate amount of missingness across countries or regions (0%–13%). The missingness proportions on these variables for each of the five countries and regions are reported in Table E1 in ESM 1, Appendix III.
>
><anchor name="tbl2"></anchor>
>
><anchor name="tbl3"></anchor>
In the current study,
To draw predictions about each background variable, for each item, three different types of predictors were considered, namely (1) the polytomous final score on the item, (2) the 100-dimensional principal sequence features extracted using the Seq2seq, and (3) the 100-dimensional principal features from MDS. For each item and background variable, 10 replications were carried out for each prediction model. In each replication, the data were randomly partitioned into three subsets, a training set (70%), a validation set (10%), and a test set (20%). The parameters of the GLM were estimated based on the training data, the optimal weight penalty for
For select items whose process features demonstrated especially high associations with a specific background variable, interpretations for the process-variable associations were further sought. PLS decomposition was applied to the 100-dimensional sequence features to find the top
Empirical Analysis Results
> <h31 id="zfp-232-2-120-d131e814">Quantifying Background-Sequence Relationship (RQ1)</h31>
For each item, the prediction accuracy of the continuous variables and gender are reported in Figures 2 and 3. In each subplot, the
>
><anchor name="fig2"></anchor>
>
><anchor name="fig3"></anchor>
Although all 14 items were designed to measure the same trait (PSTRE), there was noticeable heterogeneity across items in their strengths of associations with various background variables. Despite that features extracted from an item’s action sequences frequently showed a remarkable amount of prediction power on the background variables, the magnitudes of the prediction power differed vastly, both across background variables and across items. The sequence features in general showed higher associations with participants’ age and numeracy than with some of the employment outcome-related (i.e., income) and education/ICT exposure-related variables. For the same background variable, the strength of sequence-background association also showed great variability across items: Taking age as an example, the
From the results on the general prediction accuracy, it was observed that not only do sequence features provide additional information about the examinees, but the prediction powers of different items also differed widely. This calls for a closer look at action sequences on tasks with strong background associations. As the MDS features often demonstrated the highest prediction power, for each continuous variable, we performed PLS decomposition on a selection of items where the MDS features had the highest predictive power of the external variable. Depending on the item and the external variable, the number of PLS components extracted based on the RMSEP criterion differed. Table 4 presents a summary of the PLS components identified for each background variable and item, the Pearson correlation between the PLS component score with the external variable (
>
><anchor name="tbl4"></anchor>
<bold>Age</bold>
We performed PLS decomposition on four items whose MDS sequence features showed high associations with age, namely U01a, U01b, U16, and U19a. By inspecting the PLS components for age across items, we found the following trends:
>
<bold>Income</bold>
When it comes to the participants’ hourly income, the MDS features from U16, U19a, and U19b demonstrated higher prediction accuracy compared to the others. PLS decomposition was hence applied to these three items. The identified PLS components were uniformly found to reflect efficient strategies for problem-solving, specifically the use of keyboard shortcuts for copy/pasting (i.e., Ctrl + C, Ctrl + V, item U16) and the tendency to use “search” or “sort” in spreadsheets (items U19a, U19b). These patterns related to efficient problem-solving tended to contain information that is relevant to the assessed PSTRE proficiency but was not directly considered in final scoring. For instance, on item U16, regardless of whether the examinee used keyboard shortcuts for copy/pasting, full score was given if the examinee completed the task (sending the requested information via e-mail). However, among participants who received full score on the item, those who used copy/pasting had higher overall PSTRE proficiency (
<bold>ICT Use at Home and at Work</bold>
PLS decomposition was performed on two items showing larger MDS features-ICTHome association, namely U01b and U16. On item U01b, it was observed that individuals with more self-reported use of ICT tools at home were more likely to create e-mail folders, which was a key step to correctly solving U01b. The second PLS component on U01b was a country artifact: Participants from Japan generally had lower usage of ICT at home, and the PLS score was associated with Japanese keystrokes. On U16, higher ICTHome individuals were generally higher on PLS components for (1) more keyboard entry (PLS 1) and (2) more usage of keyboard shortcuts for copy/pasting (PLS 2). For self-reported use of ICTWork, we performed PLS analysis on one item, U01b, and one PLS component was found, which was related to the creation of new folders for e-mails.
<bold>Total Years of Formal Education</bold>
We report the PLS analysis results and interpretations of two items with relatively high prediction power on the examinees’ total years of education, namely U04a and U16. On U04a, which involved creating a spreadsheet based on the information described in an e-mail, a PLS component positively related to years of education was found. This component was associated with a higher number of correct numerical entries in the spreadsheet. On U16, two PLS components positively associated with years of education were found, the first related to cc’ing (either typing in the “cc” field or using “reply all”) other recipients when drafting the e-mail, and the second related to using keyboard shortcuts for copy/paste.
<bold>Numeracy</bold>
For numeracy, we focused on the interpretation of item U04a, which required examinees to synthesize the information from the e-mail into a spreadsheet table. Participants needed to enter both the column/row titles and the numerical entries. Three PLS components positively correlated with numeracy proficiency were found. The first PLS component was associated with the correct choice of spreadsheet row and column titles. The second PLS component was associated with the ratio between the number of correct numerical entries in the spreadsheet cells and the number of correct entries in the spreadsheet’s row/column titles. The third component was related to the ratio between the number of times the examinee alternated between the spreadsheet and e-mail environments and the number of times the examinee worked on the spreadsheet entries/titles. Note that the relationship between the third PLS component and numeracy proficiency was clearly nonmonotonic (see ESM 1, Appendix II Figure E10 for LOWESS plot). The score on PLS 3 was highest for individuals with a moderate level of numeracy proficiency. Those with higher numeracy proficiency might have higher working memory capacity, allowing them to fill in more spreadsheet entries before referring back to the e-mail for the information. Those with low numeracy proficiency could not synthesize the relevant information in the e-mail into a spreadsheet table, thus referring back to the e-mail less often.
Discussion
>
This paper introduces a sequence feature-based approach to evaluating and interpreting the relationship between action sequences on computerized simulation tasks and participants’ backgrounds. MDS and Seq2seq were adopted for extracting features from raw action sequences to preserve as much information as possible. Sequence feature-based regularized regression further quantifies the strength of association between a background variable and test-taking process on a task. The results on the prediction of different background variables showed that action sequence-derived features, especially those extracted from MDS, consistently showed a higher association with background variables compared to polytomous final scores on the PSTRE items. This suggests that the sequences of actions an individual performs on a simulation task contained unique information about a variety of background variables: In these cases, individuals with different backgrounds (e.g., age, income, ICT skills, year of education, numeracy basic skills) tended to demonstrate different problem-solving patterns on a task, but the associated behavior might not have been used as part of proficiency scoring. PLS analysis on the specific items’ MDS features further unveiled sequential patterns that differentiate participants on specific background variables. Many of the identified patterns, such as the tendency to arrive at a correct response without using “search” on a spreadsheet (associated with age) and the number of correct numerical entries on the table construction problem (associated with years of education and numeracy) required a holistic inspection of the full action sequence. This showcases the utility of sequence-based feature extraction methods in identifying background-related long-term and overall sequential patterns, which action- or short-subsequence-based methods may not give rise to.
<h31 id="zfp-232-2-120-d131e974">Implications</h31>Quantifying the association between action sequence-derived features and external variables offers a data-driven perspective on the evaluation, design, and scoring of simulation-based assessments, which has implications for both measurement and career counseling. To start, evaluating the prediction accuracy of a particular background variable based on extracted features presents a generic way to quantify the informativeness of the action sequence on a task for that variable: If a subset of items is to be selected for constructing a short test that can best differentiate examinees on the variable, priority may be given to items whose sequence data show a higher association. This is most relevant when the variable of interest is an external criterion variable that the test intends to predict, e.g., job performance, rather than demographic background variables such as age or gender. Potential applications to the workforce include the construction of simulation-based assessments for personnel selection (Tippins, 2015) optimizing predictive validity of job performance, as well as the prediction of readiness (in knowledge, skills, or abilities) for a job that fits an individual’s vocational interest, a predictor of job performance, satisfaction, and commitment (Nye et al., 2012). In a separate analysis, PSTRE action sequence features were found predictive of several knowledge, skills, and abilities required for the participants’ self-reported occupation (e.g., computers and electronics, reading comprehension, and judgment/decision making), when we linked the occupation category to O*NET expert ratings on knowledge, skill, and abilities required on that job (Fleisher & Tsacoumis, 2012). A brief graphical summary of the prediction results is provided in Figure E11 in ESM 1, Appendix III. This may see applications in predicting the gap between job-seekers’ current capabilities and their occupational interests, which can inform the choice of targeted interventions for skill development and career readiness interventions.
However, test development goes beyond maximizing predictive validity. In particular, if differences in sequential patterns are found associated with a background variable, a test developer may inspect the specific sequential patterns that contribute to the association to understand its measurement implications. This evaluation depends on whether the sequential pattern is construct-relevant, examining which requires both interpretations of the sequential pattern and gathering of additional evidence. In the current study, we attempted to identify these patterns by interpreting the PLS components, and we subsequently looked at the relationship between the identified sequential pattern and either final response, response on other PSTRE tasks, or overall PSTRE proficiency to examine (1) whether the pattern affected scoring and (2) whether it appeared construct-relevant.
For example, age was found in the current study to be related to examinee’s choice to either use a text-based drop-down menu or click a toolbar graphical icon for performing a step, e.g., creating a folder or sorting. This was found uncorrelated either with final item score or with overall PSTRE proficiency. Clicking an icon or navigating through the drop-down menu achieves the same goal, but the former was found to be less common among senior participants, which may indicate lower familiarity with toolbar icons for them. When such construct-irrelevant behaviors are correlated with external background variables, test developers need to minimize their impact on the examinees’ chance of successfully solving the question, as such nuisance can compromise test fairness by producing differential item functioning. Demographic differences in how a key step is approached signify the importance of universal design principles for assessment interface design (see Steinfeld & Maisel, 2012): Whereas the availability of both drop-down menu and toolbar icons allowed participants from different age groups to perform a key step despite potential differences in familiarity with one option, the observations from item U19a suggested the potential need to allow zooming or font size adjustments. When this option is unavailable, for two equally capable participants, both not knowing how to identify information from a spreadsheet using search or sort, one who is younger might be less visually burdened by scanning through an entire spreadsheet with small font sizes, thus having a higher chance of correct response.
On the other hand, the exploratory findings suggest that some action sequence patterns associated with income, education, and ICT exposure, such as the use of tools for efficient problem-solving, were associated with individual differences on PSTRE proficiency, i.e., were construct-relevant, despite that efficiency was not considered in applicable items’ final scoring. This finding concurred with those found from select PSTRE assessment items in prior studies (e.g., item U02 in Liao et al., 2019), where across tasks, individuals with higher income and years of education showed a tendency to utilize appropriate tools (e.g., keyboard shortcuts, spreadsheet searching/sorting) to facilitate efficient problem-solving, and when process data were used to improve PSTRE scoring precision (Zhang et al., 2023), the process-incorporated scoring algorithm picked up on such patterns. Scoring of open-ended questions is a nontrivial task and often requires specification of behavioral evidence that is indicative of the measured proficiency (Mislevy et al., 2003). Construct-relevant behavioral patterns can often contribute to more reliable scoring, as they provide additional proficiency information, but whether they should be incorporated into the scoring rule requires broader considerations, e.g., potential scoring consequences for different subgroups, which is partially addressed by examining how a pattern relates to test takers’ external characteristics.
<h31 id="zfp-232-2-120-d131e1008">Limitations and Future Directions</h31>Some limitations are worthwhile to merit discussions. First, all analyses were conducted on merely five out of 28 countries who participated in the PIAAC 2012 cycle, although they ranked at significantly different positions in PSTRE proficiency (OECD, 2016). These five countries are relatively high-income developed countries and areas, with small differences in sample proportion by PSTRE proficiency levels, which is identical as reported in previous studies by He et al. (2021). We thus caution against the generalizations of the empirical findings, both in terms of the sequence features’ prediction of various external variables and in terms of the interpretations found on sequential patterns to the general population of working-age adults worldwide. Second, the current analysis of adult digital problem-solving was only conducted on the PIAAC PSTRE assessment, which was specifically designed to assess problem-solving in personal, work, and civic contexts in three common digital platforms (web browser, spreadsheets, e-mail client) and was a low-stakes assessment. This could limit the generalizability of the findings to adult digital problem-solving behavior in general and to high-stakes situations. Third, while this study has shed light on various aspects of the relationship between background variables and sequential patterns, it is important to acknowledge its limitations with the PLS. The identification of sequential patterns through examination of original sequences ranked by PLS remains a speculative process; the variability in the PLS components could not always be fully accounted for by the identified patterns, suggesting that there may be other sequence characteristics that were not captured. In addition, the exploratory nature of our study, based on predictions and correlations, should be considered when interpreting the results – these findings should not be construed as establishing causal relationships or conclusions about interrelationships among the variables under investigation. Moreover, it is crucial to note that the relationships observed between scores or extracted features and demographic variables do not inherently validate the score or the process data. These associations are speculative in nature and should not be interpreted as confirmatory evidence for the validity of the measures used. Future research is needed to rigorously corroborate these preliminary interpretations.
Methodologically, there is also room for future development. To start, to afford a fair comparison across items, the dimension
Footnotes
<anchor name="fn1"></anchor><sups> 1 </sups> In the PIAAC background survey, scores reported on ICTHome and ICTWork were derived based on examinees’ responses to a series of corresponding survey items. Specifically, eight items in ICTHome (i.e., H_Q03a, H_Q03b, H_Q03c, H_Q03d, H_Q03e, H_Q03f, H_Q03g, and H_Q03h) and eight items in ICTWork (i.e., G_Q05a, G_Q05b, G_Q05c, G_Q05d, G_Q05e, G_Q05f, G_Q05g, and G_Q05h) were included. See details in Chapter 3 in PIAAC Tech Report (OECD, 2016).
<anchor name="fn2"></anchor><sups> 2 </sups> The PIAAC survey derived plausible values of individuals’ cognitive performance (e.g., numeracy performance) based on both the responses to cognitive assessments and background variables. The numeracy performance variable used in the current study is based on the mean of the numeracy plausible values recorded in the official PIAAC data.
<anchor name="fn3"></anchor><sups> 3 </sups> In the PIAAC computer-based assessment, examinees were routed to modules of PSTRE, literacy, or numeracy items. See details in Chapter 1 in PIAAC Tech Report (OECD, 2016). Each respondent was assigned two modules. The current study used participants who received two PSTRE modules.
References
<anchor name="c1"></anchor>Abele, S., & von Davier, M. (2019). CDMs in vocational education: Assessment and usage of diagnostic problem-solving strategies in car mechatronics. In M.von Davier & Y.Lee (Eds.)
AERA, APA, & NCME. (2014).
Bergner, Y., & von Davier, A. A. (2019). Process data in NAEP: Past, present, and future.
Chen, Y., (2020). A continuous-time dynamic choice measurement model for problem-solving process data.
Cleveland, W. S. (1981). LOWESS: A program for smoothing scatterplots by robust locally weighted regression.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis.
Dillon, G., Boulet, J., Hawkins, R., & Swanson, D. (2004). Simulations in the United States Medical Licensing Examination™ (USMLE™).
Eichmann, B., Greiff, S., Naumann, J., Brandhuber, L., & Goldhammer, F. (2020). Exploring behavioural patterns during complex problem-solving.
Ercikan, K., Guo, H., & He, Q. (2020). Use of response process data to inform group comparisons and fairness research.
Ericsson, K. A., & Simon, H. A. (1998). How to study thinking in everyday life: Contrasting think-aloud protocols with descriptions and explanations of thinking.
Fang, G., & Ying, Z. (2020). Latent theme dictionary model for finding co-occurrent patterns in process data.
Fleisher, M. S., & Tsacoumis, S. (2012).
Friedman, J., Hastie, T., & Tibshirani, R. (2009).
Gómez-Alonso, C., & Valls, A. (2008, October). A similarity measure for sequences of categorical data based on the ordering of common elements. In
Gao, Y., Cui, Y., Bulut, O., Zhai, X., & Chen, F. (2022). Examining adults' web navigation patterns in multi-layered hypertext environments.
Gao, Y., Zhai, X., Bulut, O., Cui, Y., & Sun, X. (2022). Examining humans' problem-solving styles in technology-rich environments using log file data.
Greiff, S., Wüstenberg, S., & Avvisati, F. (2015). Computer-generated log-file analyses as a window into students' minds? A showcase study based on the PISA 2012 assessment of problem solving.
Han, Y., Liu. H., & Ji, F. (2022). A sequential response model for analyzing process data on technology-based problem-solving tasks.
Hao, J., & Mislevy, R. J. (2019). Characterizing interactive communications in computer-supported collaborative problem-solving tasks: A conditional transition profile approach.
He, Q., Borgonovi, F., & Paccagnella, M. (2021). Leveraging process data to assess adults' problem-solving skills: Using sequence mining to identify behavioral patterns across digital tasks.
He, Q., Borgonovi, F., & Suárez-Álvarez, J. (2023). Clustering sequential navigation patterns inmultiple-sourcereading tasks with dynamic time warping method.
He, Q., Shi, Q., Tighe, E. (2023). Predicting problem-solving proficiency with hierarchical supervised models on response process.
He, Q., & von Davier, M. (2016). Analyzing process data from problem-solving items with N-grams. In Y.Rosen, S.Ferrara, & M.Mosharraf (Eds.),
He, Q., von Davier, M., & Han, Z. (2019). Exploring process data in problem-solving items in computer-based large-scale assessments: Case studies in PISA and PIAAC. In H.Jiao, R. W.Lissitz, & A.Van Wie (Eds.),
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components.
Kirsch, I., & Thorn, W. (2013).
LaMar, M. M. (2018). Markov decision process measurement model.
Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization.
Liang, K., Tu, D., & Cai, Y. (2023). Using process data to improve classification accuracy of cognitive diagnosis model.
Liao, D., He, Q., & Jiao, H. (2019). Mapping Background variables with sequential patterns in problem-solving environments: An investigation of United States adults' employment status in PIAAC.
Liu, H., Liu, Y., & Li, M. (2018). Analysis of process data of PISA 2012 computer-based problem solving: Application of the modified multilevel mixture IRT model.
Mislevy, R. J., Almond, R. G., & Lukas, J. F. (2003). A brief introduction to evidence-centered design.
Nwakasi, C. C., Cummins, P. A., Mehri, N., Zhang, J., & Yamashita, T. (2019).
Nye, C. D., Su, R., Rounds, J., & Drasgow, F. (2012). Vocational interests and performance.
OECD. (2012).
OECD. (2014).
OECD. (2016).
Qiao, X., & Jiao, H. (2018). Data mining techniques in analyzing process data: A didactic.
Qiao, X., Jiao, H., & He, Q. (2023). Multiple-group joint modeling of item responses, response times, and action counts with the Conway-Maxwell-Poisson distribution.
Schleicher, A. (2008). PIAAC: A new strategy for assessing adult competencies.
Steinfeld, E., & Maisel, J. (2012).
Tang, X., Wang, Z., He, Q., Liu, J., & Ying, Z. (2020). Latent feature extraction for process data via multidimensional scaling.
Tang, X., Wang, Z., Liu, J., & Ying, Z. (2021). An exploratory analysis of the latent structure of process data via action sequence autoencoders.
Tang, X., Zhang, S., Wang, Z., Liu, J., & Ying, Z. (2021). Procdata: An r package for process data analysis.
Tippins, N. T. (2015). Technology and assessment in selection.
Ulitzsch, E., He, Q., & Pohl, S. (2022). Using sequence mining techniques for understanding incorrect behavioral patterns on interactive tasks.
Ulitzsch, E., He, Q., Ulitzsch, V., Molter, H., Nichterlein, A., Niedermeier, R., & Pohl, S. (2021). Combining clickstream analyses and graph-modeled data clustering for identifying common response processes.
Ulitzsch, E., Ulitzsch, V., He, Q., & Lüdtke, O. (2023). A machine learning-based procedure for leveraging clickstream data to investigate early predictability of failure on interactive tasks.
Wang, Z., Tang, X., Liu, J., & Ying, Z. (2023). Subtask analysis of process data through a predictive model.
Wehrens, R., & Mevik, B. H. (2007). The PLS package: principal component and partial least squares regression in R.
Wold, S., Sjöström, M., & Eriksson, L. (2002). Partial least squares projections to latent structures (PLS) in chemistry. In
Xiao, Y., He, Q., Veldkamp, B., & Liu, H. (2021). Exploring latent states of problem-solving competence using hidden Markov model on process data.
Xiao, Y., & Liu. Y. (2024). A state response measurement model for problem-solving process data.
Xu, H., Fang, G., Chen, Y., Liu, J., & Ying, Z. (2018). Latent class analysis of recurrent events in problem-solving items.
Xu, H., Fang, G., & Ying, Z. (2020). A latent topic model with Markov transition for process data.
Zhan, P., & Qiao, X. (2022). Diagnostic classification analysis of problem-solving competence using process data: An item expansion method.
Zhang, S., Wang, Z., Qi, J., Liu, J., & Ying, Z. (2023). Accurate assessment via process data.
Zumbo, B. D., & Hubley, A. M. (2017).