Introduction

Increased availability of digital tools for handwriting has led to new questions on the impact of technology in the acquisition and assessment of handwriting in childhood (Graham, 2022; Guilbert et al., 2019; Karavanidou, 2017; Wollscheid et al., 2016). In particular, embodied approaches suggest that screen-based technologiesFootnote 1 may preserve the haptics of handwriting and reinstate the value of mark-making (Karavanidou, 2017; Kiefer & Velay, 2016; Mangen & Balsvik, 2016; Mangen & Velay, 2010). Considering handwriting assessment, screen-based technologies have the advantage of allowing to preserve handwritten texts, while also capturing dynamic and temporal characteristics of handwriting acts that allow producing them (Asselborn et al., 2018; Gerth et al., 2016a). Therefore, consistent research has been dedicated to screen-based assessments (SBAs)Footnote 2 of handwriting skills in childhood (Danna et al., 2023; Hammer et al., 2021).

In particular, SBAs have been used to assess both legibility and fluency of children’s handwriting. SBAs of legibility usually involve analytic assessments of a handwritten text (i.e., the handwritten product) and provide objective post-hoc measurements of individual grapho-motor parameters (GMPs).Footnote 3 This method is also used in traditional validated paper-based assessments (VPAs) of children’s grapho-motor skills (Rosenblum et al., 2003a; Sparaci et al., 2024). On the other hand, SBAs of fluency focus on the fine-motor movements performed while writing (i.e., the actual visuo-motor and proprioceptive processes enacted while handwriting) and involve objective online measurements of kinematic or dynamic parameters,Footnote 4 which can be captured only to a limited extent in VPA (Asselborn et al., 2018; Danna et al., 2023). In the present paper, we will focus on using SBAs to assess handwriting legibility in primary school children. Our main aim is to explore whether screen-based technologies and dedicated software solutions (both required for SBAs) may effectively measure individual GMPs as compared to traditional VPAs.

VPAs are gold-standard tools used to measure legibility, usually in primary school (Danna et al., 2023; Rosenblum et al., 2003a). They frequently rely on copying tasks in a specific handwriting style (print or cursive) and in different conditions (e.g., children may be asked to write slowly in their best handwriting or as quickly as possible under time constraints). VPAs provide clinicians with much needed measures of GMPs, allowing to compare individual performance to normative data (Rosenblum et al., 2003a; Sparaci et al., 2024). But they incur in multiple limitations due to the fact that they are mostly post-hoc evaluations of children’s handwritten texts. In particular, multiple studies have pointed out that VPAs show overreliance on subjective coder judgements (i.e., asking coders to make post-hoc inferences on handwriting processes based on handwritten products), extremely time-consuming scoring systems (i.e., requiring fine-grained measurements that have to be carried out by hand) and limited ecological or external validity (Provenzale et al., 2023; Rosenblum et al., 2003a; Sparaci et al., 2024; Sudsawad et al., 2001). Given rising numbers in teachers’ referrals and consistent increase in children with handwriting difficulties,Footnote 5 there has been a growing need for objective and fast assessments of handwriting, accompanying tailored support strategies (Lyon, 1996; Indira & Vijayan, 2015; Marquardt et al., 2016; MI–DGSIS 2022). Some attempts have been made at using SBAs in kindergarten and primary school (Accardo & Perrone, 2008; Chang & Yu, 2022; Dui et al., 2020; Mekyska et al., 2016; Pagliarini et al., 2015; Philip et al., 2023; Polsley et al., 2022 ; Rosenblum, et al., 2003b; Serpa-Andrade et al., 2021), but use of SBAs of legibility is still limited, often encountering multiple drawbacks.

First, there is the issue of available software. Most software for SBAs currently relies on automatic measurements of fluency parameters (i.e., handwriting processes) and aims at detecting children with handwriting difficulties for further referral and clinical evaluation (Accardo & Perrone, 2008; Asselborn et al., 2018; Asselborn et al., 2020; Chang & Yu, 2022; Dui et al., 2021; Kedar et al., 2021; Mekyska et al., 2016; Pagliarini et al., 2015; Philip et al., 2023; Polsley et al., 2022; Rosenblum et al., 2003b; Rosenblum et al., 2006; Rosenblum & Dror, 2016; Šafárová et al., 2021, Serpa-Andrade et al., 2021; Zvoncak et al., 2019). Some attempts have been made at using SBAs to measure some GMPs related to legibility (i.e., the handwritten product), but direct comparisons of children’s scores on individual GMPs with VPAs are still lacking (Asselborn et al., 2020; Gerth et al., 2016b; Simonnet et al., 2019). Therefore, while extremely useful, these software solutions often need to be followed by further VPAs, since they provide limited data on individual GMPs and lack normative data. Assessing difficulties in specific GMPs, is extremely relevant for educators and occupational therapists, who often use these parameters to: define a child’s personal profile of strengths and weaknesses, select remediation strategies and monitor exercise efficacy (Cramm & Egan, 2015; Feder & Majnemer, 2007). For example, consider letter alignment and letter size (two GMPs measuring if letters are written on the ruled line and in the appropriate size): these parameters are well known to teachers and occupational therapists, as they are commonly taught in primary schools (e.g., notebooks with different ruled lines are commonly used to teach letter alignment and sizing) and may be hard to tackle for children (Guilbert & Fernandez, 2024; Sparaci et al., 2024). For a teacher/occupational therapist, it is important to know if a child has a specific difficulty in letter alignment or sizing, rather than in other GMPs, because tailored exercises may be provided (e.g., facilitating notebooks with highlighted lines delimiting the handwriting space are available on the market) (Guilbert & Fernandez, 2024; Pellegrini & Dongilli, 2010). But software solutions for SBAs currently do not provide measurements of individual GMPs comparable to VPAs. Software solutions measuring GMPs have been implemented instead in systems that do not involve writing directly on a screen. For example, using dedicated software for feature extraction from handwritten images (Dimauro et al., 2020; Isa et al., 2019) or using graphic tabletsFootnote 6 often with an added paper sheet on top (Asselborn et al., 2018; Chang & Yu, 2022; Deschamps et al., 2021; Devillaine et al., 2021; Drey et al., 2022; Drotár & Dobeš, 2020; Falk et al., 2011; Gargot et al., 2020; Herstic et al., 2025). These studies are important, but they offer no data on using SBAs to measure individual GMPs, because they do not involve writing directly on a screen.

Only two research attempts have been made at implementing software measuring individual GMPs (rather than overall score) directly comparable to those measured in a VPA while writing on a screen with a stylus (Provenzale et al., 2022, 2023). Both studies (one on adults, the other on children) were aimed at assessing reliability of scoring systems (software vs. human) and asked participants to copy a phrase in cursive handwriting on a screen (Wacom Cintiq 16) in two writing conditions (i.e., using their best handwriting or writing as fast as possible). Texts were later analysed using two scoring systems: directly by the software, which scored screen-acquired texts, or by hand, relying on a human coder scoring paper print-outs of the screen-acquired texts. Comparable scoring methods, derived from multiple VPAs, were used and GMP scores were compared to assess reliability of the two scoring methods. The first study comprised ten adults and measured 8 GMPs, results showing good agreement between software- and human-based scoring on all GMPs, with the exception of letter joins, where the software detected more errors (Provenzale et al., 2022). The second study measured 9 GMPs in 10 primary school children (second- and third-graders), results confirming absence of significant differences between scoring systems for 6 GMPs (i.e., max amplitude of letter misalignment, max variation of medium letters, max variation of ascending/descending letters, letter height, space between words, margin alignment), while for other GMPs (i.e., joins, letter alignment, trace direction) software-based scoring detected significantly more errors (Provenzale et al., 2023). Interestingly, in this second study authors implemented a human–machine interaction approach: some GMPs (i.e., quantitative GMPs, such as: speed, fluctuations, letter dimension, space between words, margin alignment) were automatically scored by the software, while others (i.e., qualitative GMPs, such as: letter joints and direction of letter trace), were initially scored by the software, but the output was later checked by a human coder, who could add/modify the output. This human–machine interaction approach allowed, to some extent, system transparency, offering direct access to errors detected by the software on specific GMPs and providing evaluators with relevant information on children’s handwriting style (Provenzale et al., 2023). These studies, while including only a limited number of participants, show evidence of software solutions allowing to assess legibility also yielding data comparable to VPA (provided that scoring methods are comparable). More importantly, they suggest that for some GMPs SBA software may detect more errors than a human coder. But these studies provide no information on comparing SBAs to VPAs, because they focused on scoring methods and only involved texts written on screens.

This brings us to the second limitation to SBAs use: differences between writing on a screen and writing on paper. Using a stylus on a screen is a closer experience to handwriting compared to typing, but screens still differ from paper in terms of tactile, propriokinesthetic and even auditory feedback (Alamargot & Morin, 2015; Gerth, 2016b; Karavanidou, 2017; Mangen & Balsvik, 2016; Mangen & Velay, 2010; Mayer, 2020; Van der Weel et al., 2024). This is particularly true for cursive handwriting, which initially depends on memorizing letter forms and joins, but with practice becomes a “kinetic melody”, whose automaticity is strongly embedded in the haptics of the instrument with which it is played, so that small variations in surface resistance, visual and auditory feedback may alter it (Lurija, 1973; Mangen & Balsvik, 2016; Mangen & Velay, 2010). Therefore, it is appropriate to hypothesize that use of SBAs to measure legibility may affect children’s cursive handwriting and studies comparing screen vs. paper use seem to support this hypothesis. Alamargot and Morin (2015) measured fluency and showed that handwriting on a screen (Wacom Cintiq 21UX) in the preferred style with a plastic-tipped pen affects pen pauses in second-grade and pen movements in ninth-grade, suggesting a need for consistent motor adjustments in screen-based handwriting (Alamargot & Morin, 2015). Gerth et al. (2016b) measured fluency (e.g., speed, overall writing time) and legibility (e.g., overall legibility, letter shape) in 27 s-graders asking them to either write on a tablet screen (ThinkPad X61) using a digital pen or on a paper sheet placed on a screen (Intuos4 XL DTP) without controlling for writing style or conditions (i.e., children were allowed to write using both print and cursive at their preferred pace). In a quite paradoxical result when writing on a screen second-graders showed longer writing time (longer overall duration of the copying task calculated in milliseconds), but higher writing velocity (calculated as the proportion of millimetres of trace produced per seconds). This result was explained by legibility data, which highlighted differences in letter size: children produced larger letters in the screen condition, resulting in overall longer writing time (Gerth et al., 2016b). A similar effect is reported also by Alamargot and Morin (2015) showing that in their sample the distance travelled to form a letter was always longer when children were writing on a screen. Guilbert et al. (2019) showed that changes in proprioceptive and visual feedback impact handwriting speed, letter size and legibility in cursive handwriting, with reductions of visual and proprioceptive feedback leading to a greater effect in children than adults. These studies indicate that embodied approaches may be right in suggesting the need for more in-depth analyses of the impact of SBAs in childhood as these may lead to higher cognitive and motor costs in children rather than adults (Guilbert et al., 2019; Karavanidou, 2017; Mangen & Velay, 2010). In fact, overall handwriting on a screen seems to be harder for children, leading to motor adjustments that affect at least some aspects of legibility such letter size, while less is known on effects on other GMPs (Gerth et al., 2016b; Guilbert et al., 2019; Wollscheid et al., 2016). To date, no study compared GMP scores resulting from SBAs to those obtained in VPAs, using comparable methods, while controlling for writing style and conditions (see also Danna et al., 2023 for a comprehensive review).

Finally, there is the issue of tool familiarity and practice. With practice, both children and adults seem able to adapt just as well to writing on paper as to writing on a screen. Gerth et al., (2016b, p. 13–14) have shown that over the course of a task requiring to write a short phrase multiple times (i.e., 10 repetitions of the phrase “Sonne und Wellen”), handwriting fluency (measured as the number of inversions in velocity, NIV, between first and last repetitions) improved in both adults and second graders (i.e., both showing gradual decrease in NIV over the 10 repetitions). However, practice with screen-based technologies requires, to the least, that the latter are accessible to kids, enticing and equipped with appropriate learning environments. Current evidence suggests instead that use of screen-based technologies for handwriting is relatively low, in fact children rarely experience both direct and indirect use of screens for handwriting at home, while even teachers are often not overly familiar with appropriate use of this technology for handwriting (Bonneton-Botté et al., 2021; Couse & Chen, 2010; Gerth et al., 2016b; Graham, 2022; M��ller et al., 2015). This suggests that it is important to start investigating children’s familiarity and practice with screen-based tools when considering SBAs, especially considering that some GMPs may be affected by practice. For example, handwriting speed, which is often related to fluency (e.g., studies showing that fast handwriting is associated with fewer NIV), is affected by handwriting practice: multiple studies showing gradual increase in handwriting speed between first and fourth grade (Accardo et al., 2013; Gerth et al., 2016b; Graham & Weintraub, 1996; Graham et al., 1998; Loizzo et al., 2023; Tressoldi et al., 2019; Yekeler Gökmen et al., 2022). Overall, it seems important, in the future, to consider children’s familiarity and the opportunities that they have to practice use of screen-based technologies, as these may be associated to differences in performance on specific GMPs, such as handwriting speed.

Summing up, to evaluate effective use of SBAs of legibility in primary school children, in the present study we aimed to compare GMP scores obtained using SBAs to the ones resulting from VPAs in a sample of primary school children, using comparable scoring methods and controlling for writing style and conditions. To this aim, 9 GMP scores obtained in two assessment conditions (SBAs and VPAs) were scored by the same expert coder and compared, investigating both correlations and differences between scores. Based on previous research, we expected to find at least some correlations between GMP scores (Accardo & Perrone, 2008; Dui et al., 2020), while we also expected SBAs to detect a greater number of errors for some GMPs (e.g., joins, letter alignment, trace direction) (Provenzale et al., 2022, 2023). Furthermore, given documented impact of screen-use on child performance, we expected to find more errors on GMP scores related to letter sizing (Bonneton-Botté et al., 2020; Gerth et al., 2016b; Guilbert et al., 2019; Mayer et al., 2020; Wollscheid et al., 2016). No predictions could be made on the effect of SBAs on other GMPs as compared to VPAs, and this data was considered explorative. We also wished to provide some preliminary data on children’s level of familiarity and practice with screen-based technologies in general. Therefore, we added a dedicated questionnaire asking a sub-group of children within our sample which screen-based technologies they had at home (i.e., computer, smart-phone, tablet), how often they used them and if they had a tablet whether they used it with their fingers of with a stylus. Therefore, not limiting children’s experiences to specific handwriting tasks, but attempting to provide initial data on general tool familiarity and practice. Given that current literature indicates that children rarely use tablets for handwriting, we expected to find only few occurrences of stylus use. But we did expect some children to have some experience with using screen-based technologies at home (e.g., for other tasks not involving handwriting), which may have led them to achieve at least some practice with these tools (e.g., having tactile experiences of screens’ resistance). We then explored if any association emerged between frequency in use and handwriting speed in SBAs. Finally, we also explored children’s reactions to screen-based technologies, tentatively asking them whether they enjoyed using a screen and stylus. We consider this data on familiarity and practice as merely explorative and further studies needed to fully understand to what extent familiarity with screen-based tools may impact SBAs. However, we choose to report it in the hope of enriching future research on using SBAs of handwriting.

Methods

Participants

Forty-eight Italian primary school children were recruited to the present study as follows: 10 through word of mouth among colleagues, 38 in collaboration the public primary school Istituto Comprensivo Via Merope Rome, Italy. To guarantee some expertise in cursive handwriting, inclusion criteria were: being enrolled in the second semester of second grade or in third grade and actively using cursive in school.Footnote 7 While exclusion criteria were: not using cursive in school or not completing study sessions. Based on these criteria 8 children were excluded (7 for not using cursive, 1 for not completing study sessions) and the final sample included 40 children (3 s-graders, 37 third-graders), well balanced for gender and within population means for handedness (7.5% being left handed) (Perelle & Ehrman, 1994) (Table 1). Sample size is comparable to other studies on similar populations/skills (Gerth et al., 2016b; Guilbert & Fernandez, 2024; Sparaci et al., 2024). All children had normal or corrected-to-normal vision (8 wore glasses). Non-verbal cognitive level, visuo-motor skills and handwriting skills were also measured using: Raven’s Coloured Progressive Matrices (RCPM, Raven et al., 1990), Beery Visual Motor Integration Test (VMI), including the Visual Perception (VMI-V) and the Motor Coordination (VMI-M) subtests (Beery & Beery, 2004), and the Italian standardized version of the Brave Handwriting Kinder (BHK) test (Di Brina & Rossini, 2010). Performance in these tests was used to describe sample characteristics: all children showing non-verbal cognitive level ≥ 80, absence of visuo-motor coordination difficulties and no dysgraphia (see Table 1). Study procedures were approved by the CNR Ethics Committee (approval n. 0060644/2022) as well as by the Ethics Committee of the Università Campus Bio-Medico di Roma (approval n. PAR 73.21, Rome 28 Sept. 2021), parents signing an informed consent form before inclusion of their child in the study.

Table 1 Participant characteristics

Materials and procedures

Legibility of children’s cursive handwriting was assessed in our sample using a SBA and a VPA (Fig. 1). To control for handwriting conditions in both assessments children were asked first to copy a phrase in their best handwriting (Best condition) and then to copy it as fast as possible while maintaining legibility (Fast condition). Therefore, the final data sample included 160 texts (80 for SBAs, 80 for VPAs). Order of assessments was counterbalanced: 21 children performing the SBA before the VPA and 19 doing the opposite. A questionnaire was administered after the SBA to a sub-group of children within our sample to provide preliminary data on children’s familiarity and appreciation of screen-based technologies. While other standardized tests (i.e., RCPM, VMI, VMI-V, VMI-M, BHK) were administered to all children in our sample and test order was randomized within-participants. Children were evaluated individually in a quiet room at the child’s home, school or parents’ workplace at the Università Campus Bio-Medico di Roma and assessments were carried out in one day, allowing for pauses between tests to avoid fatigue. Participants sat at a table in good lighting conditions, with the screen/paper sheet placed vertically in front of them at approximately 30 cm from the eyes. At the beginning of each assessment children were explicitly encouraged to rest the wrist of the writing hand on the screen/paper and their non-dominant hand to the side, but while writing they were left free to choose their preferred posture to avoid interruptions/interferences (Fig. 1). Given that ruled paper has been shown to support handwriting legibility, we chose a VPA requiring use of ruled paper and this procedure was included in the SBA (Borean et al., 2012; Guilbert & Fernandez, 2024; Provenzale et al., 2023). Prior to writing tasks in SBA and VPA children were asked to choose the A4 ruled paper that they commonly used in school from four formatsFootnote 8 (these were shown on the screen in the SBA and physically presented in the VPA). Selected paper format was then used in the writing task: reproduced on the screen carefully respecting line spacing and proportionsFootnote 9 in the SBA or using ruled paper sheets in the VPA.

Fig. 1
figure 1

Procedure, tool and handwriting samples for the SBA and the VPA

Screen-based assessments (SBAs)

An interactive display (Wacom Cintiq 16 Full HD with 1920 × 1080 pixel resolution) and its stylus (Wacom Pro Pen 2) were used for SBAs (see Fig. 1). This portable technology has a screen size (16 inches) allowing to reproduce the exact size and proportions of a ruled A4 paper sheet (when placed vertically). Stylus digital trace was set at 2 pixels to mirror the trace pens used in the VPA (see below). Furthermore, the screen has a mate finishing reducing reflections and providing friction similar to paper. A laptop was connected to the interactive display for stimuli presentation and data acquisition. SBAs used Eye and Pen Software (Alamargot et al., 2006; Chesnet & Alamargot, 2005) and dedicated research software developed using MATLAB R2021a App Designer for offline extraction of 9 GMPs (see Provenzale et al., 2023 for details on software characteristics).

SBAs were preceded by a short practice phase during which children wrote their name, made some drawings and copied geometrical shapes on the screen. Then children listened to a short story of an elf that liked to receive and collect handwritten letters (the elf story was recorded while all subsequent prompts were written on the screen and read out loud by the experimenter). As the story ended, children selected the paper format (see above) and were given the following instructions for the Best condition: “Now you will see a phrase on the topmost part of the screen and you will have to copy it on the paper below in cursive handwriting. You will have to write well, in an orderly fashion and in your best handwriting. Do not rush. The important thing is that you write as best as you can. If, while writing, you make a mistake erase your letters by striking them out with a line. When you have finished use the mailbox below to send your letter to the elf”. The selected ruled paper was then shown full screen and children read and copied in cursive handwriting a typed sentence appearing on the top of the screen, containing all letters of the Italian alphabet (i.e., “L’elefante vide benissimo quel topo che rubava qualche pezzo di formaggio”, literally “The elephant saw very well that mouse who was stealing some piece of cheese”). The same procedure was used immediately afterwards for the Fast condition, but with the following instructions: “In the next page you will do a race. Try to write the sentence as fast as you can. This time do not worry if your handwriting isn’t as nice as before. The important thing is that what you write is legible. If, while writing, you make a mistake erase your letters by striking them out with a line. When you have finished use the mailbox below to send your letter to the elf”. Throughout SBAs children were allowed to pace themselves by advancing from one screen to the next and when both conditions were completed, they received a thank you note from the elf. The entire SBA was built in accordance with procedures available from a VPA (i.e., the Italian validated Test per la Valutazione delle Difficoltà Grafo-Motorie e Posturali della Scrittura–DGM-P; Borean et al., 2012): using the same phrase, paper formats, handwriting style, conditions and instructions, with only minor adaptations (i.e., the elf story) to maintain child interest in an otherwise more passive task.Footnote 10

Scoring was conducted off-line using research software to extract 9 GMPs relying on human–machine interaction (Provenzale et al., 2023) and comparable scoring rules to the ones used for the VPAs (Table 2). The software initially asked an expert human coder (second author) to segment handwritten texts, by marking the beginning and end of each letter with simple mouse clicks (initial letter sequencing) (see Fig. 2 panel A). Then the software automatically provided output scores for all quantitative GMPs (GMPs 1,3,7,8,9) requiring faster and objective measurements of time or space. These were reported as either a proportion (GMP 1), number of errors (GMP 3) distance in mm (GMPs 7,8,9) (Table 2). For qualitative GMPs, reported as number of errors (GMPs 2,4,5,6), the software provided instead dedicated text visualizations supporting human–machine interaction and scoring (Table 2; Fig. 2 panels B, C, D). Notwithstanding time initially required for letter segmentation and labelling, the software allows to reduce coding time (i.e., previous comparisons of coding time between human-based coding and software GMP extraction showed that the software allowed saving on average 17 min of coding time for each participant in coding each condition, saving a total of 34 min of coding time for each participant) (Provenzale et al., 2023). To guarantee consistency between SBAs and VPAs, all letter sequencing and human–machine interaction scoring for SBAs were conducted by the same expert coder (second author) who scored all VPAs.

Table 2 Full list of GMPs and of comparable methods used in scoring grapho-motor parameters (GMPs) derived from the screen-based assessment (SBA) and the validated paper-based assessment (VPA)
Fig. 2
figure 2

Sample of images provided by the software for GMPs requiring human–machine interaction: A initial letter sequencing for the word “elefante” (elephant), small dots between letters are placed by the coder (second author) to parse out individual letters. B Enlarged image of the letter “a” in the word “elefante” (elephant). This image allows to code GMP 2 by showing differentially coloured letter traces according to tracing order and arrows for trace direction, and allowing coder to select incorrect trace order by checking the appropriate box below the image and accept/change software evaluation of trace direction as correct/incorrect. It is also used in coding GMP 4 as it shows continuous/discontinuous trace within the letter and allows coder to appropriately indicate presence of open/overlapping/separate trace or eyelets by checking the appropriate boxes below the image. Finally, it is used to code GMP 5 allowing the coder to indicate presence of an ambiguous letter by checking the appropriate box. C Enlarged image of the entire phrase showing presence of interrupted/overlapping joins between letters. D Enlarged image of individual words used by coder to approve/decline errors pointed out by the software in image C by checking the appropriate accept/decline button

Validated paper-based assessments (VPAs)

VPAs were carried out following materials and procedures from the DGM-P test (Borean et al., 2012). Children were initially asked to select the paper format (see above) and the chosen sheet was attached to a plastic paper holder to control for surface resistance (Fig. 1). Children were given a black Bic Cristal ballpoint pen to use for handwriting while an experimenter (first, second or third authors) used a stopwatch to measure handwriting time. Children were then asked to read a phrase containing all letters of the Italian alphabet shown on a printed card (same phrase as in SBAs) and copy it in cursive handwriting on the selected paper sheet (Fig. 1). Instructions mirrored the ones described above for SBAs in both writing conditions.

All children’s texts were manually scored by the same expert coder (second author) following procedures derived from the DGM-P test manual (Borean et al., 2012) and comparable to the ones used to score SBAs (see Sparaci et al., 2024 for similar scoring procedures). Coding of handwriting speed (i.e., GMP 1) is based on actual handwriting execution time for each child (measured with the aid of a stopwatch), but all other scoring procedures for the VPAs are based on a post-hoc evaluation of children��s handwritten texts. For some GMPs (i.e., 3,7,8,9) this requires taking exact measures using transparent graph paper provided in test materials, while for other GMPs (i.e., 2,4,5,6) the coder needs to observe the handwritten text very accurately and make the necessary evaluations and inferences (the latter resulting in a time consuming assessment) (see Table 2 for detailed scoring methods). Coder reliability was evaluated by having a second expert coder (first author) code 22.5% of VPAs. The VPA provided legibility scores for 9 GMPsFootnote 11 measured as either a proportion (GMP 1), number of errors (GMPs 2,3,4,5,6) or distance in mm (GMP 7,8,9) (Table 2).

Questionnaire

A short questionnaire containing 11 questions was used to provide explorative data on children’s familiarity and practice with screen-based technologies as well as their appreciation of these tools. For each question children were instructed to select one viable answer (see Table 3). Children filled in the questionnaire using a pen while an experimenter sat next to them offering guidance and/or clarifications if needed. Questionnaires were introduced based on researchers’ observations (first and second authors) during initial data collection (e.g., some children upon seeing the SBA commented that they had never used this technology, others that they had tablets at home). Therefore, questionnaires were available and administered only to a subsample of 30 children (Familiarity sample in Table 1). Following questionnaire results, children were subdivided, based on their answers to questions 1 and 4, in two samples: 13 children that had a tablet at home and used it to some extent (i.e., they answered “sometimes”, “often” or “always” to question 4 assessing frequency of tablet use) were considered as having comparatively higher familiarity or practice with this tools (HF sample, Table 1), while 17 children that did not have a tablet at home or did not use it frequently (i.e., they did not have a tablet or answered “never” or “rarely” to question 4) were considered as having a comparatively lower familiarity with this tool (LF sample, see Table 1).

Table 3 Results from the questionnaire

Data analyses

Inter-coder agreement between first (second author) and second coder (first author) on individual GMP scores from VPAs was: 83.3% for speed (Cohen’s kappa = 0.822, 95% CI: 0.641–1.000), 83.3% for letter forming (Cohen’s kappa = 0.740, 95% CI: 0.489–0.992), 66.7% for letter alignment (Cohen’s kappa = 0.622, 95% CI: 0.385–0.860), 27.0% for letter distortions/interrupted overlapping joins (Cohen’s kappa = 0.228, 95% CI: 0.032–0.424),Footnote 12 77.8% for ambiguous letters (Cohen’s kappa = 0.619), 88.9% for unrecognizable letters (Cohen’s kappa = 0.684; 95% CI: 0.293–1.000), 72.2% for max amplitude of letter misalignment (Cohen’s kappa = 0.660; 95% CI: 0.411–0.909), 72.2% for max variation in size of medium letters (Cohen’s kappa = 0.647, 95% CI: 0.387–0.907) and 55.6% for max variation in size of ascending/descending letters (Cohen’s kappa = 0.495; 95% CI: 0.231–0.759). Consistency between scoring of SBAs and VPAs for letter sequencing as well as for GMPs requiring human–machine interaction (GMPs 2,4,5,6), was guaranteed by having the same expert coder (second author) score both data sets (160 child texts). While consistency between machine and human coding for all other GMPs automatically coded in the SBA (i.e., GMPs 1,3,7,8,9) was documented in a. previous study (see Provenzale et al., 2023).

Normality distribution of GMP scores was tested using Shapiro–Wilk test and for some GMPs null hypothesis was rejected. Therefore, Spearman’s rank correlation was used to verify presence/absence of significant correlations between GMP scores from SBAs and VPAs. While Wilcoxon signed-rank test was used to compare GMP scores from the two assessment conditions (SBA and VPA). Questionnaire answers were calculated as percentages and reported in Table 3. For the Familiarity sample Shapiro–Wilk test and Levene test showed normality distribution and homogeneity of variance, in handwriting speed (GMP 1) measured in SBAs (for both conditions). Therefore, to explore presence/absence of an association between comparatively higher practice of tablet use at home on handwriting speed as measured by SBAs, two separate one-way ANOVAs, one for each condition (Best and Fast), were performed, comparing HF sample and LF sample performance.

Results

Legibility assessments

Mean scores on 9 GMPs measured using SBAs and VPAs in the two writing conditions (Best and Fast) are shown in Table 4. Spearman’s rank correlation between GMP scores from SBAs and VPAs showed moderate positive correlations for speed (r(38) = .55, p = .000), letter forming (r(38) = .42, p = .007), letter alignment (r(38) = .55, p = .000), ambiguous letters (r(38) = .64, p = .000), max variation in size of medium letters (r(38) = .49, p = .002) and max variation in size of ascending/descending letters (r(38) = .48, p = .002) in the Best condition (see Table 4). While in the Fast condition moderate positive correlations emerged for speed (r(38) = .71, p = .000), letter forming (r(38) = .41, p = .007), letter alignment (r(38) = .48, p = .002), ambiguous letters (r(38) = .54, p = .000) and max variation in size of medium letters (r(38) = .56, p = .000); and weak positive correlation was present for max amplitude of letter misalignment (r(38) = .33, p = .041) (Table 4). Wilcoxon signed-rank tests evaluating presence of significant differences in the 9 GMP scores obtained from SBAs and VPAs, showed significant differences in the Best condition for: letter forming (Z = − 4.789, p = .000), letter alignment (Z = − 4.037, p = .000), letter distortions, interrupted/overlapping joins (Z = − 5.513, p = .000), ambiguous letters (Z = − 2.542, p = .011), unrecognizable letters (Z = − 2.632, p = .008), max amplitude of letter misalignment (Z = − 4.181, p = .000), max variation in size of medium letters (Z = − 4.154, p = .000) and max variation in size of ascending/descending letters (Z = − 4.234, p = .000) (Fig. 3A–C). While in the Fast condition significant differences between GMP scores emerged for speed (Z = − 3.992, p = .000), letter forming (Z = − 4.962, p = .000), letter alignment (Z = − 4.064, p = .000), letter distortions, interrupted/overlapping joins (Z = − 5.516, p = .000), ambiguous letters (Z = − 3.368, p = .001), max amplitude of letter misalignment (Z = − 3.744, p = .000), max variation in size of medium letters (Z = − 4.543, p = .000) and max variation in size of ascending/descending letters (Z = − 3.804, p = .000) (Fig. 3A–C).

Table 4 Mean scores on grapho-motor parameters (GMPs) and Spearman’s rank correlations between mean GMP scores, in the Best and the Fast condition for the screen-based assessment (SBA) and the validated paper-based assessment (VPA)
Fig. 3
figure 3

Wilcoxon signed-rank tests evaluating differences between GMP scores from SBAs and VPAs reported as proportion of letters per seconds (A), as mean millimetres (B) and as number of errorrs (C) in both conditions (Best and Fast). Numbers refer to GMPs as listed in Tables 2 and 4. Significant differences are indicated by asterisks (**p < .00; ***p < .000)

Questionnaire assessment

Answers to questionnaire are reported as percentages in Table 3 and discussed below. One-way ANOVAs assessing differences between HF and LF samples on handwriting speed in SBAs showed a significant difference in the Fast condition (F (1,28) = 4.442, p = .044), but not in the Best condition (F(1,28) = 2.442, p = .129) (Fig. 4).

Fig. 4
figure 4

One-way ANOVAs evaluating differences between LF and HF samples on handwriting speed as measured in both conditions (Best and Fast) using SBAs. Significant difference is indicated by asterisk (*p < .05)

Discussion

This study investigated effectiveness of screen-based assessments (SBAs) of handwriting legibility in primary school children as compared to traditional validated paper-based assessments (VPAs). In particular, children scores on individual grapho-motor parameters (GMPs) assessing legibility in cursive handwriting obtained from SBAs were compared to comparable scores from VPAs, in two writing conditions: writing slowly in the best handwriting (Best condition) or as fast as possible while maintaining legibility (Fast condition). We also explored, in a sub-group of children, whether more familiarity and practice with screen-based tools would be associated to better handwriting speed in SBAs in both conditions.

Before each assessment (SBA and VPA) children were allowed to choose freely among four paper formats (see above) the format that they were more accustomed to. With the exception of one child, all children proved consistent in their choices and were able to recognize and choose in both conditions the paper format that they commonly used in class (i.e., second-graders consistently chose second grade lines, while third-graders consistently chose third grade lines in both conditions) (see Fig. 1 for an example of third grade lines use). The one exception was a third-grade child that chose third grade lines in the VPA and, comparatively easier, second grade lines in the SBA. However, given that choosing a comparatively easier paper format in the SBA did not prove to affect overall performance in this assessment (i.e., the child’s scores where within group means), we chose to include this child in the final sample as an occurrence of a viable behaviour also to avoid reducing sample numerosity. Such behaviour may arise because children are not always able to recognize their preferred paper format when it is presented on a screen and may instead opt for an easier layout in a less familiar task such as handwriting on a tablet. This interpretation is consistent with studies suggesting that digital media provide fewer material anchors—that is, reduced tactile and spatial cues supporting perception and memory—compared to paper, leading to altered perceptual affordances when interacting with on-screen materials (Schilhab et al., 2018).

Legibility assessments

Our first step was to assess presence of significant correlations between 9 GMP scores from SBAs and VPAs in both conditions (Best and Fast). Results showed significant correlations between multiple GMPs in both conditions (significant correlations were present for six GMPs in the Best and Fast conditions), while only two parameters (GMPs 4 and 6) showed no correlation in either condition (Table 4). For letter distortions, interrupted/overlapping joins (GMP 4), we cannot rule out that this result may be due to the software used in scoring SBAs. In fact, previous studies comparing scoring methods (software vs. human) show that similar software solutions are able to detect comparatively more errors for letter joins and trace direction (GMP 4) (Provenzale et al., 2022, 2023). This is understandable, given that during scoring the software provides enlarged images of letters (Fig. 2B), which make this error type comparatively more visible and easier to detect. Therefore, lack of correlation for this GMP, may, at least in part, be ascribed to SBA’s scoring. We also wish to underscore that VPA of GMP 4 led to the lowest inter-rater agreement scoresFootnote 13 (see above), supporting on one side the hypothesis that human coders may experience significant difficulties in scoring this GMP, but also suggesting that caution must be exercised interpreting this result, which certainly requires further research to be better understood. As for unrecognizable letters (GMP 6), absence of a significant correlation may due to low error rates on this GMP in our sample (see Table 4 and Fig. 3C).Footnote 14 A viable cause may be that unrecognizable letters are more easily found in younger children (first graders) that are in the process of learning letter shapes or in older children (fifth graders) that, by developing a personal style, may change letter shapes making them less recognizable (Hamstra-Blez & Blote, 1990). Overall, presence of significant correlations in multiple GMPs may be considered a promising result, suggesting that, with some exceptions, SBAs may be able to outline similar patterns of strengths and weaknesses in GMPs. For example, previous studies suggest that some GMPs can be harder to tackle than others for children (Hamstra-Blez & Blote, 1990). For example, data on DGM-P test scores show that letter alignment and max variation in size of ascending/descending letters (GMPs 3 and 9) result in higher error rates compared to ambiguous letter and max variation in size of medium letters (GMPs 5 and 8) (Sparaci et al., 2024). A similar pattern was found in our VPAs, but more importantly also in the SBAs (Table 4), supporting use of the latter as a viable resource to measure some GMPs. It is also worth noting that all significant correlations between GMP scores obtained from SBAs and VPAs were positive, although their magnitude ranged from weak to moderate. This pattern indicates that, despite systematic differences in absolute scores between the two assessment methods, children who performed relatively better or worse on a given GMP in the paper-based assessment tended to show a comparable relative performance in the screen-based assessment. In this sense, the observed correlations support association of individual GMP scores across assessment modalities. The presence of weak-to-moderate correlations is not unexpected, given that SBAs and VPAs differ in tools used (screen vs. paper) as well as in the scoring procedures (software-assisted vs. fully human coding).

To better understand differences between SBAs and VPAs, our next step was to confront scores obtained on the 9 GMPs in both assessments. Results showed that, with minor exceptions,Footnote 15 SBAs always resulted in higher error rates (Fig. 3). For some GMPs (GMPs 2,3,4), based on previous studies, we can hypothesize that differences may be ascribed to the SBA scoring system (as stated above). But for the other parameters (GMPs 1,5,6,7,8,9) this explanation is less viable, given that previous studies comparing scoring systems (software vs. human) detected no differences (Provenzale et al., 2022, 2023). Consequently, for speed (GMP 1), letter shape (GMPs 5 and 6), letter alignment (GMP 7) and variations in letter size (GMPs 8 and 9), a viable alternative is to hypothesize that higher error rates in SBAs may be due screen-based handwriting. This hypothesis is partially supported by previous studies suggesting that writing on a screen with a stylus puts higher demands on motor control, leading to motor adjustments, lower relative speed and larger or longer letter traces (Alamagot & Morin, 2015; Gerth et al., 2016b; Guilbert et al., 2019). In particular, it is interesting to note that children in our sample showed more variation in letter size when writing on a screen in both conditions (Fig. 3B), a phenomenon paralleled by lower handwriting speed in both conditions (Fig. 3A). Differently from other studies which did not control for handwriting conditions (Gerth et al., 2016b), our data also showed difference in handwriting speed was significant only in the Fast condition. Possibly this happened because when children were explicitly required to write in their best handwriting, they tended to slow down and be more self-conscious of the handwriting quality even on paper, resulting in a comparatively similar speed. However, these are mostly working hypotheses and further studies will be needed to fully disentangle effects on GMP scored that are ascribable to the scoring system (software vs. human) and those that are due to the tools used (screen and stylus vs. paper and pen). Overall, it is important to note that screen-based handwriting seems to have an effect on children’s handwriting as measured by individual GMPs which extends beyond speed and letter size. Importantly, the coexistence of significant correlations and systematic differences suggests that SBAs may capture similar underlying grapho-motor constructs as VPAs, while remaining more sensitive to certain error types, leading to higher error rates. Taken together, results from correlations and confront of GMP scores may be interpreted as evidence of convergent validity at the level of individual performance patterns, but also lack of interchangeability between SBA and VPA scores. These new data suggest that SBAs of legibility in primary school while promising, should not be taken lightly. In particular, for this new technology to be efficiently used, as government agencies and researchers increasingly suggest (Danna et al., 2023; Istituto Superiore di Sanità, ISS, 2022; Philip et al., 2023), new normative data are needed. In other words, if we want to exploit potential benefits of these new tools, we need to invest further time and resources in providing new normative data, given that population means available from VPAs will not apply.

Questionnaire assessment

Questionnaire data showed that while smart phones are commonly present in children’s homes, computers and tablets are not, and that even when they are present, they are less used. In particular, it is important to acknowledge, that out of the 30 children that were administered the questionnaire, 23 had a tablet at home, but only 13 stated that they used it (i.e., sometimes/often/always, see Table 3). Some children spontaneously volunteered explanations, saying that: the tablet belonged to an adult (father or sister) and/or that they were not allowed access to it because it was considered fragile and/or costly (one child described breaking by accident a tablet screen at home, after which he was no longer allowed to use it). Smart phones were not only more present, but also more accessible, all children stating presence of at least one smart phone at home (often more than one) and only 7 children declaring that they never/rarely used them (these children explicitly explaining that caregivers limited their smart phone use) (Table 3). We did not assess what children actually used their tablets for, as previous studies suggest that tablets and smart phones are mostly used by children for internet browsing and watching videos online (Radesky et al., 2020). But we did ask whether children that had a tablet were familiar with a stylus, setting aside the type of activity that this may be used for (e.g., writing, drawing). As expected, we found that stylus use was extremely rare (only 3 children in our sample) (Table 3). This is in line with previous studies suggesting that even if tablets are present in the home, they are rarely used for handwriting and/or drawing, but are rather employed to watch videos or play games (Couse & Chen, 2010). These results, even if on a limited sample, suggest that the introduction of SBAs will require an evaluation of tool familiarity and practice, possibly developing appropriate training strategies.

To explore impact of familiarity and practice with tablets on a specific GMP (i.e., handwriting speed) as measured in the SBA, children completing the questionnaire were subdivided in two samples (i.e., with comparatively higher or lower tool practice, se HF and LF samples in Table 1) and their performance was compared. We expected children in the HF sample, which had comparatively more occasions to experience and practice tool characteristics, to produce more letters per seconds and this prediction was confirmed in both conditions (Fig. 4). This result is in line with previous studies showing that practice with screen-based technologies may lead to more fluent handwriting (i.e., lower NIV) and higher handwriting speed (Gerth et al., 2016b). This preliminary result also suggests that even minimal familiarity (i.e., we included in the HF group even 4 children that responded ‘sometimes’ to the question on frequency of tablet use at home), may mitigate challenges posed by these novel tools, especially when they are asked to write fast (as shown by the significant difference in the Fast condition). However, these data are purely exploratory and further studies are needed to fully understand the impact of tool familiarity on SBAs of handwriting skills in childhood possibly considering larger samples and/or more fine-grained questionnaires that did not rely on children’s self-reported perceptions to avoid risk of subjective bias. Furthermore, we cannot rule out that the higher handwriting speed found in the HF sample may have been due to other variables (e.g., children in this group may have been from educationally supportive home environments that allowed tablet use as well as other learning activities). In reporting this data, we are therefore attempting to point out the relevance of tool familiarity for SBAs, rather than proposing conclusive data on this issue.

Finally, our preliminary exploration of children’s appreciation of screen and stylus use showed that approximately 80% of children in the Familiarity sample found using this technology more fun than using regular pen and paper. This was not surprising given the novelty effect and considering data from previous research suggesting appreciation of this technology in some student samples (Hammer et al., 2021). But it is interesting to note that on average 43% of these children also declared that using the screen and stylus made handwriting more difficult (Table 3), suggesting that children perceived difficulties documented by higher error rates reported above. Finally, while 70% of these children would be willing to use screen-based technologies in school only 50% would readily use them for homework, this may be due to a general lack of interest/appreciation of homework or because children perceived the new tool to be more difficult and therefore better suited for contexts where they could count on adult assistance (i.e., relying on teachers’ help in class). However, we think that this explorative data highlights importance of considering children’s perception and willingness to use screen-based technologies in future studies, as well as their familiarity with these new tools to better understand performance outcomes.

Limitations

The present study presents multiple limitations. First, given that SBAs and VPAs differ in scoring systems (software vs. human) as well as tools used (screen and stylus vs. pen and paper) our data, while detecting differences in GMP scores, does not allow to fully disentangle whether these differences are due one or the other. Given that previous studies directly compare only scoring systems (software vs. human) (Provenzale et al., 2022, 2023), future studies may consider direct comparisons of tools (screen vs. paper) to better understand this point. Second, some GMPs led to low or moderate inter-rater agreement in the VPA coding (i.e., GMP 4 and 9), suggesting difficulties in achieving inter-coder agreement which have already been documented in the literature (Borean et al., 2012), but should be further explored. Third, while sample size was comparable to other studies, future research may benefit from considering larger samples possibly allowing to use of SBAs at different ages/school grades, this would also allow a better understanding of the impact of different paper formats on children’s performance. Fourth, this study only addressed handwriting legibility parameters, not analyzing fluency parameters as provided by SBAs which will be the object of a future study. Finally, the selected task, replicating procedures of the DGM-P test, only asked children to copy a phrase, so we are unable to calculate the effects of SBAs when children are confronted with the production of longer texts.

Notwithstanding these evident limitations we think that data presented in this work will be relevant for future studies on the use of screen-based technologies for handwriting assessments in childhood First, supporting viability of SBAs, as highlighted by correlations with VPA in some, if not all, GMPs considered. Secondly, suggesting that some caution must be exercised in introducing SBAs, as normative data will be needed, given that SBAs lead to comparatively higher error rates. Finally, suggesting relevance of building tools to assess tool familiarity and practice in children as well as their perception of novel tools. We think that in the future screen-based assessments of GMPs in primary school children may be useful for educators and occupational therapists, who commonly use these parameters to better understand a child’s profile of strengths and weaknesses. Given that screen-based handwriting appears both challenging and pleasant to children, we are confident that future studies may lead to a more conscious exploitation of SBAs, supporting the acquisition of the kinetic melody of handwriting in childhood.