
With the increasing prevalence of developmental disabilities (DDs) among young children worldwide [1-3], early identification of DDs has become even more critical for allowing the developmental trajectory of children with DDs with appropriate interventions [4-6]. Due to the high cost and insufficient number of specialists [2,7], however, access to developmental confirmatory tests is limited, which creates a delay in the diagnosis of DDs and causes missed opportunities [8]. As a result, developmental screening tests have emerged as part of an accurate and cost-effective health management system for young children at the national level, with an expectation of better prognosis for children at risk of DDs [9]. Many screening tests, such as the Denver Developmental Screening Test, the Ages and Stages Questionnaires (ASQ), and the Parent’s Evaluation of Developmental Status (PEDS), have been developed and implemented [10-12], and parent-administered screening tests are more popularly used according to the recommendation of the American Academy of Pediatrics [13]. One of the most widely used developmental screening tests is the ASQ, which is designed for periodic application with infants and children under six years old [12]. Since its development in the 1970s in the US [14], it has been translated into dozens of languages and used globally both in clinical and experimental settings [15,16]. Despite the broad use of developmental screening tests like the ASQ, however, some researchers question the effectiveness of these screening tests, as they have shown only moderate accuracy in previous studies. Sheldrick et al. [17] compared the accuracy of three parent-performed developmental screening tests—ASQ, PEDS, and the Survey of Well-being of Young Children (SWYC)—among a total of 642 children, ages 0 to 66 months. Unlike the estimates of specificity higher than 70% for all three screening tests, they presented low to moderate accuracy in sensitivity, from 23.5% for the ASQ in the younger group to 61.8% for the PEDS in the older [17]. Results of another study that analyzed nationwide population-based data revealed moderate levels of sensitivity of the Korean version of ASQ and the Korean Developmental Screening Test for Infants and Children (KDST)—64.1% and 44.4%, respectively, for the lowest [18].
One suspected cause is the reliability of parents as test administrators. In a study by Sheldrick et al. [17], parents of 298 (20.5%) children found them to be positive on the ASQ, while 422 (29.0%) and 127 (8.8%) children were classified as positive on the PEDS and the SWYC, respectively; the concordance rate of two screening tests among the three ranged from 35% to 60%. In another study that examined agreement of the test results between the ASQ and the PEDS, 33% (20/60) of the test results disagreed; for those who received positive results from one or both of the tests, 69% (20/29) showed disagreement in the test results [19].
According to survey results in a policy research report that investigated difficulties in administering the K-DST, 40.2% (125/311) of the parents had difficulty answering the questionnaires due to failure in observing their child’s performance (86, 27.7%) and the confusing or difficult questions (36, 11.6%) [20]. In addition, 35 of 85 (43.2%) professionals reported that parents could not understand the meaning of the questions and needed additional explanations [20]. Despite these reported difficulties, most of these screening tests do not give parents sufficient administration guidance [21-23]. The K-DST user’s guide devotes only half a page to parental administration, and that only includes instructions to answer the question after they let their child perform the task, if they are not sure of the child’s capability, and to mark as “can do it” if the child shows sufficient abilities even without witnessing the actual performance [21]. Other screening tests are not very different—the ASQ provides a 4-page flyer for parents that includes general instructions about the screening process along with the purpose and the expected benefits [23]; the PEDS provides guidelines not for parents but only for professionals to help score the results [24]. None of these guidelines provides scoring criteria on the scale, which may compromise diagnostic accuracy and cause a delay in identification of children with DDs, especially for those with mild developmental delay [17].
Therefore, this study aimed to 1) develop easily understandable guidelines that can help parents accurately administer a parent-performed developmental screening test and 2) evaluate the subjective usefulness of the developed guidelines.
We developed the guidelines based on the K-DST. The KDST is a parent-performed broadband screener that has been used since 2014 among all children under seven years old in Korea as a part of the National Health Screening Program (NHSP) for young children [18,20]. The items on the K-DST were categorized in five developmental domains—gross and fine motor movement, cognition, language, socialization, and self-help—and were designed for periodic administration [18]. Although the K-DST uses a zero-to-three-point scale rather than the zero-to-two-point scale more common for other screening tests, the vast number of its 335 items is sufficient to reflect the structural formats and characteristics of other broadband developmental screening tests.
The initial questionnaire was developed through three steps (Fig. 1). In the first step, the authors reviewed the report by Eun [20], executed by the Korea Centers for Disease Control and Prevention, which analyzed difficulties in the administration of the K-DST. The report contains two survey results investigating item adequacy of the K-DST: one among 415 parents of young children, and the other for 83 experts [20]. The survey with parents collected opinions about overall problems, while the survey of experts asked for each item. In the second step, the authors categorized items identified as difficult to understand or confusing into six categories: 1) items that can be administered impromptu assessment; 2) items that cannot be administered impromptu assessment; 3) items that present numbers in the performance criteria, such as “five words” or “ten steps”; 4) items that cannot be administered due to absence of the task tools, absence of opportunities, safety concerns, or other reasons; 5) items that are difficult to understand or confusing; and 6) others. Items in the first and second categories were further classified by whether or not prior observation of the child’s behaviors had been made. The sixth category contained items such as the person who administered the test, the location where the test was held, and the length of time spent on test administration. The final step generated a list of items for the questionnaire, containing all possible measuring methods for each category, based on the following sources: 1) user’s guidelines for other developmental confirmatory tests [25,26], 2) clinical experience of developmental assessment professionals, and 3) actual experience of parents. The first-round questionnaire comprised 33 items, including two open-ended questions. All items included a comment box (Table 1).
We built an expert panel of 20 experts [27] who had more than 10 years of clinical experience and expertise in pediatric psychiatry, pediatrics, child health nursing, developmental assessment, and special education. Of the 24 experts we initially contacted through email, 20 agreed to participate in the Delphi survey.
We distributed the first-round survey to the expert panel by email. The items in the survey were rated from “very inaccurate” to “very accurate” on a 5-point Likert scale. We calculated the first and third quartiles, the median agreement value (interquartile range=Q3-Q1), the convergence value
We surveyed the parents’ subjective usefulness to investigate the clinical feasibility of the developed guidelines. Parents aged 18 years or older who had performed the K-DST within the past six months were recruited through the online communities for parents, and duplicate participation was prevented by requiring ID authentication. We targeted a total of 167 parents, calculated as 15 times the 10 influential factors— age, sex, living area, education level, number of children, age of the child, primary caregiver, location where the test was administered, observation of the child prior to administering the test, and scoring methods—with 10% of attrition rate [28]. In addition to the subjective usefulness of each bullet point in the guidelines, we asked about parents’ socio-demographic factors and their usual measuring methods of administering the K-DST within the six categories. We calculated the number of subjects and percentages to descriptively analyze characteristics of the participants and the results of the survey. Comments were categorized by the characteristics of the contents and then qualitatively analyzed. Microsoft Excel® software (2019; Microsoft Corp., Redmond, WA, USA) was used for statistical analysis.
This study was conducted after approval of the Institutional Review Board of Seoul National University (IRB No. 2007/002-002).
We obtained written informed consent from the expert panels for Delphi survey and online informed consent from the parent participants for online survey prior to their participation.
The expert panel consisted of 20 panelists, four from each specialized field, and they all participated in both rounds. Of the 33 items, including the two open-ended questions, 14 items reached consensus during the first round. Based on the results and the comments from round one, we merged two items and modified four items. Therefore, 32 items were included in the second-round questionnaire, and they all reached consensus after round two. We selected the items with the highest accuracy score in the six categories and modified them for better readability and understandability for parents (Fig. 2). The finalized parent guidelines were approved by the panelists.
A total of 167 parents of young children participated in the online survey that investigated the subjective usefulness of the developed guidelines. Among the participants, 132 (80.5%) were in their thirties, 157 (94.0%) were mothers, and 127 (76.0%) had a bachelor’s degree (Table 2). When considering the agreements between the parents’ usual measurement methods and the guidelines, the majority of participants answered that 1) the primary caregiver of the child administered the test (95.8%), 2) they conducted the assessment at home prior to the appointment at the clinic (77.8%), and 3) they scored the child’s performance according to their proficiency for items in the socialization and self-help assessment domains when the observation was made (73.1%) (Table 3). However, only a small number of parents answered that 1) they observed the child for seven days prior to administering the assessment (25, 15.0%), 2) they left items blank and assessed in consultation with physician when the items were difficult to understand (26, 15.6 %), and 3) they scored the child’s performance according to the number of executions for items in the motor movement, cognition, and language domains when observations were made (29, 17.4%). Only half of the parents (51.5%) scored 0 points when the child performed below the numeric criteria. For the subjective usefulness of the overall guidelines, 67.7% (113) of the parents thought it was useful. When analyzing more specifically, instructions that recommended a different scoring strategy in answering the questionnaire from their previous measurement tended to be regarded as more useful by the parents. The instructions for items difficult to score due to a lack of assessment tools or inadequate understanding that had around 20% of the agreement between the guidelines and the parents’ previous measurement showed the biggest proportion of the parents (64.1%) answering “useful.” The results also revealed a huge gap between the number of parents who answered “useful” and those who answered “unuseful” for each bullet point, ranging from four to nine times. Additional comments were categorized by satisfaction about the guidelines (101, 60.5%), K-DST/NHSP-related comments (50, 30.0%), and others (16, 9.6%) (Supplementary Table 1 in the online-only Data Supplement). Of the 102 satisfaction-related comments, specific standards for test administration (32, 31.4%) was the most frequently mentioned reason in both the satisfactory and unsatisfactory responses. Other parents were satisfied with the simplicity of the guidelines, information about the length of the observation period, and an option for answering the question in consultation with a doctor. A strict scoring method, such as scoring 0 points for a child’s performance below the standard, and insufficient simplicity of the guidelines were reasons for dissatisfaction. Among the comments about the K-DST or the NHSP, 78% were related to the difficulties in administering the test.
The purpose of this study was to develop guidelines that are easy to understand but also comprehensive enough to cover the difficulties parents face when administering the screening tests. The final version of the guidelines provided sufficient information about answering the questions of the screening tests, from “who” and “where” to administer the tests to “how” to score the child’s performance on the zeroto- three-point scale. Allowing parents to answer the questions based on objective evidence as much as possible was the priority of these guidelines, and so specific instructions are stated according to the characteristics of the items and by the developmental domains the items measure for. Although these guidelines were developed based on the K-DST, the usability can be expanded to other screeners because of the structural formats and the characteristics of the items in the K-DST.
These instructions exhibited high similarity to the guidelines for the Bayley Scales of Infant and Toddler Development (BSITD), which is widely used as a “gold standard” test to measure developmental status. According to the administration manual of the BSITD [29], each of the 326 items includes a detailed instruction for administration, in addition to the general instructions for length of administration time, number of trials, time measuring method, and so on. A pink threepiece jigsaw puzzle, for example, includes the guidance to provide only one opportunity with a time limit of 180 seconds, which measures child’s performance by proficiency. In another case, the BSITD offers guidance for items in which numbers are included in the performance criteria—to answer “yes” only when a child performs the exact number of the criteria. These measurements were also included in the developed guidelines. Although the scoring criteria for the BSITD are different from those of our guidelines—the BSITD includes only yes-or-no questions—most of the contents in our guidelines give instructions similar to those of the BSITD.
The rates of agreement between the guidelines and the parents’ usual measurements showed clear deviations based on the characteristics of the measurements, and they reflected challenges to accurately administering the screening tests. According to the survey results, measurements using subjective evidence, such as scoring the child’s performance based on proficiency, had higher agreement rates than those using objective evidence, such as scoring the performance based on the number of successful executions. In the case of items that include numbers, half of the parents gave points when the child did not satisfy the exact numbers in the items, and 5% of those who wrote additional comments about satisfaction of the guidelines thought that scoring 0 points for a child’s performance below the standard was too strict. This may lead to an overestimation of a child’s abilities and an increase in false-negatives. In fact, a previous study revealed that clinicians lacked trust in parents as administrators of developmental screening tests due to parents’ overestimation of their child’s abilities and their inadequate knowledge of development [30].
The strength of these guidelines is that they provide instructions based on the characteristics of the questions, so it can be generally applied to other developmental screening tests. It is simple to understand and easy to practice, and it can be used by parents when administering a developmental screening test with their children in a clinical setting. The usefulness survey results exhibited a pattern of the measurements with lower agreement rates showing higher percentages of parents who answered “useful.” The measurements for items difficult to score—due to not understanding the meaning or lacking the opportunity to administer, for various reasons, for example— yielded the highest percentage of parents who considered these instructions useful, while about 80% of parents appeared to answer to these items without consultation with a physician. Additional comments from the survey showed high satisfaction of the usefulness of the guidelines, with the highest percentage being the more specific instructions for test administration, which indicates that these guidelines met their needs. Overall, the percentage of parents who answered “useful” for each of the instructions and for the whole guidelines outweighed the percentage of those who answered “unuseful.”
As a limitation of this study, the study participants have limited generalizability. Despite the typical size of the Delphi panel group [31], only four experts in each of the five expertise areas were included in the group, which may limit the representativeness of each field. For the usefulness survey, participants were recruited from online communities, so they may not represent the characteristics of the target population.
To the best of our knowledge, this is the first study to develop parent guidelines for the administration, not the interpretation, of screening tests, and the first study to use the Delphi technique to develop the guidelines. Findings from the usefulness survey reflected the parents’ needs for more specific scoring standards. Further studies are needed to evaluate the effectiveness of the guidelines in terms of the accuracy of test administration of parents and the diagnostic accuracy of the tests.
The online-only Data Supplement is available with this article at https://doi.org/10.5765/jkacap.230002
All data generated or analyzed during the study are included in this published article (and its supplementary information files).
The authors have no potential conflicts of interest to disclose.
Conceptualization: all authors. Data curation: Sung Sil Rah. Formal analysis: Sung Sil Rah. Funding acquisition: Sung Sil Rah. Investigation: Sung Sil Rah. Methodology: all authors. Project administration: all authors. Resources: all authors. Supervision: Soon-Beom Hong, Ju Young Yoon. Validation: all authors. Visualization: Sung Sil Rah. Writing—original draft: Sung Sil Rah. Writing—review & editing: all authors.
This study was supported by The Health Fellowship Foundation of Korea.
![]() |
![]() |