Test Development Proposal

Test Development Proposal


This paper addresses the construct of cognitive ability. It gives a definition and thoroughly explains the terms; Ability, and Cognitive Processes and the construct Cognitive Ability. The paper presents and examines five popular instruments used to measure the construct cognitive ability, these being;the Wonderlic Cognitive Ability Test, the Woodcock-Johnson Tests of Cognitive Abilities (WJ-III), the Wechsler Intelligence Scale for Children (WISC-IV), The Differential Ability Scale-2 (DAS-2), and the Kaufman Assessment Battery for Children (KABC II). The paper finallyconcludes by giving the prospects of the possibility of the formation of a new assessment instrument that would address the gap in effectiveness of the prevailing cognitive ability assessment tests to test for cognitive abilities which has been identified through the literature review.


 Literature Review


The paper will first introduce the construct “Cognitive Ability”. It will define it and present related terms. The paper will then give an in-depth analysis and review of five assessment tools that have been identified as important in measuring cognitive abilities of people who are differentiated based on various individual or group characteristics. To achieve this, the paper will review various empirical literature in the filed of psychology that touch on the cognitive ability construct. The paper will also endeavorat evidencing that indeed there is a gap in terms of an effective assessment tool that can address all the aspects of cognition ability and in extent suggest an approach through which a new assessment tool can be developed which will fill this gap.

A discussion of cognitive ability

From the psychometric perspective, an ability is defined as a developed skill, competence, or power to do something, especially existing capacity to perform some function, whether physical, metal, or a combination of the two, without further education or training. An ability is a directly observable skillthat can be judged in terms of level of proficiency, that is stable over time, and that is consistently displayed across varying opportunities to perform the behavior. At its most basic level, an ability may be viewed as the consistent performance of a discrete behavior in appropriate contexts, for example saying the word “No” in response to a question, or writing theletter “X” when asked to do so. However, under different conditions of task difficulty individuals typically vary considerably in abilities, and these individual differences are the foundation of ability measurement. From the structure provided by factor analysis and accompanying factor levels, the underlying sources of these individual differences on abilities can be inferred.

As an extension to this definition, cognitive ability is defined as any ability that concerns some class of tasks in which correct or appropriate processing of mental information is critical to successful performance (Harrison & Flanagan, 2012). It is an individual’s entire repertoire of learntskills, knowledge, andlearning sets as well as generalization trendsthough of as intellectual in nature that is existingand available at any one period of time.

Cognitive processes are inferred from the performance of individuals. Information processes have been defined as hypothetical constructs used by cognitive theorists to describe how persons apprehend, discriminate, select, and attend to certain aspects of the vast welter of stimuli that affectthe sensorium to developinternal representations that can be mentally manipulated, and transformed, and related to previous internal representations, stored in memory and later retrieved from storage to govern the persons decision and behavior in a particular situation(Harrison & Flanagan, 2012).Although individual differences on cognitive abilities tests are quantified by examining variability in performance across individuals, an individual’s cognitive processes can be inferred only by examining the item stimuli, the task demands, and the response requirements of these cognitive abilities.

The Instrumentsfor Measuring Cognitive Ability

Measurements of cognitive abilities have long been used to predict a number of socially important outcomes, such as academic attainments, occupational and social status, job performance and income, to name a few. Ability testing typically refers to standardized measures of intelligence, aptitude, or achievement. The selection devices most often used in cognitive ability testing are standardized test designed to measure developed abilities that are influenced, to no small degree, by environmental factors such as formal education. There are several tests for measuring cognitive ability, however, in this review, the paper will focus on the Wonderlic Cognitive Ability Test, the WJ-III test, the WISC-IV test, the DAS-2 test, and the KABC II test.

The Wonderlic Cognitive Ability Test

The Wonderlic cognitive ability is administered to evaluate the aptitude of an individual particularly in regards to problem-solving skills and is the most widely used standardized test to measure constructs of cognitive ability (Groth-Marnat, 2009). It is recognized as a traditional test of cognitive ability, and consists of items that measure vocabulary, reading comprehension and math.It is a 12-minute test of problem solving skills, or cognitive ability given widely to job applicants. This test has been widely used in employee selection and is popularly known for its use in the national football league draft.

It is a self administered test in which applicants are directed to read the test instructions and to complete sample questions provided on the front page of the test booklet. The test itself comprises of 50 questions and is administered for 12 minutes. Scoring the test is made convenient through the use of a scoring key. The scoring key is constructed so that the answers on the key line up with the answer blanks on the actual test. The test score is the total number of questions answered correctly.The test is designed to measure GMA and is comprised of three subsets that include vocabulary items, arithmetic reasoning and spatial reasoning items. When combined, these subsets enable the test to establish a GMA score.

The reliability of this test has been examined in many studies with the alpha reliabilities ranging from .82 to .94. The alternate form reliabilities have ranged from .73 to .95. The test is an example of a spiral omnibus measure with items of various g-related content arranged in order of difficulty. The correlations of other tests support the construct validity of the test as a measure of GMA.The Wonderlic Cognitive Ability Test is used almost exclusively for applicants screening. However, occasionally it is used as a marker test of GMA research.

The key strengths of this test include; the test has been crafted such that for visually impaired applicants, there are two different large-print versions of the test available as well as a braille version. An Audio version is also available, and all of these special forms of the tests come with a supplementaryusers manual. In terms of cross-cultural factors, the test is offered in 11 alternate language forms: Chinese, French, German, Japanese’s, Korean, Portuguese, Russian, Spanish, Swedish, Tagalog, and Vietnamese.The test is also available in a computerized version called the WTP-PC, which incorporates special diagnostic and reporting capabilities to support the testing programs (Hersen, 2004).

One shortcoming of this test comes in the form oflegal and ethical considerations.Any measure of GMA shows mean score differences by race. These race differences may require employers to expend effort to show that tests are unbiased and are job related.

The Woodcock-Johnson Tests of Cognitive Abilities (WJ-III)

The Woodcock-Johnson Tests of Cognitive Abilities (WJ-III) was developed by Richard Woodcock and Mary Johnson in 1977.  The tests allow for a detailed analysis of the cognitive abilities and can be administered to children two years and older adults (Educational Testing Service, 2010).The Woodcock-Johnson Tests of Cognitive Abilities (WJ-III) assesses general intellectual ability, distinctivecognitive abilities, oral language, and scholastic aptitude, as well as the academic achievement for individuals of ages 2 through 90. It measures general and specific cognitive functions. The standards version of the WJ-III-COG involves 10 subtests, which can be supplemented in the extended version by another 10 subtests. The WJ-III uses the CHC model as the foundation for the assessment of cognitive ability (Whiston, 2008).

The WJ-III series of tests differ in two ways from other cognitive assessment tools. It was designed specifically to assess the full range of cognitive abilities according to the CHC theory, and it attempts to minimize factorial complexity by assessing specific narrow abilities. In addition, CHC theory is in many ways compatible with current neuropsychological theories about cognitive functioning, and the test interpretation materials include clinical interpretations based on information-processing theory (Davis, 2010). This means that the WJ-III is an excellent source of individual subtests for CHT, despite the fact that it was not designed primarily as a neuropsychological assessment instrument.

The WJ-III cognitive was standardized on 8,818 people aged from 2 to over 90 years, stratified on age, sex, race and ethnicity, community size, U.S region, type of school, and parental education and occupational level. Reliability studies indicted high reliability, with the General Intellectual ability (GIA) score having a median reliability of .97 (standard) and .98 (Extended) across all ages. The WJ-III cognitive has extensive evidence of validity, with content coverage of all the major cognitive functions identified in CHC theory, which makes it an excellent source of supplemental tests for CHT. Factor analytic evidence supports the cluster structure. Validity evidence includes concurrent validity indices range from .58 to .72 for the GIA standard, and the WJ-III Achievement clusters for school-age children, generally exceeding prediction of those same clusters with the WISC-III FSIQ score (Davis, 2010). Special group studies and bias analyses are also presented in the extensive technical manual. One of the most useful aspects of the examiners manual is the section on adapting testing for students with disabilities.

A unique aspect of the WJ-III is the weighing of the GIA according to the g-loading of each test at the individual levels, making it a robust measure of overall cognitive ability if one should be needed (Fiorello & Hale, 2004). However, this can also affect the GIA if a child with disability performs poorly on one of the highly weighed tests. The tests authors have made significant attempts to limit the factorial complexity of the measures, which is both a positive and a negative aspect of the WJ-III. Limiting the factorial complexity gives one a better diagnostic direction and a means to link assessment to intervention data. However, there is a drawback to such specific subtest construction. For instance, the WJ-III may not measure complex language as well as other measures. This is good if one wants to differentiate language from crystallized abilities, but not if there is a need to examine the integrity of the left hemisphere. Another similar pro-and-con decision involves the spatial relations subtest, in an attempt to avoid motor skills and speed contaminated spatial processing, a multiple-choice response format is used, which makes the subtests very different from a perceptual organization or production task. Thus, a child with poor spatial skills can deduce the correct answer by using a trial and error approach. Finally, there is a tendency for practitioners to interpret these subtests with a “one construct, one subtest” mentality, which does not accurately reflect the cognitive skills required for the subtest (Fiorello & Hale, 2004).

In general, WJ-III provides numerous subtests covering a wide variety of cognitive skills; it is thus a very useful CHT screening tool and source for supplemental tests for CHT evaluations.
The Wechsler Intelligence Scale for Children (WISC-IV)

The Wechsler Intelligence Scale for Children (WISC-IV) was developed by David Wechsler.  It consists of 10 subsets: oral language, listening, comprehension, written expression, spelling, pseudoword decoding, word reading, reading comprehension, numerical operations, and mathematics reasoning (Whiston, 2008, p. 180). This test is individually administered to children between the ages of 6 and 16 years. The test can be completed without the need for reading and writing. The test time is between 65-80 minutes after which an IQ score is generated which determines the cognitive abilities of the child (Groth-Marnat, 2009).

The WISC-IV is the most recent version of the cognitive assessment instrument most commonly used in schools. Moving beyond the verbal-performance dichotomy, this version takes into account recent theoretical advances and research finding on cognitive functions and processes to provide a four-factor model. It also provides subtest and supplemental process scores to aid in the interpretation of strengths and weaknesses. The WISC-IV four-factor model has been strengthened to make interpretation at the index score level more useful. There are fewer, but psychometrically stronger, subtests for the verbal comprehension, perceptual reasoning, working memory, and processing speed indices. In addition, several supplemental tests can be used for hypothesis testing including a new supplemental verbal word reasoning subtest, which is useful in assessing verbal reasoning and abstraction (Fiorello & Hale, 2004).

The WISC-IV was standardized on 2,331 children from ages 6 to 16, stratified on age, sex, race, parental education level, and region of the United States. Reliability studies indicated high reliability, with the FISQ score reliability averaging .97 across the age levels. Individual subtests averaged reliabilities ranging from .70 to .90, and the four index scores have reliabilities ranging from .88 to .94(Davis, 2010). The WISC-IV provides considerable evidence of validity, including a concurrent validity of .89 with the WISC-III. Studies related with WISC-IV to measures of achievement, memory, adaptive behaviors and emotional intelligence have also been completed. Factors analyses confirmed the structure of the test across the age groups, with some secondary loading subtests indicating the continued factorial complexity of this measure.

The WISC-IV remains a psychometrically sound instrument consistent with the Wechsler tradition. Stimuli have been improved and direction simplified to reduce the impact linguistic deficits could have on understanding tasks, especially for the processing Speed subtests (Davis, 2010).

However, WISC-IV has a number of weaknesses. There is a number of important cognitive functions that are indirectly assessed or not measured at all by this test. For instance, there is no good measure of auditory processing or phonemic awareness, although the working Memory and verbal Comprehension subtests assess this indirectly. Additionally, most MISC-IV subtests remain factorially complex, making them rich clinically, but difficult tointerpret at times(Fiorello & Hale, 2004).

In general, WISC-IV is increasingly becoming useful in clinical practice and more closely aligned with current cognitive and neuropsychological theory.

The Kaufman Assessment Battery for Children (KABC II)

The Kaufman Assessment Battery for Children (KABC II) is a measure to determine how information processing takes place in a child.  The battery has five scales including sequential, simultaneous, planning, learning and knowledge (Snowman, McCown, & Biehler, 2012).

It is an individually administered achievement test. The test comes in two forms: the brief form and the comprehensive form and covers reading, spelling and mathematics. The comprehensive form is more extensive and was expanded, particularly related to reading. The reading composite is comprised of two subscales, which are the latter and word recognition and reading comprehension (Kaufman, Fletcher-Janzen, Lichtenberger, & Kaufman, 2005). There are six additional reading-related subtests: phonological awareness, nonsense word decoding, word recognition fluency, decoding fluency, associational fluency and naming facility. It also includes a math composite that consists of two subscales: math concepts and applications, and math computation. The fourth composite is written language and includes written expression and spelling. There is also the fifth composite called oral language, which assesses listening comprehension and oral expression. The final score is a comprehensive achievement composite. The comprehensive form of this test is appropriate for individuals ages 4-6 through 25 (Whiston, 2008).

Reliability studies conducted during the standardization process resulted in generally good coefficients, that is, above .80, although some were poor and fell below .70(Kaufman, Fletcher-Janzen, Lichtenberger, & Kaufman, 2005). In half split analysis, Rasch-based ability scores for the odd and even subtest halves were obtained, were correlated and then adjusted using the Spearman-Brown formula. This yielded correlations between .60 and .95 on the core subtests and correlations between .57 and .92 on the supplementary subtests. Nunally’s formula was used to determine reliability coefficients for the factor indexes, and it yielded coefficients between 0.81 and 0.95. For both models, the MPI and FCI fell between 0.94 and 0.97 for all age groups but the 3-year age group, for which MPI and FCI were 0.90 (Davis, 2010).

As with any standardized cognitive assessment, the KABC-II has its own profile of strengths and weaknesses. Strengths include independent statistical analyses that largely support the factor structure of KABC-II for the CHC model. It is conformed with the KTEA-II to provide appropriate assessments of LDs. It also appears to be a more culturally fair assessment than other measures of cognitive ability (Davis, 2010). This test can also be used for the assessment of students from different populations, such as, English language learners, students with speech language disorders, clinical populations, and individuals with hearing disorders. The KABC-II is a fun and engaging instrument for children, especially for children of preschool age, as the pictures are colorful and the manipulatives are attractive.

Although the KABC-II provides two theoretical models for administration and interpretation, most current research has focused on the CHC model. Research has also criticized the KABC-II because of the interchangeability between the Luria and CHC models and the lack of evidence to support one model over the other (Davis, 2010). Additionally, users may be dissatisfied by the lack of information provided to guide treatment planning.

In general, despite these weaknesses, the KABS-II is a very useful addition to the multitude of cognitive assessments available to examiners.

The differential Ability Scale-2 (DAS-2)

An adaptation of the British ability scales, the DAS-2 is an increasingly popular cognitive assessment measure since its introduction in the United States in 1990. One important features of the DAS-2 is the separation of verbal ability, non-verbal reasoning ability, and spatial ability tasks into cluster scores for meaningful interpretation (Fiorello & Hale, 2004). Another is the use of diagnostic subtests to assess rote and long term memory and processing speeds. The DAS-2 also has a preschool version.

Several DAS-2 subtests are useful in looking at essential cognitive processes and memory. The DAS-2 includes brief measures of basic achievement skills, but these should be seen as screening measures of word reading, math calculation, and spelling.

The availability of out-of-level testing makes the DAS especially attractive for assessing children with disabilities. Also available is the availability of the special non-verbal composite, though an assessment that avoids language and crystalized abilities has limited utility in neuropsychological assessment.

The DAS provides extensive evidence of validity, including concurrent validity indices with the WISC-R of .84 to .91 at different ages, and predictive validity indices of .46 to .66 with the Basic achievement skills individual screener, .40 to .76 with group achievement tests, and .45 with grade point average. Factor analyses confirm the structure of the test which shows increasing differentiation of abilities over time.

A key strength of the DAS-2 is its use of three main factors at the school-age level; it separates the non-verbal reasoning ability cluster from the spatial ability cluster, thereby de-emphasizing the traditional verbal-nonverbal dichotomy.Additionally, the addition of diagnostic subtests to explore strengths and weaknesses is another key distinction and strength of this test., Furthermore, the technical manual is extremely complete and helpful for designing an appropriate assessment for an individual child, and it also presents information about special group studies and bias analysis. The interpretation section is especially strong, providing a great deal of information about the cognitive processes required for subtest performance; this information makes the DAS-2 very useful for CHT.

The DAS-2 has several drawbacks, necessitating the use of additionalhypothesis-testing measures. First, it does not explore the complexity of language processes as well as some other cognitive measures do, and teasing out crystallized knowledge from language functioning is difficult. The assessment of memory is a good idea, but the DAS-2 memory related subtests are not sufficient for a comprehensive memory assessment. The recall of designs subtests is a measure of spatial skills but interpretation is influenced by both visual memory and praxis components. Finally, the DAS-2 does not appear to have an adequate measure of executive function, but the nonverbal reasoning subtests are certainly affected by executive skills (Fiorello & Hale, 2004).

Test Development

Collectively, these tests of cognitive ability when their individual strengths are pieced together they form a strong and effusive collection of tests which address all the aspects of cognitiveability that need to be measured in a particular case. Therefore based simply on their collective strength, there is no need for any new instrument of assessment of cognitive abilities.

However, individually these tests for cognitive ability have weaknesses that form strong drawbacks that make the employment of each test by its own not only insufficient and unsatisfactory but also ineffectual and unconvincing, thereby attracting doubt when a deduction or finding is made from a data analysis that is based on the test.

Therefore, taking into consideration the collective strength of these instrument and the individual weaknesses that each test exhibits by itself, the development of a new assessment instrument can only be based on a collection of the individual strengths of each instrument. For instance, even though the WJ-III has the weakness of not accommodating the application of the test to people with disabilities of various forms, such as visual, or audio, the Wonderlic test has distinctive strengths in its ability to cater for visually impaired applicants through braille versions of the test and the inclusion of an audio version to accommodate the applicants with a hearing problem. The weakness of the WISC-IV instruments on its inability to measure all the cognitive  functions and its factorial complexity that makes it difficult to interpret can be addressed by the differential Ability Scale-2 (DAS-2) which is easy to administer and interpret and comes with a simplified user manual.

This clearly shows there is a gap and therefore a need for a new improved instrument, which even though is an additional instrument it cannot be considered as new but rather an improved version of a combination of the existing instruments.


The paper has addressed the construct of cognitive ability and gives a brief description of the construct. This was followed by an in-depth review of five cognitive ability assessment instruments. The paper concluded by showing that indeed a gap exists which can be filled by the development of a test which combines certain strengths of the five instrument discussed in the empirical literature.


Davis, A. S. (2010). Handbook of Pediatric Neuropsychology. Springer Publishing Company.

Educational Testing Service. (2010). The Official Guide to the GRE Revised General Test. McGraw-Hill.

Fiorello, C. A., & Hale, J. B. (2004). School Neuropsychology: A Practitioner’s Handbook. Guilford Press.

Groth-Marnat, G. (2009). Handbook of Psychological Assessment. New Jersey: John Wiley & Sons.

Harrison, P. L., & Flanagan, D. P. (2012). Contemporary Intellectual Assessment, Third Edition: Theories, Tests, and Issues. Guilford Press.

Hersen, M. (2004). Comprehensive Handbook of Psychological Assessment, Industrial and Organizational Assessment. New York: John Wiley & Sons.

Kaufman, N. L., Fletcher-Janzen, E., Lichtenberger, E. O., & Kaufman, A. S. (2005). Essentials of KABC-II Assessment. John Wiley & Sons.

Snowman, J., McCown, R. R., & Biehler, R. F. (2012). Psychology Applied to Teaching. Carlifornia: Wadsworth.

Whiston, S. (2008). Principles and Applications of Assessment in Counseling. New York: Cengage Learning.

Do you need an Original High Quality Academic Custom Essay?