Reliability and Validity

Discussion Number 1


Introduction to Reliability

Reliability refers to the consistency of measurements of a given test. This means that the test, no matter what it is measuring will produce the same value or one very close to it every time it is used(Cohen, 2011, p. 86). Reliability means that given the same circumstances the observer who perceives the fact will perceive it again, and given the same set of circumstances other observers will perceive the same fact (Faust, 2012, p. 288). When a test has low reliability, the scores may be inconsistent, unstable and untrustworthy and accurate predictions cannot be made from the scores obtained.

According to Spreen, Sherman, and Strauss (2006, p. 10), reliability can be defined in terms of consistency within itself (internal consistency), consistency over time (test-retest reliability), consistency across alternate forms (alternate form reliability), and consistency across raters (inter-rater reliability). Indices of reliability indicate the degree to which a test is free from measurement error or the proportion of variance in observed scores attributable to variance in true score.

Components Involved When Estimating Reliability

Two components that are involved when estimating reliability are the true score and the error component. The true score and the error components of an actual score on the measure cannot be directly observed, however, the stability of measures can be assessed using correlation coefficients. Correlation coefficient is a number that indicates how strongly two variables are related to each other. When discussing reliability, the most common correlation coefficient is the Pearson Product-Moment Correlation Coefficient.

One way of thinking about reliability is through variability.The only reason an observed score changes from one measurement to another is due to random errors. Error scores are independent of the characteristic being measured; they are attributable to the measurement process, not to the individual. That is, the magnitude of error scores is unrelated to the magnitude of the characteristic being measured. If the measuring instrument is not very accurate, that is, if it adds large random error components to true scores- then the variance of the measured scores should be much larger than the variance of the true score.

Standard Errors of Measurement and Reliability Coefficients

The reliability coefficient measures the degree that the observed scores, which are made on the same stable characteristics, correlate with one another.To assess the reliability of a measure, at least two scores on the measure from many individuals are needed. If the measure is reliable, the two scores should be very similar. A Pearson correlation coefficient that relates the two scores should be a high positive correlation. This coefficient is referred to as the reliability coefficient.The reliability coefficient can be interpreted directly in terms of the percentage of score variance attributed to different sources. When all sources of variance are known for the same group, that is, the reliability coefficients for internal, test-retest, alternate form and inter-rater reliability on the same sample are known, it is possible to calculate the true score variance (Spreen, Sherman, & Strauss, 2006, p. 13).

The consistency of a test is generally expressed in terms of r. if r is equal to 1, the test can be considered to be perfectly reliable. If r=0, the test is totally unreliable. If r is negative, the reliability of the test is difficult to determine. This is because negative relationships should not occur when dealing with reliability. In general, the reliability coefficient should be 0.9 or higher before a test is considered reliable.

Estimation of Reliability and the Factors That Affect the Reliability of a Test

There are a number of different methods that can be employed in estimating the reliability of a test. These methods include the Test-retest method; the Equivalent Forms Method; Interrater; Split-Half; and Internal Consistency (Newman & Newman, 1994, pp. 49-51) .

Various factors have been identified that affect the reliability of a test. These factors introduce an element of error into any set of measurements and can be organized into three broad categories: temporary individual characteristic, lack of standardization, and chance. That is; test characteristics such as length, item type and item homogeneity; sample characteristics such as sample size, range and variability, and finally the influence of guessing. The extent of a test clarity is intimately related to its reliability. Reliable measures typically have clearly written items, easily understood test instructions, standardized administration conditions, and explicit scoring rules that minimize subjectivity as well as a process for training raters to a performance criterion.

Importance of Estimating the Precision of a Score

Precision is another important element of a test score. Precision of a score is important, because it is a measure of the repeatability of a measurement or reproducibility of an experiment. It is related to the dispersion or spread or variability in the measurements. Precision means that all trial results were fairly close to each other. However, this does not necessarily mean the experiment was accurate as precise measurements can be all close to each other but far away from the intended result. This could be the result of a systematic error.

Accuracy and precision are both important to have in an experiment as it ensures both a correct result and reproducibility. The reason why test reliability is important is because there is a need to know whether differences in test scores are “true” differences in the characteristics being considered, or whether differences are attributable to “chance” errors. If a test is not reliable, the result lacks predictive utility. When a test is reliable, changes in score are unlikely to be caused by temporary fluctuations in the examinee or environment, and thus significant differences in scores are more likely to be caused by a change in the examinee in the trait tested.

Effect of Reliability on the Confidence Level Placed On an Individual Score

The confidence placed on an individual score reflects the precision and accuracy of the individual scores. Reliability plays a very crucial role in determining the confidence that will be placed on an individual score. Individual scores that have high reliability, it follows then that a high confidence will be placed on that particular individual score. Similarly, individual score with low reliability will have a low confidence placed on them. Thus it is important that a score has high reliability as this will imply that the confidence placed of them is high and therefore reflects high accuracy of the score.


Discussion Number 2


An article discusses the validity of the Woodcock-Johnson Tests of Cognitive Abilities

One of the instruments that were used to measure my construct of cognitive ability is the Woodcock-Johnson Tests of Cognitive Abilities. The related article can be found on the website, authored by Fredrick H. Navarro, and titled “The Woodcock-Johnson Tests of Cognitive Ability, Third Edition”.

Validityof the Woodcock-Johnson Tests of Cognitive Abilities

The Woodcock-Johnson tests of cognitive ability is a set of tests administered individually and assesses an extensive range of intellectual and cognitive abilities for children up to two years old and adults up to and over the age of 90(Blackwell, 2001, p. 232).

The WJ III is a considered a highly accurate and valid system of diagnostic. The WJ III COG BIA can be applied to subjects two years old to 90 plus with reliability coefficients from .94 to .98, and concurrently validity correlations in the range of .60 to .69 with other measures of intelligence. This clearly reflects that WJ III COG has high validity. Reliability data for the WJ III COG derived from the Rasch analysis and Split Half methods. Median reliability coefficients ranged from a low of .63 to a high of .91.

Evidence for the validity of WJ III is provided for three categories: content, construct, and concurrent(Blackwell, 2001, p. 234). The author presents an extensive list of studies that have provided a broad variety of content and construct validity evidence supporting the WJ III COG.  In terms of concurrent validity, the author describes correlations in the moderate to high range with other intelligence or cognitive assessments.

Based on a number of previous studies conducted, the author concludes that WJ III has high validity and can be used to investigate a wide range of cognitive abilities as potential sources of poor achievement. The article states that it can also be used to explore differences related to giftedness in children, difficulties in comprehension among children, and the intrapersonal structure of cognitive factors in relation to achievement in person-centered context.

Strengths and Limitations of the Woodcock-Johnson Tests of Cognitive Abilities


The key strength of WJ III COG is the new standardization sample that is used to norm the WJ III COG; it is large and developed with great care. The inclusion of the WJ III COG ACH set of tests in the standardization sample also makes it unique in that many areas of achievement can be related to diverse areas of cognitive function(Cizek, 2003). According to Hartlage and D’Amato (2008, p. 191), the CHC theory foundation upon which the WJ III COG is founded is also a positive attribute and a feature that adds to its uniqueness. Further, the fact that the WJ III COG retains many features and scales from previous versions allows it to share the many validity studies conducted with its predecessors.

Additionally, the technical manual’s attention to AERA, APA and NCME standards is a clear strength, as are the administration procedures which make use of simple aids, a computer program to simplify and speed up scoring, and procedures which can be tailored to individuals with special needs(Blackwell, 2001, p. 234).


According to Cizek (2003), although WJ III COG is modeled on CHC theory, its manual does not describe how this was accomplished. He adds that the manual also lacks examples of computer-score output, as well as any description of commended interventions for different test outcomes. Additionally, extreme scores related to either age or ability are estimated rather than derived from actual subject data. Sares (2003)adds that with WJ III COG there are frustrations with the limited information provided about how the Rasch model contributed to test item selection, as well as the lack of information describing the factor analysis which confirmed the model. Phelps, McGrew, Knopik, and Ford (2005, p. 77) on the other hand raised a concern about the lack of factor analytic studies supporting the validity of the narrow ability scales.

Alternative ways to increase the validity of the Woodcock-Johnson Tests

The Woodcock Johnson III is considered a good measure that would have strong validity data. However, this validity is pegged on the procedures for administration and scoring. Therefore, it is imperative that the procedure for administration and scoring is strictly followed. The multifaceted nature of WJ III assures and increases the validity and generalizability of scores. Therefore, the levels of validity that are achieved by the employment of   WJ III COG leave no alternative ways to increase the validity of the WJ III.



Blackwell, T. (2001). Test Review. Rehabilitation Counseling Bulletin, 44(4), 232. Retrieved July 30, 2014, from Academic Search Premier database.

Cizek, G. J. (2003). Review of Woodcock-Johnson III. Mental Measurements Yearbook database. Retrieved July 30, 2014

Cohen, L. J. (2011). The Handy Psychology Answer Book. Visible Ink Press.

Faust, D. (2012). Ziskin’s Coping with Psychiatric and Psychological Testimony (illustrated ed.). Oxford University Press.

Hartlage, L. C., & D’Amato, R. C. (2008). Essentials of Neuropsychological Assessment: Treatment Planning for Rehabilitation, Second Edition. New York: Springer Publishing Company.

Navarro, F. H. (2010, March). The Woodcock-Johnson Tests of Cognitive Ability, Third Edition. Retrieved July 29, 2014, from ResearchGate:

Newman, C., & Newman, I. (1994). Conceptual Statistics for Beginners. University Press of America.

Phelps, L., McGrew, K., Knopik, S., & Ford, L. (2005). The General, Broad, and Narrow CHC Stratum Characteristics of the WJ III and WISC-III Tests: A Confirmatory Cross-Battery Investigation. School Psychology Quarterly, 20(1), 66-88. Retrieved July 30, 2014

Sares, T. (2003). Review of the Woodcock-Johson® III Diagnostic Supplement to the Tests of Cognitive Ability In Woodcock, R., McGrew, K., Mather, N., & Schrack, F. (1977, 1977-2003). Woodcock-Johnson III Diagnostic Supplement to the Tests of Cognitive Abilities.

Spreen, O., Sherman, E. S., & Strauss, E. (2006). A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary. Oxford: Oxford University Press.





Do you need an Original High Quality Academic Custom Essay?