Case Study 3

Case Study 3

The objectives of interpreting the reliability of the two new methods of evaluating job applicants for the position of telephone customer service representative is for Phonemin Company to improve the caliber of employment group from which new employees will be selected and hired. The participation of customer service provider in the telephone ordering system is important in this selection. Additionally, Phonemin will be increasing the number of employees by about forty to the call center to allow the company to meet the predetermined sale volume of phones. As a result, an appropriate method to assess candidate for the positions of telephone customer representatives is required.

To interpret the reliability results for the clerical test and work sample in one-time period, the agreement between the coefficient alpha and inter-rater must be assessed. In this case, clerical test was rated objectively. Therefore, coefficient alpha is applied in determining the reliability 1 and 2. However, it is recommended that the coefficient alpha should not be less than 0.8 to be accepted as reliable. Therefore, from the figures given, it can be said that clerical test of 0.85 and 0.86 for 1 and 2 respectively meet the criteria specified. Furthermore, since the work sample was rated subjectively, to establish the reliability of the work sample, we examine inter-rater agreement. Employing the end ratings of these tests are critical in this method. The recommended criteria for inter-rater suggest that it should not be above 75%. From the information in the result of time 1, the work sample for tactful was represented 88% while the work sample for customer concern was 80%. On the other hand, measurements were 79% for work sample for tactful and 82% for work sample for customer concern.

Notably, the test-retest reliability is high (92%), and the clerical tests represented by high coefficient of 0.85% and 0.86% which is a positive sign is reasonable scores and the work sample (T) in which the reliability indicated an inter-rater agreement of 88% and 79% is also an adequate score although higher percentage would be more reasonable. The same is applicable in work sample (C) in which the reliability fulfilled the inter-rater agreement criteria typically showing a desirable score.

Generally, the company should just consider using these work samples and the clerical tests as methods for assessing suitable candidates to be employed in the positions of telephone order sales customer representatives. However, the company should work in improving the inter-rater reliability scores in the evaluation. On one of the ways to improve the score is through offering training and timely recalibration of raters. According to Shoukri (2010), improving the inter-rater reliability scores, will help reduce associated problems such as drifts or movements away from the standards as indicated in the rating protocols. The above tests, however, show adequate reliability.

  1. Validity is very critical in almost all experiments. Validity tests helps to ascertain that all the instruments used are adequate and it enable more confidence in experiments. To achieve a validity figure, one has to apply specific scientific measure that is designed to verify the validity under test. In most cases, scientists have applied validity types such as content validity, criterion-related validity, construct validity, and consequential validity (“validity evidence,” 2014). The groups are further classified into sub-categories categories. For instance, content validity includes curricular validity and face validity while construct validity is composed of convergent validity and discriminant validity. Furthermore, criterion-related validity is divided into discriminant and concurrent validities.

In the current case, the significant correlation included error rate of negative and speed of positive while complaints correlations was significant. Therefore, the correlations between the experimental tests revealed a little correlation between Clerical test and all the work samples. Nevertheless, there was also an existence of a high correlation between the two work samples. Both the work samples presented very similar result, with the outcome showing insignificant error rate and speed while the complaints are significant.

In conclusion, this result indicates that clerical test and either of the work samples can be used in the assessment for CSRs. Evidently, the work samples produced the same results suggesting that they are redundant. Moreover, clerical test can provide a good prediction of different criteria including error rate and speed. The test is suitable and applicable in the study. However, it requires complement of the work samples. Therefore, incorporating both clerical tests and either of the work samples (T or C), the best selection can be achieved.

  1. The current CRS were selected to participate in the study. These people are already in the job, there efforts in the test may show different while performing the test compared to the new applicants. First, the individuals have gained a lot of experience gathered during the period they have been working. Secondly, the participants have additional skills particularly, in relation to dealing with complaints issues. Hence, it is critical to hold this in mind while deciding on the work samples and clerical tests. Furthermore, it is also important to ask some questions such as are the measures taken real indictors of performance? As was mentioned in the text, that KSAOs selected are likely to deliver a successful performance as a CSR and also that they are anticipated to have high effect on job performance. Thus, if they are not true indicators, then the test may fail to provide a prediction on performance that Phonemin is interested in. The best selection can be indicated by how similar the participants in tests are to the anticipated applicants who will take the new positions. If the result shows significant difference, the outcome of this study can be less generalizable to the sample of study (other population).


Sireci, S., & Faulkner-Bond, M. (2014). Validity evidence based on test content. Psicothema26(1), 100-107.

Shoukri, M. M. (2010). Measures of interobserver agreement and reliability. CRC press