A Comparison of Test Ratings and College Grades

Author:

Frank H. Reiter, Ph.D.,

University of Pennsylvania.

For several years the students in Psychology 1, at the University of Pennsylvania have been subjected to a series of tests as a part of their regular laboratory work. The results obtained from one of these classes have been used for this comparative study of test ratings and college grades. Two of these tests?The Witmer Formboard and the Witmer Cylinder Test?are performance tests, that is, the subjects are required to place into their respective recesses a certain number of inserts as quickly as possible. No specific intellectual training is necessary for the performance of these tests. A comprehension of very simple language will answer all requirements. Signs even may be used to get a subject to understand what is desired. The method for giving the formboard test standardized by Dr H. H. Young1 was employed, and for the cylinder test the method differed only slightly from the standard method later adopted by Dr F. C. Paschal.2 Each subject performed the test in a small booth with only an experimenter and a recorder present. In each of these tests three trials were given and the shortest trial was used as indicative of formboard and cylinder ability. A stop-watch was used to determine the time for each trial.

In presenting the formboard test to the subject for the first trial, the blocks arc removed and placed at random into the tray attached to the edge opposite the subject. While this is being done the subject is told that he is to put these blocks back where they belong as quickly as possible, using either one or both hands. For the second trial the blocks are placed into the tray, piled up in a definite order. The subject’s attention is called to this arrangement, and he is requested to attempt to put them back more quickly than he did before. For the third trial the subject is permitted to arrange the blocks for himself. If he avails himself of the opportunity given him here: to arrange the inserts so that they will be placed as closely to their respective recesses as possible, he will be able to perform the test more quickly on this trial than on the other two. The arranging 1 YounR, II. II.: The Witmer Formboard. Psychological Clinic, X, 4 (June, 1916). Pp. 93 to 111. J Paschal, F. C.: The Witmer Cylinder Teat. The Hcrshey Press, 1918. Pp. 64. of the blocks in a definite order before the second trial should be enough of a suggestion for an adult to follow on the third trial, even if his own resourcefulness to do so might fail him. The third trial is, therefore, the most interesting one to observe.

For an adult the formboard is a test of motor-control or accuracy of movement; rapidity of movement, which is dependent upon the normal rate of discharge of energy for a particular subject; interest; planfulness or resourcefulness, on the third trial. A comparison of the three trials gives some indication of the subject’s learning ability, but the quality of the performance must be taken into acount as well as the time required to perform the test. These two aspects involved in the performance of a test of this type are, nevertheless, related; and a relatively short trial must be qualitatively a good one even if we make an allowance for individual differences in reaction time. For 90 per cent of the subjects in this study, the third trial is the shortest.

In the Cylinder Test the inserts are all of the same shape but vary in size. They are divided into three groups with reference to variation in size: first, seven blocks arc of the same height but vary in diameter; second, seven of them vary in height but are of constant diameter; third, seven of them vary in both height and diameter. The recesses for the respective inserts of each group form an ascending or, if you wish, a descending series. There are eighteen inserts and recesses. However, three inserts are represented in each group or series, i. ,e., the three series overlap. Thus three of the blocks are the last block of one series and the first of the next respectively.

The Cylinder Test was placed before the subject. While his attention was directed to it he was told the blocks would be removed and placed into the receptacle in the center, and that he was to replace the blocks as quickly as he could, using either one or both hands. If any of the blocks had been incorrectly placed at the end of the first trial, without saying a word to the subject, the experimenter made the changes necessary with the subject looking on. The subject, therefore, saw all the blocks correctly placed before he attempted the second trial. The blocks were again removed and placed into the central receptacle, and the subject told to replace them as quickly as he could. In case there were any errors on this trial the experimenter again made the proper corrections, removed the blocks as before and told the subject to replace them once more. If the subject failed to replace all of the blocks correctly the trial was scored a failure, that is, in this test 100 per cent accuracy is required on any particular trial. If any subject failed to replace the blocks correctly in all three trials his performance was considered a failure, regardless of the time required for the different trials.

For an adult the Cylinder Test tests: motor-control or accuracy of movement; rapidity of movement, interest, analytic concentration and distributive attention, discrimination of small differences in size, ability quickly to anafyze and apprehend a new problem or situation, and learning ability, if three trials are given. The different abilities entering into the accomplishment of a performance test are sometimes subsumed under one classifying term: psychomotor ability. The chief objection to terms of this character is that they do not designate precisely what the psychologist desires to know; the fundamental mental capacities tested by a given test. Psycho-motor ability is not equally connotative for operations at different performance levels.

The other tests employed were memory span and language tests. They were given as group tests, i. e., the entire class was tested at one and the same time. The material used in the memory span test consisted of first, series of digits beginning with series of four and ending with series of twelve; second, series of short three-letter words beginning with series of four words and ending with series of eight; third, a paragraph of such length that no subject was capable of reproducing it verbatim upon one oral presentation. The method employed in giving the test for digits and words is the same as that described by Dr H. J. Ilumpstone in his monograph, “Some Aspects of the Memory Span Test, A Study in Associability. “l Before the paragraph to be reproduced was read the students were told to listen carefully and to reproduce in their own words the ideas it contained. This paragraph when logically analyzed contains sixteen ideas. All that is required of the subject is a reproduction of the ideas no matter how crudely they are expressed. The tests test the subject’s ability to retain in consciousness a certain number of discrete elements or units long enough to reproduce them graphically after one oral presentation. In the reproduction of the paragraph the subject’s ability to apprehend the relation of the ideas and the thought as a whole is tested, in addition to retention.

The language test employed is one of the Trabue tests. It consists of a series of sentences with certain words omitted. The instructions given to the subjects arc that they are to complete the sentences, using words which make the best sense. Trabue has published standards which are to be used as a guide in scoring the completed sentences. The results for this group of students were 1 Humpstone, H. J.: Some Aspects of the Memory Span Test, A Study in Associability. Experimental Studies in Psychology and Pedagogy, No. 7 (1917). The Psychological Clinic Press, Philadelphia. Pp. 31. carefully scored on the basis of Trabue’s standards. Five minutes are allowed for the completion of this test. Any college student should have no difficulty in completing this test in the alloted time, so that a student’s rating is determined by the quality of his results. The choice of words which are most appropriate to complete the sentences is what the test actually calls for. It tests a subject’s imagination and language facility. More than one word may be chosen for a particular elision, but one of these may be more appropriate or elegant as to style. A sentence may be scored 0, 1 or 2: 0 if the words supplied distort the meaning, 1 if the words are poorly chosen and 2 if the selection of words completes the thought precisely. Discrimination in slight differences of meaning is what is required for a high score. There are eight sentences in the series. Sixteen is, therefore, a maximum score.

Before taking up a distribution of the results of these tests I wish briefly to indicate the manner in which a student’s average college grade was obtained. The grades recognized by the office of the Dean of the College are D, G, P, N and F, that is, a five-point scale is employed. Thus in referring to any student’s grades they will be found recorded as D’s, G’s, et cetera. If we assume that the interval 90 to 100 on a percentile scale is the equivalent of D on the five-point scale, that 80 to 90 is the equivalent of G on the five-point scale, that 60 to 80 is the equivalent of P on the five-point scale, that 40 to 60 is the equivalent of N on the five-point scale, and that 0 to 40 is the equivalent of F on the five-point scale; the median values on a percentile scale for each of the above-mentioned grades would be D-95, G-85, P-70, N-50 and F-20. In order to strike an average college grade for each student these percentile median values were used. The number of D units which a student had received in different courses was multiplied by 95, the number of G units was multiplied by 85, the number of P units by 70, et cetera. The sum of these results was then divided by the total number of units. This average was again translated into a grade on the five-point scale. Any average result, therefore, between 90 and 100 was recorded?D, between 80 and 90?G, between 60 and 80?P and between 40 and 60?N. The intervals for D and G are equal; likewise the intervals for P and N, but the latter are twice as great as either of the former. In distributing the results obtained from the tests, a five-point scale was also employed, 5 corresponding to D, 4 to G, 3 to P, 2 to N and 1 to F. The intervals for groups 5 and 4 are equal; the intervals for 3 and 2 are equal but they likewise are twice the magnitude of the intervals of 5 and 4. In this way the college grade and test rating for any given student are directly comparable. It matters little whether we call our best group the D group or group 5 as long as the method of distributing the results is consistently adhered to. A student’s final rating in the tests was obtained in precisely the same way as his college grade. Each test rating was translated into a median percentile value, and the sum of all the percentile values divided by the number of tests employed, yielded a student’s percentile test rating. This rating was translated into a rating on a nine-point scale: 5.0, 4.5, 4.0, 3.5, 3.0, 2.5, 1.5, 1.0. In order to distribute the test ratings on a five-point scale, the result of 4.5 and 4.0 are combined and the results of 3.5 and 3.0 are combined. The test ratings for the students in the class are based on the results of six tests: (1) The Formboard, (2) The Cylinder Test, (3) Memory Span for Digits, (4) Memory Span for Words, (5) Memory Span for Ideas?the Reproduction of the Binet Paragraph, and (6) A Trabue Language Test. The following is a comparative table based on college grades and test ratings of 94 students. Each student’s average college grade is compared directly with his average test rating. The table shows the percentage distribution of test ratings occurring for a given college grade.

Test Ratings. 2.5 3.0 3.5 4.0 4.5 5.0 College Grade D. 12.5% 37.5% 25.0% 12.5% 12.5% College Grade G. 28% 42% 22% College Grade P. 6.1% 27.3% 48.4% 18.2% Total Distribution of Test Ratings. 3.2% 25.5% 44.7% 20.2% 5-3% 1.0%

The above table shows that the mode for test ratings as a whole and for their distribution in each class of college grades is 3.5. Of the students having an average college grade of D, 50 per cent have test ratings greater than the mode, and these ratings are distributed among the 5.0, 4.5, and 4.0 classes of test ratings. In this group 12.5 per cent, of the students have a test rating less than the mode all in the 3.0 class. Of the students having an average college grade of G, 30 per cent have test ratings greater than the mode; these ratings are distributed among the 4.5 and 4.0 classes of test ratings ?none appear in the 5.0 class. In this group 28 per cent of the students have a test rating less than the mode?all appearing in the 3.0 class. Of the students having an average college grade of P, 18.2 have test ratings greater than the mode; these ratings all appear in the 4.0 class?none appearing in the 4.5 and 5.0 classes. In this group 33.4 per cent of the students have a test rating less than the mode?27.3 per cent appearing in the 3.0 class and 6.1 per cent appearing in the 2.5 class. In connection with these comparisons, which show a decrease in test ratings percentage values above the mode and an increase below the mode as we descend in the scale of college grades; it is also interesting to note an increase in the magnitude of the mode as we descend in the scale of college grades: D?37.5 per cent, G?42.0 per cent, P?48.4 per cent. The number of students having N for an average college grade is so small that a comparison of test ratings within this group is not significant. In general then we may conclude that a comparison of the average college grade and of the test ratings of individual students shows that the less proficient a student is in college work, the less proficiency he displays in the tests. The foregoing comparisons are presented in the graphs on the following page.

An average test rating is an index of a subject’s proficiency. The concept proficiency connotes the efficiency in a certain number of operations. An operation may be so simple as to arouse few mental abilities. In a more complex operation a greater number of mental abilities may be stimulated, or the same mental abilities may be aroused as in a simpler operation, only to a higher degree. In performing the formboard test the operation consists in replacing the blocks. For an adult this is so simple an operation that it becomes a speed test. The operation in the cylinder test also consists in replacing the blocks. This is a more complex operation because a greater number of movements are required and it calls for a higher performance level. Discrimination, analytic concentration and distributive attention are very definitely brought into play. This is also true of the formboard, but the nature of the cylinder test stresses these mental abilities or capacities. The fact that it is possible to observe various mental abilities or capacities in operation while the subject is performing a test of this type, gives these performance tests a definite and peculiar value in determining the proficiency of an individual. This is especially true of the cylinder test for the reason that no subject in three trials will replace the blocks with the minimum number of moves required.

The operation in the graphic reproduction of the Binet Paragraph may be considered as being more complex than the operations involved in the performance of the formboard and cylinder tests because of certain intellectual acquirements necessary. In the Trabue language test imagination, a very fundamental and imA COMPARISON OF TEST RATINGS. 227 COLLEGE GRADE D 12.570,11.5%. 25% .37,5%, 12.5% 0,0% S.O ^-.5 *.0 3.5 3.0 2.5 GRAPH 1. GRAPH 1. COLLEGE GRADE! 6 0.0% | s% .22% . 42% . ZS% 5.0 4.5 4?0 3.5 5.0 2.5 GRAPH 2. GRAPH 2. college grade p 0.0%. 0.0 % tg.Z%t48.4%?27.5%. 6.1% t 5.0 4;5 4.0 3.5 3.0 2.5 GRAPH 3. GRAPH 3. portant ability is stimulated. This ability is also brought into play by the formboard and cylinder tests. In the language test, however, a specific kind of imagination, the recalling of words, the result of a specific kind of training, is called into play. The average test rating, therefore, indicates the efficiency exhibited in the performance of certain operations, that is to say, the average test rating is a proficiency index.

In the following table and Graph 4 are shown the distribution of the college grades, the grades which the students received in Psychology 1, and the average test ratings. Distribution of College Grades Distribution of Grades in Psychology 1 Distribution of Test Ratings D. 8.5% 4.4% 1.0% G. 53.2% 25.8% 25.5% P. 35% 50.5% 70.2% N. 3.2% 11.8% 3.2% F. 0.0% 7.5% 0.0%

The comparison of college grades and test ratings shows that on the whole students displayed greater proficiency in college work than in the test. In my opinion this difference between the proficiency indices of the college grades and test ratings can be explained in the following terms: the standards in the test are rigidly and definitely fixed; in college work the student is granted the privilege of “making up” an exercise if it happens to be unsatisfactory; recitations and quizzes also help the student to prepare specifically for the examination at the end of the courcs. This is not true of tests. The students in Psychology 1 constituted a selected group composed of sophomores, juniors and seniors. There was only a small number of freshmen in the course.

A comparison of the grades in Psychology 1 with the test ratings in the D, G and P groups shows that the former more nearly approximate the latter than the college grades. 11.8 per cent of the students received N for a grade in Psychology 1, while only 3.2 per cent received a similar grade in the college grades and test ratings. 7.5 per cent received F as a final grade in Psychology 1, while there are no students receiving a similar grade either in their college work or performance of the tests. In my opinion this is not so much due to a lack of ability on the part of the students who failed to pass the course in Psychology 1 as it is due to extraneous factors. Psychology may not appeal to these students in such a manner as to arouse sufficient interest. A new terminology has to be acquired and old concepts revamped. Terms such as sensation, for example, have a definite connotation differing from that which the student was wont to apply. In order to acquire these new meanings, and these new terms strict application is necessary throughout the greater part of the term in Psychology 1. The assignments are of such a character that if the student lags behind for a few weeks only he is unable to do justice to himself or to the subject. I believe that at the beginning Psychology 1 requires closer application than a majority of other college subjects. Some of the students fail to realize this fact until it is too late. Of course, no psychological tests are able to show whether a student will put forth effort to the best of his ability in college work. The contention may be urged that the tests given were entirely too easy*. If this were the fact there should be a larger percentage of D’s and G’sr and the graphs would represent a curve skewed toward the D end and not in the opposite direction. With two exceptions the limits in the tests are determined by the results of the individuals of the group themselves.

If the tests had been used as a criterion to any purpose, all students, with two exceptions, having passed the tests would have done satisfactory college work.

^ ? o _D_ Coll. Psy. Tests Coll. Pay. Tests Coll. Psy. Tests Coll. Psy. Tests Coll. Psy. Tests 8.5% 4.4% 1.0% 53.2% 25.8% 25.5% 35.0% 50.5% 70.2% 3.2% 11.8% 3.2% 0.0% 7.5% 0.0% D G P N F GRAPH 4.

Disclaimer

The historical material in this project falls into one of three categories for clearances and permissions:

  1. Material currently under copyright, made available with a Creative Commons license chosen by the publisher.

  2. Material that is in the public domain

  3. Material identified by the Welcome Trust as an Orphan Work, made available with a Creative Commons Attribution-NonCommercial 4.0 International License.

While we are in the process of adding metadata to the articles, please check the article at its original source for specific copyrights.

See https://www.ncbi.nlm.nih.gov/pmc/about/scanning/