Standard Tests and Scales of Measurements

Author:
  1. Bruce Birch, Ph.D., J

Wittenberg College, Springfield, Ohio.

Every teacher and every superintendent depends upon some tests or scales of measurement in determining the progress of pupils, grades, schools, or the efficiency of a school system. It is quite apparent that the old standards are unsatisfactory for accurately testing the product of the schools. Too much play is permitted the subjective element, the personal equation, mere opinion, or a ‘priori judgments, in the estimate of academic attainments and educational progress; as is seen in the great variations shown by teachers in estimating the work of individual pupils in the same branch, in different branches, or in similar work at different times. The teacher seems to arrive at some degree of accuracy in estimating the progress of pupils by the trial and error method.

Even the examination as an objective method of testing the progress of pupils, is too crude a standard of measurement. By it alone a teacher cannot secure any precise estimate of the progress of pupils, individually or collectively; nor can a superintendent judge of the efficiency of schools or school systems in terms of results. Each teacher usually establishes a self-made standard by measuring all pupils in terms of the grades of several. The more radically a teacher differs from other teachers in this estimate, the more he should have the assistance of an objective standard. Parker2 states that when the same geometry examination papers were marked by the teachers of mathematics in forty-three high schools, where 70 per cent is the passing mark, the grades varied from 25 per cent to 90 per cent; while in seventy-five schools, where the passing mark is 75 per cent the grades varied from 39 per cent to 85 per cent.

We are told that “human life is a deeper and more complicated subject than can be probed by quantitative tests, that the important elements in mental and moral development of pupils are of an intangible character and can’t be confined to terms of measurements,” that the spiritual side of education, though real, is so vague and indefinite that in all probability it defies and escapes measurement. But when these more subtle components of education have been ‘A paper read before the Schoolmasters’ Club of Central Ohio, and prepared after listening to discussions at the recent N. E. A. meeting in Detroit, with the assistance of Part I of the Fifteenth Yearbook of the National Society for the Study of Education. 5 Methods of Teaching in High Schools, p. 566. excluded, there is no doubt of the fact that there are other elements in education, which are primarily objective and can be measured with reasonable exactness. The more clearly the objective results of education are understood, the greater the appreciation of the spiritual elements in education. There is, therefore, need to devise and to use more precise methods of measurements to estimate the progress of pupils, grades, and schools, and to test the efficiency of school systems. Any measurement of results, says Strayer, “furnishes primarily a knowledge of a situation which makes clear the problems involved and which suggest a method of experiment that looks toward desired improvement.”

To prove the claim that progress in accomplishment has been made and that the aim of education has been realized in school work, it is necessary as Strayer says, “to secure more and better instruments of precision in the measurement as obtained by scales or units.” Such objective scales are needed to eliminate or at least properly subordinate all unsupported opinion or even cumulative a priori judgments; to get rid of subjective variation and to reduce the amount of variability of judgment of the same teacher or of various teachers to the minimum. The validity of a scale consists in its power to reduce the variability of judgment of a teacher’s estimate. This is accomplished by an objective measurement, which is defined as the result of an attempt to establish a definite standard under controlled conditions. “Teachers need to see the value of controlled experiments,” as Courtis stated at Detroit in the Normal School section of the N. E. A., for “objective measurements show that 40 to 50 per cent of pupils stand still.” That statement may be exaggerated, but it contains much food for reflection. The aims or purposes of standard tests or scales are manifold. It is understood of course, that the scales are not for daily use. They may be employed to discover at the beginning of a term what pupils know before being taught, and then again later to determine in a more accurate manner the progress made by pupils. They also assist the teacher’s judgment in forming a basis for inference, thus freeing the teacher from the charge of partiality. But the primary and distinctive purpose is to improve directly the instruction of pupils. They may lead to a definiteness in school work by serving the following uses:

(a) To show to the superintendent, or the principal, the extent to which their plans have been correctly interpreted and put into operation; to furnish them and the teacher with useful facts concerning pupils and classes, and to indicate general tendencies in the school system as a whole. | This knowledge may provide” a sound basis for the supervisor in judging the efficiency of teachers, and in determining the standard and needs of pupils. Such information will suggest some necessary changes in revision or in the introduction of new methods.

(b) To determine the most economical and efficient method of teaching school subjects. (c) Not merely to accumulate educational statistics, but to enable the superintendent to check the results of his school system by a scientific test. (d) To discover the variability of grades and schools, and by the employment of the average results of several like grades to secure a standard of measurement. Such tests can be applied especially to the accuracy of work done. (e) To test by a relatively stable standard what content is retained by pupils for later use in their work, since certain content must be basic for the interpretation of future work; and also to indicate the standard units of work required for promotion. (/) To secure as Whipple1 says “a relatively refined and precise method of more accurate determination of the mental traits or the general mental status of a pupil than can be measured by other methods, as by inspection of his marks or his school progress in terms of the teacher’s personal estimate.” (ig) To discover the bright pupils of every class who are overpracticed, or over-drilled, and release them from practice work. Able pupils are often harmed by too long continuance in drill while the dull or slower pupils may profit by more drill. In this way the variation in a group may be reduced to a minimum. Melcher states that “the majority of pupils are average pupils and should move in mass,” yet “there is a considerable number of especially slow pupils and also of especially bright pupils that should not be sacrificed to mass movement.” When a pupil reaches the standard in his grade, he may give his attention to other work, and be tested in the lapsed branch about every ten days. When the individual standard is in danger of becoming lowered, pupils should be put back into the work.

Starch believes that “one-third of the pupils waste time by being in classes in which they know practically all the material that is being covered in the recitation period and are able to perform all the tasks expected of them.” “One pupil out of every three is promoted too slowly and one pupil out of every three is promoted too rapidly.” If these statements are correct, there is great need of more definite standards, by which pupils can be compared so as to ascertain whether 1 Whipple, G. M. Manual of Physical and Mental Tests, p. 549.

or not they are up to standard. “Standard scales indicate to the teacher that there is no further need to waste time in drilling the bright pupils; and release from drill does not discourage pupils, but rather encourages them to excel the standard.” The present average standard is good for the sub-normal, but not so good for super-normal pupils; indeed, it makes no provision for the super-normal. But the standard scales provide a means of interpretation of the pupil, determining the presence of general ability, and the presence or absence of special ability, as revealed especially by the mixed relation test. Thus they assist both the super-normal and the sub-normal. They indicate to the teacher those who should be promoted and also aid in the securing of reasons why pupils fail. Those pupils who have not shown progress can be selected and assigned to a normal student, or a graduate student, or even a skilled teacher, to find out the causes for failure to make progress. The value of properly used standard tests and scales is seen in the increase in “good teaching, and good grading, which keeps pupils of like ability together.”

Whipple says that “the object of mental tests, practically considered, is to secure by a relatively refined and precise method, a more accurate determination of the mental traits or the general mental status.” According to him there are two kinds of tests: 1. Those which “aim to determine with some precision the presence or the absence, or the amount of some specific mental characteristic.”

2. Those which “aim to determine with perhaps somewhat less precision the general status of the child’s intelligence, his mental level, or general all-around ability as related to other children of the same nationality, sex, age, and social status.” Buckingham makes essentially a three-fold classification of tests on the basis of objective means of measuring school products: 1. Scales, or tests “based upon the judgments of competent persons.” 2. Scales, or tests based upon “the ratio of correct responses to total responses” in a typical group. These systematized scales are based upon many responses, or the ratio method, the per cent of correctness. The scales based upon the determination of “the ratio of correct responses to total responses” are somewhat dependent upon individual judgment. It is frequently a matter of opinion whether a response is correct or not, and judgment plays a more important role in considering the most definite STANDARD TESTS AND SCALES. 53 subjects (spelling and arithmetic) and the least definite (penmanship, drawing, and English composition). 3. Mixed scales or tests; where both the judgment and ratio methods are employed in subjects ranging between the extremes of definiteness, (as in geography, history, and grammar) for which correct ratio scales must be worked out.

“The movement for measurement is merely an application of scientific methods to the study of educational problems.” The question is: whether or not tests and scales “are of value to superintendents and teachers generally.” In reply to this question many objections have been offered:? 1. The standards are defective?there is need to standardize the standards. 2. Too much time and energy may be spent in performing the tests, or at least in overdoing it. 3. The testing interferes with school work, and makes all uncomfortable; nevertheless the value of such testing appears, if it leads to definite facts and results. 4. The teacher is not trained to it, does not know how to employ the tests, may secure merely useless data; or may overemphasize the branch in which testing is made to the neglect of other branches in which testing is not employed. Intensive work of one group upon a certain branch may distract from the other branches; but the teacher must see that the other branches are not neglected. Teachers must not set too high a standard by over-emphasizing particular branches, for there is no justifiable reason for speeding up in one branch at the expense of others. In fact, there should not be any speeding up before the fourth grade at least.

5. Teachers usually do not know which tests are best, but a detailed study of individual tests soon informs the teacher concerning the particular purpose and application of a special scale to given conditions, pupils, or groups of pupils.

6. The best teachers are selected to instruct the abnormal pupils. That is not the case as regards these standard tests of school products, for the tests are submitted by the regular teachers to all the pupils of their group. 7. The tests are determined by traditional psychology, by past theories. Such is not the case, for they are born of practice rather than of abstract theory. 8. The employment of standards may lead to the disregard of the value of grades and the teacher’s personal estimate. This is not true, for the results of the use of standards may mean very little, or may give rise to many errors, if the other helps are entirely disregarded. 9. They tend to uniformity. The one great weakness of the tests and scales is that the tests result in the standardizing of the individual pupil. They aim at the attainment of a grade of work rather than establishing units of work. Promotion is a transfer to other levels of work, and should be made with due consideration to age and units of work done. The purpose of tests is to establish a median level which agrees with the average of growth over a long period, and then group pupils and promote pupils as tested. No promotion is determined on the basis of the test, but the teacher does promote when he sees a pupil bright along all lines; or when a pupil manifests special ability the work may be fitted to his ability. 10. These tests aim at the standardizing of pupils, and may lead to the making of many classes, at great financial outlay. There should also be a standardizing of subjects. The characteristics manifested by pupils at certain ages should be discovered, and then such subject matter selected as is best adapted to the nature of the pupil during such periods; and especially should standardizing studies go hand in hand with standardizing pupils in the seventh and eighth grades.

Instead of many institutions trying to do the work of standardizing tests, a department of research should be created in the school of education of a university, preferably a state university. Such a department should cooperate with the state superintendent of public imstruction, so that all the forces under the immediate control of the state could be unified for economical and efficient work, both in devising scales and standards, and in formulating directions for the practical testing of them by public school teachers. Within the university, students of education, who should be chiefly graduate students, can be trained to a proper conception of the importance of the work, and can be given such knowledge of the technique of standard tests and scales, as will enable them to devise new standards and to test them in a specific and practical way within the state.

Just as there is in many state universities an agricultural extension department to carry on the work of instructing farmers, so there should be an educational extension department, organized to carry on this work so important to the school system and so vital to the highest welfare of the state. Such a department could send out trained students of education to assist in breaking down opposition, in creating a correct attitude towards tests, and in developing a proper conception of the value of standardized tests and scales. It may also train the teachers to use the tests intelligently and effectively. There is great need of wise direction of testing in a definite, purposive way. There is danger in an excessive number of measurements unless sufficient directions are given to the teacher employing the tests. After the teachers have secured the data in their school laboratory, instructions should be given relative to a proper procedure in examining the results and diagnosing and correcting weaknesses. It is very difficult to find causes and to correct them.

The university may then test all the results. In fact, the school of education should be a clearing-house and repository for records of results obtained within the state, and also a distributor of the conclusions drawn from a large body of definite facts. State-wide tests continued over a long period of time may in this way be coordinated and made most valuable to the schools of the state. Superintendents, principals and teachers, who are at present indifferent to this work, will soon discover the value of systematic cooperation, when they are shown the reasonableness of the procedure and its practical value to them. They naturally do not like to be annoyed by many different institutions which seek permission to send ill-trained investigators into their schools.

The only way to get a teacher to know how to use the scale is to have him use it in his own laboratory?the school room. It is not sufficient merely to tell him how to use it. There is need of a psychological clinic conducted by regular instructors, so that scientific training may be given under controlled conditions. It is unwise to use more than one scale at a time, or to emphasize too many points at once. . The scales should be used over a long period of time. Theories Avill not always work out in detail, especially over a short period. The teacher must “not only directly measure ability to give information,” but also “indirectly measure ability of a general sort, including the power to think.” The practicability of indirect measurement is clearly evident although errors may appear. In order to prevent error, as far as possible, Courtis1 advises the standardizing of standards by annalyzing and controlling the factors involved, and recommends the following procedure: 1 Courtis, S. A. Outline of Standardization of Teachers’ Examination Factors involved; as illustrated in a reading test. “A. Mechanical factors. 1. Structure of sentence. 2. Length of sentence. 3. Size of type. 4. Spacing. 5. Length and character of the line. 6. Position (upside down, or not). 7. Difficulties due to word recognition. “B. Content factors. 1. Familiarity. 2. Incentive. 3. Content and experience factor. Difficulty of word determined by frequency and recency of use, interest, emotional atmosphere surrounding conditions of use. 4. Need of an analysis of the vocabulary of children in terms of frequency and conditions of use. 5. Final test of equal units is reaction of unselected groups of children. “C. Condition factors. 1. Testing conditions. (1) Incentive; instructions, examiner, manner. (2) Timing and length of tests; accuracy, fatigue, distribution. (3) Physical conditions; light, heat, paper, ink, etc. 2. Scoring and tabulation. (1) Objective marking; approximate methods. (2) Need of simplicity. (3) Judgment scales subject to change. The actual constructing and testing of conditions is a scientific activity, and involves a real problem; for it is a great task to devise on a scientific basis tests and examinations that are valid and capable of administration by class room teachers, and unless those factors are present, measurements will prove ineffective, except for supervision purposes.” STANDARD TESTS AND SCALES. 57 While these standards are for school use, their demands should be higher than the standards society requires; for it is hardly possible that pupils will be trained above the necessary level of actual life. Many conclusions were brought out in the discussion of aims and purposes. The following is a brief summary: 1. School communities secure better results in the treatment of fundamental branches where scales have been used. 2. A decided improvement in teaching is to be noted. 3. A large body of definite fact is secured. 4. There is a reduction of the average running expenses in teaching, and also in management. 5. The need of scales for all subjects has been demonstrated. 6. A correlation may be established among pupils who are tested in several branches, as in reading, writing, and spelling. 7. The variability seen in the teacher’s estimate of a pupil’s progress is reduced to a minimum. 8. Pupils, teachers, and community, are incited to greater interest in the subjects tested. This interest should be communicated to other branches where tests are not already employed.

Disclaimer

The historical material in this project falls into one of three categories for clearances and permissions:

  1. Material currently under copyright, made available with a Creative Commons license chosen by the publisher.

  2. Material that is in the public domain

  3. Material identified by the Welcome Trust as an Orphan Work, made available with a Creative Commons Attribution-NonCommercial 4.0 International License.

While we are in the process of adding metadata to the articles, please check the article at its original source for specific copyrights.

See https://www.ncbi.nlm.nih.gov/pmc/about/scanning/