The Binet-Simon Measuring Scale for Intelligence: Some Criticisms and Suggestions

Leonard P. Ayees, Ph.D., Russell Sage Foundation.

In 1908, the French psychologists Binet and Simon published their second and most famous series of tests for the diagnosis of the grade of intelligence of children. This series, translated and adapted for American use by Goddard and Whipple, has won for itself widespread acceptance and increasing application by those interested in the education of exceptional children. This widespread practical application has demonstrated that in various minor respects the tests, originally developed for use with French children, are ill adapted to the needs of Americans, and, as a result, many workers are at present engaged in trying out minor variations in the hope of improving the series. In other words, there is general agreement that the measuring instrument is based on correct principles, and constructed on the right plan, but that it needs minor adjustments to make it work smoothly under new conditions.

The object of this article is to present considerations which lead the writer to believe that what we must have is a new instrument rather than the readjustment of the old; a new instrument utilizing what is good in the old, but largely planned on different principles and constructed along new lines.

The Binet-Simon tests consist of a series of fifty-six tasks and questions adapted to the capabilities of normal children of from three to thirteen years of age, and their purpose is to provide a measuring scale whereby the intellectual performance of the child tested may be compared with that of the average normal child of the same age.

This arrangement of the tests on an age scale constitutes the feature of greatest value in the Binet-Simon series, and accounts for the enthusiastic reception accorded them immediately upon their publication. It enables the examiner to test a seven-yearold child, for example, and to discover that his mental development is two years in arrears if he is able to pass successfully only the tests assigned for children five years of age and younger. This method provides for the first time a definite and universally intelligible scale by which to measure the performance of the subject. A general idea of the character of the tests for each year may be gained from the following table, which only roughly indicates the nature of the tasks and questions:

Three Years. 32. Counting backwards. 1. Where is your nose, eyes, 33. Writing from dictation. mouth? 34. Comparing objects from 2. Repetition of sentences. memory. 3. Repetition of numbers. Nine Years. 4. Describing pictures. 35. Knowing the date, day, 5. Name of family. month, day of month, Four Years. and year. 6. Sex of child. 36. Reciting days of week. 7. Naming familiar objects. 37. Making change. 8. Repetition of figures. 38. Definition of familiar ob9. Comparison of lines. jects. Five Years. 39. Reading and report. 10. Comparison of weights. 40. Arrangement of weights. 11. Copying square. Ten Years. 12. Making rectangle with di- 41. Reciting months of year. vided card. 42. Naming nine pieces of 13. Counting four cents. money. Six Years. 43. Sentence building. 14. Indicating right hand, left 44. Problem questions. ear. Eleven Years. 15. Repetition of sentence. 45. Detecting absurd state16. Esthetic comparison. ments. 17. Definition of objects. 46. Sentence building. 18. Execution of triple order. 47. Naming sixty words in 19. Own age. three minutes. 20. Knowing morning and 48. Defining abstract terms. afternoon. 49. Sentence building. Seven Years. Tiuelve Years. 21. Unfinished picture. 50. Repetition of figures. 22. Number of fingers. 51. Rhymes. 23. Writing from copy. 52. Repetition of sentence. 24. Copying a diamond. 53. Problem questions. 25. Repetition of figures. Thirteen Years. 26. Description of pictures. 54. Drawing from design cut, in 27. Counting thirteen cents. paper. 28. Naming four common coins. 55. Describing figure made Eight Years. from reversed triangle. 29. Reading and report. 56. Differences between ab30. Counting money. stract terms. 31. Naming four colors.

These tests are designed to measure native ability, not scholastic attainment. They aim to provide the investigator with an instrument which will enable him to form a trustworthy estimate of the child’s capacity for adapting himself to his social environment, and so are designed with special reference to evaluating his judgment, good sense, initiative, and adaptability. Their value as a measure of this kind of intellectual capacity depends on whether or not they really test the qualities they aim to test and with what degree of accuracy. It is the opinion of the writer that they may be greatly improved in both respects. His criticisms fall under six general heads:

I. The tests predominantly reflect the child’s ability to use words fluently, and only in small measure his ability to do acts. II. Five of them depend on the child’s recent environmental experience.

  1. Seven depend on his ability to read or write.

IV. Too great weight is given to tests of ability to repeat words and numbers. V. Too great weight is given to “puzzle tests.” VI. Unreasonable emphasis is given to tests of ability to define abstract terms.

I. Talking vs. Doing. The first and most serious criticism is that the ability predominantly measured by these tests is the child’s ability to use words fluently, and that this gives a warped and partial measure of his real degree of intelligence. The keynote of this criticism is founded in the tests themselves, among the questions assigned for ten-year-old children, where the child is asked, “Why should you judge a person by what he does rather than by what he says?” A correct answer to this question would be that we should judge a person by his acts rather than by his words, because his acts are accurate indicators of what he really is, whereas his words may have only the slightest relation to his real self. This principle is reflected in the proverbs and literature of every age, and of all peoples. It is so axiomatic that Binet and Simon have rightly assumed that it forms a part of the knowledge of every normal ten-year-old child. Nevertheless, by careful count, two-thirds of their tests are tests of the child’s ability to use words, and only one-third indicate his ability to act. The assumption seems to be that native ability to do can be tested by testing the ability to use words about doing.

The fallacy of this assumption is acutely appreciated by the school superintendent who is forced to select principals and teachers on the basis of examinations, and then sadly observes the striking contrast between the way in which the candidates describe how they would teach and administer, and the way in which they actually succeed in teaching and administering. Federal, state and municipal officials, whose field and office forces are selected on the basis of examinations, are equally alive to similar common discrepancies between words and deeds.

The root of the fallacy is the fundamental fact that the motivating stimuli which shape one’s actions in coping with a real problem in life are invariably multiple and complex, whereas those which determine his answer to a hypothetical question are simple, few, and different in quality.

An illustration of this may be secured by putting to a number of intelligent adults of demonstrated practical ability the questions assigned to ten-year-old children in the tests under consideration. One of these reads, “What ought one to do before taking part in an important affair ?” The writer’s experience in putting this question to business men is not encouraging. A few answers have been received ranging from “Take a bath,” and, “Put on your best clothes,” to “Take some money from the bank” and “Transfer your property to your wife”; but in general those questioned reply with energetic expressions of short and ugly words and emphatic protestations that the question is unanswerable. Again, these problem-questions overlook the importance of habit and of the emotions in influencing action. The ten-year-old child or the adult of indifferent mental ability may have a ready answer to the questions “What is the thing to do if you find out that your house is on fire ?” “What ought one to do when he has been struck by a playmate who did not do it on purpose ?” and, “What would you do if you were punished when you did not deserve it?” But the child or adult who does just the right thing when he has been struck by another, discovers that his house is on fire, or suffers undeserved punishment, thereby demonstrates a quality and degree of native ability to which few indeed among us may hope to attain.

A still further objection is that the tests assume an agreement between verbal equality and real equality which seldom exists. The blood-curdling series of tests put to eleven-year-old children to discover their ability to detect absurdities well illustrates this defect. The statement:

“There was found in the park to-day the body of an unBINET-SIMON SCALE FOR INTELLIGENCE. 191 fortunate young girl, frightfully mutilated, and chopped into eighteen pieces. It is thought that she committed suicide.” may well be a pleasant and entertaining narrative to a normal and somewhat phlegmatic child, but constitutes a serious nervous shock to his more sensitive companion. In the same statement read to both children there is verbal equality; in their psychical import there may be the most serious inequality.

  1. Recent Environmental Experience.

Five tests depend in high degree on the child’s recent environmental experiences. These are the tests relating to time and to money. Some of them are “doing” tests, and some of them “saying” ones. The assumption with regard to the “time” questions is that intelligent children, irrespective of school training, should be able to name the day of the week, the month, the day of the month, the year, etc. Experiment among business and professional men shows that they are frequently unable to supply these data off-hand unless the nature of their business requires constant reference to them. Probably every reader will recall that it requires only three or four days of a camping trip or an ocean voyage to lose track of the days of the week and the days of the month, and that a distinct shock is experienced when someone mentions the fact that “to-day is Sunday.”

The writer recalls serving as a member of a Federal jury in the West Indies trying smuggling cases, in which the members of ocean-going trading sloops were the accused. In these cases it was proved beyond any question that these sloop captains were not only illiterate, but that they were absolutely ignorant of the names of the months, and did not keep track of the days of the week, with the exception of the Sundays. Nevertheless, these men were distinctly able and intelligent, spoke several languages, navigated dangerous waters, and carried cargoes of considerable value. In the writer’s opinion, the ability to name off-hand the day of the week and of the month is governed almost entirely by daily work and very little by native ability.

A similar objection, but one probably less serious, arises in connection with those questions having to do with money and making change. These again are abilities largely governed by environment. The ability of a child of ten years to recognize and name at sight a quarter, fifty-cent piece, a five-dollar bill and a ten-dollar bill, depends not on native ability but rather on whether or not lie is accustomed to see, have, handle or spend these pieces of money.

III. Ability to Bead and Write. Seven tests depend on the subject’s ability to read and write, which commonly depends on the amount and kind of school experience he has had, and may be only slightly related to his native ability.

IV. Repetition of Words and Numbers. The repetition of words and numbers has an even more remote relation to the ability to cope with the problems of life, and yet one-seventh of the tests are of this sort. The simpler of them can be successfully passed by a gifted parrot; the more difficult ones recently proved beyond the ability of a university professor tested by the writer.

  1. Puzzle Tests.

Several of the tests seem best designated as “puzzle tests,” and appear to have strikingly little relation to anything the normal person has to do in the ordinary day’s work. Such a one is the demand that the eight-year-old child count backwards from twenty. Counting backwards is one of the rarest things most people are called to do, and yet the proposal has recently been seriously made that these present tests be “improved” by requiring the subject to recite the names of the months from December back to January instead of forward from January to December. To teach children to recite backwards lists of words that have a normal fixed order is educationally vicious. To include such a requirement in tests of intellectual ability is at least questionable. Another “puzzle test” for thirteen-year-old children seems as foreign to everyday experience as the foregoing. It uses for material a visiting card cut along the diagonal and asks the child to describe the resulting shape if one of the triangles were turned about and placed so that its short leg was on the other hypotenuse and its right angle at the smaller of the two acute angles. So far the writer has failed to find any one able to describe the resulting shape.

  1. Abstract Terms.

Definitions of abstract terms and expressions of the difference between abstract words of similar sound but different meaning constitute the last class of tests to be here considered. The first objection to these is that philosophers are almost the only people who think in abstractions and the second objection is that words of peculiarly difficult character have been chosen. Let the reader himself try the eleven-year-old test which demands definitions of Charity, Justice, and Goodness. To pass he must give two good definitions. Then let him try the thirteen-year-old tests and tell the difference between Pleasure and Honor Evolution and Revolution Event and Advent Poverty and Misery Pride and Pretension.

The third of these pairs is a good one to try on your friends. If not satisfied with their explanations recourse may be had to the Standard Dictionary where one will be rewarded by finding that an advent is the coming of an event, but just what the difference between them is remains undiscovered.

To sum up the case to this point: two-thirds of the BinetSimon tests are tests of the child’s ability to use words, and only one-third tests of his ability to do acts. Among the reasons why certain of the tests fall short of providing satisfactory criteria for the judging of native ability are the following:?

1. They overlook the fundamental difference between the multiple and complex stimuli which contribute to the motivating impulse in coping with real problems and the few simple ones entering as factors in answering questions or obeying commands. 2. The importance of the emotions and habit in influencing action is disregarded.

  1. Peal equality is attributed to verbal equality.

4. Ability to answer many of the questions depends on the child’s daily environmental experiences which differ radically among different children.

5. Ability to meet the requirements of several of the tests depends directly on the excellence of the child’s schooling. 6. Several tests depend on the mere ability to repeat words and numbers.

7. Counting backwards and solving puzzles constitute several tests.

8. Several tests turn on the ability to express in words comprehension of difficult, abstract terms. There are two important sets of evidence in favor of the tests and they are both good in the sense that they constitute “pragmatic” arguments showing that the tests “work” successfully when applied. In the first place these tests have won rapid and widespread use and endorsement among hundreds of practical teachers and workers with children, whereas all previous tests of intelligence have been practically restricted in use to workers in psychological laboratories. In the writer’s opinion the reason for this has been pointed out by Professor Terman, of Stanford University, who calls attention to the fact that here for the first time we have a set of tests arranged with reference to steps on a scale which is constant and universally understood. Everyone has a fairly accurate conception of what is meant when one says that a given child shows intelligence equal to that of a ten-year-old normal child. We have had graded tests before but no one knew what the steps on the scale meant in terms of anything else, or where the lower end began or how far the upper end reached. This application of tests to a definite, universally understood scale constitutes the great contribution of Binet and Simon and it is so important a contribution that its excellence outweighs the shortcomings of the tests themselves.

The second set of evidence consists of the records of applying the tests to large numbers of normal school children with the result that the distribution of the children into retarded, normal, and advanced groups corresponds fairly well with what is termed in statistics the normal frequency distribution. Such studies have been made by Binet and Simon in France with 203 normal children and by Goddard in this country with 1547 children. In both cases the results showed about such a distribution of retarded, normal and advanced children as the theory of normal frequency distribution tells us that we should find, and in both cases the results have been widely cited as constituting a scientific demonstration of the correctness of the tests in so far as their degree of difficulty is concerned.

Unfortunately this conclusion is hardly justified by the results of the investigations as made public, for the reason that we have only the mass figures for the entire group tested and not the figures showing the results for children of each age. This process hides the details from view and if, as many workers report, the tests for the youngest children are too easy and those for the oldest ones too hard, these important facts are concealed by putting all the results for all the ages together. How this works is illustrated by comparing the results obtained by Goddard in his application of the Binet-Simon tests to 1547 normal school children with data recently gathered by the writer showing the progress of children in the public elementary schools of twenty-eight cities. In the accompanying diagram the solid line represents the distribution for the children tested by Goddard according as the tests showed them to be normal, one year behind, one year ahead, two years behind, two years ahead, etc. The dotted line is based on data showing how long it has taken 14,762 children in twenty-eight American cities to complete the work of seven grades. Those who have done so in seven years are rated as normal; those who have taken six years, as one year ahead in progress; those taking eight years, as one year behind, etc. To secure a proper basis of comparison both the Goddard data and the writer’s have been reduced to relative figures and are presented on the basis of 1000 cases:? The significant feature of the diagram is that the curves are closely similar. If the solid curve constitutes a scientific demonstration of the correctness of the Binet tests, then it may fairly be argued that the dotted one constitutes a scientific demonstration that the public school systems and courses of study of these cities are correctly adjusted to the abilities of their pupils; neither too hard nor too easy, but almost exactly right. If this were true there -7 -6 -5 -3 -2 -1 K 1 2 3 ^

Distribution curves showing variations from normal of 1547 children tested by Binet-Simon scale (solid line) and 14,762 children in 28 cities rzted by their progress through seven grades (dotted line). Curves based on relative figures showing distribution of 1000 cases of each kind. would be far less need for securing a measuring scale of intelligence tlum there undoubtedly is, for in our public school system we should have just such a scale, scientifically correct and already at hand.

The fact is, however, that the progress figures from the twentyeight cities referred to show great variations as between different grades and localities, and it is only by combining the figures for all the cities that the nearly normal curve shown is secured. A similar comparison could easily be made with the figures showing the results of the tests made by Binet and Simon in France. Indeed, it so happens that this curve almost exactly coincides with that showing the progress of the children in the eight grades of Bayonne, New Jersey, and here again the almost normal curve disappears when the data are presented by separate grades. i.

In presenting the foregoing considerations the writer does not wish to appear as an antagonist of the Binet-Simon Measuring Scale for Intelligence, for he is not. He does wish to sound a note of warning against accepting them in their present form as final and satisfactory. What is here set down is the result of his own attempts to discover ways in which they may be improved, together with ideas secured through lengthy discussions of their application with Mrs. Louise Stevens Bryant, of the Psychological Clinic of the University of Pennsylvania.

Binet and Simon have done a great and lasting service for the cause of childhood in basing their tests on a definite scale. The present situation offers a splendid opportunity to psychologists, teachers, and mothers to observe, discover, and record things which normal children do and know at each age. Work of this sort conducted by a large number of observers and co-ordinated by some central agency or agencies would soon give us a series of tests retaining all of the good of the present series and replacing present tests wherever experiment and observation show better ones can be found.

Above all let us steadfastly bear in mind that all measuring instruments must be judged for two qualities; first, what they measure, and, secondly, how accurately they measure it. The fact that the Binet-Simon tests are more or less accurately adjusted to the normal capabilities of the children of each age is only one, and the less important, criterion. The problem of paramount importance is whether or not they really measure native ability, and if they fall short, how we may develop a series of tests that will measure it

Disclaimer

The historical material in this project falls into one of three categories for clearances and permissions:

  1. Material currently under copyright, made available with a Creative Commons license chosen by the publisher.

  2. Material that is in the public domain

  3. Material identified by the Welcome Trust as an Orphan Work, made available with a Creative Commons Attribution-NonCommercial 4.0 International License.

While we are in the process of adding metadata to the articles, please check the article at its original source for specific copyrights.

See https://www.ncbi.nlm.nih.gov/pmc/about/scanning/