» Home

  » Blog

  » Manifesto

  » Contact

A Critique of High Range IQ Tests

This article attempts to highlight some of the main flaws inherent in many of the high range IQ tests constructed by individuals and placed on the Internet. In so doing, it attempts to lay down some basic quality standards. It is not the purpose of this article to discuss possible solutions to individual test items, and requests for such information will not be responded to.

Construct Validity

If various high range tests I have seen around the Internet are anything to go by, it would seem many of the creators are unaware of, or have chosen to overlook, the fact that a psychometric test is supposed to measure something - hence the "metric" part of the name. Many people create these tests and place them on the Internet, leading testees to believe that the final result represents the person's "IQ", without even making clear what exactly the test is supposed to be measuring!

Jonathan Wai, author of the Strict Logical Sequences Exam (Forms I and II), gives a more honest introduction to a high range test, freely admitting that he does not know what the test measures, and that the testee should contact a psychologist for a reliable and valid IQ test if they wish to know their "true" score.

I believe if anyone is going to construct a measuring instrument, they should have at least a working hypothesis as to what is being measured. Otherwise how would they even begin to determine the extent to which what was to be measured was actually measured?

I therefore invite all high range test authors to explain in full on their site what they believe is being measured by their tests, and why.

Item Validity

A professionally published, standardized IQ test will have been thoroughly checked by numerous experts and thoroughly beta-tested before being published to check for bad items. If testees are submitting more than one logical answer to certain items, then it is up to the test author to disambiguate or replace the offending items.

However, it would appear some test authors would prefer not to revise or replace these ambiguous items, for whatever reason. Here is one justification I found on the site of one high range test author for keeping items with more than one possible solution: "Many times people believe that they have successfully responded to an item because their answer simply "completes" some logic. For example: 2,3,5,? could be completed with 8 taking into account an idea of Fibonacci or some other argument. However, this solution is not justified because it ignores that here we have the first four _ _ _ _ _ _. " [sic]

No elaboration is provided as to why a testee assuming the solution is the next Fibonacci number rather than the next prime number (or any well-known numeric sequence) is less logical or less acceptable, from a psychometric and cognitive point of view. What cognitive processes exactly are superior, or lacking, if the testee chooses one interpretation over another, and why?

I am suspicious of tests that leave the testee to write or draw their solutions in full, which also lends itself to multiple interpretations and solutions, and attempts to second-guess which one the author had in mind.

The bold text in the following quote is another astounding claim in defence of bad items: " contains items that have been removed from other tests in the course of twelve years for various reasons. They were too easy, too hard, "ambiguous", never solved, too knowledge-based, too math-biased, or simply weird. Experience has shown though that such "bad" items are often the best for detecting intelligence at the very highest level.".

This seems a very bold assertion without any discussion of the exact cognitive processes at work at the very highest levels of intelligence that might explain why this is so, and how to explain this from a psychometric point of view. It begs the question as to how "never solved" items can be construed as "the best for detecting intelligence at the highest level", since that statement implies that even the most intelligent test takers to date have failed to arrive at the required answers.

Adequate Norm Samples

A key difficulty in norming tests designed for the high range is the very nature of the bell curve - the higher the IQ, the greater the statistical rarity. Most standardized tests stop measuring at about four standard deviations from the mean, or about 160 (s.d.15). The reason for this is as follows. Since an IQ that high is only found in about one individual in 30,000, there are obvious problems in finding enough beta-testers with very high IQs to conduct the rigorous statistical analysis needed to norm a professionally published test. Many psychometricians will tell you that IQ testing is not reliable beyond a certain level.

For that reason, I am quite sceptical of the claims of high range test authors who claim that their tests are reliable in very high ranges - five, six or even more standard deviations from the norm. High range tests tend to be trialled on relatively few individuals, who tend to be self-selected from the established online high IQ community, and whose prior scores on both high range and standardized tests seem to be taken largely on trust.

I therefore query the sample size, and whether the sample size is sufficiently random, especially as it is drawn from an already test-savvy Internet population.

Clarity of the Test

It seems almost too obvious to need to be mentioned, but I have seen test documents that have been so poorly drawn and/or scanned before being uploaded that some items are difficult to discern. This could, of course, make all the difference to how an item is interpreted. Clear printing, format and layout should be a given.

There are also tests out there that contain no guidance as to how the test, or a section of the test, works or what the testee is supposed to do. As far as I can determine all professionally published tests give examples of each task and the examiner is supposed to make sure that the testee understands the task before letting him or her loose on the test questions. One such test contains a section where the "example" consists of a 4cm vertical line marked "5" and a 2cm line at right angles with no marking or number, and the "items" consist of differing length lines at various angles. There were no other directions provided. I am not going to say how many different possible approaches to this task I came up with. Presumably only one of them is what the test author had in mind.

While on the subject of multiple solutions, there are also tests where there is no single solution or required answer - the question is entirely open ended. See some of the later questions in the Sigma Test for an illustration of this. This makes the scoring entirely down to the subjective judgement of the person doing the marking (who may, of course, fail to appreciate the quality of an answer provided by a testee substantially more intelligent than they are!).

It also goes without saying that later questions should not be made unanswerable by having their solution dependent upon having correctly solved an earlier question.

Distinguishing Academic Knowledge and Raw Ability

Unless it is the purpose of the test to measure the application of academic knowledge, it should not be assumed that the testee has any. While facts etc. can be looked up on an untimed, unsupervised, "references allowed" test, there are some types of subject, e.g. higher mathematics, that require more grounding in the subject than can simply be looked up one afternoon in an encyclopaedia or search engine. Items based in, for example, some area of advanced geometry, may be great items for an academic exam but are poor items for an IQ test.

Consistency of Measurement

Arthur Jensen has commented that the g-loading of tests diminishes at a certain level. This could be a problem to do with the type of test items, where in order to make them "hard" the author has assumed advanced academic knowledge, or simply left the testee to work out what is to be done because there are no instructions.

The issue here is whether the same set of cognitive abilities are being measured all the way up and down the IQ scale. It would not be helpful if, when measuring temperature, for instance, at a certain (and not clearly agreed) point on the thermometer other irrelevant physical effects were being conflated with the temperature measurement to an increasing but unknown degree. With IQ tests, it is not helpful if other factors such as specialised knowledge or trained skills are being conflated with raw ability.

One possible workaround I suppose would be for the publishers of standardized tests such as the Wechsler or Stanford-Binet to extend their scales with more intricate block designs, harder to define vocabulary words, more difficult logic sequences etc. I believe that the reason they do not do so, however, is because of the norm sample issue discussed above.

Concluding Remarks

If your IQ is "off the charts", in other words, you took a standardized, psychometrician-administered test and hit the ceiling, then for all practical purposes your IQ is unmeasurable. Testing has its limitations.

A commenter on my blog was not happy with my conclusion that off the top is off the top, believing that finding or developing a test capable of discriminating accurately all the way from IQ 160 to IQ 220 would represent a major achievement for mankind. In response, I asked what was the point of worrying about whether we can accurately distinguish the 99.999 percentile from the 99.999999 percentile when there are far greater issues of concern with regard to how our society treats, and what it expects from, its cognitive outliers.

I discuss some of these problems and possible solutions in my Manifesto for the Exceptional Mind.

Treat high range tests as a piece of fun and enjoy the challenge, in the same way you would do a crossword or Sudoku. But do not attach any importance to the result, and be sceptical of treating your score on them as an "IQ" score.


Since writing this article, I have had several messages offering positive feedback, including from one of the regular writers for the Mensa Research Journal.

You may also be interested in this document written by Marco Ripà entitled HRTs (Big) Flaws.