Introduction
This test was developed
as an Internet self-administered spatial intelligence test for children
and adults, ages 5 and up. A previous test to measure verbal intelligence
in children has been developed by the author, with Internet norming and
very satisfactory reliability and validity properties.
It was felt that a
spatial test would complement this
"Kids
IQ Test and Free IQ Test"
and another Internet verbal test for adults, already marketed by
FunEducation.com in conjunction with Dr. McConochie.
Development
Test content was written
by the author in five categories: everyday physics, worldly knowledge,
patterns and shapes, directions, and common hand tools. Some items were
taken from another spatial intelligence measure previously developed by
the author. An effort was made to keep item content fair to both genders,
and, to maximize aptitude rather than achievement, independent of formal
learning experiences. Specific content can be examined by reviewing the
test online at www.funeducation.com/SpatialIQtest/. Approximately 42
items were written for each section, ranging in estimated difficulty from
very easy for a six year old to difficult for a bright young adult. The
format of the items is multiple-choice, with four or five options per item
plus an
"I
Don't
Know"
option.
Sample #1:
The test was put into
online format by FunEducation staff. 75,000 prior Internet customers were
invited by e-mail to take the pilot version of the test. 862 completed
the test within a couple of weeks. This data was used to study test
properties and create initial norms and reliability data for persons aged
16 to 61. Sample sizes for persons below and above this range were
insufficient for reliable norming at this stage.
Statistical Properties
Males scored slightly
but significantly higher than females on all test sections:
Mean Raw Scores by
Gender (269 males, 478 females) Aged 16 and up:
|
Physics |
Worldly K. |
Patterns |
Directions |
Common tools |
Total |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This finding is
compatible with similar slightly higher scores for males on the
Wechsler-Bellview and WAIS intelligence tests (Wechsler, 1958, p. 144),
though for the WAIS-III these consistent but slight gender differences
have been explained, in some instances, as reflecting achievement rather
than aptitude factors (van der Sluis, 2006). Separate norms for men and
women are used for scoring the present test.
Means and standard
deviations
Means and standard
deviations for the five sub-tests and all norm groups, ages 16 through
60 were quite similar, with means mostly in the 20's
and standard deviations between approximately 4.5 and 7. The range of
scores was also very satisfactory for both sub-tests and the total
score. The range of sub-test scores was typically between about 10 and
35 across 40-42 items. The highest possible total score is 208; the
highest obtained score was 180 across the 862 persons.
Reliabilities were
high and comparable to those for the Wechsler Adult Intelligence Test,
(WAIS-III). For example, for both men and women ages 21-30 the mean
section reliability is .81 and the total score reliability is .95.
Reliabilities were computed by the Kuder-Richardson-21 formula, which in
the author=s
experience yields values about .02 lower than Cronbach Alpha
reliabilities.
The comparable
Wechsler III Performance sub-tests have a median split-half
reliability of .83 and total score reliabilities ranging from .88 to .92
across various age groups (media.wiley). The WAIS-III mean of
test-retest reliabilities for the five performance sub-tests is .75
(range .67 to .81) in a sample of 100 16-29 year-olds. For the total
score, the reliability is .88 for this sample (Tulsky and Zhu, p. 58).
Thus, the present test
reliability appears to be as good as or better than that of the widely
used WAIS-III test.
Internal
consistency
A few test section and
total scores have minor correlations with age and education. When these
factors are controlled for, the five test sections for adults have a
mean between-test correlation of .45 (range .40 to .50) and all
sections correlate substantially with the total score, with a mean of
.75 (range .72 to .78). This is interpreted to indicate that each
section contributes equally a unique and valuable element to the total
score. It is desirable to have low correlations between test sections
and high correlations between test sections and the total score to which
they are contributing, as this maximizes the value of each section as a
unique contributor to the total score and maximizes its reliability.
These values are
somewhat better than those for the early WAIS test. For example,
for a sample of 300 males and females ages 20-34, the mean of the
correlations between the five WAIS Performance tests is .53 (range of
.44 to .62). The median correlation between these five tests and
the total Performance I.Q. score is only .53 (range .44 - .62)
(Wechsler, p. 100).
However, for the more
recent WAIS-III Wechsler test, the mean of the correlations between the
basic five performance tests is .47 (range .37-.60). The mean of the
correlations between these five and the total Performance I.Q. of which
they are a part is .76 (range .68 to .79) (Tulsky and Zhu, 1997).
Thus, the present
author=s
internal consistency test data is close to that of this widely used
current WAIS-III test, with a mean between-test correlation of .45
compared to .47 for the WAIS-III, and a mean of correlations between
sub-tests and total score of .75 compared to .76 for the WAIS-III.
Regarding age, the
only significant correlation for women was with Common Hand Tools
(.27**). For men the significant correlations were for Physics (.16**),
Tools (.28**) and Total Score (.16*).
Regarding education,
no tests correlated significantly with education for men. All did for
women, but not strongly: .10*, .10*, .11*, .23* .12* and .19**
respectively for the five sub-tests and the total score.
This general lack of
relationship between test scores and age and education is interpreted to
indicate that the tests are measuring innate aptitude for learning ("intelligence")
more than amount learned ("achievement"),
and may be relatively "culture free".
Ethnic background was not assessed in the initial test takers due to an
oversight but will be solicited in subsequent use to obtain data for
checking possible minority group bias.
Validity
Other than content
validity, the test currently has no other objectively established
validity. However, given its high reliability, it is expected to be as
valid as the Wechsler Performance tests and similar spatial aptitude
tests for predicting relevant behavior, such as success and enjoyment in
vocations and hobbies requiring spatial aptitude.
Initial Norms and
Report Format
Norms as of February
21, 2007 were for about 1250 persons tested over the Internet. Norms as
of this date were not yet large enough for children under 16 or adults
over 60.
The educational
backgrounds of the 1250 persons who had taken the test as of February,
21, 2007 are as follow:
Highest education |
Frequency |
Percent |
Some high school |
301 |
24.1 |
Completed high school |
204 |
16.3 |
Some college or associates degree |
463 |
37.1 |
Bachelors degree |
175 |
14.0 |
Masters degree |
79 |
6.3 |
Ph D degree |
28 |
2.2 |
Norms as of this
sample were by gender and age group, in 10-year segments from 21 to 30,
etc., except 16 to 20 for teens and young adults. The top group was 51
to 60. I.Q.s are based on a mean of 100 and standard deviation of 15.
The printed report provides all scores, percentile equivalents and a
brief explanation of the general meaning of the scores as measures of
aptitude rather than achievement.
Norms were to be
increased as data is obtained from administration of the test to
Internet customers, as has been done for the Kids I.Q. test, which is
currently normed on several thousand children.
Sample #2, Updating
Research Data
In September of 2007
all data available to date was analyzed. The sample totaled 2,854, 1011
males and 1843 females. The subjects ranged in age from 5 to 90.
Nationality data was available for about 1600 subjects and included
persons from Australia, Canada, Hong Kong, India Ireland New Zealand,
Pakistan, the Philippines, South Africa, the United Kingdom and the
United States. 39 subjects were from "other countries".
Alpha reliability
coefficients were computed for each ten-year age group, e.g. for ages 20
to 29, 30 to 39, etc. separately for males and females. These
alpha coefficients were generally in the .80's and .90's. The
total score alphas were all in the 90's except for males in their 60's
(.86).
Mean raw
scores for the five test sections and total score were plotted on a
graph for each 10-year age group and for children aged 5 to 10. The
mean for each group beginning at group 30 to 40 was computed and plotted
at the midpoint age for that group (e.g. age 35 for the 30 to 39 year
old group. These graphs were prepared for males and females separately.
For the
section scores, the resulting curves rose steadily to age 25 and then
remained essentially level to age 45. After that they gradually
declined so that the raw scores for the 60 to 69 age group were about
equal to those for the 20 to 29 age group. Of note was the more gradual
slope of the Common Hand Tools score and its climb to a later peak age,
the 50s for women and the 60's for men.
These curves
were then used to determine estimated means for each specific age from 5
to 25. Standard deviations for these ages were also estimated based on
data averaged across age groups, e.g. 10 to 19, 20 to 25. These
estimated means were close to the actual means for each teen year, e.g.
14, 15 and 16 but provided a smoothly consistent rise that seems a more
reasonable basis for future norm data. The sample sizes were small for
ages below 10 and the actual scores appear to be for very bright
children. Therefore, for this age range, 5 to 10, the smoothed curve
means are much different than the actual means.
Similarly, standard
deviations tended to vary from age to age, so mean standard deviations
were computed. The smoothed score means and mean standard deviations
are used for I.Q. report computations as of late September, 2007, as
these are judged to closely approximate the true values for the English
speaking population likely to take the test over the Internet.
For example consider
these actual and smoothed scores for the physics sub-test, ages 5
through 25:
Means and Standard Deviations
for Physics Subtest, Males,
Actual and Smoothed, Ages 5
to 25
Age |
Actual Mean |
Smoothed Mean |
|
Actual Standard Deviation |
Mean Standard Deviation |
5 |
20.7 |
11.5 |
|
6.7 |
6.2 |
6 |
16.8 |
12.8 |
|
6.2 |
6.2 |
7 |
14.6 |
14.2 |
|
7.0 |
6.2 |
8 |
15.6 |
15.0 |
|
6.9 |
6.2 |
9 |
18.4 |
16.5 |
|
5.9 |
6.2 |
10 |
17.1 |
18.7 |
|
7.1 |
6.7 |
11 |
17.4 |
19.3 |
|
7.6 |
6.7 |
12 |
20.5 |
20.0 |
|
6.3 |
6.7 |
13 |
20.9 |
20.4 |
|
6.8 |
6.7 |
14 |
20.9 |
20.6 |
|
7.0 |
6.7 |
15 |
22.5 |
21.0 |
|
6.3 |
6.7 |
16 |
22.5 |
21.4 |
|
5.6 |
6.7 |
17 |
23.0 |
21.7 |
|
6.0 |
6.7 |
18 |
21.1 |
22.1 |
|
7.5 |
6.7 |
19 |
22.8 |
22.4 |
|
7.0 |
6.7 |
20 |
23.8 |
22.7 |
|
6.2 |
6.6 |
21 |
23.9 |
22.9 |
|
5.8 |
6.6 |
22 |
23.7 |
23.2 |
|
7.7 |
6.6 |
23 |
21.6 |
23.5 |
|
6.2 |
6.6 |
24 |
23.3 |
23.7 |
|
6.5 |
6.6 |
25 |
21.9 |
24.0 |
|
7.8 |
6.6 |
The total raw
score (across all five sections) tended to rise steadily to the 50 to 59
age group and then decline slightly in a smooth arc.
Norms as of late
September, 2007
Norms as of late
September, 2007 will be by gender and age, for each specific age from 5
to 25 and in clusters of age (e.g. 30 to 39) above 25. The total sample
as of this norming is 2854.
References
http://media.wiley.com/product_data/excerpt/52/04712829/0471282952.pdf.
Tulsky, D.
and Zhu, J., WAIS-III, WMS-III Technical Manual, The Psychological
Corporation, San Antonio, Chicago, New York, 1997, (p. 98).
van der Sluis, S.,
Posthuma, D., Dolan, C., Geus, E., Colom, R. and Boomsma, D., Sex
differences on the Dutch WAIS-III, Intelligence, Vol. 34, p. 283,
2006.
Wechsler, David, The
Measurement and Appraisal of Adult Intelligence, The Williams and
Wilkins Company, Baltimore, 1958.
|