The effect of Hausman and Bootstrap applications to the Rasch measurement model in educational testing
- Publication Type:
- Thesis
- Issue Date:
- 2007
Closed Access
Filename | Description | Size | |||
---|---|---|---|---|---|
01Front.pdf | contents and abstract | 1.31 MB | |||
02Whole.pdf | thesis | 24.41 MB |
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
NO FULL TEXT AVAILABLE. Access is restricted indefinitely. ----- BACKGROUND: Since, the introduction of the Rasch (1960) measurement model to
educational and psychological testing, several tests for determining model fit in Rasch
have been proposed. The two most commonly used tests for evaluating goodness of fit
for Rasch are the chi-square fit statistics (Linacre & Wright, 1994; Wright &
Panchapakassen, 1969) and the conditional maximum likelihood method (Andersen,
1973a,b). Other model selections for Rasch are the Wald test, the Lagrange Multiplier
test (Fischer & Molenaar, 1995) and the Hausman test (Hausman, 1978; Weesie, 1999).
In this study, two statistical methods used are the bootstrap and Hausman test. These
two methods were introduced to improve on the existing methods for estimating fit
statistics in Rasch especially, when one is faced with poorly fitted items, missing items
and items collected at different levels of hierarchy.
This thesis is subdivided into three parts. First it sheds light upon the application of the
fit statistics to multilevel Rasch model and items that are poorly fitted. Second, it
examines five different statistical methods for imputing missing dichotomously-scored
items and third, the thesis introduces the bootstrap application to the Rasch
measurement model for the first time.
OBJECTIVES: The main objectives of this study are fourfold: firstly, to determine
whether the bootstrap method produces the same result as the parametric Rasch item
difficulty estimates, secondly, to examine the Hausman test for poorly fitted items;
thirdly, to assess whether the bootstrap imputation method is statistically better than
other parametric imputation methods when estimating Infit and Outfit statistics using
standard Rasch software; and, lastly the use of the Hausman specification test to
determine the fit statistics for Rasch at two levels of hierarchy.
HYPOTHESES: The main hypotheses of this thesis are that
(a) The bootstrap replicate (B=lOOO) produces similar standard error values as the
parametric method (N=200);
(b) The bootstrap imputation method does reduces imputation bias; and
(c) There are statistical differences between two level and three level item difficulty
estimates.
PARTICIPANTS: The participants in study 1 comprised 200 high school students
(male= 120, female= 80) ranging in age from 15to19 years (mean= 16.Syears, SD= 1.8)
in Nigeria. The sample comprised largely disadvantaged students from low socioeconomic
backgrounds.
The participants in study 2 comprised 2375 high school students in Indonesia (male =
1217, female= 1158) ranging in age from 14 to 16. Since, this study focused primarily
on imputation methods, only data from participants who failed to complete 10 percent
to 26 percent of responses were retained, yielding a sample of 644 respondents.
Study 3 participants comprised 50 students (male=26, female = 24) ranging in age from
12.5 to 13 years (mean= 12.98 years, SD= 0.14) from the International Association for
the Evaluation of Educational Achievement mathematics study for high school pupils
in Australia. The sample of the population was drawn in two stages. Only year 9
students with completed information were used in the analysis. The sample used in
this study consisted of three types of schools that is, 52% comprehensive; 26% selective
academic and 22% selective vocational.
STATISTICAL METHODS: The statistical methods used were the Bootstrap;
Hausman test, imputation methods and multilevel modelling. These were calculated
using the following statistical software: RUMM, WINSTEPS, STATA and the R-statistical
package.
RESULTS: In the first study, initial analysis using RUMM showed that the fit of the
items to the Rasch model was poor and 1000 bootstrap replicates of the sample were
generated. The main findings were that the simulation and bootstrap method for
estimating the Hausman test for Rasch were statistically better than the parametric
method.
In the second study, five statistical methods: Person mean imputation substitution,
item mean imputation substitution, regression imputation substitution, principle
component imputation substitution (also called parametric imputation methods) and
the bootstrap imputation method were examined for imputing missing items. Results
showed that the mean values, the WINSTEPS Infit, Outfit and item difficulty for the
bootstrap imputation method were similar to the estimates from original sample while,
the parametric imputation methods estimates were inconsistent with the estimates of
the original sample.
Finally, the third study provided item difficulty estimates for two and three levels
Rasch model for hierarchical generalised linear models using a 10 item data set from
the second International Association for the Evaluation of Educational Achievement
(IAEA) mathematics study for high school pupils in Australia and this sample was
conducted in 1978. The findings from the data set suggested that the Hausman test
shows a statistical difference in item difficulty estimates between two - and three-level
Rasch model with random effect.
CONCLUSION: The main purpose of this thesis is to examine the effect of the
bootstrap and Hausman specification test for Rasch. The Hausman specification test
may be superior to Andersen's likelihood ratio test statistics because the Hausman test
procedure required both the estimation of the standard error and the test statistics
while the bootstrap method is a resampling technique.
The findings of the first study that supported the first hypothesis are in two parts. The
first part suggested that larger bootstrap replicate (B=1000 or more) item difficulty
estimate for Rasch is not sensitive to sample size while the second part of the first
study concluded that there was no need to eliminate poorly fitted items as suggested
previously in the literature, rather simulation or bootstrap methods may be considered
as an alternative method for estimating the fit statistics for Rasch. The second study
supported the hypothesis that the bootstrap imputation method was superior and the
estimates produced by other parametric imputation methods were biased. Finally, the
third study concluded that ignoring the multilevel nature of a data set will have an
adverse consequence on how one will interpret item difficulty estimate in Rasch and
this is in favour of the third hypothesis.
The findings from this thesis will not only provide Rasch users with an alternative way
of investigating research question by using the bootstrap method but also demonstrate
how researcher can use the Hausman specification model to determine the fit statistics
for an hierarchically structured data set and for poorly fitted items.
Please use this identifier to cite or link to this item: