The effect of Hausman and Bootstrap applications to the Rasch measurement model in educational testing

Agho, Kingsley Emwinyore

The effect of Hausman and Bootstrap applications to the Rasch measurement model in educational testing

Agho, Kingsley Emwinyore

Permalink

Publication Type:: Thesis
Issue Date:: 2007

Closed Access

	Filename	Description	Size
	01Front.pdf	contents and abstract	1.31 MB	Adobe PDF	View/Open
	02Whole.pdf	thesis	24.41 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Agho, Kingsley Emwinyore
dc.date.accessioned	2016-11-07T04:38:26Z
dc.date.available	2016-11-07T04:38:26Z
dc.date.issued	2007
dc.identifier.uri	http://hdl.handle.net/10453/60208
dc.description	University of Technology, Sydney. Faculty of Education.	en_AU
dc.description	NO FULL TEXT AVAILABLE. Access is restricted indefinitely. The hardcopy may be available for consultation at the UTS Library.
dc.description.abstract	NO FULL TEXT AVAILABLE. Access is restricted indefinitely. ----- BACKGROUND: Since, the introduction of the Rasch (1960) measurement model to educational and psychological testing, several tests for determining model fit in Rasch have been proposed. The two most commonly used tests for evaluating goodness of fit for Rasch are the chi-square fit statistics (Linacre & Wright, 1994; Wright & Panchapakassen, 1969) and the conditional maximum likelihood method (Andersen, 1973a,b). Other model selections for Rasch are the Wald test, the Lagrange Multiplier test (Fischer & Molenaar, 1995) and the Hausman test (Hausman, 1978; Weesie, 1999). In this study, two statistical methods used are the bootstrap and Hausman test. These two methods were introduced to improve on the existing methods for estimating fit statistics in Rasch especially, when one is faced with poorly fitted items, missing items and items collected at different levels of hierarchy. This thesis is subdivided into three parts. First it sheds light upon the application of the fit statistics to multilevel Rasch model and items that are poorly fitted. Second, it examines five different statistical methods for imputing missing dichotomously-scored items and third, the thesis introduces the bootstrap application to the Rasch measurement model for the first time. OBJECTIVES: The main objectives of this study are fourfold: firstly, to determine whether the bootstrap method produces the same result as the parametric Rasch item difficulty estimates, secondly, to examine the Hausman test for poorly fitted items; thirdly, to assess whether the bootstrap imputation method is statistically better than other parametric imputation methods when estimating Infit and Outfit statistics using standard Rasch software; and, lastly the use of the Hausman specification test to determine the fit statistics for Rasch at two levels of hierarchy. HYPOTHESES: The main hypotheses of this thesis are that (a) The bootstrap replicate (B=lOOO) produces similar standard error values as the parametric method (N=200); (b) The bootstrap imputation method does reduces imputation bias; and (c) There are statistical differences between two level and three level item difficulty estimates. PARTICIPANTS: The participants in study 1 comprised 200 high school students (male= 120, female= 80) ranging in age from 15to19 years (mean= 16.Syears, SD= 1.8) in Nigeria. The sample comprised largely disadvantaged students from low socioeconomic backgrounds. The participants in study 2 comprised 2375 high school students in Indonesia (male = 1217, female= 1158) ranging in age from 14 to 16. Since, this study focused primarily on imputation methods, only data from participants who failed to complete 10 percent to 26 percent of responses were retained, yielding a sample of 644 respondents. Study 3 participants comprised 50 students (male=26, female = 24) ranging in age from 12.5 to 13 years (mean= 12.98 years, SD= 0.14) from the International Association for the Evaluation of Educational Achievement mathematics study for high school pupils in Australia. The sample of the population was drawn in two stages. Only year 9 students with completed information were used in the analysis. The sample used in this study consisted of three types of schools that is, 52% comprehensive; 26% selective academic and 22% selective vocational. STATISTICAL METHODS: The statistical methods used were the Bootstrap; Hausman test, imputation methods and multilevel modelling. These were calculated using the following statistical software: RUMM, WINSTEPS, STATA and the R-statistical package. RESULTS: In the first study, initial analysis using RUMM showed that the fit of the items to the Rasch model was poor and 1000 bootstrap replicates of the sample were generated. The main findings were that the simulation and bootstrap method for estimating the Hausman test for Rasch were statistically better than the parametric method. In the second study, five statistical methods: Person mean imputation substitution, item mean imputation substitution, regression imputation substitution, principle component imputation substitution (also called parametric imputation methods) and the bootstrap imputation method were examined for imputing missing items. Results showed that the mean values, the WINSTEPS Infit, Outfit and item difficulty for the bootstrap imputation method were similar to the estimates from original sample while, the parametric imputation methods estimates were inconsistent with the estimates of the original sample. Finally, the third study provided item difficulty estimates for two and three levels Rasch model for hierarchical generalised linear models using a 10 item data set from the second International Association for the Evaluation of Educational Achievement (IAEA) mathematics study for high school pupils in Australia and this sample was conducted in 1978. The findings from the data set suggested that the Hausman test shows a statistical difference in item difficulty estimates between two - and three-level Rasch model with random effect. CONCLUSION: The main purpose of this thesis is to examine the effect of the bootstrap and Hausman specification test for Rasch. The Hausman specification test may be superior to Andersen's likelihood ratio test statistics because the Hausman test procedure required both the estimation of the standard error and the test statistics while the bootstrap method is a resampling technique. The findings of the first study that supported the first hypothesis are in two parts. The first part suggested that larger bootstrap replicate (B=1000 or more) item difficulty estimate for Rasch is not sensitive to sample size while the second part of the first study concluded that there was no need to eliminate poorly fitted items as suggested previously in the literature, rather simulation or bootstrap methods may be considered as an alternative method for estimating the fit statistics for Rasch. The second study supported the hypothesis that the bootstrap imputation method was superior and the estimates produced by other parametric imputation methods were biased. Finally, the third study concluded that ignoring the multilevel nature of a data set will have an adverse consequence on how one will interpret item difficulty estimate in Rasch and this is in favour of the third hypothesis. The findings from this thesis will not only provide Rasch users with an alternative way of investigating research question by using the bootstrap method but also demonstrate how researcher can use the Hausman specification model to determine the fit statistics for an hierarchically structured data set and for poorly fitted items.	en_AU
dc.format	Thesis (PhD)
dc.language.iso	en_AU	en_AU
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	The effect of Hausman and Bootstrap applications to the Rasch measurement model in educational testing	en_AU
dc.type	Thesis	en_AU
utslib.copyright.status	closed_access

Abstract:

NO FULL TEXT AVAILABLE. Access is restricted indefinitely. ----- BACKGROUND: Since, the introduction of the Rasch (1960) measurement model to educational and psychological testing, several tests for determining model fit in Rasch have been proposed. The two most commonly used tests for evaluating goodness of fit for Rasch are the chi-square fit statistics (Linacre & Wright, 1994; Wright & Panchapakassen, 1969) and the conditional maximum likelihood method (Andersen, 1973a,b). Other model selections for Rasch are the Wald test, the Lagrange Multiplier test (Fischer & Molenaar, 1995) and the Hausman test (Hausman, 1978; Weesie, 1999). In this study, two statistical methods used are the bootstrap and Hausman test. These two methods were introduced to improve on the existing methods for estimating fit statistics in Rasch especially, when one is faced with poorly fitted items, missing items and items collected at different levels of hierarchy. This thesis is subdivided into three parts. First it sheds light upon the application of the fit statistics to multilevel Rasch model and items that are poorly fitted. Second, it examines five different statistical methods for imputing missing dichotomously-scored items and third, the thesis introduces the bootstrap application to the Rasch measurement model for the first time. OBJECTIVES: The main objectives of this study are fourfold: firstly, to determine whether the bootstrap method produces the same result as the parametric Rasch item difficulty estimates, secondly, to examine the Hausman test for poorly fitted items; thirdly, to assess whether the bootstrap imputation method is statistically better than other parametric imputation methods when estimating Infit and Outfit statistics using standard Rasch software; and, lastly the use of the Hausman specification test to determine the fit statistics for Rasch at two levels of hierarchy. HYPOTHESES: The main hypotheses of this thesis are that (a) The bootstrap replicate (B=lOOO) produces similar standard error values as the parametric method (N=200); (b) The bootstrap imputation method does reduces imputation bias; and (c) There are statistical differences between two level and three level item difficulty estimates. PARTICIPANTS: The participants in study 1 comprised 200 high school students (male= 120, female= 80) ranging in age from 15to19 years (mean= 16.Syears, SD= 1.8) in Nigeria. The sample comprised largely disadvantaged students from low socioeconomic backgrounds. The participants in study 2 comprised 2375 high school students in Indonesia (male = 1217, female= 1158) ranging in age from 14 to 16. Since, this study focused primarily on imputation methods, only data from participants who failed to complete 10 percent to 26 percent of responses were retained, yielding a sample of 644 respondents. Study 3 participants comprised 50 students (male=26, female = 24) ranging in age from 12.5 to 13 years (mean= 12.98 years, SD= 0.14) from the International Association for the Evaluation of Educational Achievement mathematics study for high school pupils in Australia. The sample of the population was drawn in two stages. Only year 9 students with completed information were used in the analysis. The sample used in this study consisted of three types of schools that is, 52% comprehensive; 26% selective academic and 22% selective vocational. STATISTICAL METHODS: The statistical methods used were the Bootstrap; Hausman test, imputation methods and multilevel modelling. These were calculated using the following statistical software: RUMM, WINSTEPS, STATA and the R-statistical package. RESULTS: In the first study, initial analysis using RUMM showed that the fit of the items to the Rasch model was poor and 1000 bootstrap replicates of the sample were generated. The main findings were that the simulation and bootstrap method for estimating the Hausman test for Rasch were statistically better than the parametric method. In the second study, five statistical methods: Person mean imputation substitution, item mean imputation substitution, regression imputation substitution, principle component imputation substitution (also called parametric imputation methods) and the bootstrap imputation method were examined for imputing missing items. Results showed that the mean values, the WINSTEPS Infit, Outfit and item difficulty for the bootstrap imputation method were similar to the estimates from original sample while, the parametric imputation methods estimates were inconsistent with the estimates of the original sample. Finally, the third study provided item difficulty estimates for two and three levels Rasch model for hierarchical generalised linear models using a 10 item data set from the second International Association for the Evaluation of Educational Achievement (IAEA) mathematics study for high school pupils in Australia and this sample was conducted in 1978. The findings from the data set suggested that the Hausman test shows a statistical difference in item difficulty estimates between two - and three-level Rasch model with random effect. CONCLUSION: The main purpose of this thesis is to examine the effect of the bootstrap and Hausman specification test for Rasch. The Hausman specification test may be superior to Andersen's likelihood ratio test statistics because the Hausman test procedure required both the estimation of the standard error and the test statistics while the bootstrap method is a resampling technique. The findings of the first study that supported the first hypothesis are in two parts. The first part suggested that larger bootstrap replicate (B=1000 or more) item difficulty estimate for Rasch is not sensitive to sample size while the second part of the first study concluded that there was no need to eliminate poorly fitted items as suggested previously in the literature, rather simulation or bootstrap methods may be considered as an alternative method for estimating the fit statistics for Rasch. The second study supported the hypothesis that the bootstrap imputation method was superior and the estimates produced by other parametric imputation methods were biased. Finally, the third study concluded that ignoring the multilevel nature of a data set will have an adverse consequence on how one will interpret item difficulty estimate in Rasch and this is in favour of the third hypothesis. The findings from this thesis will not only provide Rasch users with an alternative way of investigating research question by using the bootstrap method but also demonstrate how researcher can use the Hausman specification model to determine the fit statistics for an hierarchically structured data set and for poorly fitted items.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/60208