Spatial modeling, covariate measurement error and design issues in environmental epidemiology

Publication Type:
Issue Date:
Full metadata record
Files in This Item:
Filename Description Size
01front.pdf3.9 MB
Adobe PDF
02whole.pdf5.66 MB
Adobe PDF
In this thesis we develop methods to resolve a series of problems motivated by the analysis of administrative data to help explain geographical variation in disease rates. The Conditional auto-regressive (CAR) structure within a hierarchical generalized linear model offers a robust, flexible, and popular class of models for the exploration and analysis of geographical variation across small areas. However, lack of modeling strategies for individual level covariate data is a limitation of the existing methodology. We propose an individual level covariate adjusted conditional auto-regressive (indiCAR) model to incorporate both individual and area level covariates while adjusting for spatial correlation in disease rates. We also extend the indiCAR method to a semiparametric mixed model framework that allows adjustment for smooth covariate effects (smooth-indiCAR). We illustrate the applicability of both methods in a distributed computing framework that enhances its application in the Big Data domain with a large number of individual/group level covariates involved. We evaluate the performance of indiCAR and smooth-indiCAR through simulation studies. Our results indicate that both methods provide reliable estimates of all the regression and random effect parameters. The estimated regression coefficient based on the CAR modeling, however, appears to be sensitive to the assumed spatial correlation structure. We hypothesize that such sensitivity is especially likely to occur when the covariate of interest has been measured with error. We quantify the biases of covariate measurement error, showing that the amount of attenuation depends on the degree of spatial correlation in both the covariate of interest and the assumed random error from the regression model. These results explain why the estimates obtained from spatial regression modeling are often so sensitive to the assumed model error structure. We propose and develop both a parametric and a semiparametric approach to obtain bias corrected estimate. Statistical analysis of administrative data often helps in uncovering trends and patterns that need to be followed up via traditional epidemiologic investigations. Case control studies are often the first choice. However, appropriate selection of controls and lack of power to detect interaction effect are the main concerns of a case control design. We propose a variant of the classical case-control design, the exposure enriched case-control (EECC) design, where not only cases, but also high (or low) exposed individuals are over-sampled, depending on the skewness of the exposure distribution. We show that the judicious oversampling of exposure is possible and can boost the study power particularly when susceptibility genes are rare and environmental exposure is highly skewed.
Please use this identifier to cite or link to this item: