Fast approximate inference for longitudinal and multilevel data analysis

Publication Type:
Thesis
Issue Date:
2016
Full metadata record
Generalised linear mixed models are the cornerstone of longitudinal and multilevel data analysis. However, exact inference for Bayesian mixed models with semiparametric extensions is typically intractable, requiring approximate inference methods for use in practice. Markov chain Monte Carlo or MCMC is one of the most commonly used approximate inference methods in this setting, but can be computationally intensive and often suffers from poor convergence in complex models. A faster, deterministic alternative to MCMC is variational approximations, a class of deterministic algorithms that is based on reformulating the problem of computing the posterior distribution as an optimisation problem, simplifying that problem and finding solutions to the perturbed problem. In this thesis, we work with a particular class of variational approximations, known as the mean field variational Bayes (MFVB). In essence, MFVB approximations are based upon optimising the Kullback-Leibler divergence with respect to the so-called approximating distribution. We derive MFVB algorithms for a wide variety of Bayesian semiparametric mixed models with Gaussian, Student-t, Bernoulli and Poisson responses. In order to overcome the computational cost of the direct naïve approach to the underlying MFVB calculations for models, we introduce a novel, streamlined approach that involves matrix permutation and block decomposition. Through a series of numerical studies, we demonstrate that the MFVB algorithms achieve a good level of accuracy compared to a MCMC benchmark (our gold standard). Furthermore, our developed streamlined algorithms are shown to have a complexity that is linear in the number of groups at each level, representing a two orders of magnitude improvement over the naïve approach. More importantly, the modularity of MFVB allows relatively simple extensions to more complicated scenarios, including higher-level random effects, measurement error and/or missing data problems, models with group-specific curves and real-time or online data processing. Illustrations from various real data examples are provided.
Please use this identifier to cite or link to this item: