Modern study of biological systems delivers high-dimensional data as an outcome, such as expression data from RNA sequencing, with each response recorded resulting in large and complex datasets. Statistical analysis of these high-dimensional datasets needs to consider the within-sample dependences, such as spatial correlation within the brain. Current experimental design strategy typically uses parallel groups of animals to compare a single variable but longitudinal studies are also employed, where an individual animal or litter are measured repeatedly over time. The variable measured in these studies is often related to the amount of time between measurements, demonstrating time dependence. Statistical models for the design and analysis of longitudinal studies with high-dimensional outcomes needs to consider both types of dependences. The framework of these models is yet to be established.
Why we funded it
This PhD Studentship aims to develop a novel statistical model for the design and analysis of high-dimensional longitudinal animal studies. Well-designed longitudinal studies can take advantage of time dependence to gain statistical power and reduce the number of animals required for the study.
The exact number of animals that could be reduced depends on the correlation between the repeated measures. With a correlation value of 0.2, the numbers of animals needed for the study can be reduced by 20% compared to a parallel group experimental design. To demonstrate the potential of this work, historic mouse functional brain imaging data will be analysed and an accurate reduction potential calculated retrospectively.
The statistical model to be developed in this proposal will be based on generalised linear mixed models and Gaussian processes. Once developed, the statistical power will be evaluated by analysing brain imaging data from a genetic mouse model with behaviours relevant to both schizophrenia and autism. The brain shows spatial correlation patterns which statistical models must take into account and this will be used to inform the statistical model developed in this proposal. The model will allow the identification of statistically significant brain metabolism variation and gene expression over time and show differences between the mutant and wild type mice. This will allow for new insights into the developmental regulation of brain function in mice.
In order to reach sound scientific conclusions, investigators need to use sound experimental designs, followed up by rigorous statistical analysis of the results. This is particularly important in studies involving animals, where for ethical and economic reasons, we aim to reduce the number of animals used, while making sure that the sample size is sufficient to gain maximal knowledge. The potential for reducing the number of animals needed by employing efficient experimental designs is huge. When animals or litters are measured repeatedly over time, we can use longitudinal design and analysis methods to gain statistical power. However, existing methods assume that each sample at each time provides only one, or at most a few, measured outcomes (Diggle, Heagerty, Liang and Zeger, 2002).
Meanwhile, modern high-throughput biotechnologies deliver very high-dimensional outcome data from a single sample. While most research acknowledges the dependence amongst different dimensions of the data (e.g. linkage disequilibrium in the genome, spatial correlation in brain function imaging), there is a need for study design and analysis methods that bridge the gap between traditional longitudinal studies and the high-dimensional world of biomedicine.
This project will be concerned with developing a statistical model for the design and analysis of high-dimensional longitudinal studies, based on generalized linear mixed models and Gaussian processes. The student will apply this method to existing mouse functional brain imaging data from the lab of our collaborator Dr Neil Dawson. The aim will be to gain new scientific insights into developmental changes in mouse brain function, and to demonstrate the effectiveness of the high-dimensional longitudinal method for increasing the statistical power and reducing the number of mice needed.